In this blog post, I address some more issues:
Input Data Replaced by HTML Tag
The program tries to parse ass="BrodTextWhite" width="12">? to a float number. I don't know why this string has replaced the data.
I will handle this by simply removing the bad record. I haven't seen that for other stocks and there is no information in the string that can recover the missing data.
Invalid Data for Dividend or Report Dates
Another issue that I saw was that some records had an invalid date for the report. I solved that by checking that the string can be parsed to a date. If not, I'll feed the NULL value to the database.
Now, the database has a NULL value when the date is missing.
Empty Profit Margin
Now, I was able to scan three stock records for almost five years (2013-07-28 to 2018-06-13). For AAK, the program is trying to parse an empty string to a value.
There are some empty values: Earnings per share, Capital per share, Dividend per share and Profit Margin. I can recover everything except for the Profit Margin, that requires information about sales in relation to the production costs. Since this information is important, I will add the stock information but set the profit margin to NULL.
Populating Missing Data?
I will create functions for filling in some data that is missing, if corresponding data can help calculating that data.
For the AAK record of 2018-06-13, the Earnings per share is missing.
I will handle this in two steps (next blog post):
First, I'll add try-catch clauses to handle parsing errors from string to floats.
After that, I'll add functions to verify and populate null entries of the key numbers.
No comments:
Post a Comment