Saturday 6 July 2019

StockAnalyzer: Validating the Raw Data - Broken input data and Missing Report date

My Stock project is progressing and I'm currently adding fixes for flaws in the input data that I want to add to a SQL database. In my previous blog post, I added a fix for records that has no information about the stock price.

In this blog post, I address some more issues:
Input Data Replaced by HTML Tag
The program tries to parse ass="BrodTextWhite" width="12">? to a float number. I don't know why this string has replaced the data.

I will handle this by simply removing the bad record. I haven't seen that for other stocks and there is no information in the string that can recover the missing data.

Invalid Data for Dividend or Report Dates
Another issue that I saw was that some records had an invalid date for the report. I solved that by checking that the string can be parsed to a date. If not, I'll feed the NULL value to the database.

Now, the database has a NULL value when the date is missing.


Empty Profit Margin
Now, I was able to scan three stock records for almost five years (2013-07-28 to 2018-06-13). For AAK, the program is trying to parse an empty string to a value.

There are some empty values: Earnings per share, Capital per share, Dividend per share and Profit Margin. I can recover everything except for the Profit Margin, that requires information about sales in relation to the production costs. Since this information is important, I will add the stock information but set the profit margin to NULL.

Populating Missing Data?
I will create functions for filling in some data that is missing, if corresponding data can help calculating that data.

For the AAK record of 2018-06-13, the Earnings per share is missing.

I will handle this in two steps (next blog post):
First, I'll add try-catch clauses to handle parsing errors from string to floats.
After that, I'll add functions to verify and populate null entries of the key numbers.

No comments:

Post a Comment