Cute Trains and Wide Dreams: June 2019

Saturday, 29 June 2019

StockAnalyzer: Validating the Raw Data - Missing Price

In my last blog post, I was able to add data from July 2013 and later. Before that, the data has a different format and I will handle that soon.

I have seen that I will need to check the data before adding it to the database. Sometimes, data is missing and sometimes, the data comes in a different shape.

I will discuss the issues when I see them.

Missing Price Information:

The web site that I use displays the price (the price of the last transaction) if there has been any transactions that day. If no transactions has happened that day, the price field is empty.

If that number is missing, the parameter "P" will have the value null. However, it is always possible to calculate the price by multiplying P/E by earnings per share (E). I'll illustrate with a real-world example:

On October 9th, 2014, the trade in the Swedish company Eniro was halted by the stock exchange in Stockholm and the Swedish Financial inspection. For that day, all trade was halted for Eniro and there was no price quote for that day.

The stock price number is corresponding to Senast (Latest). Screen dump from June 2019.

The screen shot above shows the web pages that I'm scraping. Most of the times, the fields are populated with data, but there are some exceptions. In the case of Eniro, the dividend (utdelning) and yield (direktavkastning) are omitted since there is no yield for that company. The webscraper interprets those fields to be zero.

Price is available for Eniro (10,19 SEK per share). The price corresponds to the the P/E (1,84) and Earnings per share (5,52) since 1,84 times 5.52 equals 10,16 SEK.

On October 9th, 2014, price info was omitted. It is still possible to estimate the price using:

Price/Earnings rating, the Earnings per share and also
Capital per share and the Price per capital:

On October 9th, 2014, no trades were done for Eniro.
As a consequence of that, there is no record of stock price for that date.
Multiplying P/E by E gives 6,77 SEK.

I expect this to happen very seldom, but I will add a row to the database that indicates when the price information is missing. The neural network will tell whether a missing price information will predict anything in the future.

Coding:

After fetching the stock record, all data are represented O as a string list. In this case, the string for price is empty, and string2Float thus returns null.

I added a function inside the method for scanning the stock record. That function is in the same scope as the other variables and can modify them.

If P is null, P is recovered using P/E and E and a boolean parameter is set to indicate that the parameter was missing from the raw data.

I have updated the database to include a bit telling whether the price information was missing in the stock record.

Now, the price is added along with an indicator of missing price. "1" indicates that the price is missing.

Saturday, 22 June 2019

StockAnalyzer: Scanning the Stock Files for Data

Now, I will start to interpret the data in the input files (YYYYDDMM.csv) and generate SQL commands based on that information.

I will start by analyzing the headers of all the data files.

I collect all unique headers to a stringlist and print the contents to the console.

There is plenty of data but I will only use some of it. The data that is common for all three different kinds of versions are:

Name,
Price (P),
Earning per share (E),
P/E - This is redundant and can be calculated from P and E. I'll use this to verify P, E and P/E values
Annual Dividend (D)
Stock Yield (D/P) - This is redundant and can be used to verify P, D and D/P
Capital per Share (C)
Price/Capital (P/C) - This is redundant and can be used to verify P, C and P/C
RSI - Indicator of volatility
TA - Technical Analysis. This information was available for the first years.
Date for Dividend - Only available for later stocks
Date for report - Only available for later stocks

I will start by scanning one file and extract the values.

The first line is the header. If that header string hasn't been detected before, the program will add that header to a list of headers.

The other lines contains the stock data and I'll parse them to parseRecordToDatabase that extracts the relevant data and creates a SQL query that adds the record to the database.

I will use a switch case to identify the different headers.

I will split the each line of the file into the different data points and do some initial integrity checks of the data.

After the checks, I will create a SQL query string to enter the data to the database. I'll parse that string to my database handler.

Next steps:

Add counters for the number of records that I was able to decode and add to the database
Add two case clauses for the other two types of input data
Add checks for the input data for errors
Analyze a file only if the corresponding date doesn't exist in the database.
Refactor and clean up code

Saturday, 15 June 2019

StockAnalyzer: Selecting the Input Files

Now, it is time to interpret the input files. They are named in the form "YYYYMMDD.csv" and contain a header row that describes the contents.

I need to consider some possible issues for the input data:

The file headers has slightly different formats since I have added some information in the later versions.
The early versions of StockReader had hard coded references to the stocks. Thus, some stock records has sometimes slightly different names of the stocks. Sometimes, the companies changed names too. I need to add checks for the stock names, and possibly add a second table that maps different names to the same stocks.
Sometimes, the data acquisition was interrupted.

I'll start by selecting which files to analyze, based on the file names and the time interval that the user has specified in the dateTimePickers.

In order to do that, I use a built in function for removing the file path and the file extension. The remaining string contains the year, month and date:

Now, I process only one file.

Now, I want to check whether the corresponding date is within the time interval. For that purpose, I need to convert the string 20101110 to a datetime format. This is done in three steps:

Step 1: Check the file format - I'm only interested in comma separate value files.
Step 2: Check the date - I'm only interested in files that are corresponding to plausible dates.
Step 3: Convert the date to dateTime format and check whether the date is between the start and end date that the user has specified.
If all these checks are OK, the program will analyze the file. I'll do that in the next blog post.

Special Characters - ÅÄÖ
Since I am collecting data from Swedish stocks, I will handle strings with Swedish characters. I solved that by adding a streamReader object that is configured for the Windows 1252 encoding that contains Swedish characters.

As a side note, I added an UNIQUE constraint to the database. There can be only one record for one company at a particular date.

Saturday, 8 June 2019

StockAnalyzer: Adding Code

Now, it is time to add some code to the program.

Database
The database is controlled from dbParser.
I added some code for printing the contents of the database to the console and also code for clearing all data from the database.

Code for printing contents of database to console.

Code for deleting all records of the database.
The first step is to delete all records from the StockTable.
The second step is to set the identity counter to zero.

Scanning for Files
I have saved my input data in a local folder on my computer. Thus, it is convenient for me to have that folder as the default folder. However, I want to avoid assuming that that folder exists - someone might fork my repo and install it on another system.

To handle this, the program will check whether my folder exists. If it doesn't, it will use the user's document folder instead.

I'll clean up this form soon.

When the user presses the "Scan to Database" button, the program will scan for the files in the class scanFolder. First, I need a list of all files in the current folder. Once I have that file list, I will iterate over it and scan only the files that are in the time interval.

In the next blog post, I'll make the program select which files to analyze and also start to add records to the database.

Saturday, 1 June 2019

StockAnalyzer: Creating a Structure of the Program

I've been able to populate my database with some dummy data and to read it from the database. That's an important step and now, I need to define the overall structure of the program.

Use Cases identifies some scenarios where an user is using the system:

User wants to delete all stock records from the database
User wants to scan all available files and add the data to the database
User wants to scan all files within a specified time range
User wants to update the database with the latest results from StockReader

The program will have three different modules:

User Interface - Collects the parameters and call methods/functions
FileScanner - Select which csv files to scan and determine whether the data in a row contains enough information so that it can be added to the database.
DBParser - Build SQL queries and launch them. Database error handling.

Now, I have a rough idea what the program should look like. I have added a couple of user interface widgets to my app. The code is stubbed right now:

I will clean up this form later.

In the "Time Interval to Scan" group box, the user will be able to limit the number of files that shall be loaded into the database.

In the "Directory to Scan" group box, the user will be able to select where the input files are. The text of the button shows the current folder.

The user will be able to clear the database and scan all files that matches the dates in the folder. The algorithm will look like this:

List all csv files in the current folder.
For all files that are matching the time interval:

Check the headers in order to know how to interpret the data.
For all rows in the file:

If the needed data is there for that stock, add the information to the database

I have also created two classes: FileScanner and DBParser. Those are instantiated when the program starts.

In the next blog post, I'll fill the classes and widgets with code.