Saturday, 15 June 2019

StockAnalyzer: Selecting the Input Files

Now, it is time to interpret the input files. They are named in the form "YYYYMMDD.csv" and contain a header row that describes the contents.

I need to consider some possible issues for the input data:
  • The file headers has slightly different formats since I have added some information in the later versions. 
  • The early versions of StockReader had hard coded references to the stocks. Thus, some stock records has sometimes slightly different names of the stocks. Sometimes, the companies changed names too. I need to add checks for the stock names, and possibly add a second table that maps different names to the same stocks.
  • Sometimes, the data acquisition was interrupted.

I'll start by selecting which files to analyze, based on the file names and the time interval that the user has specified in the dateTimePickers.

In order to do that, I use a built in function for removing the file path and the file extension. The remaining string contains the year, month and date:
Now, I process only one file.
Now, I want to check whether the corresponding date is within the time interval. For that purpose, I need to convert the string 20101110 to a datetime format. This is done in three steps:
Step 1: Check the file format - I'm only interested in comma separate value files.
Step 2: Check the date - I'm only interested in files that are corresponding to plausible dates.
Step 3: Convert the date to dateTime format and check whether the date is between the start and end date that the user has specified.
If all these checks are OK, the program will analyze the file. I'll do that in the next blog post.

Special Characters - ÅÄÖ
Since I am collecting data from Swedish stocks, I will handle strings with Swedish characters. I solved that by adding a streamReader object that is configured for the Windows 1252 encoding that contains Swedish characters.

As a side note, I added an UNIQUE constraint to the database. There can be only one record for one company at a particular date.

No comments:

Post a Comment