Saturday 17 August 2019

StockAnalyzer: Handling Stock Name Mismathes

One early flaw of the earlier versions of my webscraper was that it used hard coded links when retrieving the stock data. The links were based on hard coded numbers and I needed a safeguard check to detect if those numbers changed.

The program took the first Name ("Eniro") as the stock name. It used the corresponding numbers (869, 2398 and 46100341) to build three URL strings to fetch the stock data.

For that reason, I had the stock name as the two first entries: The first name is the stock that I want to fetch and the second name is the stock that was actually collected. Ideally, the names matched each other, but there are several mismatches.

I need to verify the stock records before adding them to the database. One check will be to see whether the first string is "---Void---" or not. That string was a placeholder and that record shall be discarded.


Further, I need to check whether the stock data I have really matches the stock name or not. To do this, I'll use a XML file that contains three lists:
  • White List for the allowed combinations of names,
  • Grey List for the combinations of names that hasn't been verified by the user yet and
  • Black List for the illegal combinations of names (these records will be ignored by the program).


A lesson learnt from this is to make sure that the web scraped data is in a good shape already when scraping. The next blog post will cover some XML topics for CSharp.

No comments:

Post a Comment