With the verified training data, I've been trying to predict stock performances, given some key numbers.
I'm using the scikit toolbox for Python and the initial findings is that the neural network hasn't been able to predict the stock performances. The correlation between the predicted values and the target values is close to zero.
Findings
I've run several iterations with different parameters, like alpha (learning rate), random state, batch size and whether to use historic data or not. They all point to the same conclusion - there is no easy-to-catch pattern that the neural network can find.
These finding support the efficient market theory - the past information is already priced into the current stock prices.
The distribution of the result data is hard to model using a neural network. Most of the data is small, but there are some few outliers that generates large losses for the training function.
|
The result distribution. 95% of the results are in the range -0.0143 up to 0.0166, but some outliers are much bigger. |
For reference, I trained a neural network for a dataset that is available in sklearn (
load_diabetes). That network found a quite strong correlation between predictions and the results (75%). This means that I know how to set up and use data sets for machine learning.
Summary of the project and the future
I have no plans to continue the project at this moment. The web scraper will keep recording stock records and I'll update the
documentation.
Lessons learned:
- Verify the data early in the project!
- Use standard schemes for saving the data. Preferably CSV format, and/or a format that can easibly be imported into a database.
- Add checks for inconsistent data that notifies the operator.
Unplanned future work:
- Map the results into an machine-learning friendlier distribution. Maybe logarithmic scale?
- Iterate over different prediction windows. Currently only 7 days is used.
- Use other algorithms, like K-nearest neighbor.
For the end of the year, I'll explore Internet of Things (IOT) with Arduino and Raspberry PI. The train project will continue in 2021.