Cute Trains and Wide Dreams: October 2020

Saturday, 31 October 2020

IOT: A New Project

For the coming year, I will focus on IOT - Internet of Things. I have bought a starter kit for Arduino (a microcontroller with some basic electronic stuff like a motor, diodes, buttons, potentiometers and a LCD display). Together with a Raspberry PI computer that I got as a birthday gift, I have something to start exploring.

My first project on my own - a simple traffic light.

I'll start with learning the basics - I follow the examples in "Arduino Projects Book" and I'll bring up a Raspberry PI to have basic security and connectivity (no mouse, keyboard or monitor connected).

The long term vision is a remotely controlled vehicle that has a web server and that can take pictures and stream video. Maybe I'll also add some sensors to log temperature.

The initial project plan looks like this;

Green means done and yellow means "work in progress".

Green means done,
Yellow means "work in progress".
Blue means "not started".

The main obstacle from a fully mobile mini-vehicle is to charge the battery without connecting it manually. I'll look into that later on.

Saturday, 24 October 2020

StockPredictor: Using Neural Networks to Try to Predict Stock Values

With the verified training data, I've been trying to predict stock performances, given some key numbers.

I'm using the scikit toolbox for Python and the initial findings is that the neural network hasn't been able to predict the stock performances. The correlation between the predicted values and the target values is close to zero.

Findings

I've run several iterations with different parameters, like alpha (learning rate), random state, batch size and whether to use historic data or not. They all point to the same conclusion - there is no easy-to-catch pattern that the neural network can find.

These finding support the efficient market theory - the past information is already priced into the current stock prices.

The distribution of the result data is hard to model using a neural network. Most of the data is small, but there are some few outliers that generates large losses for the training function.

The result distribution. 95% of the results are in the range -0.0143 up to 0.0166,
but some outliers are much bigger.

For reference, I trained a neural network for a dataset that is available in sklearn (load_diabetes). That network found a quite strong correlation between predictions and the results (75%). This means that I know how to set up and use data sets for machine learning.

Summary of the project and the future

I have no plans to continue the project at this moment. The web scraper will keep recording stock records and I'll update the documentation.

Lessons learned:

Verify the data early in the project!
Use standard schemes for saving the data. Preferably CSV format, and/or a format that can easibly be imported into a database.
Add checks for inconsistent data that notifies the operator.

Unplanned future work:

Map the results into an machine-learning friendlier distribution. Maybe logarithmic scale?
Iterate over different prediction windows. Currently only 7 days is used.
Use other algorithms, like K-nearest neighbor.

For the end of the year, I'll explore Internet of Things (IOT) with Arduino and Raspberry PI. The train project will continue in 2021.

Saturday, 10 October 2020

StockPredictor: Exploring the MLPRegressor

I want to predict stock prices using existing data. Optionally, I'll convert the problem into a classification problem ("to buy or not to buy). With the training data in place, I'll explore the Python package scikit-learn.

This article discusses some machine learning algorithms that can be used to predict a numerical value. Neural networks and K-nearest neighbors seems promising. I suspect non-linear relations between features and results, so I don't expect linear regression to work. However, I'll give it a try later.

Multilayer Perceptron Regressor

I based my program on the examples above, with my data from StockReader.

After training the neural network on ~210 000 examples, I test it on ~50 000 examples. I compare how the neural network is doing with an dart throwing monkey (a simple prediction that the stock performances will be the average daily price increase of the stocks).

The initial output from training the neural network together with verifying it on the test data shows no improvement compared to the monkey. The Median Average Error for the Neural network was 0.095, compared to the benchmark of 0.064.

The next step will be to investigate other parameters for the neural network, along with other algoritms. Since I will compare different models, I will need to introduce cross validation data. This is important, since I need to make sure to verify the model selection itself.

Useful link:

https://towardsdatascience.com/ml-preface-2-355b1775723e