Saturday, 26 September 2020

StockPredictor: Adding Past Values to Training Data

In my training data for the predictions, I use the following data to predict the changes in stock prices:

Stock Price, P/E, P/C, Yield, Price Margin, RSI and the number of days to dividend. 

I want to be able to add historic key numbers as well and I'll specify that using bit masks,

PriceP/EP/CYieldRSIPMRSI
Days to Dividend
Todayxxxxxxxx
One week agoxxx
Two weeks agoxxx
Three weeks agoxxxx
Code (binary)
11110101100111011011000100010011
Code (Hex)F59DB113

For simplicity, if one parameter is required two weeks ago, the script that generates the training data will add all parameters two weeks ago. The selection of parameters will happen in the prediction script. 

There won't be matching historic dates for all stock records in the database. For the historic dates, I'll try the calculated date +/- one day.
If the current stock record date is on the 27th, I'll search for the one-week-old record on the 20th. If that date can't be found, I'll search for the 19th and 21th. If neither can be found, the program will give up fetching records for that date.

This means that the training set will be some 50% smaller. Removing half of the training set in order to train the neural network on what happened to stocks over the last weeks can, in theory, bias the underlying data. Assumption: I assume that the performance of the recorded stocks are independent on when I recorded the training data.

In the network trainer, the data that isn't used will be removed from the training set. The script will drop the columns that I don't need according to the bitmask.

The next step is to evaluate some machine learning algorithms on the data.

Saturday, 12 September 2020

StockPredictor: Selecting Algorithm and Some Design Considerations

This is the third part of the stock project, where I will use the validated data to train a machine learning system to predict the performance of a stock based on the data that was available at the date of the stock record.

The first step is to make some design considerations for the algorithm. The earliest program will only look at the current stock record for features (X) and the results (y) will be the daily change of the stock.

If I select the prediction window to one week, one example of features and results would be:


The results from a query in the database will look like this:

The X data can consist of values of P, P/E, P/C, Yield, PMI, RSI and the time to/since the last dividend. The y data for the stock record of 2018-10-31 will be the change (%) divided by the days: 
The daily increase of ABB's stock price was 0,054% during one week in November, 2018.
It would have been more mathematically correct to take the seventh
root of the quota, but this time I'll prefer simplicity.

For training data to be useful, there must be a price record in the future that can be used ad y.

I'm saving the data to a huge panda dataframe (the process of populating it from the database (with some necessary processing) takes ~35 minutes. To avoid repeating this cumbersome process every time, I save the dataframe to a csv file, that can be used directly to train the machine learning algorithm. Loading that file takes less than a second.

The Data Set
I suspect that some of the data are highly correlated to each other. For example, Price, P/E and P/C are correlated in the short-term, since earnings and capital per share seldom changes. 

Sometimes, the data contains zeros or null values. 

Regression or Neural Networks

Both regression and neural networks has some pros for this dataset. 

Linear regression

A linear regression would be intuitive for predicting the change in stock price. But there are some zeros in the underlying data that I fear will skew the results. It also seems to be tricky to handle XOR relationships between features. 

Multi-Layer Neural Networks can model complex relationships such as XOR relations and zeroed records. I'll start with this one initially. The

SciKit-Learn

Sklearn has a neural-network-ish regressor that I will investigate.