Cute Trains and Wide Dreams: AI

Showing posts with label AI. Show all posts

Tuesday, 19 August 2025

AI: How May AI Tools Impact Our Thinking

I've spent some time on reading and summarizing a couple of papers that are discussing how AI is impacting our thinking. After that, I've also had an AI agent summarizing the papers and discussing my conclusions.

Your Brain on ChatGPT

This paper from MIT compared three small groups of students that were to write essays with different levels of assistance: No assistance at all, Search Engines and Large Language Models.

The term self-efficacy reflects the student's confidence in his/her own ability to learn. The report mentioned that students low in self-efficacy may use LLM's to a larger extents than self-confident students. Cognitive Load versus Engagement

There is a difference between high-competence use of LLM and low-competence use. Higher competence uses LLM strategically for active learning, revisiting and synthesizing information to create coherent knowledge structures. This reduced cognitive strain while remaining engaged with material.

LLM bots as Instructor Bot and Emotional Support Bot can improve performance and reduce stress

Con's from LLM's: Laziness, one single answer compared with web searches, no person-person discussions, more superficial and effort-less learning.

The Illusion of Thinking

Large Reasoning Models are LLM's that can perform some kind of "reasoning" in different steps. Apple has investigated some models for simple puzzles. Depending on the task complexity, either LLM's are better (simple tasks), LRM are better (moderate complex tasks) and both collapse (complex tasks).

Put differently, will future employers allow employees to spend several hours reading and understanding complex topics by actually reading about it, or will employees be expected to use LLM's to quickly generate convincing TL;DR results that are more or less factual. The latter option may make employees lazy, worse in problem solving and independent thinking.

Thursday, 17 July 2025

Exploring AI Tools for Coding

As I work in tech, the developments in AI will have a serious impact on my work, even if I'm not a software developer.

My summer project (on the rare occassions when I'm not focusing on my family, house and geopolitics) will be to investigate tools for coding and working in tech using AI tools that are available today. Maybe I'll reboot my train project.

Initial Youtube videos for exploring AI

Of the tools below, I'll focus on GitHub Copilot, Gemini and Cursor. I have some experience of the two former tools. I'll have an open mindset and try to avoid the inevitable flamewars that comes with any new technologies. Many of the cool tools will vanish when the AI world enters the next AI winter so it rill be hard I'll avoid focusing on a few tools.

This is an overview of some current tools for developers.

Tools to check: React, Express, Tailwind, Reddus and Dino for web development.

Vibe coding fundamentals

Vibe coding is kinda like having a junior developer available that can help with some basic non-perfect coding. Still, there will be need for coding, design thinking and debugging.

This one summarizes the Google AI Essentials course.

Break down complex problems into specific tasks.

Four levels of thinking for vibe coding: Logical, Analytical, Computational and Procedural

Tools to check: Replit -Windsurf - Cursor

Fundamental skills: The Friendly Cat Dances Constantly.

Thinking - have a clear description of problem. PRD - Product Requirement Document
Framework - Help the AI help you to find framework that is solving your problem
Checkpoints - use GIT
Debugging
Context

Gemini has some support for advanced research. To be added to my backlog

Prompt engineering Tiny Crabs Ride Enormous Iguanas -

Task
Context
References
Examples
Iterations

Revisit prompting framework
Separate prompts into shorter sentences
Try different phrasing or switch to analogous task
Introduce constraints
Check following from prompt responses

Is output accurate and unbiased?
Is output containing sufficient information?
Is output relevant?
Is output consistent when using it several times?

Glossary:

Shot = Example

Persona - Ask AI to act an an expert on a specific field

Context

Task

Links

https://aistudio.google.com/prompts/new_chat

https://github.com/i-am-bee Bee agent framework

Brilliant.org

https://grow.google/prompting-essentials/

Saturday, 26 December 2020

AI: Colorizing Old Videos

It has been a couple of weeks since my last blog post, and that for a good reason (my second son arrived earlier this month).

I announced his arrival with some posts about some of his ancestors. One of the posts contained a video from the early 1950's.

There is a online tool (Jupyter/Python) that can colorize black/white photos and videos.

What It Does

DeOldify is based on Generative Adversarial Networks, where two Neural Networks compete to each other. One Neural Network (generator) tries to produce realistic colorized pictures, and the other (discriminator) tries to tell whether pictures are authentic or colorized. The generator tries to fool the discriminator, and the discriminator tries to reveal the generator. When they work together, they will for example learn to create pictures that appear realistic to human beings.

DeOldify uses an extension of GAN that is called NoGAN. NoGAN eliminates some side effects of GAN for videos, for example flickering colors between frames.

You can find an interview with one of the inventors below.

How to Use:

Step 1: I uploaded the black/white video file to YouTube. I wanted it to be available for deOldify but I didn't want it to be searchable yet, so I used the unlisted option. With the unlisted option, one needs the direct link to the video in order to see it.

Step 2: Follow the steps in deOldify Colab. One can run it in the cloud, or on a local computer, if desired. Provide the link to the video in step 1 when it is time to run the Colorize! step.

Step 3: The colorization can process a couple of frames per second, so this will take some time. Once the colorization is done, download the video file to your computer.

My original video:

My Colorized Video:

I added a small watermark in the lower left side of the video to indicate that this video has been post processed.

Saturday, 24 October 2020

StockPredictor: Using Neural Networks to Try to Predict Stock Values

With the verified training data, I've been trying to predict stock performances, given some key numbers.

I'm using the scikit toolbox for Python and the initial findings is that the neural network hasn't been able to predict the stock performances. The correlation between the predicted values and the target values is close to zero.

Findings

I've run several iterations with different parameters, like alpha (learning rate), random state, batch size and whether to use historic data or not. They all point to the same conclusion - there is no easy-to-catch pattern that the neural network can find.

These finding support the efficient market theory - the past information is already priced into the current stock prices.

The distribution of the result data is hard to model using a neural network. Most of the data is small, but there are some few outliers that generates large losses for the training function.

The result distribution. 95% of the results are in the range -0.0143 up to 0.0166,
but some outliers are much bigger.

For reference, I trained a neural network for a dataset that is available in sklearn (load_diabetes). That network found a quite strong correlation between predictions and the results (75%). This means that I know how to set up and use data sets for machine learning.

Summary of the project and the future

I have no plans to continue the project at this moment. The web scraper will keep recording stock records and I'll update the documentation.

Lessons learned:

Verify the data early in the project!
Use standard schemes for saving the data. Preferably CSV format, and/or a format that can easibly be imported into a database.
Add checks for inconsistent data that notifies the operator.

Unplanned future work:

Map the results into an machine-learning friendlier distribution. Maybe logarithmic scale?
Iterate over different prediction windows. Currently only 7 days is used.
Use other algorithms, like K-nearest neighbor.

For the end of the year, I'll explore Internet of Things (IOT) with Arduino and Raspberry PI. The train project will continue in 2021.

Saturday, 10 October 2020

StockPredictor: Exploring the MLPRegressor

I want to predict stock prices using existing data. Optionally, I'll convert the problem into a classification problem ("to buy or not to buy). With the training data in place, I'll explore the Python package scikit-learn.

This article discusses some machine learning algorithms that can be used to predict a numerical value. Neural networks and K-nearest neighbors seems promising. I suspect non-linear relations between features and results, so I don't expect linear regression to work. However, I'll give it a try later.

Multilayer Perceptron Regressor

I based my program on the examples above, with my data from StockReader.

After training the neural network on ~210 000 examples, I test it on ~50 000 examples. I compare how the neural network is doing with an dart throwing monkey (a simple prediction that the stock performances will be the average daily price increase of the stocks).

The initial output from training the neural network together with verifying it on the test data shows no improvement compared to the monkey. The Median Average Error for the Neural network was 0.095, compared to the benchmark of 0.064.

The next step will be to investigate other parameters for the neural network, along with other algoritms. Since I will compare different models, I will need to introduce cross validation data. This is important, since I need to make sure to verify the model selection itself.

Useful link:

https://towardsdatascience.com/ml-preface-2-355b1775723e

Saturday, 26 September 2020

StockPredictor: Adding Past Values to Training Data

In my training data for the predictions, I use the following data to predict the changes in stock prices:

Stock Price, P/E, P/C, Yield, Price Margin, RSI and the number of days to dividend.

I want to be able to add historic key numbers as well and I'll specify that using bit masks,

	Price	P/E	P/C	Yield	RSI	PM	RSI	Days to Dividend
Today	x	x	x	x	x	x	x	x
One week ago	x				x			x
Two weeks ago	x	x		x
Three weeks ago	x		x	x	x
Code (binary)	1111	0101	1001	1101	1011	0001	0001	0011
Code (Hex)	F	5	9	D	B	1	1	3

For simplicity, if one parameter is required two weeks ago, the script that generates the training data will add all parameters two weeks ago. The selection of parameters will happen in the prediction script.

There won't be matching historic dates for all stock records in the database. For the historic dates, I'll try the calculated date +/- one day.

If the current stock record date is on the 27th, I'll search for the one-week-old record on the 20th. If that date can't be found, I'll search for the 19th and 21th. If neither can be found, the program will give up fetching records for that date.

This means that the training set will be some 50% smaller. Removing half of the training set in order to train the neural network on what happened to stocks over the last weeks can, in theory, bias the underlying data. Assumption: I assume that the performance of the recorded stocks are independent on when I recorded the training data.

In the network trainer, the data that isn't used will be removed from the training set. The script will drop the columns that I don't need according to the bitmask.

The next step is to evaluate some machine learning algorithms on the data.

Saturday, 12 September 2020

StockPredictor: Selecting Algorithm and Some Design Considerations

This is the third part of the stock project, where I will use the validated data to train a machine learning system to predict the performance of a stock based on the data that was available at the date of the stock record.

The first step is to make some design considerations for the algorithm. The earliest program will only look at the current stock record for features (X) and the results (y) will be the daily change of the stock.

If I select the prediction window to one week, one example of features and results would be:

The results from a query in the database will look like this:

The X data can consist of values of P, P/E, P/C, Yield, PMI, RSI and the time to/since the last dividend. The y data for the stock record of 2018-10-31 will be the change (%) divided by the days:

The daily increase of ABB's stock price was 0,054% during one week in November, 2018.
It would have been more mathematically correct to take the seventh
root of the quota, but this time I'll prefer simplicity.

For training data to be useful, there must be a price record in the future that can be used ad y.

I'm saving the data to a huge panda dataframe (the process of populating it from the database (with some necessary processing) takes ~35 minutes. To avoid repeating this cumbersome process every time, I save the dataframe to a csv file, that can be used directly to train the machine learning algorithm. Loading that file takes less than a second.

The Data Set

I suspect that some of the data are highly correlated to each other. For example, Price, P/E and P/C are correlated in the short-term, since earnings and capital per share seldom changes.

Sometimes, the data contains zeros or null values.

Regression or Neural Networks

Both regression and neural networks has some pros for this dataset.

Linear regression

A linear regression would be intuitive for predicting the change in stock price. But there are some zeros in the underlying data that I fear will skew the results. It also seems to be tricky to handle XOR relationships between features.

Multi-Layer Neural Networks can model complex relationships such as XOR relations and zeroed records. I'll start with this one initially. The

SciKit-Learn

Sklearn has a neural-network-ish regressor that I will investigate.

Saturday, 11 January 2020

2019 in Retrospect

This year, I've had some progress on my pet projects. The progress rate was slower in Q4 as we have bought a house. We'll move there in 2020.

Maintaining a house will require some work and learning activities. I expect that the pace of my pet projects will be slower than before.

2019

TrafficControl (C++ / Qt) - Paused

No work on this project in 2019.

StockReader/StockAnalyzer (C# / SQL / MSVC) - Current focus

Creating a C# program that scans a folder for result files, extract stock records and parses the data to a database. I've got some experience with Microsoft Visual Studio by now and I'm still learning...
Next step will be to fix issues with data conversion to SQL
After that, I'll explore some Machine Learning algorithms and check if I can predict stock price changes based on key numbers.

ApartmentPredictor (Python) - Done

A Python script that made Linear Regression Analysis of past apartment prices in an area. The predictions were based on the apartment size and the monthly cost.

Machine Learning Course at Stanford (Octave / Matlab) - Done

I successfully attended and finished the course in Machine Learning from Stanford University.
I applied some machine learning for a pet project for a friend. The simulations indicated that the result was random to a very high degree,

TravelTimeCalculator (Java / SQLite / Android Studio) - Done

I've completed the project now. TravelTimeCalculator allows the user to compare travel time/cost between several positions on a map. They can also customize the travel cost to consider both travel time, economic costs and environmental cost.

Work (C / Python / 4G / 5G) At work, I've focused on test and some coding for 5G with focus on antenna calibration.

To Do in 2020:

Prio 1: The New House!

Prio 2: StockReader/StockAnalyzer

Saturday, 30 November 2019

ApartmentPredictor: A Simple Logistic Regression Price Predictor (1)

Problem:
I want to predict apartment prices for a given area using the size and the monthly cost, using a regression model with statistics for sold apartments in a specified area.

It is fairly simple to get a sample of past apartment deals for an area using some real estate listings.

Preparing the Data
I created a web scraper in python that is using Regular Expressions and the Requests package.

In this data set, the Price reflects the final price agreed between the seller and the buyer.

The next step is to prepare the data for the SciKit package in Python. I will use two features (area and cost) to construct a model of seven parameters to optimize for:

I created two numpy arrays for the raw data:

X (number of past apartment deals, number of features)
y (number of past apartment deals)

The Panda dataframe is populated by the two arrays:

The result is:

The data set's properties are:

Most of the features are derived from Cost and Area

So, just guessing that the final price would be the mean price would give an average error of 347 kSEK. This will be a measure of the model's performance: I hope that it will generate a lower error than 347 kSEK.

In the next blog post, I'll use SciKit to generate an initial prediction and evaluate that one.

Tools that I use:
SciKit is a popular package for data analysis, data mining and machine learning. I use the linear regression.
Panda is popular for machine learning too. It makes it easy to organize data
Numpy is useful for matrices, linear algebra and organizing data.

Thanks Nagesh for the inspiring article.

Saturday, 23 November 2019

Machine Learning: Lecture Notes (Week 9)

Gaussian Distribution (Normal Distribution) is used for anomaly detection. I'm quite familiar with this so I won't discuss this in my blog.

The Parameter Distribution Problem
Given a data set, estimate which distribution gave this data set using the Maximum Likelihood method:

Estimate that the average (mu) to be the same as the average.
Estimate that the variance (sigma) to be the same as the standard error squared.

Anomaly Detection Algorithm
Given a training set of m samples, each with n features, where the samples follow the normal distribution, p(x) can be calculated as:

Choose features that might indicate anomalous examples.
Fit parameters for the different features such as mean values and variance.
Calculate p(x). If p(x) is smaller than a threshold, an anomaly is detected.

Put simply, the likelihood for one example is calculated. If that likelihood is too small, we probably have an anomaly.

This is the last lecture note for this course, since I've completed the course. The next blog post will discuss a simple logistic regression example for apartment prices.

Saturday, 16 November 2019

Machine Learning: Lecture Notes (Week 8)

Unsupervised Learning
Unsupervised learning is training on a training set X without labels y.

Clustering algorithms finds clusters of similar features.

K-means
K-means is the most popular clustering algorithm.

It takes two random points as cluster centroids.

Ignore the bias feature.
Repeat:

For each sample in the training set, it assigns it to the closest centroid.
Each centroid is moved to the average location of its assigned training samples.

Optimization Objective
K-means can get stuck in a local optimum. Use random initialization several time to overcome this.

Elbow method for selecting number of clusters.

Create a curve of the cost function versus the number of clusters. The "elbow" will indicate the suitable number of clusters.

It can be necessary to reduce the number of features. Sometimes, features can be seen as redundant.

This can be done by projecting data from a set of higher dimensions to a plane of lower dimensions.

Principal Component Analysis

PCA tries to reduce the error between the data points and its projections.

PCA should not be confused with liner regression

Start with mean normalization (subtracting average from each value). The scale of features must be on a comparable scale.

Calculate coviarance matrix SIGMA = 1/m x^i {x^i}^T

Compute Eigenvectors U, S, V of SIGMA using singular value decomposition (SVD)

This gives U, a matrix of n column vectors. The first k column vectors are the new planes for the PCA.

U_reduce = U(1:k,:)

z=U_reduce^T x

Restoring:

X_approx = U_reduce x

Anomaly Detection - when to use

Use AD when there are few positive examples

Many different and unpredictable types of anomalies.

What features to use?

Use histograms to see whether it is a gaussian feature or not. If it is gaussian, it is suitable for anomaly detection. If it isn't, transform it to a gaussian distribution.

Recommender Systems

One approach can be to use a version of a regression analysis.

Collaborative Filtering

In this case, we don't have any information about the movies, such as romance/action etc. The users has specified which movies they like (theta values).

Alternate optimizing for indata and model parameters to get lower errors.

1. Initialize x and theta to smal random values.

2. Minimize J(x, theta) using gradient descent

Saturday, 9 November 2019

Machine Learning: Lecture Notes (Week 7)

Support Vector Machine (SVM)
The cost function will be modified compared to the sigmod function. It will consist of a constant slope session and a constant session.

In SVM, the parameters CA+B shall be minimized. A is the cost term, B is the regularization term and C controls the weight between them.

When training the hypothesis function, the hypothesis (theta multiplied by input data) shall be

Cost1: Bigger than 1, if y = 1
Cost0: Smaller than -1, if y = 1

Support Vector Machine
Define f as a similarity function that calculates the proximity to landmarks (combinations of feature values):

The function is based on a Gaussian kernel. Sigma affects the pointyness of the curve - a small sigma means a pointier curve.

Big C (small lambda): Low bias, high variance. Optimizing training set.
Small C (big lambda): High bias, low variance. Optimizing to reduce overfitting.

Saturday, 2 November 2019

Machine Learning: Lecture Notes (Week 6)

Machine Learning Diagnostics
When getting poor test results from a trained machine learning algorithm, some solutions are available:

More training examples
More/less features
Higher/smaller training rate
Adding polynomial features

The challenge is how to find which of the solutions will help out in a particular project.

Machine Learning Diagnostics are tests that can help narrowing down which actions listed above may help improving the performance of a ML project.

How to Evaluate a Hypothesis
A hypothesis should be able to generalize to new examples that aren't in the training set. An over fitted system will preform poorly with respect to this.

One way of verifying a hypothesis is to divide the dataset into a training set and a test set. After training a neural network, apply the hypothesis to the test set and check whether the error is acceptable. ~70% of data can be training set.

Model Selection with Train/Validation/Test sets
Degree of polynomial for regression can be determined by fitting the training data to a set of polynomials with different degrees. By evaluating the polynomials on a test set, a model is selected. However, that may not be a good generalization since that polynomial is fitted to the test set (the parameter d) and may not be generalized enough.

Instead, divide the data set into three parts:

Training set ~60%
Cross validation set (CV) ~20%
Test set ~20%

Calculate corresponding cost functions.

Train the data to minimize the training set cost function. Test the models using the cross validation set and select the model with the lowest cross validation error. Estimate the error using the test set.

Bias vs Variance
High Bias - Underfit
High Variance - Overfit

Calculate the Training and Cross Validation error. Plot error vs degree of polynomial.

Bias: Both training and CV errors are high
Variance: CV errors are high, but training errors are low.

Regularization
For a polynomial as a hypothesis, a high value for lambda (regularization) implies that all non-bias parameters will be close to zero. A low value for lambda gives an overfitted system. How to chose the regularization parameter?

Calculate the cost functions for the training set, the cross validation set and the test set (mean-square error).
Try different lambdas (doubling each step) and minimize the cost functions.
Use the cross validation set and check which of them has the lowest error on the cross validation set.
Finally, calculate the test error on the test data.

Readers that follow the course may note that I've omitted some parts of the lectures. I am doing that for a pragmatic reason - time. I take these notes in order to help me remember (rubberduck) the important lessons. If you want better coverage of the course. I recommend taking the course.

Saturday, 26 October 2019

Machine Learning: Lecture Notes (week 4/5)/Neural Networks

My earlier blog posts on Neural Networks are here.

Neural Networks are suitable for more non-linear representations. When the feature set gets bigger, the polynomial will be unreasonably big.

Terminology:

The course considers two types of classification: Binary and Multi-class classification.

The cost function will be a generalisation of the cost function for logistic regression. For the multi-class case, the cost function will be calculated for each output.

Back Propagation - Cost Function

Back propagation starts with calculating the error of the last hidden layer (L-1). Using that, the second last hidden layer (L-2) can be estimated etc, back to the first layer.

The cost function for neural networks is more complex than the cost function for regularized logistic regressions, with a couple of nested summations.

Let's look at the summations:

The first summation spans over the training set. This is similar to the logistic regression.
The second summation spans over the output nodes. This means that the cost function must consider all different output bins.
The first summation in the regularization term spans over all layers:

The second summation in the regularization term spans over all the input nodes in the current layer, including bias unit (number of columns in the matrix)
The third summation in the regularization terms spans over all the output nodes in the current layer, excluding the bias unit (number of rows in the matrix)

Saturday, 5 October 2019

Machine Learning: Lecture Notes (week 3)

These are some lecture notes from the third week of the online Machine Learning Course at Stanford University.

Classification
Many machine learning problems are concerning Classification. That may be "dirty/clean", malignant/benign tumor, spam/no spam email etc.

In this case, the training output y will be either zero (negative class) or one (positive class) for the two class case.

Logistic Regression / Hypothesis Representation
Logistic regression model is improved with a sigmoid function:

The sigmoid function offers a smooth transition from 0 (z << 0) to 1 (z >> 0).

h_θ(x) is the probability for a positive result, given the input x.

Decision Boundary
A decision boundary is the limit between the set of x that results in h_θ(x)>0.5 and the set of h_θ(x)<0.5

Cost Function

This cost function will give zero cost, if the hypothesis is correct and an infinite cost, if the hypothesis is incorrect.

Simplified Cost Function and Gradient Descent
For the two binary cases for the cost function (y=0 and y=1), the cost function can be written as:

This function be derived using maximum likelihood estimation.

The optimization problem is now to minimize J with respect to the parameters and the observations.

The gradient descent method will update the parameters with the error multiplied by the observations:

Advanced Optimization
Another optimization algorithms are

Gradient Descent
Conjugate Gradient
BFGS
L-BFGS

2-4 doesn't need to select alpha (learning rate) and are often faster than gradient descent. They are also more complex.

Regularization
Too many features in the hypothesis may cause overfitting. That means that the model fits the training set perfectly but misses out in new cases. There are two ways to handle overfitting:

Reducing number of features (manually or using a selection algorithm)
Regularization (keep all features but reduce magnitude of theta)

Regularization will add the sizes of some hypothesis parameters to the cost function. The first theta parameter will not be optimized by convention.

The vector form will be:

Saturday, 28 September 2019

Machine Learning: Lecture Notes (1)

Here are some lecture notes from the second week of the online Stanford course in Machine Learning. I take the notes for myself as a form of rubberducking.

Terminology from week 1 (single feature linear regression):

m is the number of training examples.
n is the number of features.
y is the output from the training set
x are the input features in the training set
x⁽ⁱ⁾ are the input features of the i^th training set.
h_θ =>θ₀+θ₁x₁ + ... + θ_nx_n is the hypothesis function that will be modified to fit the training set.
θ_jare the hypothesis values for each feature.
J is the cost function. A common cost function is the mean square error function
α is the learning rate

Multiple features:

It is common to add a constant feature to the model: θ₀, where x₀ = 1
Using matrices, h_θ (x) = θ^Tx
Gradient Descent Algorithm
For each θ_j:
Add the derivative of the cost function for that feature
Add α times the average of the (hypothesis minus the actual observation) multiplied by that feature value.
Gradient Descent - Feature Scaling
The gradient descent algorithm can converge quicker if the feature range are scaled to the same interval.
Feature scaling is dividing the input variables by the range of the input variables. The new range will be 1, but the values can still be higher.

Mean normalization means subtracting the mean from each value. If you combine those methods, you get:

x_i := (x_i - mean(x))/range(x)

Normal Equation
This is a way to minimize J without iterations.

Gradient Descent or Normal Equation
GD: Need to chose a learning rate. NE: No need to choose a learning rate.
GD: Needs many iterations. NE: Needs no iterations
GD: O(kn^2). NE: O(n^3). NE is heavy for large feature sets!

The next step is a programming assignment where I'll implement linear regression to a simple data set. I won't publish that on my blog, however.

Saturday, 21 September 2019

Machine Learning: Online Stanford Course

I have spent some time on programming this week - but not on my pet projects.

For obvious reasons, I can't elaborate on the programming I've done at work more that I have learnt a lot.

I've joined a dojo excercise on Machine Learning with an online course from Stanford/Coursera. We followed an example where we implemented a simple linear regression.

So far, I think the course is good for a beginner in AI. I seriously consider purchasing a course certificate when I finish that course.

Saturday, 27 April 2019

StockAnalyzer: Applying Machine Learning on Stock Information

After using some neural networks to analyse a small set of data for a friends pet project, I will focus on my own data.

I have a web scraper, StockReader, that collects some information about public stocks that are listed in Sweden. That has been in use for eight years and I have more than 1500 snapshots of a couple of hundreds of stocks.

It was originally written in C++ (the code was terrible, but it was essential for me to learn to build a more complex program) and I later ported that to a twenty-four line python script that is scheduled to run three times a week.

Now, I want to use the data to learn more about machine learning and analysis of time series. I also want to get experience from C#/Visual Studio and Angular JS.

I don't expect to find a magic algorithm that helps me in stock-picking. The purpose of this project is to learn coding, machine learning, C#, Visual Studio and Angular JS.

The Different Technology Areas for the Project
The data is saved in csv files (comma separated values). The data includes stock price, earnings per share, dividends, profit margin, RSI and date for the quarterly reports.

I will create a Windows program based on C# in Visual Studio to populate a mySQL database with the stock data. It would probably be easier using Python but I want to explore new tools and programming languages.

After the data is in the database, I will build a web app using Angular JS, That program will check the data and search for possible stock splits and inconsistent data.

Once the data is corrected, I will analyse it. Since Python has very powerful AI packages, I will use Python to extend the web app above.

I will start by describing the data and some relevant topics for stock markets.

Saturday, 20 April 2019

Machine Learning: Using a Neural Network for Value Prediction

Until now, I've been using a Neural Network for a binary classification. Predicting a continuous value would be more relevant for my case.

I found an example that is using the sckit-learn package.

The Peer Project
This time, I've made some changes to the input data:

I will make the neural network train for the actual training output data instead of a binary representation of that data.
I will sort the training input data in three groups of thirteen values:

The values are initially arranged in triplets (likelihoods for events 1, X and 2) for thirteen samples.

I sorted the array by descending values with a simple modification of the numpy sort command:

I use the MLPRegressor with some different random seeds and 10000 iterations.

The results are a bit disappointing. For some random seeds (and different sets of training/test data), the errors are smaller after 10000 iterations, compared to after one iteration.

Seed 300, 10000 iterations.

Seed 400, 1 iteration

Seed 400, 10000 iterations

Seed 400, 1 iteration

The error messages are reoccurring and indicates that the convergence is too weak. It seems not to be any easy-detectable link between inputs and the magnitude of the output.

Adding more iterations seems not to be the magical solution either.

For 20 000 iterations, the error is smaller than for 10 000 iterations.
But for 40 000 iterations, the error is increasing.

It seems that it is very important to be able to interpret the neural networks!

Checking a Known Data Set

As a sanity check, I've been training the MLPRegressor on a small training set:

This sample should be quite easy for an neural network to train.

I tested with a test size of 0.2 (6 or 7 samples in training set and 1 or 2 samples in test set):

One iteration:

The MLPRegressor wasn't able to converge after one sample.
This makes sense, since it takes quite a number of iterations to converge.

Ten iterations:

I still see the warning about convergence, but the errors are smaller.
This means that the algorithm is converging after all

A Thousand Iterations:

The error isn't shown anymore, and the errors are smaller now.

A Million Iterations:

Here, I'm running twice on the same setup.
The first optimization converged much more poorly than the second one.
The reason for this is probably an unfortunate selection of training data.

Another optimization with a poor distribution of test and training data.
The training data contained all the "zeros", but the test data contained none.
This made a difference between the training and test data.

The example above illustrates how important it is to have a big set of data to test and train on.

I'll finish this part of my Neural Network project for now. The next project will start in the next blog post and cover much more training data.

Saturday, 13 April 2019

Machine Learning: Tweaking the Hidden Layer

In the last blog post, I used a neural network with a hidden layer to try to predict a binary outcome based ion some input data. Now, I'll investigate how different sizes of the hidden layer will affect the output of the network.

Keep in mind that:

I'm learning about neural networks - this blog describes my learning curve and not recommended ways of handling neural networks.
Different shapes and depths of neural networks work for different problems, there is no "Golden" neural network.
The data I'm using for this example is probably random in the sense that there is likely no obvious connection between input and output.

10 Neurons in Hidden Layer:

(the setting from the last blog post): The neural network was well fitted for the training set, but worse than guessing for the test set.

1 Neuron in Hidden layer:

This is practically a single-layer neural network. For the training set, the predictions are accurate for the "ones", but the predictions are 0.5 for the other values. This means that the neural network fails to predict "zero" at all. The prediction "0.5" can be seen as an attempt by the neural network to predict zero.

2 Neurons in Hidden Layer:

The predictions on the training set is better, but two predictions are still at 0.5 (samples 9 and 23). For the test set, the network is confident but wrong in six cases out of eight.

4 Neurons in Hidden Layer:

Now, the neural network fits to the training set (no errors). This has a similar confusion matrix as the neural network above, with 2 neurons in the hidden layer.

8+ Neurons in Hidden Layer:

The network is very confident all the times, but the test predictions are bad. This indicates that the neural network is well fitted to the training data (and that the sigmoid function is narrow enough to make the estimates ones or zeros). Adding more nodes makes the estimates more confident, but not more correct.

Matrix Calculations:

I've made a summary of the matrix calculations below:

Some Findings:

A neural network should have a general understanding of the input, and it should ideally be less confident.

This exercise illustrates one of the dangers with neural networks: it can generate very nice predictions that have low uncertainties, but fail to predict new data. It can give us an illusion of seeing patterns that doesn't exist.

There is a risk that a sum-optimal neural network will tell us what we want to hear. This makes it very important to look at the results with an open and still critical mind set.