Saturday 30 March 2019

Machine Learning: Rubber Ducking a Neural Network

In the book The Pragmatic Programmer, a programmer used a rubber duck to help him understand his code issues.

He described his code to the duck, line by line, easy enough for a rubber duck to understand. After describing a couple of lines, he understood the issue himself.

I'll try to do this for a simple neural network program that I have. It is based on a tutorial for a very simple neural network:
The indata is weighted and summed up to generate a prediction. In the example
In the example above, a farmer uses a dataset of eight flowers that are red or blue. Each flower has a width and a length.
The length on the X axis and the width on the Y axis.
The training data consists of blue flowers (0) and red flowers(1).
The gray flowers are to be classified by the neural network.
I start by guessing the bias and the weight factors for the neural network.
After that, I iterate over all known flowers and estimate the color using the weights. I sum up the errors as a metric of the progress of the neural network. The new weights and bias are adjusted by calculating some derivatives (slope of a cost function).

Each iteration will try to bring the predictions closer to the target:

I ran the program several times for the same dataset but with different number of iterations over the training data.
Parameters Flower 0 Flower 1 Training Data
Iterations w1 w2 b Prediction Cost Prediction Cost Cost
0 (guess) -0.134 -0.033 -0.253 0.381 0.145 0.291 0.502 NaN
1 0.109 0.142 0.165 0.616 0.379 0.690 0.096 2.51
10 0.345 -0.334 -0.954 0.317 0.100 0.566 0.188 1.742
100 0.900 -0.530 -3.126 0.091 8.228e-03 0.598 0.162 1.394
1 000 1.576 -0.316 -5.865 21.51e-03 462-5e-06 0.713 82.45e-03 1.316
10 000 2.250 -0.102 -8.392 5.948e-03 3.538e-06 0.836 26.78e-03 1.316
10 000 2.250 -0.103 -8.391 5.945e-03 3.536e-06 0.836 26.73e-03 1.316
10 000 2.251 -0.106 -8.390 5.945e-03 3.534e-06 0.837 26.66e-03 1.315
100 000 2.907 0.120 -10.78 1.833e-03 3.360e-06 0.918 6.660e-03 1.239
1 000 000 3.547 0.373 -13.12 0.595e-03 3.543e-06 0.961 1.491e-03 1.192

As expected, the first iteration is basically a guess. It takes a lot of iterations to get predictions that are close to the actual values. For flower 0 (Blue), it takes thousands of iterations and for flower 1 (red), it takes hundreds of thousands iterations. 

I also ran a prediction on the same training data but with a test flower that had a sightly shorter blade. For that flower, it was much harder to predict for the neural network (it said 51% red). As one can see from the training data, the rightmost gray (unknown) flower is surrounded by red ones. However, it is still possible that the blue ones in the middle can reach out to that flower. For the leftmost gray flower, it is easier to predict where it belongs.

Iterating hundreds of thousands of times takes some time. I need to find better ways to estimate the new parameters, such as optimizations and built-in functions.

I see three factors that will make machine learning difficult:
  • Machine learning takes a lot of computational resources.
  • The data is often imperfect due to bad sensors, operator errors and other factors
  • The world it self is often irregular with stochastic processes and unknown unknowns that will confuse the learning of neural networks.
Why code the algorithm myself instead of using any of the existing ones? The purpose of this experiment is to learn neural networks from the ground, not making cool predictions without understanding what I'm doing. 

In the next blog post, I'll try to use some real world data to see if there can be any predictions of the outcome.

Saturday 23 March 2019

Machine Learning Project

My next project will explore machine learning and neural networks.

I'll play around with some real data for a simple machine learning project, where I want to build a simple neural network to estimate the order of magnitude for four output numbers. The input will be thirteen pairs of float numbers between 0 and 1.

A simple neural network can look like this:
https://www.python-course.eu/neural_networks_with_python_numpy.php
The input are the yellow nodes, the internal layers are the green nodes and the output are the red nodes.

In my case, the input will be an array of 26 elements and the output will be an array of four elements. The size of the input may result in numerous matrix calculations, but it should be OK if I use the built-in algorithms for matrix multiplications.

I don't know what a good neural network looks like in terms of number of internal layers and their respective sizes. If the calculations are not too heavy, I'll just play around.

First, I need to learn the basics of neural networks in Python. I will start by following a basic tutorial and code the same network to get the basic concept.


Saturday 16 March 2019

TravelTimeCalculator: Fixing Bugs, Testing on Physical Phone and Closing the Project

So far, I've been testing the app on a simulated device from Android Studio.

The next step is to test the app on a real device. I'm using a Samsung SM-J500FN (Android 6.01, API23). That phone is some four years old and I wasn't sure how responsive the app would be on that phone.

Once I installed the app, it worked like a charm. The app is responsive and runs smoother than when running in the virtual devices. That makes sense, since a virtual Android device is a full-stack emulation of the application processor of a smartphone. Having that running on a Windows PC will require quite a lot of system resources.



The last bug to fix was the apps inability to find valid values from the database when using reversed directions. I fixed it by checking whether the directions are in direct or reversed mode in the database.


Why TravelTimeCalculator Won't be Released
The purpose of this project for me is to learn Android - not to release an app that can be downloaded.

An app that is available on Google Play must have been tested thoroughly on several phones, different settings and lots of configurations. And once released, there should be someone to fix the incoming bug reports. I don't have time and motivation to do that.

Further, If I would release it on Google Play, I'd need to provide a Directions API key so that the user can get any directions. And if the app would be popular and generate lots of requests, I'd get a bill from Google. In any case, I need to consider whether to make that app free and risk getting a bill or charge a small fee for the app. The quality of the app can't justify charging for it.

Therefore, I'll keep the app as it is. Anyone can download it from Github and build it him/herself.

In the spring, I'll continue with TrafficControl and also have a small project on AI/Machine Learning.

Saturday 9 March 2019

TravelTimeCalculator: Applying Settings

In this post, I'll use the values in the settings to estimate the cheapest travel mode.

The DBHelper method getAllDistanceDuration is expanded to return cost as well. It is using the following parameters:
  • Origin 
  • Destination
  • Emission cost (from settings)
  • Time cost (from settings)
  • Start and stop time for bicycling (from settings)
  • Start and stop time for driving (from settings)
  • Cost per kilometer for driving (from settings)
  • Emissions per kilometer for driving (from settings)
  • Ticket cost for transit (from settings)
  • Emissions per kilometer for transit (from settings)
I query the distances and durations from the database:

The values are stored in an array of integers.

The settings are retrieved from SharedPreferences.
I calculate the relevant costs and compare them with each other. The smallest cost is returned from the function,
Calculation of costs from the database. 
Finally, I update the marker with the shortest cost and the corresponding mode of transport.

Finally, some notes about bugs for this project:

One issue was unrealistic calculations for travel mode; "DRIVING", that is solved now.

Two issues are remaining:
The second one will be tricky and the first one probably needs some Googling.

Once those two issues are solved and I've cleaned the code, I'll stop developing this app. In the next blog post, I'll discuss the journey of learning to build an Android App, and why this app won't reach Google Play for now. 

Saturday 2 March 2019

TravelTimeCalculator: Implementing Settings

The first step is to add the setting menus for the app by adding the elements to the xml file:
I'm keeping the settings from the example for this moment, since I may need them later to understand how to bind the settings to the app code.
After that, I want to add the current setting below the setting title. This already works for the comparison mode setting (Distance/Duration/Cost), so let's look at that setting.

When selecting a new comparison mode (Cost, Distance or Duration), the new value is sent as an integer.
When changing that setting, onPreferenceChange is trigged with two parameters: "Compare duration, distance or Cost? Duration" and an integer value that corresponds to the new value (1 for Cost, 2 for Distance and 3 for Duration).

The first part of the string corresponds to the title tag of the item and the second tag corresponds to the value of the setting before it is changed. The integer corresponds to the new value.

Defining settings isn't enough - something must happen when changing them. When a setting is changed, the corresponding entry in the settings screen is changed. This is done by binding the values.

When any setting is changed, a method is invoked that saves all the settings to variables inside TravelTimeHandler. Changing parameters directly in another class is a dirty solution, but it will do for now.

The next step is to actually use the parameters.