Saturday 2 November 2019

Machine Learning: Lecture Notes (Week 6)

Machine Learning Diagnostics
When getting poor test results from a trained machine learning algorithm, some solutions are available:

  • More training examples
  • More/less features
  • Higher/smaller training rate
  • Adding polynomial features


The challenge is how to find which of the solutions will help out in a particular project.

Machine Learning Diagnostics are tests that can help narrowing down which actions listed above may help improving the performance of a ML project.

How to Evaluate a Hypothesis
A hypothesis should be able to generalize to new examples that aren't in the training set. An over fitted system will preform poorly with respect to this.

One way of verifying a hypothesis is to divide the dataset into a training set and a test set. After training a neural network, apply the hypothesis to the test set and check whether the error is acceptable. ~70% of data can be training set.

Model Selection with Train/Validation/Test sets
Degree of polynomial for regression can be determined by fitting the training data to a set of polynomials with different degrees. By evaluating the polynomials on a test set, a model is selected. However, that may not be a good generalization since that polynomial is fitted to the test set (the parameter d) and may not be generalized enough.

Instead, divide the data set into three parts:

  • Training set ~60%
  • Cross validation set (CV) ~20%
  • Test set ~20%

Calculate corresponding cost functions.

Train the data to minimize the training set cost function. Test the models using the cross validation set and select the model with the lowest  cross validation error. Estimate the error using the test set.

Bias vs Variance
High Bias - Underfit
High Variance - Overfit

Calculate the Training and Cross Validation error. Plot error vs degree of polynomial.

Bias: Both training and CV errors are high
Variance: CV errors are high, but training errors are low.

Regularization
For a polynomial as a hypothesis, a high value for lambda (regularization) implies that all non-bias parameters will be close to zero. A low value for lambda gives an overfitted system. How to chose the regularization parameter?


  • Calculate the cost functions for the training set, the cross validation set and the test set (mean-square error).
  • Try different lambdas (doubling each step) and minimize the cost functions. 
  • Use the cross validation set and check which of them has the lowest error on the cross validation set.
  • Finally, calculate the test error on the test data.

Readers that follow the course may note that I've omitted some parts of the lectures. I am doing that for a pragmatic reason - time. I take these notes in order to help me remember (rubberduck) the important lessons. If you want better coverage of the course. I recommend taking the course.

No comments:

Post a Comment