Cute Trains and Wide Dreams: Machine Learning: Lecture Notes (week 3)

Saturday, 5 October 2019

Machine Learning: Lecture Notes (week 3)

These are some lecture notes from the third week of the online Machine Learning Course at Stanford University.

Classification
Many machine learning problems are concerning Classification. That may be "dirty/clean", malignant/benign tumor, spam/no spam email etc.

In this case, the training output y will be either zero (negative class) or one (positive class) for the two class case.

Logistic Regression / Hypothesis Representation
Logistic regression model is improved with a sigmoid function:

The sigmoid function offers a smooth transition from 0 (z << 0) to 1 (z >> 0).

h_θ(x) is the probability for a positive result, given the input x.

Decision Boundary
A decision boundary is the limit between the set of x that results in h_θ(x)>0.5 and the set of h_θ(x)<0.5

Cost Function

This cost function will give zero cost, if the hypothesis is correct and an infinite cost, if the hypothesis is incorrect.

Simplified Cost Function and Gradient Descent
For the two binary cases for the cost function (y=0 and y=1), the cost function can be written as:

This function be derived using maximum likelihood estimation.

The optimization problem is now to minimize J with respect to the parameters and the observations.

The gradient descent method will update the parameters with the error multiplied by the observations:

Advanced Optimization
Another optimization algorithms are

Gradient Descent
Conjugate Gradient
BFGS
L-BFGS

2-4 doesn't need to select alpha (learning rate) and are often faster than gradient descent. They are also more complex.

Regularization
Too many features in the hypothesis may cause overfitting. That means that the model fits the training set perfectly but misses out in new cases. There are two ways to handle overfitting:

Reducing number of features (manually or using a selection algorithm)
Regularization (keep all features but reduce magnitude of theta)

Regularization will add the sizes of some hypothesis parameters to the cost function. The first theta parameter will not be optimized by convention.

The vector form will be:

Cute Trains and Wide Dreams

Saturday, 5 October 2019

Machine Learning: Lecture Notes (week 3)

No comments:

Post a Comment