Saturday 16 November 2019

Machine Learning: Lecture Notes (Week 8)

Unsupervised Learning
Unsupervised learning is training on a training set X without labels y.

Clustering algorithms finds clusters of similar features.

K-means
K-means is the most popular clustering algorithm.

It takes two random points as cluster centroids.
  • Ignore the bias feature.
  • Repeat:
    • For each sample in the training set, it assigns it to the closest centroid.
    • Each centroid is moved to the average location of its assigned training samples.


Optimization Objective
K-means can get stuck in a local optimum. Use random initialization several time to overcome this.

Elbow method for selecting number of clusters.

Create a curve of the cost function versus the number of clusters. The "elbow" will indicate the suitable number of clusters.

It can be necessary to reduce the number of features. Sometimes, features can be seen as redundant.

This can be done by projecting data from a set of higher dimensions to a plane of lower dimensions.

Principal Component Analysis
PCA tries to reduce the error between the data points and its projections.

PCA should not be confused with liner regression

Start with mean normalization (subtracting average from each value). The scale of features must be on a comparable scale.

Calculate coviarance matrix SIGMA = 1/m x^i {x^i}^T
Compute Eigenvectors U, S, V of SIGMA using singular value decomposition (SVD)

This gives U, a matrix of n column vectors. The first k column vectors are the new planes for the PCA.

U_reduce = U(1:k,:)

z=U_reduce^T x

Restoring:
X_approx = U_reduce x



Anomaly Detection - when to use
Use AD when there are few positive examples
Many different and unpredictable types of anomalies.
What features to use?
Use histograms to see whether it is a gaussian feature or not. If it is gaussian, it is suitable for anomaly detection. If it isn't, transform it to a gaussian distribution.

Recommender Systems
One approach can be to use a version of a regression analysis.

Collaborative Filtering
In this case, we don't have any information about the movies, such as romance/action etc. The users has specified which movies they like (theta values).

Alternate optimizing for indata and model parameters to get lower errors.

1. Initialize x and theta to smal random values.
2. Minimize J(x, theta) using gradient descent

No comments:

Post a Comment