Saturday, 23 November 2019

Machine Learning: Lecture Notes (Week 9)

Gaussian Distribution (Normal Distribution) is used for anomaly detection. I'm quite familiar with this so I won't discuss this in my blog.

The Parameter Distribution Problem
Given a data set, estimate which distribution gave this data set using the Maximum Likelihood method:

  • Estimate that the average (mu) to be the same as the average.
  • Estimate that the variance (sigma) to be the same as the standard error squared.

Anomaly Detection Algorithm
Given a training set of m samples, each with n features, where the samples follow the normal distribution, p(x) can be calculated as:

  • Choose features that might indicate anomalous examples.
  • Fit parameters for the different features such as mean values and variance.
  • Calculate p(x). If p(x) is smaller than a threshold, an anomaly is detected.


Put simply, the likelihood for one example is calculated. If that likelihood is too small, we probably have an anomaly.

This is the last lecture note for this course, since I've completed the course. The next blog post will discuss a simple logistic regression example for apartment prices.

No comments:

Post a Comment