By Andrew Ng
1. Logistic Regression
1.1 Visualizing Data Part
└ plotData function
1.2 Advanced Optimization Part
└ mapFeature function
└ costFunctionReg function
└ fminunc function
1.3 Decision Boundary and Prediction Part
└ plotDecisionBoundary function
└ predict function
1. Logistic Regression
We'll implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you're the product manager of factory and you have the test results for some microchips on two different tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.
1.1 Visualizing Data Part
plotData(X, y);
Plot of training data |
ex2data2.txt |
This function plots the data points X and y into a new figure. In other words, it plots the data points with + for the positive examples and o for the negative examples. X is assumed to be a M x 2 matrix.
1.2 Advanced Optimization Part
First, you're given a dataset with data points that are not linearly separable. However, you would still like to use logistic regression to classify the data points. To do so, you introduce more features to use. In particular, you add polynomial features to our data matrix (similar to polynomial regression).
Second, to get optimized theta, we'll use an advanced optimization function called fminunc(i.e., function minimization unconstrained), rather than naive gradient descent algorithm. It has a clever inner-loop called a line search algorithm that automatically tries out different values for learning rate alpha and automatically picks a good learning rate alpha.
1.2.1 mapFeature function
One way to fit the data better is to create more features from each data point. In this function, we'll map the features into all polynomial terms of $x_1$ and $x_2$ up to the sixth power.
As a result of this mapping, out vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot. While the feature mapping allows us to build a more expressive classifier, it also more susceptible to overfitting. So, we need to implement regularized logistic regression to fit the data.
1.2.2 costFunctionReg function
We'll implement code to compute the cost function and gradient for regularized logistic regression. Before starting with the actual cost function, recall that the logistic regression hypothesis is defined as: $$h_\theta(x) = g(\theta^Tx) ,$$ where function $g$ is the sigmoid function which defined as: $$g(z) = {1\over{1+e^{-z}}}$$ Recall that the regularized cost function in logistic regression is:
Notice that we should not regularize the parameter $theta_0$. The gradient of the cost function is a vector where the $j^{th}$ element is defined as follows:
1.3 Decision Boundary and Prediction
After learning the parameters $theta$, now we plot the non-linear decision boundary by computing the classifier's predictions on an evenly spaced grid and then draw a contour plot of where the predictions change from y=0 to y=1.
plotDecisionBoundary(theta, X, y);
Training data with decision boundary ($\lambda=1$) |
No regularization (Overfitting) ($\lambda=0$) |
Too much regularization (Underfitting) ($\lambda=100$) |
1.3.1 plotDecisionBoundary function
This function plots the data points X and y into a new figure with the decision boundary defined by theta.
1.3.2 predict function
To predict whether the label is 0 or 1 using learned logistic regression parameters theta
REFERENCES
[1] Machine Learning, Stanford University, Andrew Ng, Coursera
댓글
댓글 쓰기