기본 콘텐츠로 건너뛰기

Logistic Regression


By Andrew Ng

1. Logistic Regression 
    1.1 Visualizing Data Part
            └ plotData function
    1.2 Advanced Optimization Part
            └ mapFeature function
            └ costFunctionReg function
            └ fminunc function    
    1.3 Decision Boundary and Prediction Part
            └ plotDecisionBoundary function
            └ predict function




1. Logistic Regression

We'll implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you're the product manager of factory and you have the test results for some microchips on two different tests. From these two tests, you would like to determine whether the microchips should be accepted or rejected. To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.


1.1 Visualizing Data Part

plotData(X, y);
Plot of training data
ex2data2.txt
1.1.1 plotData function
This function plots the data points X and y into a new figure. In other words, it plots the data points with + for the positive examples and o for the negative examples. X is assumed to be a M x 2 matrix.



1.2 Advanced Optimization Part
First, you're given a dataset with data points that are not linearly separable. However, you would still like to use logistic regression to classify the data points. To do so, you introduce more features to use. In particular, you add polynomial features to our data matrix (similar to polynomial regression).
Second, to get optimized theta, we'll use an advanced optimization function called fminunc(i.e., function minimization unconstrained), rather than naive gradient descent algorithm. It has a clever inner-loop called a line search algorithm that automatically tries out different values for learning rate alpha and automatically picks a good learning rate alpha.

1.2.1 mapFeature function
One way to fit the data better is to create more features from each data point. In this function, we'll map the features into all polynomial terms of $x_1$ and $x_2$ up to the sixth power.
As a result of this mapping, out vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector. A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot. While the feature mapping allows us to build a more expressive classifier, it also more susceptible to overfitting. So, we need to implement regularized logistic regression to fit the data.

1.2.2 costFunctionReg function
We'll implement code to compute the cost function and gradient for regularized logistic regression. Before starting with the actual cost function, recall that the logistic regression hypothesis is defined as: $$h_\theta(x) = g(\theta^Tx) ,$$ where function $g$ is the sigmoid function which defined as: $$g(z) = {1\over{1+e^{-z}}}$$ Recall that the regularized cost function in logistic regression is:
Notice that we should not regularize the parameter $theta_0$. The gradient of the cost function is a vector where the $j^{th}$ element is defined as follows:



1.3 Decision Boundary and Prediction
After learning the parameters $theta$, now we plot the non-linear decision boundary by computing the classifier's predictions on an evenly spaced grid and then draw a contour plot of where the predictions change from y=0 to y=1.

plotDecisionBoundary(theta, X, y);
Training data with decision boundary ($\lambda=1$)
No regularization (Overfitting) ($\lambda=0$)
Too much regularization (Underfitting) ($\lambda=100$)

1.3.1 plotDecisionBoundary function
This function plots the data points X and y into a new figure with the decision boundary defined by theta.

1.3.2 predict function
To predict whether the label is 0 or 1 using learned logistic regression parameters theta



REFERENCES
[1] Machine Learning, Stanford University, Andrew Ng, Coursera

댓글

이 블로그의 인기 게시물

Pattern Discovery in Data Mining

Coursera Illinois at Urbana-Champaign by Jiawei Han 2015.03.19 CONTENT 1. A brief Introduction to Data Mining 2. Pattern Discovery : Basic Concepts 3. Efficient Pattern Mining Methods 4. Pattern Evaluation 5. Mining Diverse Patterns 6. Constraint-Based Pattern Mining 7. Sequential Pattern Mining 8. Graph Pattern Mining 9. Pattern-Based Classification 10. Exploring Pattern Mining Applications Lecture 1 : A brief Introduction to Data Mining - We'are drowning in data but starving for knowledge ( a lot of data are unstructured ) - Data mining : a misnomer ! -> Knowledge mining from data - Extraction of interesting patterns (non-trivial, implicit, previously unknown and potentially useful) or knowledge from massive data. - Data mining is a interdisciplinary field (machine learning, pattern recognition, statistics, databases, big data, business intelligence..) Knowledge Discovery (KDD) Process Methodology View: Confluence of Multiple Disciplines Lecture 2 : Pattern Discovery : Ba...

Vector Space Model

Motivation When you want to find some information by using Search Engines, you have to make a query used for search. Unfortunately, since you don't know exactly what it means, your query will be ambiguous and not accurate. Therefore, Search Engines give you the information in a ranked list rather than the right position. Intuition In order to make a ranked list, you need to calculate the similarity between the query and documents based on terms or words. One of the calculation of similarity is dot product on a vector space. In the vector space, there are many documents with respect to word dimensions The first to rank is d2, because to see with eyes it's the most similarity with the query. Problem How do we plot those vectors wonderfully and nicely and very very fairly ? - How do we define the dimension ? - How do we place a document vector ? - How do we place a query vector ? - How do we match a similarity ? Consideration 1. The frequency of each word of Query. First, Score in...

Operating System

Operating System Three Phases of OS History Phase 1       Summary :  H/W 비싸고, 인건비는 저렴        Goal : 어떻게 하면 값비싼 CPU를 효율적으로 사용할 수 있을까 ? Phase 1-1 : Operator as OS 하드웨어(애니악 - 진공관 CPU,마그네틱 코어 Memory)가 매우 비싸고, 그에 반해 인건비가 매우 저렴하기 때문에 CPU Utilization을 최대화 시키는게 목표였다. 초창기 컴퓨터 당시 Operator 역할은 사람이었다. 카드 덱 수령, 카드 덱 시스템에 로딩, 수행결과를 프린트, 결과물을 사용자에 전달 등등 Phase 1-2 : Simple batch monitor Human Operator에 의해 job-to-job 전환 속도가 매우 느린점으로 인해 CPU적 관점으로 컴퓨터 시스템의 비효율성을 극복(Utilization 최대화)하기 위해서 최초의 Operating System Software가 등장했다. I/O machine이 추가 되었다. Phase 1-3 : Batch monitor ( OS + I/O device controller ) 하지만, I/O를 추가한 후, I/O를 하고 있을 때 CPU가 idle하게 되었고, 또 다시 CPU Utilization 문제가 대두되었다. 이와 같은 문제를 해결하기 위해 I/O Channel, I/O device controller가 등장했다. I/O Channel은 CPU를 대신해서 I/O device의 operation을 관장해준다. 그 대신 I/O operation의 시작과 끝만 CPU에게 알려준다. ( Asynchronous I/O 방법으로써 Interrupt mechanism 등장 ) 이러한 mechanism 으로 CPU와 I/O가 서로 overlap되며 사용가능해졌다. Phase 1-4 : Multi...