In this blog, I will talk about two models that are commonly used for classification tasks: Logistic Regression and Softmax Regression.
Logistic Regression is commonly used to estimate the probability that an instance belongs to a particular class. It’s a binary classifier. Like a Linear Regression model, a Logistic Regression model computes a weighted sum of the input features, but instead of outputting the result like the Linear Regression does, it outputs the logistic of the result (like the following equation).
The objective of training is to set the parameter vector θ so that the model estimates high probabilites for positive instances and low probabilities for negative instances. This idea is captured by the following cost function:
This function makes sense since -log(t) grows very large if t approaches 0, so the cost will be large if the model estimates a probability close to 0 for a positive instance; and it will also be very large if the model etimates a probability close to 1 for a negative instance.
Let’s use the iris dataset and try to build a classifier to detect the Iris-Virginica type based only on the petal width feature.
The classifier predicts that if a flower’spetal width is 1.7 cm, then it’s an Iris-Virginica; if a flower’spetal width is 1.5 cm, then it is not an Iris-Virginica.
The Logistic Regression model can be generalized to support multiple classes directly, without having to train and combine multiple binary classifiers. This is called Softmax Regression, or Multinomial Logistic Regression.
How it works? When given an instance x, the Softmax Regression model first computes a score for each class k, then estimates the probability of each class by applying the softmax function to the scores.
Softmax score for class k:
Note that each class has its owm dedicated parameter vector . Once you have computed the score of each class for the insrance x, you can estimate the probability that the instance belongs to class k by running the scores through the softmax function (as follows): it computes the exponential of every score, then normalizes them.
The objective is to have a model that estimates a high probability for the target class. Minimizing the cost function called cross entropy should lead to this objective since it penalizes the odel whenit estimates a low probability for a target class. Cross entropy is frequentyly used to measure how well a set of estimated class probabilities match the target classes.
Cross entropy cost function:
is equal to 1 if the target class for the ith instance is k; otherwise, it’s equal to 0.
Notice that when K=2, the cost function is equivalent to the Logistic Regression’s cost function.
Let’s use Softmax Regression to classify the iris flowers into all three classes.
If a flower’s petal length is 5 cm and its petal width is 2 cm, it is an Iris-Virginica with 94.2% probability.
In this blog, I resumed Logistic Regression, Softmax Regression and their usecases via Python. Hope it’s useful for you.
- Aurélien Géron. 2017. “Chapter 4 Training Models” Hands-On Machine Learning with Scikit-Learn & TensorFlow p 136-144
- Fotomanie, “Iris”, pixabay.com. [Online]. Available: https://pixabay.com/photos/flower-iris-wild-flower-purple-76336/