A good way to reduce overfitting is to regularize the model, which means the
fewer degrees of freedom it has, the harder it will be for it to overfit the
data. For a linear model, regularization is achieved by constraining the
**weights of the model**. In this blog, I will talk about how to constrain the
weights of the following models:

- Ridge Regression
- Lasso Regression
- Elastic Net

## Ridge Regression

*Ridge Regression* is a regularized version of Linear Regression: a
*regularization term*
is added to the cost function. Note that the regularization term should only be
added to the cost function during training.

The hyperparameter *α* controls how much you want to regularize the model. If
*α* = 0, then Ridge Regression is Linear Regression. If *α* is pretty large,
then all weights end up very close to zero and the result is a flat line going
through the data’s mean.

Ridge Regression cost function:

We can define **w** as the vector of feature weights, then the regularization
term is equal to
,
where
represents the *l2* norm of the weight vector.

Here is how to perform Ridge Regression with `scikit-learn`

:

## Lasso Regression

*Least Absolute Shrinkage and Selection Operator Regression* (simply called
*Lasso Regression*) is another regularized version of Linear Regression, it
adds a regularization term to the cost function, but uses the *l1* norm of the
weight vector instead of half the square of the *l2* norm.

Lasso Regression cost function:

An important characteristic of lasso Regression is that **it tends to
completely eliminate the weights of the least important features**.

Here is how to perform Lasso Regression with `scikit-learn`

:

## Elastic Net

*Elastic Net* is a middle ground between Ridge Regression and Lasso Regression.
The regularization term is a simple mix of theirs, we can also control the mix
ratio *r*. When *r* = 0, Elastic Net is Ridge Regression; when *r* = 1, Elastic
Net is Lasso Regression.

Elastic Net cost function:

Here is how to perform Lasso Regression with `scikit-learn`

:

## Conclusion

So when should we use Linear Regression, Ridge Regression, Lasso Regression or Elastic Net?

It’s almost always preferable to have at least a little bit of regularization, so we should avoid plain Linear Regression. Ridge Regression is a good choice by default. However, if you suspect that only a few features are useful, you should choose Lasso Regression or Elastic Net, because they tend to completely eliminate the weights of the least important features. If the number of features is greater than the number of training instances or if several features are strongly correlated, Elastic Net is preferred over Lasso Regression since Lasso may behave erratically.

## Reference

- Aurélien Géron. 2017. “Chapter 4 Training Models”
*Hands-On Machine Learning with Scikit-Learn & TensorFlow*p 129-136 - stevepb, “Cheese”,
*pixabay.com*. [Online]. Available: https://pixabay.com/photos/pawn-chess-pieces-strategy-chess-2430046/