Suppose you ask a complex question to thousands of random people, then aggregate their answers. In many cases you will find that this aggregated answer is better than an expert’s answer. Similarly, if you aggregate the predictions of a group of predictors, you will often get better predictions than with the best individual predictor. This technique is called Ensemble Learning, an Ensemble Learning algorithm is called an Ensemble method. In this blog I’ll talk about following Ensemble Learning algorithms:
- Voting Classifiers
- Bagging and Pasting
- Random Forests
Voting Classifiers
Imagine that you have trained a few classifiers, each one achieves about 80% accuracy. You may have a Logistic Regression classifier, an SVM classifier, a Random Forest classifier, a K-Nearest Neighbors classifier and perhaps a few more. A way to create a better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes.
The following codes create and train a voting classifier in Scikit-Learn, composed of 3 diverse classifiers:
If all classifiers are able to estimate class probabilities, then you can let
Scikit-Learn predict the class with the highest class probability, averaged
over all the individual classifiers. This is called soft voting. It often
achieves higher performance than hard voting because it gives more weight to
highly confident votes. You only need to replace voting='hard'
with
voting='soft'
and ensure that all classifiers can estimate class
probabilities. For SVC classifier, we need to set its probability
hyperparameter to True
.
Bagging and Pasting
Another approach is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging. When sampling is performed without replacement, it’s called pasting.
Scikit-Learn offers a simple API for both bagging and pasting with the
BaggingClassifier
class. The following codes train an ensemble of 500
Decision Tree classifiers, each train on 100 training instances randomly
sampled from the training set with replacement.
The BaggingClassifier
automatically performs soft voting if the base
classifier can estimate class probabilities.
Random Forests
A Random Forest is an ensemble of Decision Trees, generally trained via the
bagging method, typically with max_samples
set to the size of the training
set. Instead of building a BaggingClassifier
and passing it a
DecisionTreeClassifier
, you can instead use the RandomForestClassifier
class, which is more convenient and optimized for Decision Trees. The following
codes train a Random Forest classifier with 500 trees:
With a few exceptions, a RandomForestClassifier
has all the hyperparameters
of a DecisionTreeClassifier
plus all the hyperparameters of a
BaggingClassifier
to control the ensemble itself. The Random Forest algorithm
introduces extra randomness when growing trees; instead of searching for the
very best feature when splitting a mode, it searches for the best feature among
a random subset of features. This results in a greater tree diversity, which
trades a higher bias for a lower variance.
Conclusion
In this blog, I introduced 3 Ensemble Learning algorithms: Voting Classifiers, Bagging and Pasting, Random Forests. Hope it’s useful for you.
Reference
- Aurélien Géron. 2017. “Chapter 7 Ensemble Learning and Random Forests” Hands-On Machine Learning with Scikit-Learn & TensorFlow p 183-193
- yunje5054, “Ensemble music played saxophone”, pixabay.com. [Online]. Available: https://pixabay.com/photos/ensemble-music-played-saxophone-619258/