Suppose you ask a complex question to thousands of random people, then aggregate their answers. In many cases you will find that this aggregated answer is better than an expert’s answer. Similarly, if you aggregate the predictions of a group of predictors, you will often get better predictions than with the best individual predictor. This technique is called Ensemble Learning, an Ensemble Learning algorithm is called an Ensemble method. In this blog I’ll talk about following Ensemble Learning algorithms:
- Voting Classifiers
- Bagging and Pasting
- Random Forests
Imagine that you have trained a few classifiers, each one achieves about 80% accuracy. You may have a Logistic Regression classifier, an SVM classifier, a Random Forest classifier, a K-Nearest Neighbors classifier and perhaps a few more. A way to create a better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes.
The following codes create and train a voting classifier in Scikit-Learn, composed of 3 diverse classifiers:
If all classifiers are able to estimate class probabilities, then you can let
Scikit-Learn predict the class with the highest class probability, averaged
over all the individual classifiers. This is called soft voting. It often
achieves higher performance than hard voting because it gives more weight to
highly confident votes. You only need to replace
voting='soft' and ensure that all classifiers can estimate class
probabilities. For SVC classifier, we need to set its
Bagging and Pasting
Another approach is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging. When sampling is performed without replacement, it’s called pasting.
Scikit-Learn offers a simple API for both bagging and pasting with the
BaggingClassifier class. The following codes train an ensemble of 500
Decision Tree classifiers, each train on 100 training instances randomly
sampled from the training set with replacement.
BaggingClassifier automatically performs soft voting if the base
classifier can estimate class probabilities.
A Random Forest is an ensemble of Decision Trees, generally trained via the
bagging method, typically with
max_samples set to the size of the training
set. Instead of building a
BaggingClassifier and passing it a
DecisionTreeClassifier, you can instead use the
class, which is more convenient and optimized for Decision Trees. The following
codes train a Random Forest classifier with 500 trees:
With a few exceptions, a
RandomForestClassifier has all the hyperparameters
DecisionTreeClassifier plus all the hyperparameters of a
BaggingClassifier to control the ensemble itself. The Random Forest algorithm
introduces extra randomness when growing trees; instead of searching for the
very best feature when splitting a mode, it searches for the best feature among
a random subset of features. This results in a greater tree diversity, which
trades a higher bias for a lower variance.
In this blog, I introduced 3 Ensemble Learning algorithms: Voting Classifiers, Bagging and Pasting, Random Forests. Hope it’s useful for you.
- Aurélien Géron. 2017. “Chapter 7 Ensemble Learning and Random Forests” Hands-On Machine Learning with Scikit-Learn & TensorFlow p 183-193
- yunje5054, “Ensemble music played saxophone”, pixabay.com. [Online]. Available: https://pixabay.com/photos/ensemble-music-played-saxophone-619258/