1. Introduction
In this tutorial, we’ll review Random Forests (RF) and Extremely Randomized Trees (ET): what they are, how they are structured, and how they differ.
2. Definitions
Random Forest and Extremely Randomized Trees belong to a class of algorithms known as ensemble learning algorithms. Ensemble learning algorithms utilize the power of many learning algorithms to execute a task. For example, in a classification task, an ensemble learning algorithm may aggregate the predictions from several different classifiers to make a final prediction.
This concept is based on the notion that using multiple learning algorithms can lead to a better final prediction. Next, let’s look at Random Forests and Extremely Randomized Trees in detail.
3. Random Forest
When we talk of Random Forests, we’re referring to learning algorithms consisting of multiple decision trees. A Random Forest constructs multiple decision trees (a forest) during training time over different subsets of the training data. Likewise, the idea is still the same, the results from several combined trees are likely to be better than those from a single tree.
Given a dataset with several features, an RF algorithm will sample subsets of observations with different features from the dataset. A decision tree is constructed over this subset. This process of sampling subsets with replacements is known as bootstrapping.
Note that, when constructing the decision tree, RF will choose the most optimal split at each node. Next, this process is repeated on a different subset of the data with varying features until the specified number of trees has been constructed.
After obtaining results from all the trees, the final prediction will be obtained through majority voting for classification or averaging for regression. For instance, let’s consider an RF classification task with six trees. Suppose five of these trees predict class 0. By virtue of majority voting, the final class is assigned as 0:
3.1. Advantages and Disadvantages
Random Forests are robust, working well on both regression and classification tasks. Additionally, RF algorithms work well with large datasets and different data types, such as numerical, binary and categorical.
However, the complexity and computational time are relatively high when the number of trees is high, resulting in a longer training time. In addition, the sampling of subsets may introduce some bias.
3.2. Applications
RF can be applied to almost any classification or regression task. However, common application areas are Remote Sensing, stock market prediction, fraud prediction, sentiment analysis, and product recommendation.
4. Extremely Randomized Trees
Extremely Randomized Trees, also known as Extra Trees, construct multiple trees like RF algorithms during training time over the entire dataset. During training, the ET will construct trees over every observation in the dataset but with different subsets of features.
It is important to note that although bootstrapping is not implemented in ET’s original structure, we can add it in some implementations. Furthermore, when constructing each decision tree, the ET algorithm splits nodes randomly.
4.1. Advantages and Disadvantages
The main advantage of Extra Trees is the reduction in bias. This is in terms of sampling from the entire dataset during the construction of the trees. Different subsets of the data may introduce different biases in the results obtained, hence Extra Trees prevents this by sampling the entire dataset.
Another advantage of Extra Trees is that they reduce variance. This is a result of the randomized splitting of nodes within the decision trees, hence the algorithm is not heavily influenced by certain features or patterns in the dataset.
4.2. Applications
Similarly, we can apply Extra Trees to classification and regression tasks like Random Forests. In some instances, Extra Trees are also used for feature selection. Here, an Extra Trees classifier is used to pick features that matter the most.
5. Differences and Similarities
RFs and ETs are similar in that they both construct multiple decision trees to use for the task at hand, whether classification or regression. However, subtle differences exist between the two.
Let’s look at these:
Random Forest
Extremely Randomized Trees
Samples subsets through bootstrapping
Samples the entire dataset
Nodes are split looking at the best split
Randomized node split
Medium Variance
Low Variance
It takes time to find the best node to split on
Faster since node splits are random
6. Conclusions
In this tutorial, we reviewed Random Forests and Extremely Randomized Trees. Random Forests build multiple decision trees over bootstrapped subsets of the data, whereas Extra Trees algorithms build multiple decision trees over the entire dataset. In addition, RF chooses the best node to split on while ET randomizes the node split.
Most importantly, the choice of which one to use always depends on the dataset available and the task at hand.