What is a Decision Tree in Machine Learning?
A decision tree is a supervised learning technique that has a pre-defined target variable and is most often used in classification problems. This tree can be applied to either categorical or continuous input & output variables. The training process resembles a flow chart, with each internal (non-leaf) node a test of an attribute, each branch is the outcome of that test, and each leaf (terminal) node contains a class label. The uppermost node in the tree is called the root node.
In the decision process, the sample (population) is split into two or more sub-populations sets of maximal, which is decided by the most significant splitter or differentiator in the input variables.
The ultimate goal is to create a predictive model that can take observations about a sample (the branches) and make accurate conclusions about the sample’s target value (the leaves).
Types of Decision Trees:
The two main families of decision trees are defined by function:
- Classification trees – Used to predict the class to which the data sample belongs.
- Regression tree
– Used when the outcome isn’t a classifier, but rather a real number.
Some approaches construct multiple decision trees, or ensembles, to solve specific problems. A few common examples:
- Boosted trees – Used to train instances that were previously incorrectly modeled. For example, AdaBoost. This works for both regression and classification problems.
- Bootstrap aggregated decision trees – Used for classifying data that’s difficult to label by employing repeated sampling and building a consensus prediction.
- Random forests – A variant of bootstrap aggregating used to compensated for “overfitting” of data.