Decision Tree

What is a Decision Tree in Machine Learning?

A decision tree is a supervised learning technique that has a pre-defined target variable and is most often used in classification problems. This tree can be applied to either categorical or continuous input & output variables. The training process resembles a flow chart, with each internal (non-leaf) node a test of an attribute, each branch is the outcome of that test, and each leaf (terminal) node contains a class label. The uppermost node in the tree is called the root node.

In the decision process, the sample (population) is split into two or more  sub-populations sets of maximal, which is decided by the most significant splitter or differentiator in the input variables.

The ultimate goal is to create a predictive model that can take observations about a sample (the branches) and make accurate conclusions about the sample’s target value (the leaves).

Types of Decision Trees:

The two main families of decision trees are defined by function:

  • Classification trees – Used to predict the class to which the data sample belongs.
  • Regression tree

    – Used when the outcome isn’t a classifier, but rather a real number.

Some approaches construct multiple decision trees, or ensembles, to solve specific problems. A few common examples: 

  • Boosted trees – Used to train instances that were previously incorrectly modeled. For example, AdaBoost. This works for both regression and classification problems.
  • Bootstrap aggregated decision trees – Used for classifying data that’s difficult to label by employing repeated sampling and building a consensus prediction. 
  • Random forests – A variant of bootstrap aggregating used to compensated for “overfitting” of data.