Tree Induction Algorithm

What is a Tree Induction Algorithm in Machine Learning?

A tree induction algorithm is a form of decision tree that does not use backpropagation; instead the tree’s decision points are in a top-down recursive way. Sometimes referred to as “divide and conquer,” this approach resembles a traditional if Yes then do A, if No, then do B flow chart.  In order to produce a good tree model, one needs to determine the nodes (i.e. which feature in the data to use to make a decision) and the number of splits and the respective split thresholds.  A tree induction algorithm will typically use impurity to determine these factors.  Impurity is essentially measures how well each node with its corresponding splits and thresholds separates the data.  For instance if a single node was able to find a set of splits and thresholds that partitioned the data uniformly into some collection of 'buckets', then we would have a model that perfectly classifies the data.  Each resulting bucket of data would be pure in the sense that each data point would belong to the same class.  Therefore we want to maximize the resulting purity after each node decision (equivalently we wish to minimize the impurity).

How are Tree Induction Algorithms Used?

Like all decision trees, this algorithm includes a root node, branches, and leaf nodes. Each internal node represents a test conducted on an input, the branches are the outcome of some test, and each leaf node contains the classification label. The uppermost node in the tree is the root node.
With tree induction, each branch node also represents the possible choices of action based upon the outcome of the test and the leaf node is the decision that will be made.

Induction trees are often sub-trees for a larger “forest” of decision trees. Sometimes these simplified trees are merely the byproduct after a more complex decision tree has been “pruned” of anomalies and outliers in the training data. Or after a larger decision tree has been optimized to reduce the number of leaves, or it’s “cost complexity.” Either approach often leaves the tree with a straightforward decision path with only a few options available at any given layer.