Efficient non-greedy optimization of decision trees

11/12/2015
by   Mohammad Norouzi, et al.
0

Decision trees and randomized forests are widely used in computer vision and machine learning. Standard algorithms for decision tree induction optimize the split functions one node at a time according to some splitting criteria. This greedy procedure often leads to suboptimal trees. In this paper, we present an algorithm for optimizing the split functions at all levels of the tree jointly with the leaf parameters, based on a global objective. We show that the problem of finding optimal linear-combination (oblique) splits for decision trees is related to structured prediction with latent variables, and we formulate a convex-concave upper bound on the tree's empirical loss. The run-time of computing the gradient of the proposed surrogate objective with respect to each training exemplar is quadratic in the the tree depth, and thus training deep trees is feasible. The use of stochastic gradient descent for optimization enables effective training with large datasets. Experiments on several classification benchmarks demonstrate that the resulting non-greedy decision trees outperform greedy decision tree baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/19/2015

CO2 Forest: Improved Random Forest by Continuous Optimization of Oblique Splits

We propose a novel algorithm for optimizing multivariate linear threshol...
research
05/05/2023

Learning Decision Trees with Gradient Descent

Decision Trees (DTs) are commonly used for many machine learning tasks d...
research
12/19/2017

A Faster Drop-in Implementation for Leaf-wise Exact Greedy Induction of Decision Tree Using Pre-sorted Deque

This short article presents a new implementation for decision trees. By ...
research
04/12/2016

Confidence Decision Trees via Online and Active Learning for Streaming (BIG) Data

Decision tree classifiers are a widely used tool in data stream mining. ...
research
02/27/2019

Neural Packet Classification

Packet classification is a fundamental problem in computer networking. T...
research
12/29/2020

Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Despite the latest prevailing success of deep neural networks (DNNs), se...
research
10/19/2021

Optimal randomized classification trees

Classification and Regression Trees (CARTs) are off-the-shelf techniques...

Please sign up or login with your details

Forgot password? Click here to reset