Scalable Optimal Multiway-Split Decision Trees with Constraints

02/14/2023
by   Shivaram Subramanian, et al.
0

There has been a surge of interest in learning optimal decision trees using mixed-integer programs (MIP) in recent years, as heuristic-based methods do not guarantee optimality and find it challenging to incorporate constraints that are critical for many practical applications. However, existing MIP methods that build on an arc-based formulation do not scale well as the number of binary variables is in the order of 𝒪(2^dN), where d and N refer to the depth of the tree and the size of the dataset. Moreover, they can only handle sample-level constraints and linear metrics. In this paper, we propose a novel path-based MIP formulation where the number of decision variables is independent of N. We present a scalable column generation framework to solve the MIP optimally. Our framework produces a multiway-split tree which is more interpretable than the typical binary-split trees due to its shorter rules. Our method can handle nonlinear metrics such as F1 score and incorporate a broader class of constraints. We demonstrate its efficacy with extensive experiments. We present results on datasets containing up to 1,008,372 samples while existing MIP-based decision tree models do not scale well on data beyond a few thousand points. We report superior or competitive results compared to the state-of-art MIP-based methods with up to a 24X reduction in runtime.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/20/2022

Constrained Prescriptive Trees via Column Generation

With the abundance of available data, many enterprises seek to implement...
research
11/06/2020

A Scalable MIP-based Method for Learning Optimal Multivariate Decision Trees

Several recent publications report advances in training optimal decision...
research
09/15/2020

Optimal Decision Trees for Nonlinear Metrics

Nonlinear metrics, such as the F1-score, Matthews correlation coefficien...
research
11/02/2020

A better method to enforce monotonic constraints in regression and classification trees

In this report we present two new ways of enforcing monotone constraints...
research
04/30/2020

A Span-based Linearization for Constituent Trees

We propose a novel linearization of a constituent tree, together with a ...
research
11/30/2020

Using dynamical quantization to perform split attempts in online tree regressors

A central aspect of online decision tree solutions is evaluating the inc...
research
03/29/2021

Strong Optimal Classification Trees

Decision trees are among the most popular machine learning models and ar...

Please sign up or login with your details

Forgot password? Click here to reset