Optimization of Tree Ensembles

05/30/2017
by   Velibor V. Mišić, et al.
0

Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.

READ FULL TEXT

page 25

page 33

research
01/13/2020

Trees, forests, and impurity-based variable importance

Tree ensemble methods such as random forests [Breiman, 2001] are very po...
research
02/28/2023

Tightness of prescriptive tree-based mixed-integer optimization formulations

We focus on modeling the relationship between an input feature vector an...
research
09/25/2015

Evasion and Hardening of Tree Ensemble Classifiers

Classifier evasion consists in finding for a given instance x the neares...
research
07/08/2022

On data-driven chance constraint learning for mixed-integer optimization problems

When dealing with real-world optimization problems, decision-makers usua...
research
02/15/2023

Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

The interpretability of models has become a crucial issue in Machine Lea...
research
11/21/2019

JANOS: An Integrated Predictive and Prescriptive Modeling Framework

Business research practice is witnessing a surge in the integration of p...
research
05/18/2019

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

Tree ensembles, such as random forests and AdaBoost, are ubiquitous mach...

Please sign up or login with your details

Forgot password? Click here to reset