Unboxing Tree Ensembles for interpretability: a hierarchical visualization tool and a multivariate optimal re-built tree

02/15/2023
by   Giulia Di Teodoro, et al.
0

The interpretability of models has become a crucial issue in Machine Learning because of algorithmic decisions' growing impact on real-world applications. Tree ensemble methods, such as Random Forests or XgBoost, are powerful learning tools for classification tasks. However, while combining multiple trees may provide higher prediction quality than a single one, it sacrifices the interpretability property resulting in "black-box" models. In light of this, we aim to develop an interpretable representation of a tree-ensemble model that can provide valuable insights into its behavior. First, given a target tree-ensemble model, we develop a hierarchical visualization tool based on a heatmap representation of the forest's feature use, considering the frequency of a feature and the level at which it is selected as an indicator of importance. Next, we propose a mixed-integer linear programming (MILP) formulation for constructing a single optimal multivariate tree that accurately mimics the target model predictions. The goal is to provide an interpretable surrogate model based on oblique hyperplane splits, which uses only the most relevant features according to the defined forest's importance indicators. The MILP model includes a penalty on feature selection based on their frequency in the forest to further induce sparsity of the splits. The natural formulation has been strengthened to improve the computational performance of mixed-integer software. Computational experience is carried out on benchmark datasets from the UCI repository using a state-of-the-art off-the-shelf solver. Results show that the proposed model is effective in yielding a shallow interpretable tree approximating the tree-ensemble decision function.

READ FULL TEXT
research
10/19/2022

Margin Optimal Classification Trees

In recent years there has been growing attention to interpretable machin...
research
02/22/2022

On Uncertainty Estimation by Tree-based Surrogate Models in Sequential Model-based Optimization

Sequential model-based optimization sequentially selects a candidate poi...
research
06/08/2015

Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

It is becoming increasingly important for machine learning methods to ma...
research
11/20/2019

LionForests: Local Interpretation of Random Forests through Path Selection

Towards a future where machine learning systems will integrate into ever...
research
02/22/2022

Transition Matrix Representation of Trees with Transposed Convolutions

How can we effectively find the best structures in tree models? Tree mod...
research
05/30/2017

Optimization of Tree Ensembles

Tree ensemble models such as random forests and boosted trees are among ...
research
05/18/2019

Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

Tree ensembles, such as random forests and AdaBoost, are ubiquitous mach...

Please sign up or login with your details

Forgot password? Click here to reset