Individualized and Global Feature Attributions for Gradient Boosted Trees in the Presence of ℓ_2 Regularization

11/08/2022
by   Qingyao Sun, et al.
0

While ℓ_2 regularization is widely used in training gradient boosted trees, popular individualized feature attribution methods for trees such as Saabas and TreeSHAP overlook the training procedure. We propose Prediction Decomposition Attribution (PreDecomp), a novel individualized feature attribution for gradient boosted trees when they are trained with ℓ_2 regularization. Theoretical analysis shows that the inner product between PreDecomp and labels on in-sample data is essentially the total gain of a tree, and that it can faithfully recover additive models in the population case when features are independent. Inspired by the connection between PreDecomp and total gain, we also propose TreeInner, a family of debiased global feature attributions defined in terms of the inner product between any individualized feature attribution and labels on out-sample data for each tree. Numerical experiments on a simulated dataset and a genomic ChIP dataset show that TreeInner has state-of-the-art feature selection performance. Code reproducing experiments is available at https://github.com/nalzok/TreeInner .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/07/2012

Feature Selection via Regularized Trees

We propose a tree regularization framework, which enables many tree mode...
research
02/12/2018

Consistent Individualized Feature Attribution for Tree Ensembles

Interpreting predictions from tree ensemble methods such as gradient boo...
research
12/27/2020

Inserting Information Bottlenecks for Attribution in Transformers

Pretrained transformers achieve the state of the art across tasks in nat...
research
06/12/2020

Generalizing Gain Penalization for Feature Selection in Tree-based Models

We develop a new approach for feature selection via gain penalization in...
research
06/19/2017

Consistent feature attribution for tree ensembles

It is critical in many applications to understand what features are impo...
research
06/01/2021

Cleaning and Structuring the Label Space of the iMet Collection 2020

The iMet 2020 dataset is a valuable resource in the space of fine-graine...
research
05/06/2022

GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers

There has been a growing interest in interpreting the underlying dynamic...

Please sign up or login with your details

Forgot password? Click here to reset