Optimal Sparse Recovery with Decision Stumps

03/08/2023
by   Kiarash Banihashem, et al.
0

Decision trees are widely used for their low computational cost, good predictive performance, and ability to assess the importance of features. Though often used in practice for feature selection, the theoretical guarantees of these methods are not well understood. We here obtain a tight finite sample bound for the feature selection problem in linear regression using single-depth decision trees. We examine the statistical properties of these "decision stumps" for the recovery of the s active features from p total features, where s ≪ p. Our analysis provides tight sample performance guarantees on high-dimensional sparse systems which align with the finite sample bound of O(s log p) as obtained by Lasso, improving upon previous bounds for both the median and optimal splitting criteria. Our results extend to the non-linear regime as well as arbitrary sub-Gaussian distributions, demonstrating that tree based methods attain strong feature selection properties under a wide variety of settings and further shedding light on the success of these methods in practice. As a byproduct of our analysis, we show that we can provably guarantee recovery even when the number of active features s is unknown. We further validate our theoretical results and proof methodology using computational experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/05/2021

Scalable Feature Selection for (Multitask) Gradient Boosted Trees

Gradient Boosted Decision Trees (GBDTs) are widely used for building ran...
research
03/22/2019

On the support recovery of marginal regression

Leading methods for support recovery in high-dimensional regression, suc...
research
10/01/2016

Tuning Parameter Calibration in High-dimensional Logistic Regression With Theoretical Guarantees

Feature selection is a standard approach to understanding and modeling h...
research
06/26/2019

A Debiased MDI Feature Importance Measure for Random Forests

Tree ensembles such as Random Forests have achieved impressive empirical...
research
06/12/2023

DRCFS: Doubly Robust Causal Feature Selection

Knowing the features of a complex system that are highly relevant to a p...
research
11/02/2021

Distributed Sparse Feature Selection in Communication-Restricted Networks

This paper aims to propose and theoretically analyze a new distributed s...
research
11/05/2020

Nonparametric Variable Screening with Optimal Decision Stumps

Decision trees and their ensembles are endowed with a rich set of diagno...

Please sign up or login with your details

Forgot password? Click here to reset