ControlBurn: Nonlinear Feature Selection with Sparse Tree Ensembles

07/08/2022
by   Brian Liu, et al.
0

ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and ℓ_0-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.

READ FULL TEXT

page 9

page 14

research
02/10/2022

L0Learn: A Scalable Package for Sparse Learning using L0 Regularization

We present L0Learn: an open-source package for sparse linear regression ...
research
07/05/2022

The R Package BHAM: Fast and Scalable Bayesian Hierarchical Additive Model for High-dimensional Data

BHAM is a freely avaible R pakcage that implments Bayesian hierarchical ...
research
06/08/2019

apricot: Submodular selection for data summarization in Python

We present apricot, an open source Python package for selecting represen...
research
08/14/2023

LCE: An Augmented Combination of Bagging and Boosting in Python

lcensemble is a high-performing, scalable and user-friendly Python packa...
research
02/25/2022

Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity

Interpretable machine learning has demonstrated impressive performance w...
research
02/18/2019

Sparse Regression: Scalable algorithms and empirical performance

In this paper, we review state-of-the-art methods for feature selection ...
research
10/07/2020

Pkwrap: a PyTorch Package for LF-MMI Training of Acoustic Models

We present a simple wrapper that is useful to train acoustic models in P...

Please sign up or login with your details

Forgot password? Click here to reset