StructureBoost: Efficient Gradient Boosting for Structured Categorical Variables

07/08/2020
by   Brian Lucena, et al.
0

Gradient boosting methods based on Structured Categorical Decision Trees (SCDT) have been demonstrated to outperform numerical and one-hot-encodings on problems where the categorical variable has a known underlying structure. However, the enumeration procedure in the SCDT is infeasible except for categorical variables with low or moderate cardinality. We propose and implement two methods to overcome the computational obstacles and efficiently perform Gradient Boosting on complex structured categorical variables. The resulting package, called StructureBoost, is shown to outperform established packages such as CatBoost and LightGBM on problems with categorical predictors that contain sophisticated structure. Moreover, we demonstrate that StructureBoost can make accurate predictions on unseen categorical values due to its knowledge of the underlying structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/15/2020

Exploiting Categorical Structure Using Tree-Based Methods

Standard methods of using categorical variables as predictors either end...
research
11/03/2016

Categorical Reparameterization with Gumbel-Softmax

Categorical variables are a natural choice for representing discrete str...
research
10/24/2018

CatBoost: gradient boosting with categorical features support

In this paper we present CatBoost, a new open-sourced gradient boosting ...
research
09/27/2020

A grammar of graphics framework for generalized parallel coordinate plots

Parallel coordinate plots (PCP) are a useful tool in exploratory data an...
research
07/05/2023

A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

High-cardinality categorical variables are variables for which the numbe...
research
04/08/2021

A global method for mixed categorical optimization with catalogs

In this article, we propose an algorithmic framework for globally solvin...
research
12/22/2021

Evaluating categorical encoding methods on a real credit card fraud detection database

Correctly dealing with categorical data in a supervised learning context...

Please sign up or login with your details

Forgot password? Click here to reset