A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

07/05/2023
by   Fabio Sigrist, et al.
0

High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set, or in other words, there are few data points per level. Machine learning methods can have difficulties with high-cardinality variables. In this article, we empirically compare several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, and linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. We find that, first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects, and, second, tree-boosting with random effects outperforms deep neural networks with random effects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2023

Machine Learning with High-Cardinality Categorical Features in Actuarial Applications

High-cardinality categorical features are pervasive in actuarial data (e...
research
04/01/2021

Regularized target encoding outperforms traditional methods in supervised machine learning with high cardinality features

Because most machine learning (ML) algorithms are designed for numerical...
research
02/10/2023

Predicting the cardinality of a reduced Gröbner basis

We use ansatz neural network models to predict key metrics of complexity...
research
05/19/2021

Latent Gaussian Model Boosting

Latent Gaussian models and boosting are widely used techniques in statis...
research
10/16/2012

Fast Exact Inference for Recursive Cardinality Models

Cardinality potentials are a generally useful class of high order potent...
research
07/08/2020

StructureBoost: Efficient Gradient Boosting for Structured Categorical Variables

Gradient boosting methods based on Structured Categorical Decision Trees...
research
06/15/2021

CatBoost model with synthetic features in application to loan risk assessment of small businesses

Loan risk for small businesses has long been a complex problem worthy of...

Please sign up or login with your details

Forgot password? Click here to reset