Binarsity: a penalization for one-hot encoded features

03/24/2017
by   Mokhtar Z. Alaya, et al.
0

This paper deals with the problem of large-scale linear supervised learning in settings where a large number of continuous features are available. We propose to combine the well-known trick of one-hot encoding of continuous features with a new penalization called binarsity. In each group of binary features coming from the one-hot encoding of a single raw continuous feature, this penalization uses total-variation regularization together with an extra linear constraint to avoid collinearity within groups. Non-asymptotic oracle inequalities for generalized linear models are proposed, and numerical experiments illustrate the good performances of our approach on several datasets. It is also noteworthy that our method has a numerical complexity comparable to standard ℓ_1 penalization.

READ FULL TEXT
research
07/25/2018

Binacox: automatic cut-points detection in high-dimensional Cox model, with applications to genetic data

Determining significant prognostic biomarkers is of increasing importanc...
research
04/26/2023

Enhancing Robustness of Gradient-Boosted Decision Trees through One-Hot Encoding and Regularization

Gradient-boosted decision trees (GBDT) are widely used and highly effect...
research
08/08/2017

Which Encoding is the Best for Text Classification in Chinese, English, Japanese and Korean?

This article offers an empirical study on the different ways of encoding...
research
08/17/2021

Memory-Efficient Factorization Machines via Binarizing both Data and Model Coefficients

Factorization Machines (FM), a general predictor that can efficiently mo...
research
05/07/2008

Hot Roller Embossing for the Creation of Microfluidic Devices

We report on the hot roller embossing of polymer sheets for the creation...
research
07/03/2019

Encoding high-cardinality string categorical variables

Statistical analysis usually requires a vector representation of categor...
research
08/16/2022

Delaunay-Triangulation-Based Learning with Hessian Total-Variation Regularization

Regression is one of the core problems tackled in supervised learning. R...

Please sign up or login with your details

Forgot password? Click here to reset