Sufficient Representations for Categorical Variables

08/26/2019
by   Jonathan Johannemann, et al.
25

Many learning algorithms require categorical data to be transformed into real vectors before it can be used as input. Often, categorical variables are encoded as one-hot (or dummy) vectors. However, this mode of representation can be wasteful since it adds many low-signal regressors, especially when the number of unique categories is large. In this paper, we investigate simple alternative solutions for universally consistent estimators that rely on lower-dimensional real-valued representations of categorical variables that are "sufficient" in the sense that no predictive information is lost. We then compare preexisting and proposed methods on simulated and observational datasets.

READ FULL TEXT

page 6

page 11

page 12

research
06/26/2015

Clustering categorical data via ensembling dissimilarity matrices

We present a technique for clustering categorical data by generating man...
research
06/04/2018

Similarity encoding for learning with dirty categorical variables

For statistical learning, categorical variables in a table are usually c...
research
05/09/2018

Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes

Bayesian Optimization (BO) methods are useful for optimizing functions t...
research
07/03/2019

Encoding high-cardinality string categorical variables

Statistical analysis usually requires a vector representation of categor...
research
11/23/2021

ptype-cat: Inferring the Type and Values of Categorical Variables

Type inference is the task of identifying the type of values in a data c...
research
11/05/2018

Visualizing class specific heterogeneous tendencies in categorical data

In multiple correspondence analysis, both individuals (observations) and...
research
12/01/2021

Dimensionality Reduction for Categorical Data

Categorical attributes are those that can take a discrete set of values,...

Please sign up or login with your details

Forgot password? Click here to reset