ptype-cat: Inferring the Type and Values of Categorical Variables

11/23/2021
by   Taha Ceritli, et al.
0

Type inference is the task of identifying the type of values in a data column and has been studied extensively in the literature. Most existing type inference methods support data types such as Boolean, date, float, integer and string. However, these methods do not consider non-Boolean categorical variables, where there are more than two possible values encoded by integers or strings. Therefore, such columns are annotated either as integer or string rather than categorical, and need to be transformed into categorical manually by the user. In this paper, we propose a probabilistic type inference method that can identify the general categorical data type (including non-Boolean variables). Additionally, we identify the possible values of each categorical variable by adapting the existing type inference method ptype. Combining these methods, we present ptype-cat which achieves better results than existing applicable solutions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/20/2020

Probabilistic learning of boolean functions applied to the binary classification problem with categorical covariates

In this work we cast the problem of binary classification in terms of es...
research
11/22/2019

ptype: Probabilistic Type Inference

Type inference refers to the task of inferring the data type of a given ...
research
11/23/2021

Identifying the Units of Measurement in Tabular Data

We consider the problem of identifying the units of measurement in a dat...
research
08/26/2019

Sufficient Representations for Categorical Variables

Many learning algorithms require categorical data to be transformed into...
research
02/11/2020

A Method Expanding 2 by 2 Contingency Table by Obtaining Tendencies of Boolean Operators: Boolean Monte Carlo Method

A medical test and accuracy of diagnosis are often discussed with contin...
research
05/06/2020

Graph Spectral Feature Learning for Mixed Data of Categorical and Numerical Type

Feature learning in the presence of a mixed type of variables, numerical...
research
11/05/2020

Towards a more perfect union type

We present a principled theoretical framework for inferring and checking...

Please sign up or login with your details

Forgot password? Click here to reset