Quasi-orthonormal Encoding for Machine Learning Applications

05/29/2020
by   Haw-minn Lu, et al.
0

Most machine learning models, especially artificial neural networks, require numerical, not categorical data. We briefly describe the advantages and disadvantages of common encoding schemes. For example, one-hot encoding is commonly used for attributes with a few unrelated categories and word embeddings for attributes with many related categories (e.g., words). Neither is suitable for encoding attributes with many unrelated categories, such as diagnosis codes in healthcare applications. Application of one-hot encoding for diagnosis codes, for example, can result in extremely high dimensionality with low sample size problems or artificially induce machine learning artifacts, not to mention the explosion of computing resources needed. Quasi-orthonormal encoding (QOE) fills the gap. We briefly show how QOE compares to one-hot encoding. We provide example code of how to implement QOE using popular ML libraries such as Tensorflow and PyTorch and a demonstration of QOE to MNIST handwriting samples.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2018

Similarity encoding for learning with dirty categorical variables

For statistical learning, categorical variables in a table are usually c...
research
04/15/2021

Geometry encoding for numerical simulations

We present a notion of geometry encoding suitable for machine learning-b...
research
10/25/2022

Unsupervised Anomaly Detection for Auditing Data and Impact of Categorical Encodings

In this paper, we introduce the Vehicle Claims dataset, consisting of fr...
research
10/04/2022

Representing missing values through polar encoding

We propose polar encoding, a representation of categorical and numerical...
research
01/27/2022

Fairness implications of encoding protected categorical attributes

Protected attributes are often presented as categorical features that ne...
research
03/30/2022

Does Configuration Encoding Matter in Learning Software Performance? An Empirical Study on Encoding Schemes

Learning and predicting the performance of a configurable software syste...
research
11/07/2021

A Review of Location Encoding for GeoAI: Methods and Applications

A common need for artificial intelligence models in the broader geoscien...

Please sign up or login with your details

Forgot password? Click here to reset