Robust Variational Autoencoders for Outlier Detection in Mixed-Type Data

07/15/2019
by   Simão Eduardo, et al.
2

We focus on the problem of unsupervised cell outlier detection in mixed type tabular datasets. Traditional methods for outlier detection are concerned only on detecting which rows in the dataset are outliers. However, identifying which cells in the dataset corrupt a specific row is an important problem in practice, especially in high-dimensional tables. We introduce the Robust Variational Autoencoder (RVAE), a deep generative model that learns the joint distribution of the clean data while identifying the outlier cells in the dataset. RVAE learns the probability of each cell in the dataset being an outlier, balancing the contributions of the different likelihood models in the row outlier score, making the method suitable for outlier detection in mixed type datasets. We show experimentally that the RVAE performs better than several state of the art methods in cell outlier detection for tabular datasets, while providing comparable or better results for row outlier detection.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2016

Outlier Detection on Mixed-Type Data: An Energy-based Approach

Outlier detection amounts to finding data points that differ significant...
research
07/17/2022

Repairing Systematic Outliers by Learning Clean Subspaces in VAEs

Data cleaning often comprises outlier detection and data repair. Systema...
research
11/16/2021

Automatically detecting anomalous exoplanet transits

Raw light curve data from exoplanet transits is too complex to naively a...
research
08/18/2022

Outlier Detection using Self-Organizing Maps for Automated Blood Cell Analysis

The quality of datasets plays a crucial role in the successful training ...
research
08/19/2021

Efficient remedies for outlier detection with variational autoencoders

Deep networks often make confident, yet incorrect, predictions when test...
research
05/20/2023

Technical outlier detection via convolutional variational autoencoder for the ADMANI breast mammogram dataset

The ADMANI datasets (annotated digital mammograms and associated non-ima...
research
03/14/2023

RODD: Robust Outlier Detection in Data Cubes

Data cubes are multidimensional databases, often built from several sepa...

Please sign up or login with your details

Forgot password? Click here to reset