RECol: Reconstruction Error Columns for Outlier Detection

02/04/2021
by   Jörn Hees, et al.
0

Detecting outliers or anomalies is a common data analysis task. As a sub-field of unsupervised machine learning, a large variety of approaches exist, but the vast majority treats the input features as independent and often fails to recognize even simple (linear) relationships in the input feature space. Hence, we introduce RECol, a generic data pre-processing approach to generate additional columns in a leave-one-out-fashion: For each column, we try to predict its values based on the other columns, generating reconstruction error columns. We run experiments across a large variety of common baseline approaches and benchmark datasets with and without our RECol pre-processing method and show that the generated reconstruction error feature space generally seems to support common outlier detection methods and often considerably improves their ROC-AUC and PR-AUC values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/27/2021

Unsupervised Outlier Detection using Memory and Contrastive Learning

Outlier detection is one of the most important processes taken to create...
research
09/11/2023

Boundary Peeling: Outlier Detection Method Using One-Class Peeling

Unsupervised outlier detection constitutes a crucial phase within data a...
research
10/31/2017

Extracting Syntactic Patterns from Databases

Many database columns contain string or numerical data that conforms to ...
research
03/28/2023

TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

We present TabRet, a pre-trainable Transformer-based model for tabular d...
research
05/25/2019

Sherlock: A Deep Learning Approach to Semantic Data Type Detection

Correctly detecting the semantic type of data columns is crucial for dat...
research
05/26/2023

Detecting Errors in Numerical Data via any Regression Model

Noise plagues many numerical datasets, where the recorded values in the ...
research
04/03/2023

Improving Autoencoder-based Outlier Detection with Adjustable Probabilistic Reconstruction Error and Mean-shift Outlier Scoring

Autoencoders were widely used in many machine learning tasks thanks to t...

Please sign up or login with your details

Forgot password? Click here to reset