Outlier Detection on Mixed-Type Data: An Energy-based Approach

08/17/2016
by   Kien Do, et al.
0

Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use free-energy derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/01/2016

Local Subspace-Based Outlier Detection using Global Neighbourhoods

Outlier detection in high-dimensional data is a challenging yet importan...
research
07/15/2019

Robust Variational Autoencoders for Outlier Detection in Mixed-Type Data

We focus on the problem of unsupervised cell outlier detection in mixed ...
research
07/28/2016

Robust Contextual Outlier Detection: Where Context Meets Sparsity

Outlier detection is a fundamental data science task with applications r...
research
06/13/2020

SDCOR: Scalable Density-based Clustering for Local Outlier Detection in Massive-Scale Datasets

This paper presents a batch-wise density-based clustering approach for l...
research
09/15/2023

BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus

RANSAC-based algorithms are the standard techniques for robust estimatio...
research
07/19/2018

Simple robust genomic prediction and outlier detection for a multi-environmental field trial

The aim of plant breeding trials is often to identify germplasms that ar...
research
04/12/2017

Provable Self-Representation Based Outlier Detection in a Union of Subspaces

Many computer vision tasks involve processing large amounts of data cont...

Please sign up or login with your details

Forgot password? Click here to reset