Detecting Outliers in High-dimensional Data with Mixed Variable Types using Conditional Gaussian Regression Models

03/03/2021
by   Mads Lindskou, et al.
0

Outlier detection has gained increasing interest in recent years, due to newly emerging technologies and the huge amount of high-dimensional data that are now available. Outlier detection can help practitioners to identify unwanted noise and/or locate interesting abnormal observations. To address this, we developed a novel method for outlier detection for use in, possibly high-dimensional, datasets with both discrete and continuous variables. We exploit the family of decomposable graphical models in order to model the relationship between the variables and use this to form an exact likelihood ratio test for an observation that is considered an outlier. We show that our method outperforms the state-of-the-art Isolation Forest algorithm on a real data example.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2019

Outlier Detection in High Dimensional Data

High-dimensional data poses unique challenges in outlier detection proce...
research
07/01/2022

A geometric framework for outlier detection in high-dimensional data

Outlier or anomaly detection is an important task in data analysis. We d...
research
09/30/2016

Outlier Detection from Network Data with Subnetwork Interpretation

Detecting a small number of outliers from a set of data observations is ...
research
02/08/2020

SUOD: Toward Scalable Unsupervised Outlier Detection

Outlier detection is a key field of machine learning for identifying abn...
research
08/18/2023

Outlier detection for mixed-type data: A novel approach

Outlier detection can serve as an extremely important tool for researche...
research
12/14/2021

Linear Discriminant Analysis with High-dimensional Mixed Variables

Datasets containing both categorical and continuous variables are freque...
research
09/04/2023

Robust penalized least squares of depth trimmed residuals regression for high-dimensional data

Challenges with data in the big-data era include (i) the dimension p is ...

Please sign up or login with your details

Forgot password? Click here to reset