Typical Yet Unlikely: Using Information Theoretic Approaches to Identify Outliers which Lie Close to the Mean

11/01/2022
by   Matthew J. Vowels, et al.
0

Normality, in the colloquial sense, has historically been considered an aspirational trait, synonymous with harmony and ideality. The arithmetic average has often been used to characterize normality, and is often used both productively and unproductively as a blunt way to characterize samples and outliers. A number of prior commentaries in the fields of psychology and social science have highlighted the need for caution when reducing complex phenomena to a single mean value. However, to the best of our knowledge, none have described and explained why the mean provides such a poor characterization of normality, particularly in the context of multi-dimensionality and outlier detection. We demonstrate that even for datasets with a relatively low number of dimensions (<10), data start to exhibit a number of peculiarities which become progressively severe as the number of dimensions increases. The availability of large, multi-dimensional datasets is increasing, and it is therefore especially important that researchers understand the peculiar characteristics of such data. We show that normality can be better characterized with `typicality', an information theoretic concept relating to the entropy of a distribution. An application of typicality to both synthetic and real-world data reveals that in multi-dimensional space, to be normal (or close to the mean) is actually to be highly atypical. This motivates us to update our working definition of an outlier, and we demonstrate typicality for outlier detection as a viable method which is consistent with this updated definition. In contrast, whilst the popular Mahalanobis based outlier detection method can be used to identify points far from the mean, it fails to identify those which are too close. Typicality can be used to achieve both, and performs well regardless of the dimensionality of the problem.

READ FULL TEXT
research
05/15/2015

MCODE: Multivariate Conditional Outlier Detection

Outlier detection aims to identify unusual data instances that deviate f...
research
05/05/2014

Robust Subspace Outlier Detection in High Dimensional Space

Rare data in a large-scale database are called outliers that reveal sign...
research
09/28/2018

Generative Adversarial Active Learning for Unsupervised Outlier Detection

Outlier detection is an important topic in machine learning and has been...
research
02/07/2018

Outlier Detection for Robust Multi-dimensional Scaling

Multi-dimensional scaling (MDS) plays a central role in data-exploration...
research
07/02/2020

Outlier Detection through Null Space Analysis of Neural Networks

Many machine learning classification systems lack competency awareness. ...
research
05/27/2020

An Entropy Based Outlier Score and its Application to Novelty Detection for Road Infrastructure Images

A novel unsupervised outlier score, which can be embedded into graph bas...

Please sign up or login with your details

Forgot password? Click here to reset