TVOR: Finding Discrete Total Variation Outliers among Histograms

12/21/2020
by   Nikola Banić, et al.
0

Pearson's chi-squared test can detect outliers in the data distribution of a given set of histograms. However, in fields such as demographics (for e.g. birth years), outliers may be more easily found in terms of the histogram smoothness where techniques such as Whipple's or Myers' indices handle successfully only specific anomalies. This paper proposes smoothness outliers detection among histograms by using the relation between their discrete total variations (DTV) and their respective sample sizes. This relation is mathematically derived to be applicable in all cases and simplified by an accurate linear model. The deviation of the histogram's DTV from the value predicted by the model is used as the outlier score and the proposed method is named Total Variation Outlier Recognizer (TVOR). TVOR requires no prior assumptions about the histograms' samples' distribution, it has no hyperparameters that require tuning, it is not limited to only specific patterns, and it is applicable to histograms with the same bins. Each bin can have an arbitrary interval that can also be unbounded. TVOR finds DTV outliers easier than Pearson's chi-squared test. In case of distribution outliers, the opposite holds. TVOR is tested on real census data and it successfully finds suspicious histograms. The source code is given at https://github.com/DiscreteTotalVariation/TVOR.

READ FULL TEXT

page 3

page 4

page 7

page 10

page 12

page 15

page 22

page 23

research
11/10/2021

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

In this paper, we respond to a critique of one of our papers previously ...
research
03/27/2021

Inapplicability of the TVOR method to USHMM Data Outlier Identification

Recent paper "TVOR: Finding Discrete Total Variation Outliers Among Hist...
research
01/11/2023

ODIM: an efficient method to detect outliers via inlier-memorization effect of deep generative models

Identifying whether a given sample is an outlier or not is an important ...
research
07/20/2023

Edgewise outliers of network indexed signals

We consider models for network indexed multivariate data involving a dep...
research
06/16/2021

Out-of-Scope Intent Detection with Self-Supervision and Discriminative Training

Out-of-scope intent detection is of practical importance in task-oriente...
research
04/12/2019

Outlier-robust estimation of a sparse linear model using ℓ_1-penalized Huber's M-estimator

We study the problem of estimating a p-dimensional s-sparse vector in a ...

Please sign up or login with your details

Forgot password? Click here to reset