Inapplicability of the TVOR method to USHMM Data Outlier Identification

03/27/2021
by   Melkior Ornik, et al.
0

Recent paper "TVOR: Finding Discrete Total Variation Outliers Among Histograms" [arXiv:2012.11574] introduces the Total Variation Outlier Recognizer (TVOR) method for identification of outliers among a given set of histograms. The method relies on comparing the smoothness of each given histogram, given by its discrete total variation, to those of other histograms in the dataset, with the underlying assumption that most histograms in the data set should be of similar smoothness. The paper concludes by applying the TVOR model to histograms of ages of Holocaust victims produced using United States Holocaust Memorial Museum (USHMM) data, and purports to identify the list of victims of the Jasenovac concentration camp as potentially suspicious. In this paper, we show that the TVOR model and its assumptions are grossly inapplicable to the considered dataset. Namely, the dataset does not satisfy the model's critical assumption of the shared smoothness between distributions of the victims' ages across lists, the model is biased in assigning a higher outlier score to histograms of larger sizes, and the dataset has not been reviewed to remove obvious data processing errors, leading to duplication of hundreds of thousands of entries when performing the data analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2020

TVOR: Finding Discrete Total Variation Outliers among Histograms

Pearson's chi-squared test can detect outliers in the data distribution ...
research
11/10/2021

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

In this paper, we respond to a critique of one of our papers previously ...
research
03/16/2023

Learned Discretization Schemes for the Second-Order Total Generalized Variation

The total generalized variation extends the total variation by incorpora...
research
01/25/2017

Photographic dataset: playing cards

This is a photographic dataset collected for testing image processing al...
research
09/25/2009

Discrete MDL Predicts in Total Variation

The Minimum Description Length (MDL) principle selects the model that ha...
research
11/05/2017

A robust RUV-testing procedure via gamma-divergence

Identification of differentially expressed genes (DE-genes) is commonly ...
research
08/29/2022

Cutoff profile of the Metropolis biased card shuffling

We consider the Metropolis biased card shuffling (also called the multi-...

Please sign up or login with your details

Forgot password? Click here to reset