Inapplicability of the TVOR method to USHMM Data Outlier Identification

by   Melkior Ornik, et al.

Recent paper "TVOR: Finding Discrete Total Variation Outliers Among Histograms" [arXiv:2012.11574] introduces the Total Variation Outlier Recognizer (TVOR) method for identification of outliers among a given set of histograms. The method relies on comparing the smoothness of each given histogram, given by its discrete total variation, to those of other histograms in the dataset, with the underlying assumption that most histograms in the data set should be of similar smoothness. The paper concludes by applying the TVOR model to histograms of ages of Holocaust victims produced using United States Holocaust Memorial Museum (USHMM) data, and purports to identify the list of victims of the Jasenovac concentration camp as potentially suspicious. In this paper, we show that the TVOR model and its assumptions are grossly inapplicable to the considered dataset. Namely, the dataset does not satisfy the model's critical assumption of the shared smoothness between distributions of the victims' ages across lists, the model is biased in assigning a higher outlier score to histograms of larger sizes, and the dataset has not been reviewed to remove obvious data processing errors, leading to duplication of hundreds of thousands of entries when performing the data analysis.



There are no comments yet.


page 1

page 2

page 3

page 4


TVOR: Finding Discrete Total Variation Outliers among Histograms

Pearson's chi-squared test can detect outliers in the data distribution ...

Reply to Comment on "TVOR: Finding Discrete Total Variation Outliers among Histograms"

In this paper, we respond to a critique of one of our papers previously ...

Multiclass Total Variation Clustering

Ideas from the image processing literature have recently motivated a new...

Photographic dataset: playing cards

This is a photographic dataset collected for testing image processing al...

A robust RUV-testing procedure via gamma-divergence

Identification of differentially expressed genes (DE-genes) is commonly ...

Discrete MDL Predicts in Total Variation

The Minimum Description Length (MDL) principle selects the model that ha...

A parametric level-set method for partially discrete tomography

This paper introduces a parametric level-set method for tomographic reco...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.