Distance Assessment and Hypothesis Testing of High-Dimensional Samples using Variational Autoencoders

Given two distinct datasets, an important question is if they have arisen from the the same data generating function or alternatively how their data generating functions diverge from one another. In this paper, we introduce an approach for measuring the distance between two datasets with high dimensionality using variational autoencoders. This approach is augmented by a permutation hypothesis test in order to check the hypothesis that the data generating distributions are the same within a significance level. We evaluate both the distance measurement and hypothesis testing approaches on generated and on public datasets. According to the results the proposed approach can be used for data exploration (e.g. by quantifying the discrepancy/separability between categories of images), which can be particularly useful in the early phases of the pipeline of most machine learning projects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2016

Reasoning with Memory Augmented Neural Networks for Language Comprehension

Hypothesis testing is an important cognitive process that supports human...
research
11/16/2020

Using Ordinal Data to Assess Distance Learning

There is some disagreement on whether Likert scale data should be treate...
research
11/03/2020

Robust hypothesis testing and distribution estimation in Hellinger distance

We propose a simple robust hypothesis test that has the same sample comp...
research
01/02/2021

Visual High Dimensional Hypothesis Testing

In exploratory data analysis of known classes of high dimensional data, ...
research
02/09/2022

A Data-Driven Approach to Robust Hypothesis Testing Using Sinkhorn Uncertainty Sets

Hypothesis testing for small-sample scenarios is a practically important...
research
07/18/2022

The Vocal Signature of Social Anxiety: Exploration using Hypothesis-Testing and Machine-Learning Approaches

Background - Social anxiety (SA) is a common and debilitating condition,...
research
05/03/2023

A Statistical Exploration of Text Partition Into Constituents: The Case of the Priestly Source in the Books of Genesis and Exodus

We present a pipeline for a statistical textual exploration, offering a ...

Please sign up or login with your details

Forgot password? Click here to reset