Graph-Based Two-Sample Tests for Discrete Data

11/12/2017
by   Jingru Zhang, et al.
0

In the regime of two-sample comparison, tests based on a graph constructed on observations by utilizing similarity information among them is gaining attention due to their flexibility and good performances under various settings for high-dimensional data and non-Euclidean data. However, when there are repeated observations or ties in terms of the similarity graph, these graph-based tests could be problematic as they are versatile to the choice of the similarity graph. We study two ways to fix the "tie" problem for the existing graph-based test statistics and a new max-type statistic. Analytic p-value approximations for these extended graph-based tests are also derived and shown to work well for finite samples, allowing the tests to be fast applicable to large datasets. The new tests are illustrated in the analysis of a phone-call network dataset. All proposed tests are implemented in R package gTests.

READ FULL TEXT
research
08/17/2021

Limiting distributions of graph-based test statistics

Two-sample tests utilizing a similarity graph on observations are useful...
research
05/27/2022

New graph-based multi-sample tests for high-dimensional and non-Euclidean data

Testing the equality in distributions of multiple samples is a common ta...
research
06/18/2020

Asymptotic distribution-free change-point detection for data with repeated observations

In the regime of change-point detection, a nonparametric framework based...
research
06/24/2021

Two-sample tests for repeated measurements of histogram objects with applications to wearable device data

Repeated observations have become increasingly common in biomedical rese...
research
12/24/2021

RISE: Rank in Similarity Graph Edge-Count Two-Sample Test

Two-sample hypothesis testing for high-dimensional data is ubiquitous no...
research
06/03/2022

New kernel-based change-point detection

Change-point analysis plays a significant role in various fields to reve...
research
10/14/2018

Sequential Change-point Detection for High-dimensional and non-Euclidean Data

In many modern applications, high-dimensional/non-Euclidean data sequenc...

Please sign up or login with your details

Forgot password? Click here to reset