Kernel Two-Sample Hypothesis Testing Using Kernel Set Classification

06/18/2017
by   Hamed Masnadi-Shirazi, et al.
0

The two-sample hypothesis testing problem is studied for the challenging scenario of high dimensional data sets with small sample sizes. We show that the two-sample hypothesis testing problem can be posed as a one-class set classification problem. In the set classification problem the goal is to classify a set of data points that are assumed to have a common class. We prove that the average probability of error given a set is less than or equal to the Bayes error and decreases as a power of n number of sample data points in the set. We use the positive definite Set Kernel for directly mapping sets of data to an associated Reproducing Kernel Hilbert Space, without the need to learn a probability distribution. We specifically solve the two-sample hypothesis testing problem using a one-class SVM in conjunction with the proposed Set Kernel. We compare the proposed method with the Maximum Mean Discrepancy, F-Test and T-Test methods on a number of challenging simulated high dimensional and small sample size data. We also perform two-sample hypothesis testing experiments on six cancer gene expression data sets and achieve zero type-I and type-II error results on all data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2017

Strictly Proper Kernel Scoring Rules and Divergences with an Application to Kernel Two-Sample Hypothesis Testing

We study strictly proper scoring rules in the Reproducing Kernel Hilbert...
research
01/03/2019

Instance-Based Classification through Hypothesis Testing

Classification is a fundamental problem in machine learning and data min...
research
08/31/2015

Wald-Kernel: Learning to Aggregate Information for Sequential Inference

Sequential hypothesis testing is a desirable decision making strategy in...
research
12/11/2018

Bounding the Error From Reference Set Kernel Maximum Mean Discrepancy

In this paper, we bound the error induced by using a weighted skeletoniz...
research
10/16/2019

Generative Learning of Counterfactual for Synthetic Control Applications in Econometrics

A common statistical problem in econometrics is to estimate the impact o...
research
07/13/2017

Small Sample Inference for the Common Coefficient of Variation

This paper utilizes the modified signed log-likelihood ratio method for ...
research
11/22/2021

Using prior information to boost power in correlation structure support recovery

Hypothesis testing of structure in correlation and covariance matrices i...

Please sign up or login with your details

Forgot password? Click here to reset