Instance-Based Classification through Hypothesis Testing

01/03/2019
by   Zengyou He, et al.
0

Classification is a fundamental problem in machine learning and data mining. During the past decades, numerous classification methods have been presented based on different principles. However, most existing classifiers cast the classification problem as an optimization problem and do not address the issue of statistical significance. In this paper, we formulate the binary classification problem as a two-sample testing problem. More precisely, our classification model is a generic framework that is composed of two steps. In the first step, the distance between the test instance and each training instance is calculated to derive two distance sets. In the second step, the two-sample test is performed under the null hypothesis that the two sets of distances are drawn from the same cumulative distribution. After these two steps, we have two p-values for each test instance and the test instance is assigned to the class associated with the smaller p-value. Essentially, the presented classification method can be regarded as an instance-based classifier based on hypothesis testing. The experimental results on 40 real data sets show that our method is able to achieve the same level performance as the state-of-the-art classifiers and has significantly better performance than existing testing-based classifiers. Furthermore, we can handle outlying instances and control the false discovery rate of test instances assigned to each class under the same framework.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2017

Kernel Two-Sample Hypothesis Testing Using Kernel Set Classification

The two-sample hypothesis testing problem is studied for the challenging...
research
07/31/2019

A Novel Multiple Classifier Generation and Combination Framework Based on Fuzzy Clustering and Individualized Ensemble Construction

Multiple classifier system (MCS) has become a successful alternative for...
research
03/05/2023

A Semi-Bayesian Nonparametric Hypothesis Test Using Maximum Mean Discrepancy with Applications in Generative Adversarial Networks

A classic inferential problem in statistics is the two-sample hypothesis...
research
10/03/2021

Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

We introduce Learn then Test, a framework for calibrating machine learni...
research
06/04/2020

A statistical Testing Procedure for Validating Class Labels

Motivated by an open problem of validating protein identities in label-f...
research
05/21/2020

Detecting a botnet in a network

We formalize the problem of detecting the presence of a botnet in a netw...
research
04/29/2021

Querying multiple sets of p-values through composed hypothesis testing

Motivation: Combining the results of different experiments to exhibit co...

Please sign up or login with your details

Forgot password? Click here to reset