Nearest Neighbor Classification based on Imbalanced Data: A Statistical Approach

06/22/2022
by   Anvit Garg, et al.
0

In a classification problem, where the competing classes are not of comparable size, many popular classifiers exhibit a bias towards larger classes, and the nearest neighbor classifier is no exception. To take care of this problem, in this article, we develop a statistical method for nearest neighbor classification based on such imbalanced data sets. First, we construct a classifier for the binary classification problem and then extend it for classification problems involving more than two classes. Unlike the existing oversampling methods, our proposed classifiers do not need to generate any pseudo observations, and hence the results are exactly reproducible. We establish the Bayes risk consistency of these classifiers under appropriate regularity conditions. Their superior performance over the exiting methods is amply demonstrated by analyzing several benchmark data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2014

Stabilized Nearest Neighbor Classifier and Its Statistical Properties

The stability of statistical analysis is an important indicator for repr...
research
07/05/2021

Statistical Theory for Imbalanced Binary Classification

Within the vast body of statistical theory developed for binary classifi...
research
08/19/2020

Neural Neighborhood Encoding for Classification

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgu...
research
12/07/2016

Extend natural neighbor: a novel classification method with self-adaptive neighborhood parameters in different stages

Various kinds of k-nearest neighbor (KNN) based classification methods a...
research
02/08/2019

Nearest Neighbor Classifier based on Generalized Inter-point Distances for HDLSS Data

In high dimension, low sample size (HDLSS) settings, Euclidean distance ...
research
03/17/2022

Nearest Neighbor Classifier with Margin Penalty for Active Learning

As deep learning becomes the mainstream in the field of natural language...
research
04/09/2020

Multiclass Classification via Class-Weighted Nearest Neighbors

We study statistical properties of the k-nearest neighbors algorithm for...

Please sign up or login with your details

Forgot password? Click here to reset