An Adjusted Nearest Neighbor Algorithm Maximizing the F-Measure from Imbalanced Data

09/02/2019
by   Rémi Viola, et al.
0

In this paper, we address the challenging problem of learning from imbalanced data using a Nearest-Neighbor (NN) algorithm. In this setting, the minority examples typically belong to the class of interest requiring the optimization of specific criteria, like the F-Measure. Based on simple geometrical ideas, we introduce an algorithm that reweights the distance between a query sample and any positive training example. This leads to a modification of the Voronoi regions and thus of the decision boundaries of the NN algorithm. We provide a theoretical justification about the weighting scheme needed to reduce the False Negative rate while controlling the number of False Positives. We perform an extensive experimental study on many public imbalanced datasets, but also on large scale non public data from the French Ministry of Economy and Finance on a tax fraud detection task, showing that our method is very effective and, interestingly, yields the best performance when combined with state of the art sampling methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/27/2019

Guarantees on Nearest-Neighbor Condensation heuristics

The problem of nearest-neighbor (NN) condensation aims to reduce the siz...
research
11/29/2017

NPC: Neighbors Progressive Competition Algorithm for Classification of Imbalanced Data Sets

Learning from many real-world datasets is limited by a problem called th...
research
01/02/2023

P3DC-Shot: Prior-Driven Discrete Data Calibration for Nearest-Neighbor Few-Shot Classification

Nearest-Neighbor (NN) classification has been proven as a simple and eff...
research
07/08/2019

Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Nearest-neighbor (NN) procedures are well studied and widely used in bot...
research
05/01/2022

Nearest Neighbor Knowledge Distillation for Neural Machine Translation

k-nearest-neighbor machine translation (NN-MT), proposed by Khandelwal e...
research
08/02/2019

On the Merge of k-NN Graph

K-nearest neighbor graph is the fundamental data structure in many disci...
research
04/26/2020

Deep k-NN for Noisy Labels

Modern machine learning models are often trained on examples with noisy ...

Please sign up or login with your details

Forgot password? Click here to reset