Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier - A Review

08/14/2017
by   V. B. Surya Prasath, et al.
0

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about 20% while the noise level reaches 90%, this is true for all the distances used. This means that the KNN classifier using any of the top 10 distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.

READ FULL TEXT

page 38

page 39

research
02/08/2019

Nearest Neighbor Classifier based on Generalized Inter-point Distances for HDLSS Data

In high dimension, low sample size (HDLSS) settings, Euclidean distance ...
research
03/05/2018

Local Distance Metric Learning for Nearest Neighbor Algorithm

Distance metric learning is a successful way to enhance the performance ...
research
09/22/2017

Intrinsic Metrics: Nearest Neighbor and Edge Squared Distances

Some researchers have proposed using non-Euclidean metrics for clusterin...
research
11/18/2011

Multi-font Multi-size Kannada Numeral Recognition Based on Structural Features

In this paper a fast and novel method is proposed for multi-font multi-s...
research
12/21/2021

Combining Minkowski and Chebyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier

This work proposes a distance that combines Minkowski and Chebyshev dist...
research
01/07/2021

Distances with mixed type variables some modified Gower's coefficients

Nearest neighbor methods have become popular in official statistics, mai...
research
01/18/2016

Zero-error dissimilarity based classifiers

We consider general non-Euclidean distance measures between real world o...

Please sign up or login with your details

Forgot password? Click here to reset