DeepAI AI Chat
Log In Sign Up

Benefit of Interpolation in Nearest Neighbor Algorithms

by   Yue Xing, et al.

The over-parameterized models attract much attention in the era of data science and deep learning. It is empirically observed that although these models, e.g. deep neural networks, over-fit the training data, they can still achieve small testing error, and sometimes even outperform traditional algorithms which are designed to avoid over-fitting. The major goal of this work is to sharply quantify the benefit of data interpolation in the context of nearest neighbors (NN) algorithm. Specifically, we consider a class of interpolated weighting schemes and then carefully characterize their asymptotic performances. Our analysis reveals a U-shaped performance curve with respect to the level of data interpolation, and proves that a mild degree of data interpolation strictly improves the prediction accuracy and statistical stability over those of the (un-interpolated) optimal kNN algorithm. This theoretically justifies (predicts) the existence of the second U-shaped curve in the recently discovered double descent phenomenon. Note that our goal in this study is not to promote the use of interpolated-NN method, but to obtain theoretical insights on data interpolation inspired by the aforementioned phenomenon.


page 1

page 2

page 3

page 4


Statistical Optimality of Interpolated Nearest Neighbor Algorithms

In the era of deep learning, understanding over-fitting phenomenon becom...

Predictive Power of Nearest Neighbors Algorithm under Random Perturbation

We consider a data corruption scenario in the classical k Nearest Neighb...

Theoretically Principled Deep RL Acceleration via Nearest Neighbor Function Approximation

Recently, deep reinforcement learning (RL) has achieved remarkable empir...

Nearest Neighbor Value Interpolation

This paper presents the nearest neighbor value (NNV) algorithm for high ...

Do Deeper Convolutional Networks Perform Better?

Over-parameterization is a recent topic of much interest in the machine ...

Multiple Descent: Design Your Own Generalization Curve

This paper explores the generalization loss of linear regression in vari...

Beyond Occam's Razor in System Identification: Double-Descent when Modeling Dynamics

System identification aims to build models of dynamical systems from dat...