DeepAI AI Chat
Log In Sign Up

Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation

by   Stanislav Minsker, et al.
University of Southern California

We study the supervised clustering problem under the two-component anisotropic Gaussian mixture model in high dimensions and in the non-asymptotic setting. We first derive a lower and a matching upper bound for the minimax risk of clustering in this framework. We also show that in the high-dimensional regime, the linear discriminant analysis (LDA) classifier turns out to be sub-optimal in the minimax sense. Next, we characterize precisely the risk of ℓ_2-regularized supervised least squares classifiers. We deduce the fact that the interpolating solution may outperform the regularized classifier, under mild assumptions on the covariance structure of the noise. Our analysis also shows that interpolation can be robust to corruption in the covariance of the noise when the signal is aligned with the "clean" part of the covariance, for the properly defined notion of alignment. To the best of our knowledge, this peculiar phenomenon has not yet been investigated in the rapidly growing literature related to interpolation. We conclude that interpolation is not only benign but can also be optimal, and in some cases robust.


page 1

page 2

page 3

page 4


Sharp optimal recovery in the Two Component Gaussian Mixture Model

In this paper, we study the problem of clustering in the Two component G...

Sharp optimal recovery in the Two Gaussian Mixture Model

In this paper, we study the non-asymptotic problem of exact recovery in ...

Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures

This paper considers binary classification of high-dimensional features ...

A Large Dimensional Analysis of Regularized Discriminant Analysis Classifiers

This article carries out a large dimensional analysis of standard regula...

Sharp Statistical Guarantees for Adversarially Robust Gaussian Classification

Adversarial robustness has become a fundamental requirement in modern ma...

Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

In this manuscript we consider the problem of generalized linear estimat...

High-dimensional logistic entropy clustering

Minimization of the (regularized) entropy of classification probabilitie...