Are Gaussian data all you need? Extents and limits of universality in high-dimensional generalized linear estimation

02/17/2023
by   Luca Pesce, et al.
0

In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the high-dimensional regime. Motivated by the recent stream of results on the Gaussian universality of the test and training errors in generalized linear estimation, we ask ourselves the question: "when is a single Gaussian enough to characterize the error?". Our formula allow us to give sharp answers to this question, both in the positive and negative directions. More precisely, we show that the sufficient conditions for Gaussian universality (or lack of thereof) crucially depend on the alignment between the target weights and the means and covariances of the mixture clusters, which we precisely quantify. In the particular case of least-squares interpolation, we prove a strong universality property of the training error, and show it follows a simple, closed-form expression. Finally, we apply our results to real datasets, clarifying some recent discussion in the literature about Gaussian universality of the errors in this context.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2023

Universality laws for Gaussian mixtures in generalized linear models

Let (x_i, y_i)_i=1,…,n denote independent samples from a general mixture...
research
05/26/2022

Gaussian Universality of Linear Classifiers with Random Labels in High-Dimension

While classical in many theoretical settings, the assumption of Gaussian...
research
10/25/2022

Interpolating Discriminant Functions in High-Dimensional Gaussian Latent Mixtures

This paper considers binary classification of high-dimensional features ...
research
11/13/2021

Minimax Supervised Clustering in the Anisotropic Gaussian Mixture Model: A new take on Robust Interpolation

We study the supervised clustering problem under the two-component aniso...
research
05/18/2023

High-dimensional Asymptotics of Denoising Autoencoders

We address the problem of denoising data from a Gaussian mixture using a...
research
10/21/2022

A Non-Asymptotic Moreau Envelope Theory for High-Dimensional Generalized Linear Models

We prove a new generalization bound that shows for any class of linear p...
research
12/01/2014

Classification and Reconstruction of High-Dimensional Signals from Low-Dimensional Features in the Presence of Side Information

This paper offers a characterization of fundamental limits on the classi...

Please sign up or login with your details

Forgot password? Click here to reset