On the Role of Channel Capacity in Learning Gaussian Mixture Models

02/15/2022
by   Elad Romanov, et al.
0

This paper studies the sample complexity of learning the k unknown centers of a balanced Gaussian mixture model (GMM) in ℝ^d with spherical covariance matrix σ^2𝐈. In particular, we are interested in the following question: what is the maximal noise level σ^2, for which the sample complexity is essentially the same as when estimating the centers from labeled measurements? To that end, we restrict attention to a Bayesian formulation of the problem, where the centers are uniformly distributed on the sphere √(d)𝒮^d-1. Our main results characterize the exact noise threshold σ^2 below which the GMM learning problem, in the large system limit d,k→∞, is as easy as learning from labeled observations, and above which it is substantially harder. The threshold occurs at log k/d = 1/2log( 1+1/σ^2), which is the capacity of the additive white Gaussian noise (AWGN) channel. Thinking of the set of k centers as a code, this noise threshold can be interpreted as the largest noise level for which the error probability of the code over the AWGN channel is small. Previous works on the GMM learning problem have identified the minimum distance between the centers as a key parameter in determining the statistical difficulty of learning the corresponding GMM. While our results are only proved for GMMs whose centers are uniformly distributed over the sphere, they hint that perhaps it is the decoding error probability associated with the center constellation as a channel code that determines the statistical difficulty of learning the corresponding GMM, rather than just the minimum distance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2020

Learning Mixtures of Spherical Gaussians via Fourier Analysis

Suppose that we are given independent, identically distributed samples x...
research
11/21/2017

Parameter Estimation in Gaussian Mixture Models with Malicious Noise, without Balanced Mixing Coefficients

We consider the problem of estimating means of two Gaussians in a 2-Gaus...
research
01/03/2021

Improved Convergence Guarantees for Learning Gaussian Mixture Models by EM and Gradient EM

We consider the problem of estimating the parameters a Gaussian Mixture ...
research
12/19/2018

Sharp optimal recovery in the Two Component Gaussian Mixture Model

In this paper, we study the problem of clustering in the Two component G...
research
03/15/2020

On Approximation, Bounding Exact Calculation of Block Error Probability for Random Codes

This paper presents a method to calculate the exact block error probabil...
research
08/07/2016

Statistical Guarantees for Estimating the Centers of a Two-component Gaussian Mixture by EM

Recently, a general method for analyzing the statistical accuracy of the...
research
02/20/2023

Replicable Clustering

In this paper, we design replicable algorithms in the context of statist...

Please sign up or login with your details

Forgot password? Click here to reset