Small Covers for Near-Zero Sets of Polynomials and Learning Latent Variable Models
Let V be any vector space of multivariate degree-d homogeneous polynomials with co-dimension at most k, and S be the set of points where all polynomials in V nearly vanish. We establish a qualitatively optimal upper bound on the size of ϵ-covers for S, in the ℓ_2-norm. Roughly speaking, we show that there exists an ϵ-cover for S of cardinality M = (k/ϵ)^O_d(k^1/d). Our result is constructive yielding an algorithm to compute such an ϵ-cover that runs in time poly(M). Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models with hidden variables. These include density and parameter estimation for k-mixtures of spherical Gaussians (with known common covariance), PAC learning one-hidden-layer ReLU networks with k hidden units (under the Gaussian distribution), density and parameter estimation for k-mixtures of linear regressions (with Gaussian covariates), and parameter estimation for k-mixtures of hyperplanes. Our algorithms run in time quasi-polynomial in the parameter k. Previous algorithms for these problems had running times exponential in k^Ω(1). At a high-level our algorithms for all these learning problems work as follows: By computing the low-degree moments of the hidden parameters, we are able to find a vector space of polynomials that nearly vanish on the unknown parameters. Our structural result allows us to compute a quasi-polynomial sized cover for the set of hidden parameters, which we exploit in our learning algorithms.
READ FULL TEXT