Learning Distributions Generated by One-Layer ReLU Networks

09/04/2019
by   Shanshan Wu, et al.
22

We consider the problem of estimating the parameters of a d-dimensional rectified Gaussian distribution from i.i.d. samples. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error ϵ||W||_F using Õ(1/ϵ^2) samples and Õ(d^2/ϵ^2) time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to ϵ in total variation distance using Õ(κ^2d^2/ϵ^2) samples, where κ is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative. Without this non-negativity assumption, we show that estimating the bias vector within an error ϵ requires the number of samples at least exponential in 1/ϵ^2. Our algorithm is based on the key observation that vector norms and pairwise angles can be estimated separately. We use a recent result on learning from truncated samples. We also prove two sample complexity lower bounds: Ω(1/ϵ^2) samples are required to estimate the parameters up to error ϵ, while Ω(d/ϵ^2) samples are necessary to estimate the distribution up to ϵ in total variation distance. The first lower bound implies that our algorithm is optimal for parameter estimation. Finally, we show an interesting connection between learning a two-layer generative model and non-negative matrix factorization. Experimental results are provided to support our analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

09/02/2021

Lower Bounds on the Total Variation Distance Between Mixtures of Two Gaussians

Mixtures of high dimensional Gaussian distributions have been studied ex...
07/05/2020

Efficient Parameter Estimation of Truncated Boolean Product Distributions

We study the problem of estimating the parameters of a Boolean product d...
04/19/2014

Tight bounds for learning a mixture of two gaussians

We consider the problem of identifying the parameters of an unknown mixt...
04/04/2018

Sparse non-negative super-resolution - simplified and stabilised

The convolution of a discrete measure, x=∑_i=1^ka_iδ_t_i, with a local w...
02/02/2019

Complexity, Statistical Risk, and Metric Entropy of Deep Nets Using Total Path Variation

For any ReLU network there is a representation in which the sum of the a...
02/20/2021

Efficient Learning of Non-Interacting Fermion Distributions

We give an efficient classical algorithm that recovers the distribution ...
04/05/2019

Parameter estimation for integer-valued Gibbs distributions

We consider the family of Gibbs distributions, which are probability dis...

Code Repositories

densityEstimation

Code for our paper "Learning Distributions Generated by One-Layer ReLU Networks"


view repo