Learning Distributions Generated by One-Layer ReLU Networks
We consider the problem of estimating the parameters of a d-dimensional rectified Gaussian distribution from i.i.d. samples. A rectified Gaussian distribution is defined by passing a standard Gaussian distribution through a one-layer ReLU neural network. We give a simple algorithm to estimate the parameters (i.e., the weight matrix and bias vector of the ReLU neural network) up to an error ϵ||W||_F using Õ(1/ϵ^2) samples and Õ(d^2/ϵ^2) time (log factors are ignored for simplicity). This implies that we can estimate the distribution up to ϵ in total variation distance using Õ(κ^2d^2/ϵ^2) samples, where κ is the condition number of the covariance matrix. Our only assumption is that the bias vector is non-negative. Without this non-negativity assumption, we show that estimating the bias vector within an error ϵ requires the number of samples at least exponential in 1/ϵ^2. Our algorithm is based on the key observation that vector norms and pairwise angles can be estimated separately. We use a recent result on learning from truncated samples. We also prove two sample complexity lower bounds: Ω(1/ϵ^2) samples are required to estimate the parameters up to error ϵ, while Ω(d/ϵ^2) samples are necessary to estimate the distribution up to ϵ in total variation distance. The first lower bound implies that our algorithm is optimal for parameter estimation. Finally, we show an interesting connection between learning a two-layer generative model and non-negative matrix factorization. Experimental results are provided to support our analysis.
READ FULL TEXT