More is Less: Inducing Sparsity via Overparameterization

12/21/2021
by   Hung-Hsu Chou, et al.
107

In deep learning it is common to overparameterize the neural networks, that is, to use more parameters than training samples. Quite surprisingly training the neural network via (stochastic) gradient descent leads to models that generalize very well, while classical statistics would suggest overfitting. In order to gain understanding of this implicit bias phenomenon we study the special case of sparse recovery (compressive sensing) which is of interest on its own. More precisely, in order to reconstruct a vector from underdetermined linear measurements, we introduce a corresponding overparameterized square loss functional, where the vector to be reconstructed is deeply factorized into several vectors. We show that, under a very mild assumption on the measurement matrix, vanilla gradient flow for the overparameterized loss functional converges to a solution of minimal ℓ_1-norm. The latter is well-known to promote sparse solutions. As a by-product, our results significantly improve the sample complexity for compressive sensing in previous works. The theory accurately predicts the recovery rate in numerical experiments. For the proofs, we introduce the concept of solution entropy, which bypasses the obstacles caused by non-convexity and should be of independent interest.

READ FULL TEXT

page 5

page 19

page 20

research
02/05/2020

Sample Complexity Bounds for 1-bit Compressive Sensing and Binary Stable Embeddings with Generative Priors

The goal of standard 1-bit compressive sensing is to accurately recover ...
research
10/11/2021

A global convergence theory for deep ReLU implicit networks via over-parameterization

Implicit deep learning has received increasing attention recently due to...
research
06/02/2022

Gradient flow dynamics of shallow ReLU networks for square loss and orthogonal inputs

The training of neural networks by gradient descent methods is a corners...
research
05/07/2020

Compressive sensing with un-trained neural networks: Gradient descent finds the smoothest approximation

Un-trained convolutional neural networks have emerged as highly successf...
research
09/12/2018

Fast Signal Recovery from Saturated Measurements by Linear Loss and Nonconvex Penalties

Sign information is the key to overcoming the inevitable saturation erro...
research
11/18/2019

Implicit Regularization of Normalization Methods

Normalization methods such as batch normalization are commonly used in o...

Please sign up or login with your details

Forgot password? Click here to reset