When is a Convolutional Filter Easy To Learn?

09/18/2017
by   Simon S. Du, et al.
0

We analyze the convergence of (stochastic) gradient descent algorithm for learning a convolutional filter with Rectified Linear Unit (ReLU) activation function. Our analysis does not rely on any specific form of the input distribution and our proofs only use the definition of ReLU, in contrast with previous works that are restricted to standard Gaussian input. We show that (stochastic) gradient descent with random initialization can learn the convolutional filter in polynomial time and the convergence rate depends on the smoothness of the input distribution and the closeness of patches. To the best of our knowledge, this is the first recovery guarantee of gradient-based algorithms for convolutional filter on non-Gaussian input distributions. Our theory also justifies the two-stage learning rate strategy in deep neural networks. While our focus is theoretical, we also present experiments that illustrate our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2018

Learning One-hidden-layer ReLU Networks via Gradient Descent

We study the problem of learning one-hidden-layer neural networks with R...
research
02/26/2017

Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs

Deep learning models are often successfully trained using gradient desce...
research
01/10/2023

Semiparametric Regression for Spatial Data via Deep Learning

In this work, we propose a deep learning-based method to perform semipar...
research
08/24/2021

The staircase property: How hierarchical structure can guide deep learning

This paper identifies a structural property of data distributions that e...
research
05/09/2021

Directional Convergence Analysis under Spherically Symmetric Distribution

We consider the fundamental problem of learning linear predictors (i.e.,...
research
06/05/2022

Demystifying the Global Convergence Puzzle of Learning Over-parameterized ReLU Nets in Very High Dimensions

This theoretical paper is devoted to developing a rigorous theory for de...
research
03/28/2017

Unifying the Stochastic Spectral Descent for Restricted Boltzmann Machines with Bernoulli or Gaussian Inputs

Stochastic gradient descent based algorithms are typically used as the g...

Please sign up or login with your details

Forgot password? Click here to reset