CNNs are Globally Optimal Given Multi-Layer Support

12/07/2017
by   Chen Huang, et al.
0

Stochastic Gradient Descent (SGD) is the central workhorse for training modern CNNs. Although giving impressive empirical performance it can be slow to converge. In this paper we explore a novel strategy for training a CNN using an alternation strategy that offers substantial speedups during training. We make the following contributions: (i) replace the ReLU non-linearity within a CNN with positive hard-thresholding, (ii) reinterpret this non-linearity as a binary state vector making the entire CNN linear if the multi-layer support is known, and (iii) demonstrate that under certain conditions a global optima to the CNN can be found through local descent. We then employ a novel alternation strategy (between weights and support) for CNN training that leads to substantially faster convergence rates, nice theoretical properties, and achieving state of the art results across large scale datasets (e.g. ImageNet) as well as other standard benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/29/2018

On the Convergence Rate of Training Recurrent Neural Networks

Despite the huge success of deep learning, our understanding to how the ...
research
08/17/2022

Learning with Local Gradients at the Edge

To enable learning on edge devices with fast convergence and low memory,...
research
06/10/2016

Face Detection with the Faster R-CNN

The Faster R-CNN has recently demonstrated impressive results on various...
research
07/30/2020

On the Banach spaces associated with multi-layer ReLU networks: Function representation, approximation theory and gradient descent dynamics

We develop Banach spaces for ReLU neural networks of finite depth L and ...
research
03/21/2022

ImageNet Challenging Classification with the Raspberry Pi: An Incremental Local Stochastic Gradient Descent Algorithm

With rising powerful, low-cost embedded devices, the edge computing has ...
research
01/30/2017

CNN as Guided Multi-layer RECOS Transform

There is a resurging interest in developing a neural-network-based solut...
research
12/07/2017

Take it in your stride: Do we need striding in CNNs?

Since their inception, CNNs have utilized some type of striding operator...

Please sign up or login with your details

Forgot password? Click here to reset