Make Deep Networks Shallow Again

09/15/2023
by   Bernhard Bermeitinger, et al.
0

Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections – an identity mapping parallel to a conventional layer. This concept is applicable to stacks of layers of the same dimension and substantially alleviates the vanishing gradient problem. A stack of residual connection layers can be expressed as an expansion of terms similar to the Taylor expansion. This expansion suggests the possibility of truncating the higher-order terms and receiving an architecture consisting of a single broad layer composed of all initially stacked layers in parallel. In other words, a sequential deep architecture is substituted by a parallel shallow one. Prompted by this theory, we investigated the performance capabilities of the parallel architecture in comparison to the sequential one. The computer vision datasets MNIST and CIFAR10 were used to train both architectures for a total of 6912 combinations of varying numbers of convolutional layers, numbers of filters, kernel sizes, and other meta parameters. Our findings demonstrate a surprising equivalence between the deep (sequential) and shallow (parallel) architectures. Both layouts produced similar results in terms of training and validation set loss. This discovery implies that a wide, shallow architecture can potentially replace a deep network without sacrificing performance. Such substitution has the potential to simplify network architectures, improve optimization efficiency, and accelerate the training process.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/22/2015

Gradual DropIn of Layers to Train Very Deep Neural Networks

We introduce the concept of dynamically growing a neural network during ...
research
12/11/2018

Layer-Parallel Training of Deep Residual Neural Networks

Residual neural networks (ResNets) are a promising class of deep neural ...
research
06/16/2021

Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework

Enforcing orthogonality in neural networks is an antidote for gradient v...
research
05/23/2017

Input Fast-Forwarding for Better Deep Learning

This paper introduces a new architectural framework, known as input fast...
research
05/24/2016

FractalNet: Ultra-Deep Neural Networks without Residuals

We introduce a design strategy for neural network macro-architecture bas...
research
05/20/2016

Residual Networks Behave Like Ensembles of Relatively Shallow Networks

In this work we propose a novel interpretation of residual networks show...
research
10/06/2018

Co-Stack Residual Affinity Networks with Multi-level Attention Refinement for Matching Text Sequences

Learning a matching function between two text sequences is a long standi...

Please sign up or login with your details

Forgot password? Click here to reset