GradNets: Dynamic Interpolation Between Neural Architectures

11/21/2015
by   Diogo Almeida, et al.
0

In machine learning, there is a fundamental trade-off between ease of optimization and expressive power. Neural Networks, in particular, have enormous expressive power and yet are notoriously challenging to train. The nature of that optimization challenge changes over the course of learning. Traditionally in deep learning, one makes a static trade-off between the needs of early and late optimization. In this paper, we investigate a novel framework, GradNets, for dynamically adapting architectures during training to get the benefits of both. For example, we can gradually transition from linear to non-linear networks, deterministic to stochastic computation, shallow to deep architectures, or even simple downsampling to fully differentiable attention mechanisms. Benefits include increased accuracy, easier convergence with more complex architectures, solutions to test-time execution of batch normalization, and the ability to train networks of up to 200 layers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

Multipath Graph Convolutional Neural Networks

Graph convolution networks have recently garnered a lot of attention for...
research
10/05/2021

Distribution Mismatch Correction for Improved Robustness in Deep Neural Networks

Deep neural networks rely heavily on normalization methods to improve th...
research
03/06/2017

On the Expressive Power of Overlapping Architectures of Deep Learning

Expressive efficiency refers to the relation between two architectures A...
research
02/29/2020

Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs

Batch normalization (BatchNorm) has become an indispensable tool for tra...
research
10/06/2021

Equivariant Subgraph Aggregation Networks

Message-passing neural networks (MPNNs) are the leading architecture for...
research
03/08/2019

Is Deeper Better only when Shallow is Good?

Understanding the power of depth in feed-forward neural networks is an o...
research
03/28/2022

To Fold or Not to Fold: a Necessary and Sufficient Condition on Batch-Normalization Layers Folding

Batch-Normalization (BN) layers have become fundamental components in th...

Please sign up or login with your details

Forgot password? Click here to reset