Bidirectional Self-Normalizing Neural Networks

by   Yao Lu, et al.

The problem of exploding and vanishing gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the exploding/vanishing gradient problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.


page 1

page 2

page 3

page 4


Training of Deep Neural Networks based on Distance Measures using RMSProp

The vanishing gradient problem was a major obstacle for the success of d...

Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths

We consider deep neural networks with a Lipschitz continuous activation ...

Fixed points of arbitrarily deep 1-dimensional neural networks

In this paper, we introduce a new class of functions on ℝ that is closed...

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Deep Neural Networks (DNNs) training can be difficult due to vanishing a...

Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues

Deep neural networks employing error back-propagation for learning can s...

Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Deep learning requires several design choices, such as the nodes' activa...

Interpolated Adjoint Method for Neural ODEs

In this paper, we propose a method, which allows us to alleviate or comp...

Please sign up or login with your details

Forgot password? Click here to reset