Bidirectional Self-Normalizing Neural Networks

06/22/2020
by   Yao Lu, et al.
11

The problem of exploding and vanishing gradients has been a long-standing obstacle that hinders the effective training of neural networks. Despite various tricks and techniques that have been employed to alleviate the problem in practice, there still lacks satisfactory theories or provable solutions. In this paper, we address the problem from the perspective of high-dimensional probability theory. We provide a rigorous result that shows, under mild conditions, how the exploding/vanishing gradient problem disappears with high probability if the neural networks have sufficient width. Our main idea is to constrain both forward and backward signal propagation in a nonlinear neural network through a new class of activation functions, namely Gaussian-Poincaré normalized functions, and orthogonal weight matrices. Experiments on both synthetic and real-world data validate our theory and confirm its effectiveness on very deep neural networks when applied in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/06/2017

Training of Deep Neural Networks based on Distance Measures using RMSProp

The vanishing gradient problem was a major obstacle for the success of d...
research
06/02/2023

Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths

We consider deep neural networks with a Lipschitz continuous activation ...
research
03/22/2023

Fixed points of arbitrarily deep 1-dimensional neural networks

In this paper, we introduce a new class of functions on ℝ that is closed...
research
05/27/2021

Hamiltonian Deep Neural Networks Guaranteeing Non-vanishing Gradients by Design

Deep Neural Networks (DNNs) training can be difficult due to vanishing a...
research
09/07/2023

Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues

Deep neural networks employing error back-propagation for learning can s...
research
06/04/2021

Regularization and Reparameterization Avoid Vanishing Gradients in Sigmoid-Type Networks

Deep learning requires several design choices, such as the nodes' activa...
research
03/11/2020

Interpolated Adjoint Method for Neural ODEs

In this paper, we propose a method, which allows us to alleviate or comp...

Please sign up or login with your details

Forgot password? Click here to reset