Stable Architectures for Deep Neural Networks

05/09/2017
by   Eldad Haber, et al.
0

Deep neural networks have become invaluable tools for supervised machine learning, e.g., classification of text or images. While often offering superior results over traditional techniques and successfully expressing complicated patterns in data, deep architectures are known to be challenging to design and train such that they generalize well to new data. Important issues with deep architectures are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is our interpretation of deep learning as a parameter estimation problem of nonlinear dynamical systems. Given this formulation, we analyze stability and well-posedness of deep learning and use this new understanding to develop new network architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness with state-of-the-art networks.

READ FULL TEXT
research
12/12/2020

Parameter Estimation with Dense and Convolutional Neural Networks Applied to the FitzHugh-Nagumo ODE

Machine learning algorithms have been successfully used to approximate n...
research
09/12/2022

The Mori-Zwanzig formulation of deep learning

We develop a new formulation of deep learning based on the Mori-Zwanzig ...
research
09/10/2019

Techniques All Classifiers Can Learn from Deep Networks: Models, Optimizations, and Regularization

Deep neural networks have introduced novel and useful tools to the machi...
research
09/12/2017

Reversible Architectures for Arbitrarily Deep Residual Neural Networks

Recently, deep residual networks have been successfully applied in many ...
research
05/24/2018

Residual Networks as Geodesic Flows of Diffeomorphisms

This paper addresses the understanding and characterization of residual ...
research
11/21/2021

Accretionary Learning with Deep Neural Networks

One of the fundamental limitations of Deep Neural Networks (DNN) is its ...

Please sign up or login with your details

Forgot password? Click here to reset