Toward Equation of Motion for Deep Neural Networks: Continuous-time Gradient Descent and Discretization Error Analysis

10/28/2022
by   Taiki Miyagawa, et al.
0

We derive and solve an “Equation of Motion” (EoM) for deep neural networks (DNNs), a differential equation that precisely describes the discrete learning dynamics of DNNs. Differential equations are continuous but have played a prominent role even in the study of discrete optimization (gradient descent (GD) algorithms). However, there still exist gaps between differential equations and the actual learning dynamics of DNNs due to discretization error. In this paper, we start from gradient flow (GF) and derive a counter term that cancels the discretization error between GF and GD. As a result, we obtain EoM, a continuous differential equation that precisely describes the discrete learning dynamics of GD. We also derive discretization error to show to what extent EoM is precise. In addition, we apply EoM to two specific cases: scale- and translation-invariant layers. EoM highlights differences between continuous-time and discrete-time GD, indicating the importance of the counter term for a better description of the discrete learning dynamics of GD. Our experimental results support our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/29/2019

Stability of stochastic impulsive differential equations: integrating the cyber and the physical of stochastic systems

According to Newton's second law of motion, we humans describe a dynamic...
research
11/17/2016

Stochastic Gradient Descent in Continuous Time

Stochastic gradient descent in continuous time (SGDCT) provides a comput...
research
12/27/2021

Last-Iterate Convergence of Saddle Point Optimizers via High-Resolution Differential Equations

Several widely-used first-order saddle point optimization methods yield ...
research
06/17/2022

Learning the parameters of a differential equation from its trajectory via the adjoint equation

The paper contributes to strengthening the relation between machine lear...
research
02/01/2021

Time discretization of a nonlocal phase-field system with inertial term

Time discretizations of phase-field systems have been studied. For examp...
research
05/12/2022

Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs

The optimization with orthogonality has been shown useful in training de...
research
06/03/2022

Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

Neural ordinary differential equations (ODEs) have attracted much attent...

Please sign up or login with your details

Forgot password? Click here to reset