Zero Stability Well Predicts Performance of Convolutional Neural Networks

06/27/2022
by   Liangming Chen, et al.
4

The question of what kind of convolutional neural network (CNN) structure performs well is fascinating. In this work, we move toward the answer with one more step by connecting zero stability and model performance. Specifically, we found that if a discrete solver of an ordinary differential equation is zero stable, the CNN corresponding to that solver performs well. We first give the interpretation of zero stability in the context of deep learning and then investigate the performance of existing first- and second-order CNNs under different zero-stable circumstances. Based on the preliminary observation, we provide a higher-order discretization to construct CNNs and then propose a zero-stable network (ZeroSNet). To guarantee zero stability of the ZeroSNet, we first deduce a structure that meets consistency conditions and then give a zero stable region of a training-free parameter. By analyzing the roots of a characteristic equation, we theoretically obtain the optimal coefficients of feature maps. Empirically, we present our results from three aspects: We provide extensive empirical evidence of different depth on different datasets to show that the moduli of the characteristic equation's roots are the keys for the performance of CNNs that require historical features; Our experiments show that ZeroSNet outperforms existing CNNs which is based on high-order discretization; ZeroSNets show better robustness against noises on the input. The source code is available at <https://github.com/LongJin-lab/ZeroSNet>.

READ FULL TEXT

page 3

page 5

page 6

page 8

page 9

page 10

page 11

page 12

research
03/15/2021

Meta-Solver for Neural Ordinary Differential Equations

A conventional approach to train neural ordinary differential equations ...
research
02/05/2022

LyaNet: A Lyapunov Framework for Training Neural ODEs

We propose a method for training ordinary differential equations by usin...
research
02/11/2021

Higher-order generalized-α methods for parabolic problems

We propose a new class of high-order time-marching schemes with dissipat...
research
06/30/2023

Multigrid-Augmented Deep Learning for the Helmholtz Equation: Better Scalability with Compact Implicit Layers

We present a deep learning-based iterative approach to solve the discret...
research
12/09/2021

A More Stable Accelerated Gradient Method Inspired by Continuous-Time Perspective

Nesterov's accelerated gradient method (NAG) is widely used in problems ...
research
04/07/2019

Adaptively Connected Neural Networks

This paper presents a novel adaptively connected neural network (ACNet) ...

Please sign up or login with your details

Forgot password? Click here to reset