A stochastic optimization approach to train non-linear neural networks with a higher-order variation regularization

08/04/2023
by   Akifumi Okuno, et al.
0

While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a (k,q)th order variation regularization ((k,q)-VR), which is defined as the qth-powered integral of the absolute kth order derivative of the parametric models to be trained; penalizing the (k,q)-VR is expected to yield a smoother function, which is expected to avoid overfitting. Particularly, (k,q)-VR encompasses the conventional (general-order) total variation with q=1. While the (k,q)-VR terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the (k,q)-VR without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the (k,q)-VR terms are more “resilient” than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2022

Neural networks trained with SGD learn distributions of increasing complexity

The ability of deep neural networks to generalise well even when they in...
research
07/11/2023

A stochastic optimization approach to minimize robust density power-based divergences for general parametric density models

Density power divergence (DPD) [Basu et al. (1998), Biometrika], which i...
research
10/02/2020

Variance-Reduced Methods for Machine Learning

Stochastic optimization lies at the heart of machine learning, and its c...
research
10/27/2020

A Bayesian Perspective on Training Speed and Model Selection

We take a Bayesian perspective to illustrate a connection between traini...
research
06/04/2018

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Nonconvex Stochastic Optimization

Asynchronous momentum stochastic gradient descent algorithms (Async-MSGD...
research
10/05/2014

On the Computational Efficiency of Training Neural Networks

It is well-known that neural networks are computationally hard to train....
research
12/18/2019

Analytic expressions for the output evolution of a deep neural network

We present a novel methodology based on a Taylor expansion of the networ...

Please sign up or login with your details

Forgot password? Click here to reset