Lipschitzness Effect of a Loss Function on Generalization Performance of Deep Neural Networks Trained by Adam and AdamW Optimizers

03/29/2023
by   Mohammad Lashkari, et al.
0

The generalization performance of deep neural networks with regard to the optimization algorithm is one of the major concerns in machine learning. This performance can be affected by various factors. In this paper, we theoretically prove that the Lipschitz constant of a loss function is an important factor to diminish the generalization error of the output model obtained by Adam or AdamW. The results can be used as a guideline for choosing the loss function when the optimization algorithm is Adam or AdamW. In addition, to evaluate the theoretical bound in a practical setting, we choose the human age estimation problem in computer vision. For assessing the generalization better, the training and test datasets are drawn from different distributions. Our experimental evaluation shows that the loss function with lower Lipschitz constant and maximum value improves the generalization of the model trained by Adam or AdamW.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2018

Wrapped Loss Function for Regularizing Nonconforming Residual Distributions

Multi-output is essential in machine learning that it might suffer from ...
research
07/06/2017

Convergence Analysis of Optimization Algorithms

The regret bound of an optimization algorithms is one of the basic crite...
research
11/23/2018

Do GAN Loss Functions Really Matter?

In this paper, we address the recent controversy between Lipschitz regul...
research
04/24/2018

An Information-Theoretic View for Deep Learning

Deep learning has transformed the computer vision, natural language proc...
research
09/07/2020

System Identification Through Lipschitz Regularized Deep Neural Networks

In this paper we use neural networks to learn governing equations from d...
research
11/30/2021

LossPlot: A Better Way to Visualize Loss Landscapes

Investigations into the loss landscapes of deep neural networks are ofte...
research
01/28/2023

On the Lipschitz Constant of Deep Networks and Double Descent

Existing bounds on the generalization error of deep networks assume some...

Please sign up or login with your details

Forgot password? Click here to reset