GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks

Current training methods for deep neural networks boil down to very high dimensional and non-convex optimization problems which are usually solved by a wide range of stochastic gradient descent methods. While these approaches tend to work in practice, there are still many gaps in the theoretical understanding of key aspects like convergence and generalization guarantees, which are induced by the properties of the optimization surface (loss landscape). In order to gain deeper insights, a number of recent publications proposed methods to visualize and analyze the optimization surfaces. However, the computational cost of these methods are very high, making it hardly possible to use them on larger networks. In this paper, we present the GradVis Toolbox, an open source library for efficient and scalable visualization and analysis of deep neural network loss landscapes in Tensorflow and PyTorch. Introducing more efficient mathematical formulations and a novel parallelization scheme, GradVis allows to plot 2d and 3d projections of optimization surfaces and trajectories, as well as high resolution second order gradient information for large networks.

READ FULL TEXT

page 7

page 9

research
05/28/2019

A Gram-Gauss-Newton Method Learning Overparameterized Deep Neural Networks for Regression Problems

First-order methods such as stochastic gradient descent (SGD) are curren...
research
01/22/2018

Rover Descent: Learning to optimize by learning to navigate on prototypical loss surfaces

Learning to optimize - the idea that we can learn from data algorithms t...
research
06/20/2023

No Wrong Turns: The Simple Geometry Of Neural Networks Optimization Paths

Understanding the optimization dynamics of neural networks is necessary ...
research
06/17/2017

Variants of RMSProp and Adagrad with Logarithmic Regret Bounds

Adaptive gradient methods have become recently very popular, in particul...
research
05/30/2015

Saddle-free Hessian-free Optimization

Nonconvex optimization problems such as the ones in training deep neural...
research
03/13/2017

Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks

Minimizing non-convex and high-dimensional objective functions is challe...
research
12/16/2019

A Deep Neural Network's Loss Surface Contains Every Low-dimensional Pattern

The work "Loss Landscape Sightseeing with Multi-Point Optimization" (Sko...

Please sign up or login with your details

Forgot password? Click here to reset