Taxonomizing local versus global structure in neural network loss landscapes

07/23/2021
by   Yaoqing Yang, et al.
9

Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization). Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. By considering a range of metrics that attempt to capture different aspects of the loss landscape, we demonstrate that the best test accuracy is obtained when: the loss landscape is globally well-connected; ensembles of trained models are more similar to each other; and models converge to locally smooth regions. We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data; and that, if the loss landscape is globally poorly-connected, then training to zero loss can actually lead to worse test accuracy. Based on these results, we develop a simple one-dimensional model with load-like and temperature-like parameters, we introduce the notion of an effective loss landscape depending on these parameters, and we interpret our results in terms of a rugged convexity of the loss landscape. When viewed through this lens, our detailed empirical results shed light on phases of learning (and consequent double descent behavior), fundamental versus incidental determinants of good generalization, the role of load-like and temperature-like parameters in the learning process, different influences on the loss landscape from model and data, and the relationships between local and global metrics, all topics of recent interest.

READ FULL TEXT

page 9

page 10

page 11

page 25

page 29

page 32

page 33

page 34

research
05/28/2023

A Three-regime Model of Network Pruning

Recent work has highlighted the complex influence training hyperparamete...
research
04/09/2022

FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks

Despite their effective use in various fields, many aspects of neural ne...
research
06/21/2017

The energy landscape of a simple neural network

We explore the energy landscape of a simple neural network. In particula...
research
09/21/2022

Deep Double Descent via Smooth Interpolation

Overparameterized deep networks are known to be able to perfectly fit th...
research
02/12/2023

Data efficiency and extrapolation trends in neural network interatomic potentials

Over the last few years, key architectural advances have been proposed f...
research
10/26/2017

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

We describe an approach to understand the peculiar and counterintuitive ...
research
06/01/2021

Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

To understand better the causes of good generalization performance in st...

Please sign up or login with your details

Forgot password? Click here to reset