Post-mortem on a deep learning contest: a Simpson's paradox and the complementary roles of scale metrics versus shape metrics

06/01/2021
by   Charles H. Martin, et al.
8

To understand better the causes of good generalization performance in state-of-the-art neural network (NN) models, we analyze of a corpus of models that was made publicly-available for a contest to predict the generalization accuracy of NNs. These models include a wide range of qualities and were trained with a range of architectures and regularization hyperparameters. We identify what amounts to a Simpson's paradox: where "scale" metrics (from traditional statistical learning theory) perform well overall but perform poorly on subpartitions of the data of a given depth, when regularization hyperparameters are varied; and where "shape" metrics (from Heavy-Tailed Self Regularization theory) perform well on subpartitions of the data, when hyperparameters are varied for models of a given depth, but perform poorly overall when models with varying depths are aggregated. Our results highlight the subtly of comparing models when both architectures and hyperparameters are varied, as well as the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality. Our results also suggest caution when one tries to extract causal insight with a single metric applied to aggregate data, and they highlight the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of state-of-the-art NN models. Based on these findings, we present two novel shape metrics, one data-independent, and the other data-dependent, which can predict trends in the test accuracy of a series of NNs, of a fixed architecture/depth, when varying solver hyperparameters.

READ FULL TEXT
research
02/17/2020

Predicting trends in the quality of state-of-the-art neural networks without access to training or testing data

In many applications, one works with deep neural network (DNN) models tr...
research
01/24/2019

Traditional and Heavy-Tailed Self Regularization in Neural Network Models

Random Matrix Theory (RMT) is applied to analyze the weight matrices of ...
research
02/19/2019

Learning Optimal Linear Regularizers

We present algorithms for efficiently learning regularizers that improve...
research
07/21/2017

Why We Need New Evaluation Metrics for NLG

The majority of NLG evaluation relies on automatic metrics, such as BLEU...
research
06/02/2019

An Empirical Study on Hyperparameters and their Interdependence for RL Generalization

Recent results in Reinforcement Learning (RL) have shown that agents wit...
research
07/23/2021

Taxonomizing local versus global structure in neural network loss landscapes

Viewing neural network models in terms of their loss landscapes has a lo...
research
02/06/2022

Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data

The search for effective and robust generalization metrics has been the ...

Please sign up or login with your details

Forgot password? Click here to reset