Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift

06/27/2022
by   Christina Baek, et al.
0

Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear correlation with its out-of-distribution (OOD) accuracy on several OOD benchmarks – a phenomenon they dubbed ”accuracy-on-the-line”. While a useful tool for model selection (i.e., the model most likely to perform the best OOD is the one with highest ID accuracy), this fact does not help estimate the actual OOD performance of models without access to a labeled OOD validation set. In this paper, we show a similar but surprising phenomenon also holds for the agreement between pairs of neural network classifiers: whenever accuracy-on-the-line holds, we observe that the OOD agreement between the predictions of any two pairs of neural networks (with potentially different architectures) also observes a strong linear correlation with their ID agreement. Furthermore, we observe that the slope and bias of OOD vs ID agreement closely matches that of OOD vs ID accuracy. This phenomenon, which we call agreement-on-the-line, has important practical applications: without any labeled data, we can predict the OOD accuracy of classifiers, since OOD agreement can be estimated with just unlabeled data. Our prediction algorithm outperforms previous methods both in shifts where agreement-on-the-line holds and, surprisingly, when accuracy is not on the line. This phenomenon also provides new insights into deep neural networks: unlike accuracy-on-the-line, agreement-on-the-line appears to only hold for neural network classifiers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2022

Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift

We often see undesirable tradeoffs in robust machine learning where out-...
research
05/04/2023

On the nonlinear correlation of ML performance between data subpopulations

Understanding the performance of machine learning (ML) models across div...
research
05/19/2021

When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics

Although a plethora of architectural variants for deep classification ha...
research
08/14/2019

Unsupervised Out-of-Distribution Detection by Maximum Classifier Discrepancy

Since deep learning models have been implemented in many commercial appl...
research
01/11/2022

Leveraging Unlabeled Data to Predict Out-of-Distribution Performance

Real-world machine learning deployments are characterized by mismatches ...
research
09/12/2021

No True State-of-the-Art? OOD Detection Methods are Inconsistent across Datasets

Out-of-distribution detection is an important component of reliable ML s...
research
10/12/2020

The Extraordinary Failure of Complement Coercion Crowdsourcing

Crowdsourcing has eased and scaled up the collection of linguistic annot...

Please sign up or login with your details

Forgot password? Click here to reset