Understanding and Testing Generalization of Deep Networks on Out-of-Distribution Data

11/17/2021
by   Rui Hu, et al.
0

Deep network models perform excellently on In-Distribution (ID) data, but can significantly fail on Out-Of-Distribution (OOD) data. While developing methods focus on improving OOD generalization, few attention has been paid to evaluating the capability of models to handle OOD data. This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm to accurately evaluate the practical performance. Our analysis is based on an introduced categorization of three types of distribution shifts to generate OOD data. Main observations include: (1) ID test fails in neither reflecting the actual performance of a single model nor comparing between different models under OOD data. (2) The ID test failure can be ascribed to the learned marginal and conditional spurious correlations resulted from the corresponding distribution shifts. Based on this, we propose novel OOD test paradigms to evaluate the generalization capacity of models to unseen data, and discuss how to use OOD test results to find bugs of models to guide model debugging.

READ FULL TEXT
research
04/23/2022

Learning by Erasing: Conditional Entropy based Transferable Out-Of-Distribution Detection

Out-of-distribution (OOD) detection is essential to handle the distribut...
research
05/25/2022

ER-TEST: Evaluating Explanation Regularization Methods for NLP Models

Neural language models' (NLMs') reasoning processes are notoriously hard...
research
08/16/2023

It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models

Generative Transformer-based models have achieved remarkable proficiency...
research
03/28/2022

Understanding out-of-distribution accuracies through quantifying difficulty of test samples

Existing works show that although modern neural networks achieve remarka...
research
02/02/2023

Effective Robustness against Natural Distribution Shifts for Models with Different Training Data

“Effective robustness” measures the extra out-of-distribution (OOD) robu...
research
05/04/2023

On the nonlinear correlation of ML performance between data subpopulations

Understanding the performance of machine learning (ML) models across div...
research
07/07/2023

When does the ID algorithm fail?

The ID algorithm solves the problem of identification of interventional ...

Please sign up or login with your details

Forgot password? Click here to reset