Are Out-of-Distribution Detection Methods Reliable?

11/20/2022
by   Vahid Reza Khazaie, et al.
7

This paper establishes a novel evaluation framework for assessing the performance of out-of-distribution (OOD) detection in realistic settings. Our goal is to expose the shortcomings of existing OOD detection benchmarks and encourage a necessary research direction shift toward satisfying the requirements of real-world applications. We expand OOD detection research by introducing new OOD test datasets CIFAR-10-R, CIFAR-100-R, and MVTec-R, which allow researchers to benchmark OOD detection performance under realistic distribution shifts. We also introduce a generalizability score to measure a method's ability to generalize from standard OOD detection test datasets to a realistic setting. Contrary to existing OOD detection research, we demonstrate that further performance improvements on standard benchmark datasets do not increase the usability of such models in the real world. State-of-the-art (SOTA) methods tested on our realistic distributionally-shifted datasets drop in performance for up to 45 reliability of OOD models before they are deployed in real-world environments.

READ FULL TEXT

page 3

page 5

research
02/16/2023

Unsupervised Evaluation of Out-of-distribution Detection: A Data-centric Perspective

Out-of-distribution (OOD) detection methods assume that they have test g...
research
11/25/2022

Wild-Time: A Benchmark of in-the-Wild Distribution Shift over Time

Distribution shift occurs when the test distribution differs from the tr...
research
10/26/2017

DoShiCo: a Domain Shift Challenge for Control

Training deep neural control networks end-to-end for real-world applicat...
research
12/24/2022

Out-of-Distribution Detection with Reconstruction Error and Typicality-based Penalty

The task of out-of-distribution (OOD) detection is vital to realize safe...
research
06/15/2023

OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection

Out-of-Distribution (OOD) detection is critical for the reliable operati...
research
07/03/2022

Identifying the Context Shift between Test Benchmarks and Production Data

Across a wide variety of domains, there exists a performance gap between...
research
11/24/2022

Turning the Tables: Biased, Imbalanced, Dynamic Tabular Datasets for ML Evaluation

Evaluating new techniques on realistic datasets plays a crucial role in ...

Please sign up or login with your details

Forgot password? Click here to reset