Beyond AUROC co. for evaluating out-of-distribution detection performance

While there has been a growing research interest in developing out-of-distribution (OOD) detection methods, there has been comparably little discussion around how these methods should be evaluated. Given their relevance for safe(r) AI, it is important to examine whether the basis for comparing OOD detection methods is consistent with practical needs. In this work, we take a closer look at the go-to metrics for evaluating OOD detection, and question the approach of exclusively reducing OOD detection to a binary classification task with little consideration for the detection threshold. We illustrate the limitations of current metrics (AUROC its friends) and propose a new metric - Area Under the Threshold Curve (AUTC), which explicitly penalizes poor separation between ID and OOD samples. Scripts and data are available at https://github.com/glhr/beyond-auroc

READ FULL TEXT

page 1

page 4

research
06/02/2023

LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

We present a novel vision-language prompt learning approach for few-shot...
research
11/30/2020

Feature Space Singularity for Out-of-Distribution Detection

Out-of-Distribution (OoD) detection is important for building safe artif...
research
08/30/2018

Towards a Better Metric for Evaluating Question Generation Systems

There has always been criticism for using n-gram based similarity metric...
research
08/23/2023

CLIPN for Zero-Shot OOD Detection: Teaching CLIP to Say No

Out-of-distribution (OOD) detection refers to training the model on an i...
research
07/15/2022

Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data

Detecting out-of-distribution (OOD) data is a task that is receiving an ...
research
06/01/2016

On the equivalence between Kolmogorov-Smirnov and ROC curve metrics for binary classification

Binary decisions are very common in artificial intelligence. Applying a ...
research
03/27/2019

Tightness-aware Evaluation Protocol for Scene Text Detection

Evaluation protocols play key role in the developmental progress of text...

Please sign up or login with your details

Forgot password? Click here to reset