Estimating Semantic Similarity between In-Domain and Out-of-Domain Samples

06/01/2023
by   Rhitabrat Pokharel, et al.
0

Prior work typically describes out-of-domain (OOD) or out-of-distribution (OODist) samples as those that originate from dataset(s) or source(s) different from the training set but for the same task. When compared to in-domain (ID) samples, the models have been known to usually perform poorer on OOD samples, although this observation is not consistent. Another thread of research has focused on OOD detection, albeit mostly using supervised approaches. In this work, we first consolidate and present a systematic analysis of multiple definitions of OOD and OODist as discussed in prior literature. Then, we analyze the performance of a model under ID and OOD/OODist settings in a principled way. Finally, we seek to identify an unsupervised method for reliably identifying OOD/OODist samples without using a trained model. The results of our extensive evaluation using 12 datasets from 4 different tasks suggest the promising potential of unsupervised metrics in this task.

READ FULL TEXT
research
09/24/2022

Raising the Bar on the Evaluation of Out-of-Distribution Detection

In image classification, a lot of development has happened in detecting ...
research
02/06/2023

Rethinking Out-of-distribution (OOD) Detection: Masked Image Modeling is All You Need

The core of out-of-distribution (OOD) detection is to learn the in-distr...
research
02/17/2022

Data-SUITE: Data-centric identification of in-distribution incongruous examples

Systematic quantification of data quality is critical for consistent mod...
research
10/14/2020

On Cross-Dataset Generalization in Automatic Detection of Online Abuse

NLP research has attained high performances in abusive language detectio...
research
01/13/2020

Incremental Unsupervised Domain-Adversarial Training of Neural Networks

In the context of supervised statistical learning, it is typically assum...
research
04/09/2022

Understanding, Detecting, and Separating Out-of-Distribution Samples and Adversarial Samples in Text Classification

In this paper, we study the differences and commonalities between statis...
research
07/04/2022

Embedding contrastive unsupervised features to cluster in- and out-of-distribution noise in corrupted image datasets

Using search engines for web image retrieval is a tempting alternative t...

Please sign up or login with your details

Forgot password? Click here to reset