PnPOOD : Out-Of-Distribution Detection for Text Classification via Plug andPlay Data Augmentation

10/31/2021
by   Mrinal Rawat, et al.
0

While Out-of-distribution (OOD) detection has been well explored in computer vision, there have been relatively few prior attempts in OOD detection for NLP classification. In this paper we argue that these prior attempts do not fully address the OOD problem and may suffer from data leakage and poor calibration of the resulting models. We present PnPOOD, a data augmentation technique to perform OOD detection via out-of-domain sample generation using the recently proposed Plug and Play Language Model (Dathathri et al., 2020). Our method generates high quality discriminative samples close to the class boundaries, resulting in accurate OOD detection at test time. We demonstrate that our model outperforms prior models on OOD sample detection, and exhibits lower calibration error on the 20 newsgroup text and Stanford Sentiment Treebank dataset (Lang, 1995; Socheret al., 2013). We further highlight an important data leakage issue with datasets used in prior attempts at OOD detection, and share results on a new dataset for OOD detection that does not suffer from the same problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2020

Text Data Augmentation: Towards better detection of spear-phishing emails

Text data augmentation, i.e. the creation of synthetic textual data from...
research
10/06/2022

Augmentor or Filter? Reconsider the Role of Pre-trained Language Model in Text Classification Augmentation

Text augmentation is one of the most effective techniques to solve the c...
research
03/03/2023

Exploring Data Augmentation Methods on Social Media Corpora

Data augmentation has proven widely effective in computer vision. In Nat...
research
12/16/2021

ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification

Data augmentation has been an important ingredient for boosting performa...
research
03/14/2022

On the Calibration of Pre-trained Language Models using Mixup Guided by Area Under the Margin and Saliency

A well-calibrated neural model produces confidence (probability outputs)...
research
04/04/2020

ObjectNet Dataset: Reanalysis and Correction

Recently, Barbu et al introduced a dataset called ObjectNet which includ...
research
05/06/2022

A Data Cartography based MixUp for Pre-trained Language Models

MixUp is a data augmentation strategy where additional samples are gener...

Please sign up or login with your details

Forgot password? Click here to reset