Investigating Selective Prediction Approaches Across Several Tasks in IID, OOD, and Adversarial Settings

03/01/2022
by   Neeraj Varshney, et al.
1

In order to equip NLP systems with selective prediction capability, several task-specific approaches have been proposed. However, which approaches work best across tasks or even if they consistently outperform the simplest baseline 'MaxProb' remains to be explored. To this end, we systematically study 'selective prediction' in a large-scale setup of 17 datasets across several NLP tasks. Through comprehensive experiments under in-domain (IID), out-of-domain (OOD), and adversarial (ADV) settings, we show that despite leveraging additional resources (held-out data/computation), none of the existing approaches consistently and considerably outperforms MaxProb in all three settings. Furthermore, their performance does not translate well across tasks. For instance, Monte-Carlo Dropout outperforms all other approaches on Duplicate Detection datasets but does not fare well on NLI datasets, especially in the OOD setting. Thus, we recommend that future selective prediction approaches should be evaluated across tasks and settings for reliable estimation of their capabilities.

READ FULL TEXT
research
05/03/2020

How Does Selective Mechanism Improve Self-Attention Networks?

Self-attention networks (SANs) with selective mechanism has produced sub...
research
05/11/2022

Towards Unified Prompt Tuning for Few-shot Text Classification

Prompt-based fine-tuning has boosted the performance of Pre-trained Lang...
research
07/01/2021

Interviewer-Candidate Role Play: Towards Developing Real-World NLP Systems

Standard NLP tasks do not incorporate several common real-world scenario...
research
02/23/2023

What Can We Learn From The Selective Prediction And Uncertainty Estimation Performance Of 523 Imagenet Classifiers

When deployed for risk-sensitive tasks, deep neural networks must includ...
research
05/02/2023

Post-Abstention: Towards Reliably Re-Attempting the Abstained Instances in QA

Despite remarkable progress made in natural language processing, even th...
research
04/17/2023

On Uncertainty Calibration and Selective Generation in Probabilistic Neural Summarization: A Benchmark Study

Modern deep models for summarization attains impressive benchmark perfor...
research
09/18/2018

Testing Selective Influence Directly Using Trackball Movement Tasks

Systems factorial technology (SFT; Townsend & Nozawa, 1995) is regarded ...

Please sign up or login with your details

Forgot password? Click here to reset