Bridging Topic, Domain, and Language Shifts: An Evaluation of Comprehensive Out-of-Distribution Scenarios

09/15/2023
by   Andreas Waldis, et al.
0

Language models (LMs) excel in in-distribution (ID) scenarios where train and test data are independent and identically distributed. However, their performance often degrades in real-world applications like argument mining. Such degradation happens when new topics emerge, or other text domains and languages become relevant. To assess LMs' generalization abilities in such out-of-distribution (OOD) scenarios, we simulate such distribution shifts by deliberately withholding specific instances for testing, as from the social media domain or the topic Solar Energy. Unlike prior studies focusing on specific shifts and metrics in isolation, we comprehensively analyze OOD generalization. We define three metrics to pinpoint generalization flaws and propose eleven classification tasks covering topic, domain, and language shifts. Overall, we find superior performance of prompt-based fine-tuning, notably when train and test splits primarily differ semantically. Simultaneously, in-context learning is more effective than prompt-based or vanilla fine-tuning for tasks when training data embodies heavy discrepancies in label distribution compared to testing data. This reveals a crucial drawback of gradient-based learning: it biases LMs regarding such structural obstacles.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/07/2023

Revisiting Out-of-distribution Robustness in NLP: Benchmark, Analysis, and LLMs Evaluations

This paper reexamines the research on out-of-distribution (OOD) robustne...
research
06/09/2023

How Does Fine-Tuning Impact Out-of-Distribution Detection for Vision-Language Models?

Recent large vision-language models such as CLIP have shown remarkable o...
research
05/03/2023

Using Language Models on Low-end Hardware

This paper evaluates the viability of using fixed language models for tr...
research
09/19/2023

Test-Time Training for Speech

In this paper, we study the application of Test-Time Training (TTT) as a...
research
01/30/2023

Fine-Tuning Deteriorates General Textual Out-of-Distribution Detection by Distorting Task-Agnostic Features

Detecting out-of-distribution (OOD) inputs is crucial for the safe deplo...
research
01/13/2023

Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing

Standard fine-tuning of language models typically performs well on in-di...
research
05/05/2022

Toward A Fine-Grained Analysis of Distribution Shifts in MSMARCO

Recent IR approaches based on Pretrained Language Models (PLM) have now ...

Please sign up or login with your details

Forgot password? Click here to reset