Mitigating Label Biases for In-context Learning

by   Yu Fei, et al.

Various design settings for in-context learning (ICL), such as the choice and order of the in-context examples, can bias the model's predictions. While many studies discuss these design choices, there have been few systematic investigations into categorizing them and mitigating their impact. In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias (which we conceptualize and detect for the first time). Our analysis demonstrates that prior label bias calibration methods fall short of addressing all three types of biases. Specifically, domain-label bias restricts LLMs to random-level performance on many tasks regardless of the choice of in-context examples. To mitigate the effect of these biases, we propose a simple bias calibration method that estimates a language model's label bias using random in-domain words from the task corpus. After controlling for this estimated bias when making predictions, our novel domain-context calibration significantly improves the ICL performance of GPT-J and GPT-3 on a wide range of tasks. The gain is substantial on tasks with large domain-label bias (up to 37 Macro-F1). Furthermore, our results generalize to models with different scales, pretraining methods, and manually-designed task instructions, showing the prevalence of label biases in ICL.


page 5

page 6

page 11


In-Contextual Bias Suppression for Large Language Models

Despite their impressive performance in a wide range of NLP tasks, Large...

Interactions of Linguistic and Domain Overhypotheses in Category Learning

For humans learning to categorize and distinguish parts of the world, th...

Mitigating Label Bias via Decoupled Confident Learning

Growing concerns regarding algorithmic fairness have led to a surge in m...

The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence

As the production of and reliance on datasets to produce automated decis...

An Experimental Evaluation of a De-biasing Intervention for Professional Software Developers

CONTEXT: The role of expert judgement is essential in our quest to impro...

Towards Debiasing NLU Models from Unknown Biases

NLU models often exploit biases to achieve high dataset-specific perform...

How User Language Affects Conflict Fatality Estimates in ChatGPT

OpenAI's ChatGPT language model has gained popularity as a powerful tool...

Please sign up or login with your details

Forgot password? Click here to reset