Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?

09/04/2023
by   Leon Weber-Genzel, et al.
0

Instruction-tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality issues of gold-standard labels. But so far, the application of AED methods is limited to discriminative settings. It is an open question how well AED methods generalize to generative settings which are becoming widespread via generative LLMs. In this work, we present a first and new benchmark for AED on instruction-tuning data: Donkii. It encompasses three instruction-tuning datasets enriched with annotations by experts and semi-automatic methods. We find that all three datasets contain clear-cut errors that sometimes directly propagate into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them comprehensively on the newly introduced dataset. Our results demonstrate that choosing the right AED method and model size is indeed crucial, thereby deriving practical recommendations. To gain insights, we provide a first case-study to examine how the quality of the instruction-tuning datasets influences downstream performance.

READ FULL TEXT
research
09/07/2023

From Base to Conversational: Japanese Instruction Dataset and Tuning Large Language Models

Instruction tuning is essential for large language models (LLMs) to beco...
research
08/21/2023

Instruction Tuning for Large Language Models: A Survey

This paper surveys research works in the quickly advancing field of inst...
research
09/05/2023

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

Nowadays, the research on Large Vision-Language Models (LVLMs) has been ...
research
07/28/2023

Exploring Format Consistency for Instruction Tuning

Instruction tuning has emerged as a promising approach to enhancing larg...
research
08/23/2023

From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning

In the realm of Large Language Models, the balance between instruction d...
research
08/31/2023

Enhancing PLM Performance on Labour Market Tasks via Instruction-based Finetuning and Prompt-tuning with Rules

The increased digitization of the labour market has given researchers, e...
research
05/01/2023

Poisoning Language Models During Instruction Tuning

Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetun...

Please sign up or login with your details

Forgot password? Click here to reset