Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

06/20/2023
by   Jiuding Sun, et al.
0

Instruction fine-tuning has recently emerged as a promising approach for improving the zero-shot capabilities of Large Language Models (LLMs) on new tasks. This technique has shown particular strength in improving the performance of modestly sized LLMs, sometimes inducing performance competitive with much larger model variants. In this paper we ask two questions: (1) How sensitive are instruction-tuned models to the particular phrasings of instructions, and, (2) How can we make them more robust to such natural language variation? To answer the former, we collect a set of 319 instructions manually written by NLP practitioners for over 80 unique tasks included in widely used benchmarks, and we evaluate the variance and average performance of these instructions as compared to instruction phrasings observed during instruction fine-tuning. We find that using novel (unobserved) but appropriate instruction phrasings consistently degrades model performance, sometimes substantially so. Further, such natural instructions yield a wide variance in downstream performance, despite their semantic equivalence. Put another way, instruction-tuned models are not especially robust to instruction re-phrasings. We propose a simple method to mitigate this issue by introducing “soft prompt” embedding parameters and optimizing these to maximize the similarity between representations of semantically equivalent instructions. We show that this method consistently improves the robustness of instruction-tuned models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2023

Evaluating the Robustness to Instructions of Large Language Models

Recently, Instruction fine-tuning has risen to prominence as a potential...
research
09/14/2023

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

Training large language models to follow instructions makes them perform...
research
03/17/2022

How Many Data Samples is an Additional Instruction Worth?

Recently introduced instruction-paradigm empowers non-expert users to le...
research
04/24/2023

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

The immense scale of the recent large language models (LLM) allows many ...
research
04/19/2022

What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment

The instruction learning paradigm – where a model learns to perform new ...
research
08/23/2023

Instruction Position Matters in Sequence Generation with Large Language Models

Large language models (LLMs) are capable of performing conditional seque...
research
08/02/2023

Evaluating Instruction-Tuned Large Language Models on Code Comprehension and Generation

In this work, we evaluate 10 open-source instructed LLMs on four represe...

Please sign up or login with your details

Forgot password? Click here to reset