Instruction Position Matters in Sequence Generation with Large Language Models

08/23/2023
by   Yijin Liu, et al.
0

Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization, through instruction fine-tuning. The fine-tuning data is generally sequentially concatenated from a specific task instruction, an input sentence, and the corresponding response. Considering the locality modeled by the self-attention mechanism of LLMs, these models face the risk of instruction forgetting when generating responses for long input sentences. To mitigate this issue, we propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences. Theoretical analysis suggests that our straightforward method can alter the model's learning focus, thereby emphasizing the training of instruction-following capabilities. Concurrently, experimental results demonstrate that our approach consistently outperforms traditional settings across various model scales (1B / 7B / 13B) and different sequence generation tasks (translation and summarization), without any additional data or annotation costs. Notably, our method significantly improves the zero-shot performance on conditional sequence generation, e.g., up to 9.7 BLEU points on WMT zero-shot translation tasks.

READ FULL TEXT
research
08/24/2023

Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

Large Language Models (LLMs) present strong general capabilities, and a ...
research
06/20/2023

Evaluating the Zero-shot Robustness of Instruction-tuned Language Models

Instruction fine-tuning has recently emerged as a promising approach for...
research
11/08/2022

Conciseness: An Overlooked Language Task

We report on novel investigations into training models that make sentenc...
research
08/28/2023

Evaluating the Robustness to Instructions of Large Language Models

Recently, Instruction fine-tuning has risen to prominence as a potential...
research
05/22/2023

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

ChatGPT and GPT-4 have attracted substantial interest from both academic...
research
08/14/2023

EcomGPT: Instruction-tuning Large Language Model with Chain-of-Task Tasks for E-commerce

Recently, instruction-following Large Language Models (LLMs) , represent...
research
07/13/2023

AutoHint: Automatic Prompt Optimization with Hint Generation

This paper presents AutoHint, a novel framework for automatic prompt eng...

Please sign up or login with your details

Forgot password? Click here to reset