Improving Translation Faithfulness of Large Language Models via Augmenting Instructions

08/24/2023
by   Yijie Chen, et al.
0

Large Language Models (LLMs) present strong general capabilities, and a current compelling challenge is stimulating their specialized capabilities, such as machine translation, through low-cost instruction tuning. The standard instruction-following data is sequentially organized as the concatenation of an instruction, an input, and a response. As the attention mechanism of LLMs has limitations on local focus, LLMs tend to focus more on the words or sentences nearby at each position. This leads to a high risk of instruction forgetting during decoding. To alleviate the above issues, We propose SWIE (Segment-Weighted Instruction Embedding) and an instruction-following dataset OVERMISS. SWIE improves the model instruction understanding by adding a global instruction representation on the following input and response representations. OVERMISS improves model faithfulness by comparing over-translation and miss-translation results with the correct translation. We apply our methods to two main-stream open-source LLMs, BLOOM and LLaMA. The experimental results demonstrate significant improvements in translation performance with SWIE based on BLOOMZ-3b, particularly in zero-shot and long text translations due to reduced instruction forgetting risk. Additionally, OVERMISS outperforms the baseline in translation performance (e.g. an increase in BLEU scores from 0.69 to 3.12 and an average improvement of 0.48 percentage comet scores for LLaMA-7b) with further enhancements seen in models combining OVERMISS and SWIE (e.g. the BLUE scores increase up to 0.56 from English to German across three different backbones), and both exhibit improvements in the faithfulness metric based on word alignment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/23/2023

Instruction Position Matters in Sequence Generation with Large Language Models

Large language models (LLMs) are capable of performing conditional seque...
research
06/19/2023

BayLing: Bridging Cross-lingual Alignment and Instruction Following through Interactive Translation for Large Language Models

Large language models (LLMs) have demonstrated remarkable prowess in lan...
research
04/05/2023

ParroT: Translating During Chat Using Large Language Models

Large language models (LLMs) like ChatGPT and GPT-4 have exhibited remar...
research
05/24/2023

Eliciting the Translation Ability of Large Language Models via Multilingual Finetuning with Translation Instructions

Large-scale Pretrained Language Models (LLMs), such as ChatGPT and GPT4,...
research
07/10/2023

TIM: Teaching Large Language Models to Translate with Comparison

Open-sourced large language models (LLMs) have demonstrated remarkable e...
research
08/10/2023

A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment

Training large language models (LLMs) with open-domain instruction data ...
research
12/04/2022

Understanding How Model Size Affects Few-shot Instruction Prompting

Large Language Models are affected by the phenomena of memorizing and fo...

Please sign up or login with your details

Forgot password? Click here to reset