A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

09/20/2023
by   Haoran Xu, et al.
0

Generative Large Language Models (LLMs) have achieved remarkable advancements in various NLP tasks. However, these advances have not been reflected in the translation task, especially those with moderate model sizes (i.e., 7B or 13B parameters), which still lag behind conventional supervised encoder-decoder translation models. Previous studies have attempted to improve the translation capabilities of these moderate LLMs, but their gains have been limited. In this study, we propose a novel fine-tuning approach for LLMs that is specifically designed for the translation task, eliminating the need for the abundant parallel data that traditional translation models usually depend on. Our approach consists of two fine-tuning stages: initial fine-tuning on monolingual data followed by subsequent fine-tuning on a small set of high-quality parallel data. We introduce the LLM developed through this strategy as Advanced Language Model-based trAnslator (ALMA). Based on LLaMA-2 as our underlying model, our results show that the model can achieve an average improvement of more than 12 BLEU and 12 COMET over its zero-shot performance across 10 translation directions from the WMT'21 (2 directions) and WMT'22 (8 directions) test datasets. The performance is significantly better than all prior work and even superior to the NLLB-54B model and GPT-3.5-text-davinci-003, with only 7B or 13B parameters. This method establishes the foundation for a novel training paradigm in machine translation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/11/2021

Unsupervised Neural Machine Translation with Generative Language Models Only

We show how to derive state-of-the-art unsupervised neural machine trans...
research
05/28/2020

Language Models are Few-Shot Learners

Recent work has demonstrated substantial gains on many NLP tasks and ben...
research
09/13/2023

Simultaneous Machine Translation with Large Language Models

Large language models (LLM) have demonstrated their abilities to solve v...
research
06/11/2023

Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective

Molecule discovery plays a crucial role in various scientific fields, ad...
research
08/26/2019

Transductive Data-Selection Algorithms for Fine-Tuning Neural Machine Translation

Machine Translation models are trained to translate a variety of documen...
research
09/07/2021

Don't Go Far Off: An Empirical Study on Neural Poetry Translation

Despite constant improvements in machine translation quality, automatic ...
research
05/08/2022

Context-Aware Abbreviation Expansion Using Large Language Models

Motivated by the need for accelerating text entry in augmentative and al...

Please sign up or login with your details

Forgot password? Click here to reset