Unsupervised Neural Machine Translation with Generative Language Models Only

10/11/2021
by   Jesse Michael Han, et al.
4

We show how to derive state-of-the-art unsupervised neural machine translation systems from generatively pre-trained language models. Our method consists of three steps: few-shot amplification, distillation, and backtranslation. We first use the zero-shot translation ability of large pre-trained language models to generate translations for a small set of unlabeled sentences. We then amplify these zero-shot translations by using them as few-shot demonstrations for sampling a larger synthetic dataset. This dataset is distilled by discarding the few-shot demonstrations and then fine-tuning. During backtranslation, we repeatedly generate translations for a set of inputs and then fine-tune a single language model on both directions of the translation task at once, ensuring cycle-consistency by swapping the roles of gold monotext and generated translations when fine-tuning. By using our method to leverage GPT-3's zero-shot translation capability, we achieve a new state-of-the-art in unsupervised translation on the WMT14 English-French benchmark, attaining a BLEU score of 42.1.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2021

Language Tags Matter for Zero-Shot Neural Machine Translation

Multilingual Neural Machine Translation (MNMT) has aroused widespread in...
research
10/01/2022

MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation

Large pre-trained language models have brought remarkable progress in NL...
research
11/08/2022

Conciseness: An Overlooked Language Task

We report on novel investigations into training models that make sentenc...
research
09/20/2023

A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models

Generative Large Language Models (LLMs) have achieved remarkable advance...
research
09/29/2022

Bidirectional Language Models Are Also Few-shot Learners

Large language models such as GPT-3 (Brown et al., 2020) can perform arb...
research
09/09/2021

AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models

Despite their success in a variety of NLP tasks, pre-trained language mo...
research
05/29/2023

LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Recently, large-scale pre-trained Vision and Language (VL) models have s...

Please sign up or login with your details

Forgot password? Click here to reset