Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

02/07/2023
by   Amirkeivan Mohtashami, et al.
0

Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for lower-resource languages. We show how knowledge can be distilled from Large Language Models (LLMs) to improve upon such learned metrics without requiring human annotators, by creating synthetic datasets which can be mixed into existing datasets, requiring only a corpus of text in the target language. We show that the performance of a BLEURT-like model on lower resource languages can be improved in this way.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/21/2021

Should we Stop Training More Monolingual Models, and Simply Use Machine Translation Instead?

Most work in NLP makes the assumption that it is desirable to develop so...
research
01/25/2022

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources

In recent years, large-scale data collection efforts have prioritized th...
research
09/27/2022

Improving Multilingual Neural Machine Translation System for Indic Languages

Machine Translation System (MTS) serves as an effective tool for communi...
research
09/15/2022

Rethinking Round-trip Translation for Automatic Machine Translation Evaluation

A parallel corpus is generally required to automatically evaluate the tr...
research
09/13/2023

Simultaneous Machine Translation with Large Language Models

Large language models (LLM) have demonstrated their abilities to solve v...
research
11/20/2022

A Theory of Unsupervised Translation Motivated by Understanding Animal Communication

Recent years have seen breakthroughs in neural language models that capt...
research
03/21/2023

Optical Character Recognition and Transcription of Berber Signs from Images in a Low-Resource Language Amazigh

The Berber, or Amazigh language family is a low-resource North African v...

Please sign up or login with your details

Forgot password? Click here to reset