Document-Level Machine Translation with Large Language Models

by   Longyue Wang, et al.
Dublin City University

Large language models (LLMs) such as Chat-GPT can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. Taking document-level machine translation (MT) as a testbed, this paper provides an in-depth evaluation of LLMs' ability on discourse modeling. The study fo-cuses on three aspects: 1) Effects of Discourse-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of Chat-GPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and examine the impact of training techniques on discourse modeling. By evaluating a number of benchmarks, we surprisingly find that 1) leveraging their powerful long-text mod-eling capabilities, ChatGPT outperforms commercial MT systems in terms of human evaluation. 2) GPT-4 demonstrates a strong ability to explain discourse knowledge, even through it may select incorrect translation candidates in contrastive testing. 3) ChatGPT and GPT-4 have demonstrated superior performance and show potential to become a new and promising paradigm for document-level translation. This work highlights the challenges and opportunities of discourse modeling for LLMs, which we hope can inspire the future design and evaluation of LLMs.


page 1

page 2

page 3

page 4


A Bilingual Parallel Corpus with Discourse Annotations

Machine translation (MT) has almost achieved human parity at sentence-le...

Disco-Bench: A Discourse-Aware Evaluation Benchmark for Language Modelling

Modeling discourse – the linguistic phenomena that go beyond individual ...

Pragmatic Neural Language Modelling in Machine Translation

This paper presents an in-depth investigation on integrating neural lang...

Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus

Several recent papers claim human parity at sentence-level Machine Trans...

Deep Learning Methods for Extracting Metaphorical Names of Flowers and Plants

The domain of Botany is rich with metaphorical terms. Those terms play a...

Evaluating MT Systems: A Theoretical Framework

This paper outlines a theoretical framework using which different automa...

Please sign up or login with your details

Forgot password? Click here to reset