Machine Translation Pre-training for Data-to-Text Generation – A Case Study in Czech

04/05/2020
by   Mihir Kale, et al.
0

While there is a large body of research studying deep learning methods for text generation from structured data, almost all of it focuses purely on English. In this paper, we study the effectiveness of machine translation based pre-training for data-to-text generation in non-English languages. Since the structured data is generally expressed in English, text generation into other languages involves elements of translation, transliteration and copying - elements already encoded in neural machine translation systems. Moreover, since data-to-text corpora are typically small, this task can benefit greatly from pre-training. Based on our experiments on Czech, a morphologically complex language, we find that pre-training lets us train end-to-end models with significantly improved performance, as judged by automatic metrics and human evaluation. We also show that this approach enjoys several desirable properties, including improved performance in low data scenarios and robustness to unseen slot values.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/18/2020

Unsupervised Pidgin Text Generation By Pivoting English Data and Self-Training

West African Pidgin English is a language that is significantly spoken i...
research
08/19/2023

Data-to-text Generation for Severely Under-Resourced Languages with GPT-3.5: A Bit of Help Needed from Google Translate

LLMs like GPT are great at tasks involving English which dominates in th...
research
01/23/2020

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

Sequence-to-sequence (S2S) pre-training using large monolingual data is ...
research
03/25/2022

Automatic Song Translation for Tonal Languages

This paper develops automatic song translation (AST) for tonal languages...
research
06/29/2023

Joint Level Generation and Translation Using Gameplay Videos

Procedural Content Generation via Machine Learning (PCGML) faces a signi...
research
04/05/2020

Semantics of the Unwritten

The semantics of a text is manifested not only by what is read, but also...
research
02/25/2019

Using logical form encodings for unsupervised linguistic transformation: Theory and applications

We present a novel method to architect automatic linguistic transformati...

Please sign up or login with your details

Forgot password? Click here to reset