Shape of synth to come: Why we should use synthetic data for English surface realization

05/06/2020
by   Henry Elder, et al.
0

The Surface Realization Shared Tasks of 2018 and 2019 were Natural Language Generation shared tasks with the goal of exploring approaches to surface realization from Universal-Dependency-like trees to surface strings for several languages. In the 2018 shared task there was very little difference in the absolute performance of systems trained with and without additional, synthetically created data, and a new rule prohibiting the use of synthetic data was introduced for the 2019 shared task. Contrary to the findings of the 2018 shared task, we show, in experiments on the English 2018 dataset, that the use of synthetic data can have a substantial positive effect - an improvement of almost 8 BLEU points for a previously state-of-the-art system. We analyse the effects of synthetic data, and we argue that its use should be encouraged rather than prohibited so that future research efforts continue to explore systems that can take advantage of such data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/28/2023

HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-Information for Multi-Level Sexism Classification

We present the findings of our participation in the SemEval-2023 Task 10...
research
06/17/2016

Sequence-to-Sequence Generation for Spoken Dialogue via Deep Syntax Trees and Strings

We present a natural language generator based on the sequence-to-sequenc...
research
10/01/2020

WeChat Neural Machine Translation Systems for WMT20

We participate in the WMT 2020 shared news translation task on Chinese t...
research
05/20/2018

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

This work presents a new state of the art in reconstruction of surface r...
research
09/12/2022

Rule-adhering synthetic data – the lingua franca of learning

AI-generated synthetic data allows to distill the general patterns of ex...
research
05/24/2019

Designing a Symbolic Intermediate Representation for Neural Surface Realization

Generated output from neural NLG systems often contain errors such as ha...
research
05/21/2020

Improving American Sign Language Recognition with Synthetic Data

There is a need for real-time communication between the deaf and hearing...

Please sign up or login with your details

Forgot password? Click here to reset