Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

12/06/2021
by   Shailza Jolly, et al.
0

Data-to-text generation systems aim to generate text descriptions based on input data (often represented in the tabular form). A typical system uses huge training samples for learning the correspondence between tables and texts. However, large training sets are expensive to obtain, limiting the applicability of these approaches in real-world scenarios. In this work, we focus on few-shot data-to-text generation. We observe that, while fine-tuned pretrained language models may generate plausible sentences, they suffer from the low semantic coverage problem in the few-shot setting. In other words, important input slots tend to be missing in the generated text. To this end, we propose a search-and-learning approach that leverages pretrained language models but inserts the missing slots to improve the semantic coverage. We further fine-tune our system based on the search results to smooth out the search noise, yielding better-quality text and improving inference efficiency to a large extent. Experiments show that our model achieves high performance on E2E and WikiBio datasets. Especially, we cover 98.35 largely alleviating the low coverage problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2022

A Survey of Pretrained Language Models Based Text Generation

Text Generation aims to produce plausible and readable text in human lan...
research
06/03/2021

Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models

This paper studies how to automatically generate a natural language text...
research
10/09/2022

ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

Data-to-text generation is challenging due to the great variety of the i...
research
12/16/2022

MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation

Prompting large language models has enabled significant recent progress ...
research
02/06/2021

Does the Order of Training Samples Matter? Improving Neural Data-to-Text Generation with Curriculum Learning

Recent advancements in data-to-text generation largely take on the form ...
research
09/22/2017

Sentence Correction Based on Large-scale Language Modelling

With the further development of informatization, more and more data is s...
research
04/05/2020

Semantics of the Unwritten

The semantics of a text is manifested not only by what is read, but also...

Please sign up or login with your details

Forgot password? Click here to reset