Generating Wikipedia Article Sections from Diverse Data Sources

12/29/2020
by   Mingda Chen, et al.
0

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.

READ FULL TEXT

page 1

page 2

page 3

page 4

05/13/2019

Towards Content Transfer through Grounded Text Generation

Recent work in neural generation has attracted significant interest in c...
07/20/2021

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

We present a new dataset of Wikipedia articles each paired with a knowle...
08/19/2019

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

Existing neural methods for data-to-text generation are still struggling...
12/16/2021

FRUIT: Faithfully Reflecting Updated Information in Text

Textual knowledge bases such as Wikipedia require considerable effort to...
05/22/2020

A Generative Approach to Titling and Clustering Wikipedia Sections

We evaluate the performance of transformer encoders with various decoder...
12/14/2021

Massive-scale Decoding for Text Generation using Lattices

Neural text generation models like those used for summarization and tran...
01/11/2020

PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata

PatentTransformer is our codename for patent text generation based on Tr...