Generating Wikipedia Article Sections from Diverse Data Sources

by   Mingda Chen, et al.

Datasets for data-to-text generation typically focus either on multi-domain, single-sentence generation or on single-domain, long-form generation. In this work, we create a large-scale dataset, WikiTableT, that pairs Wikipedia sections with their corresponding tabular data and various metadata. WikiTableT contains millions of instances, covering a broad range of topics, as well as a variety of flavors of generation tasks with different levels of flexibility. We benchmark several training and decoding strategies on WikiTableT. Our qualitative analysis shows that the best approaches can generate fluent and high quality texts but they sometimes struggle with coherence.


page 1

page 2

page 3

page 4


Towards Content Transfer through Grounded Text Generation

Recent work in neural generation has attracted significant interest in c...

WikiGraphs: A Wikipedia Text - Knowledge Graph Paired Dataset

We present a new dataset of Wikipedia articles each paired with a knowle...

Long and Diverse Text Generation with Planning-based Hierarchical Variational Model

Existing neural methods for data-to-text generation are still struggling...

FRUIT: Faithfully Reflecting Updated Information in Text

Textual knowledge bases such as Wikipedia require considerable effort to...

A Generative Approach to Titling and Clustering Wikipedia Sections

We evaluate the performance of transformer encoders with various decoder...

Massive-scale Decoding for Text Generation using Lattices

Neural text generation models like those used for summarization and tran...

PatentTransformer-2: Controlling Patent Text Generation by Structural Metadata

PatentTransformer is our codename for patent text generation based on Tr...