Scalable Micro-planned Generation of Discourse from Structured Data

by   Anirban Laha, et al.

We present a framework for generating natural language description from structured data such as tables. Motivated by the need to approach this problem in a manner that is scalable and easily adaptable to newer domains, unlike existing related systems, our system does not require parallel data; it rather relies on monolingual corpora and basic NLP tools which are easily accessible. The system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other data types such as Knowledge-Graphs and Key-Value dictionaries.


page 1

page 2

page 3

page 4


An Inference Approach To Question Answering Over Knowledge Graphs

Knowledge Graphs (KG) act as a great tool for holding distilled informat...

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Recent years have witnessed the burgeoning of pretrained language models...

DART: Open-Domain Structured Data Record to Text Generation

We introduce DART, a large dataset for open-domain structured data recor...

Unsupervised Natural Language Generation with Denoising Autoencoders

Generating text from structured data is important for various tasks such...

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

Atomic clauses are fundamental text units for understanding complex sent...

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language su...

Latent Predictor Networks for Code Generation

Many language generation tasks require the production of text conditione...

Please sign up or login with your details

Forgot password? Click here to reset