Scalable Micro-planned Generation of Discourse from Structured Data

10/05/2018
by   Anirban Laha, et al.
0

We present a framework for generating natural language description from structured data such as tables. Motivated by the need to approach this problem in a manner that is scalable and easily adaptable to newer domains, unlike existing related systems, our system does not require parallel data; it rather relies on monolingual corpora and basic NLP tools which are easily accessible. The system employs a 3-staged pipeline that: (i) converts entries in the structured data to canonical form, (ii) generates simple sentences for each atomic entry in the canonicalized representation, and (iii) combines the sentences to produce a coherent, fluent and adequate paragraph description through sentence compounding and co-reference replacement modules. Experiments on a benchmark mixed-domain dataset curated for paragraph description from tables reveals the superiority of our system over existing data-to-text approaches. We also demonstrate the robustness of our system in accepting other data types such as Knowledge-Graphs and Key-Value dictionaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/21/2021

An Inference Approach To Question Answering Over Knowledge Graphs

Knowledge Graphs (KG) act as a great tool for holding distilled informat...
research
05/17/2020

TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data

Recent years have witnessed the burgeoning of pretrained language models...
research
07/06/2020

DART: Open-Domain Structured Data Record to Text Generation

We introduce DART, a large dataset for open-domain structured data recor...
research
04/21/2018

Unsupervised Natural Language Generation with Denoising Autoencoders

Generating text from structured data is important for various tasks such...
research
06/22/2021

ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences

Atomic clauses are fundamental text units for understanding complex sent...
research
04/20/2018

A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization

Structured data summarization involves generation of natural language su...
research
03/22/2016

Latent Predictor Networks for Code Generation

Many language generation tasks require the production of text conditione...

Please sign up or login with your details

Forgot password? Click here to reset