DART: Open-Domain Structured Data Record to Text Generation

07/06/2020
by   Dragomir Radev, et al.
0

We introduce DART, a large dataset for open-domain structured data record to text generation. We consider the structured data record input as a set of RDF entity-relation triples, a format widely used for knowledge representation and semantics description. DART consists of 82,191 examples across different domains with each input being a semantic RDF triple set derived from data records in tables and the tree ontology of the schema, annotated with sentence descriptions that cover all facts in the triple set. This hierarchical, structured format with its open-domain nature differentiates DART from other existing table-to-text corpora. We conduct an analysis of DART on several state-of-the-art text generation models, showing that it introduces new and interesting challenges compared to existing datasets. Furthermore, we demonstrate that finetuning pretrained language models on DART facilitates out-of-domain generalization on the WebNLG 2017 dataset. DART is available at https://github.com/Yale-LILY/dart.

READ FULL TEXT

page 2

page 16

research
02/27/2023

TabGenie: A Toolkit for Table-to-Text Generation

Heterogenity of data-to-text generation datasets limits the research on ...
research
05/19/2023

STOAT: Structured Data to Analytical Text With Controls

Recent language models have made tremendous progress in the structured d...
research
02/08/2023

COMBO: A Complete Benchmark for Open KG Canonicalization

Open knowledge graph (KG) consists of (subject, relation, object) triple...
research
02/12/2021

Querying collections of tree-structured records in the presence of within-record referential constraints

In this paper, we consider a tree-structured data model used in many com...
research
10/05/2018

Scalable Micro-planned Generation of Discourse from Structured Data

We present a framework for generating natural language description from ...
research
12/05/2022

Momentum Decoding: Open-ended Text Generation As Graph Exploration

Open-ended text generation with autoregressive language models (LMs) is ...
research
10/03/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

The Data-to-Text task aims to generate human-readable text for describin...

Please sign up or login with your details

Forgot password? Click here to reset