WDV: A Broad Data Verbalisation Dataset Built from Wikidata

05/05/2022
by   Gabriel Amaral, et al.
0

Data verbalisation is a task of great importance in the current field of natural language processing, as there is great benefit in the transformation of our abundant structured and semi-structured data into human-readable formats. Verbalising Knowledge Graph (KG) data focuses on converting interconnected triple-based claims, formed of subject, predicate, and object, into text. Although KG verbalisation datasets exist for some KGs, there are still gaps in their fitness for use in many scenarios. This is especially true for Wikidata, where available datasets either loosely couple claim sets with textual information or heavily focus on predicates around biographies, cities, and countries. To address these gaps, we propose WDV, a large KG claim verbalisation dataset built from Wikidata, with a tight coupling between triples and text, covering a wide variety of entities and predicates. We also evaluate the quality of our verbalisations through a reusable workflow for measuring human-centred fluency and adequacy scores. Our data and code are openly available in the hopes of furthering research towards KG verbalisation.

READ FULL TEXT
research
05/05/2016

Improving Automated Patent Claim Parsing: Dataset, System, and Experiments

Off-the-shelf natural language processing software performs poorly when ...
research
10/30/2021

EventNarrative: A large-scale Event-centric Dataset for Knowledge Graph-to-Text Generation

We introduce EventNarrative, a knowledge graph-to-text dataset from publ...
research
10/12/2018

Important Attribute Identification in Knowledge Graph

The knowledge graph(KG) composed of entities with their descriptions and...
research
07/04/2018

Seq2RDF: An end-to-end application for deriving Triples from Natural Language Text

We present an end-to-end approach that takes unstructured textual input ...
research
08/28/2018

Bridging Knowledge Gaps in Neural Entailment via Symbolic Models

Most textual entailment models focus on lexical gaps between the premise...
research
06/08/2019

Seeing Things from a Different Angle: Discovering Diverse Perspectives about Claims

One key consequence of the information revolution is a significant incre...

Please sign up or login with your details

Forgot password? Click here to reset