Partially-Aligned Data-to-Text Generation with Distant Supervision

10/03/2020
by   Zihao Fu, et al.
0

The Data-to-Text task aims to generate human-readable text for describing some given structured data enabling more interpretability. However, the typical generation task is confined to a few particular domains since it requires well-aligned data which is difficult and expensive to obtain. Using partially-aligned data is an alternative way of solving the dataset scarcity problem. This kind of data is much easier to obtain since it can be produced automatically. However, using this kind of data induces the over-generation problem posing difficulties for existing models, which tends to add unrelated excerpts during the generation procedure. In order to effectively utilize automatically annotated partially-aligned datasets, we extend the traditional generation task to a refined task called Partially-Aligned Data-to-Text Generation (PADTG) which is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains. To tackle this new task, we propose a novel distant supervision generation framework. It firstly estimates the input data's supportiveness for each target word with an estimator and then applies a supportiveness adaptor and a rebalanced beam search to harness the over-generation problem in the training and generation phases respectively. We also contribute a partially-aligned dataset (The data and source code of this paper can be obtained from https://github.com/fuzihaofzh/distant_supervision_nlg by sampling sentences from Wikipedia and automatically extracting corresponding KB triples for each sentence from Wikidata. The experimental results show that our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.

READ FULL TEXT
research
04/13/2021

From Solving a Problem Boldly to Cutting the Gordian Knot: Idiomatic Text Generation

We study a new application for text generation – idiomatic sentence gene...
research
09/26/2022

Informative Text Generation from Knowledge Triples

As the development of the encoder-decoder architecture, researchers are ...
research
05/24/2023

Faithful Low-Resource Data-to-Text Generation through Cycle Training

Methods to generate text from structured data have advanced significantl...
research
07/06/2020

DART: Open-Domain Structured Data Record to Text Generation

We introduce DART, a large dataset for open-domain structured data recor...
research
08/14/2023

Can Knowledge Graphs Simplify Text?

Knowledge Graph (KG)-to-Text Generation has seen recent improvements in ...
research
12/29/2020

A Theoretical Analysis of the Repetition Problem in Text Generation

Text generation tasks, including translation, summarization, language mo...
research
04/17/2018

Bootstrapping Generators from Noisy Data

A core step in statistical data-to-text generation concerns learning cor...

Please sign up or login with your details

Forgot password? Click here to reset