Bootstrapping Generators from Noisy Data

04/17/2018
by   Laura Perez-Beltrachini, et al.
0

A core step in statistical data-to-text generation concerns learning correspondences between structured data representations (e.g., facts in a database) and associated texts. In this paper we aim to bootstrap generators from large scale datasets where the data (e.g., DBPedia facts) and related texts (e.g., Wikipedia abstracts) are loosely aligned. We tackle this challenging task by introducing a special-purpose content selection mechanism. We use multi-instance learning to automatically discover correspondences between data and text pairs and show how these can be used to enhance the content signal while training an encoder-decoder architecture. Experimental results demonstrate that models trained with content-specific objectives improve upon a vanilla encoder-decoder which solely relies on soft attention.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/19/2023

DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion

Diffusion models have emerged as the new state-of-the-art family of deep...
research
09/10/2019

Select and Attend: Towards Controllable Content Selection in Text Generation

Many text generation tasks naturally contain two steps: content selectio...
research
07/21/2021

CL4AC: A Contrastive Loss for Audio Captioning

Automated Audio captioning (AAC) is a cross-modal translation task that ...
research
07/02/2020

Fact-based Text Editing

We propose a novel text editing task, referred to as fact-based text edi...
research
10/31/2018

Generating Texts with Integer Linear Programming

Concept-to-text generation typically employs a pipeline architecture, wh...
research
10/03/2020

Partially-Aligned Data-to-Text Generation with Distant Supervision

The Data-to-Text task aims to generate human-readable text for describin...

Please sign up or login with your details

Forgot password? Click here to reset