Substance over Style: Document-Level Targeted Content Transfer

10/16/2020
by   Allison Hegel, et al.
0

Existing language models excel at writing from scratch, but many real-world scenarios require rewriting an existing document to fit a set of constraints. Although sentence-level rewriting has been fairly well-studied, little work has addressed the challenge of rewriting an entire document coherently. In this work, we introduce the task of document-level targeted content transfer and address it in the recipe domain, with a recipe as the document and a dietary restriction (such as vegan or dairy-free) as the targeted constraint. We propose a novel model for this task based on the generative pre-trained language model (GPT-2) and train on a large number of roughly-aligned recipe pairs (https://github.com/microsoft/document-level-targeted-content-transfer). Both automatic and human evaluations show that our model out-performs existing methods by generating coherent and diverse rewrites that obey the constraint while remaining close to the original document. Finally, we analyze our model's rewrites to assess progress toward the goal of making language generation more attuned to constraints that are substantive rather than stylistic.

READ FULL TEXT

page 17

page 18

page 19

research
10/20/2020

Neural Language Modeling for Contextualized Temporal Graph Generation

This paper presents the first study on using large-scale pre-trained lan...
research
06/14/2021

Automatic Document Sketching: Generating Drafts from Analogous Texts

The advent of large pre-trained language models has made it possible to ...
research
05/30/2023

DEPLAIN: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification

Text simplification is an intralingual translation task in which documen...
research
04/26/2017

Topically Driven Neural Language Model

Language models are typically applied at the sentence level, without acc...
research
10/28/2022

DORE: Document Ordered Relation Extraction based on Generative Framework

In recent years, there is a surge of generation-based information extrac...
research
09/02/2022

Exploiting Pretrained Biochemical Language Models for Targeted Drug Design

Motivation: The development of novel compounds targeting proteins of int...
research
10/20/2020

Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification

Much of modern day text simplification research focuses on sentence-leve...

Please sign up or login with your details

Forgot password? Click here to reset