Model Criticism for Long-Form Text Generation

10/16/2022
by   Yuntian Deng, et al.
0

Language models have demonstrated the ability to generate highly fluent text; however, it remains unclear whether their output retains coherent high-level structure (e.g., story progression). Here, we propose to apply a statistical tool, model criticism in latent space, to evaluate the high-level structure of the generated text. Model criticism compares the distributions between real and generated data in a latent space obtained according to an assumptive generative process. Different generative processes identify specific failure modes of the underlying model. We perform experiments on three representative aspects of high-level discourse – coherence, coreference, and topicality – and find that transformer-based language models are able to capture topical structures but have a harder time maintaining structural coherence or modeling coreference.

READ FULL TEXT

page 8

page 20

page 24

research
05/19/2021

Long Text Generation by Modeling Sentence-Level and Discourse-Level Coherence

Generating long and coherent text is an important but challenging task, ...
research
11/18/2021

How much do language models copy from their training data? Evaluating linguistic novelty in text generation using RAVEN

Current language models can generate high-quality text. Are they simply ...
research
05/08/2023

Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer

Large Language Models (LLMs), such as the Generative Pretrained Transfor...
research
10/15/2021

Boosting coherence of language models

Naturality of long-term information structure – coherence – remains a ch...
research
10/19/2022

Language Detoxification with Attribute-Discriminative Latent Space

Transformer-based Language Models (LMs) achieve remarkable performances ...
research
10/14/2020

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries

Long-text generation remains a challenge. The difficulty of generating c...
research
08/30/2019

Linguistic Versus Latent Relations for Modeling Coherent Flow in Paragraphs

Generating a long, coherent text such as a paragraph requires a high-lev...

Please sign up or login with your details

Forgot password? Click here to reset