Semantics of the Unwritten

04/05/2020
by   He Bai, et al.
0

The semantics of a text is manifested not only by what is read, but also by what is not read. In this article, we will study how those implicit "not read" information such as end-of-paragraph (EOP) and end-of-sequence (EOS) affect the quality of text generation. Transformer-based pretrained language models (LMs) have demonstrated the ability to generate long continuations with good quality. This model gives us a platform for the first time to demonstrate that paragraph layouts and text endings are also important components of human writing. Specifically, we find that pretrained LMs can generate better continuations by learning to generate the end of the paragraph (EOP) in the fine-tuning stage. Experimental results on English story generation show that EOP can lead to higher BLEU score and lower EOS perplexity. To further investigate the relationship between text ending and EOP, we conduct experiments with a self-collected Chinese essay dataset on Chinese-GPT2, a character level LM without paragraph breaker or EOS during pre-training. Experimental results show that the Chinese GPT2 can generate better essay endings with paragraph information. Experiments on both English stories and Chinese essays demonstrate that learning to end paragraphs can benefit the continuation generation with pretrained LMs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/09/2020

An Empirical Investigation of Pre-Trained Transformer Language Models for Open-Domain Dialogue Generation

We present an empirical investigation of pre-trained Transformer-based a...
research
12/15/2020

Writing Polishment with Simile: Task, Dataset and A Neural Approach

A simile is a figure of speech that directly makes a comparison, showing...
research
10/19/2022

Improving Chinese Story Generation via Awareness of Syntactic Dependencies and Semantics

Story generation aims to generate a long narrative conditioned on a give...
research
12/06/2021

Search and Learn: Improving Semantic Coverage for Data-to-Text Generation

Data-to-text generation systems aim to generate text descriptions based ...
research
04/05/2020

Machine Translation Pre-training for Data-to-Text Generation – A Case Study in Czech

While there is a large body of research studying deep learning methods f...
research
12/20/2022

Future Sight: Dynamic Story Generation with Large Pretrained Language Models

Recent advances in deep learning research, such as transformers, have bo...
research
10/06/2020

Investigating African-American Vernacular English in Transformer-Based Text Generation

The growth of social media has encouraged the written use of African Ame...

Please sign up or login with your details

Forgot password? Click here to reset