TegFormer: Topic-to-Essay Generation with Good Topic Coverage and High Text Coherence

12/27/2022
by   Wang Qi, et al.
3

Creating an essay based on a few given topics is a challenging NLP task. Although several effective methods for this problem, topic-to-essay generation, have appeared recently, there is still much room for improvement, especially in terms of the coverage of the given topics and the coherence of the generated text. In this paper, we propose a novel approach called TegFormer which utilizes the Transformer architecture where the encoder is enriched with domain-specific contexts while the decoder is enhanced by a large-scale pre-trained language model. Specifically, a Topic-Extension layer capturing the interaction between the given topics and their domain-specific contexts is plugged into the encoder. Since the given topics are usually concise and sparse, such an additional layer can bring more topic-related semantics in to facilitate the subsequent natural language generation. Moreover, an Embedding-Fusion module that combines the domain-specific word embeddings learnt from the given corpus and the general-purpose word embeddings provided by a GPT-2 model pre-trained on massive text data is integrated into the decoder. Since GPT-2 is at a much larger scale, it contains a lot more implicit linguistic knowledge which would help the decoder to produce more grammatical and readable text. Extensive experiments have shown that the pieces of text generated by TegFormer have better topic coverage and higher text coherence than those from SOTA topic-to-essay techniques, according to automatic and human evaluations. As revealed by ablation studies, both the Topic-Extension layer and the Embedding-Fusion module contribute substantially to TegFormer's performance advantage.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2019

Multi-view and Multi-source Transfers in Neural Topic Modeling with Pretrained Topic and Word Embeddings

Though word embeddings and topics are complementary representations, sev...
research
10/05/2020

Acrostic Poem Generation

We propose a new task in the area of computational creativity: acrostic ...
research
09/14/2019

Multi-view and Multi-source Transfers in Neural Topic Modeling

Though word embeddings and topics are complementary representations, sev...
research
12/12/2022

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a complete...
research
12/24/2021

Analyzing Scientific Publications using Domain-Specific Word Embedding and Topic Modelling

The scientific world is changing at a rapid pace, with new technology be...
research
04/05/2018

Not just about size - A Study on the Role of Distributed Word Representations in the Analysis of Scientific Publications

The emergence of knowledge graphs in the scholarly communication domain ...
research
06/12/2023

Enhancing Topic Extraction in Recommender Systems with Entropy Regularization

In recent years, many recommender systems have utilized textual data for...

Please sign up or login with your details

Forgot password? Click here to reset