TWAG: A Topic-Guided Wikipedia Abstract Generator

06/29/2021
by   Fangwei Zhu, et al.
0

Wikipedia abstract generation aims to distill a Wikipedia abstract from web sources and has met significant success by adopting multi-document summarization techniques. However, previous works generally view the abstract as plain text, ignoring the fact that it is a description of a certain entity and can be decomposed into different topics. In this paper, we propose a two-stage model TWAG that guides the abstract generation with topical information. First, we detect the topic of each input paragraph with a classifier trained on existing Wikipedia articles to divide input documents into different topics. Then, we predict the topic distribution of each abstract sentence, and decode the sentence from topic-aware representations with a Pointer-Generator network. We evaluate our model on the WikiCatSum dataset, and the results show that outperforms various existing baselines and is capable of generating comprehensive abstracts. Our code and dataset can be accessed at <https://github.com/THU-KEG/TWAG>

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2016

Topic Sensitive Neural Headline Generation

Neural models have recently been used in text summarization including he...
research
01/04/2016

Scalable Models for Computing Hierarchies in Information Networks

Information hierarchies are organizational structures that often used to...
research
01/30/2018

Generating Wikipedia by Summarizing Long Sequences

We show that generating English Wikipedia articles can be approached as ...
research
08/01/2023

CoSMo: A constructor specification language for Abstract Wikipedia's content selection process

Representing snippets of information abstractly is a task that needs to ...
research
10/14/2021

Hindsight: Posterior-guided training of retrievers for improved open-ended generation

Many text generation systems benefit from using a retriever to retrieve ...
research
09/23/2020

Crosslingual Topic Modeling with WikiPDA

We present Wikipedia-based Polyglot Dirichlet Allocation (WikiPDA), a cr...
research
09/02/2013

Scalable Probabilistic Entity-Topic Modeling

We present an LDA approach to entity disambiguation. Each topic is assoc...

Please sign up or login with your details

Forgot password? Click here to reset