Topic Modeling Based Extractive Text Summarization

Text summarization is an approach for identifying important information present within text documents. This computational technique aims to generate shorter versions of the source text, by including only the relevant and salient information present within the source text. In this paper, we propose a novel method to summarize a text document by clustering its contents based on latent topics produced using topic modeling techniques and by generating extractive summaries for each of the identified text clusters. All extractive sub-summaries are later combined to generate a summary for any given source document. We utilize the lesser used and challenging WikiHow dataset in our approach to text summarization. This dataset is unlike the commonly used news datasets which are available for text summarization. The well-known news datasets present their most important information in the first few lines of their source texts, which make their summarization a lesser challenging task when compared to summarizing the WikiHow dataset. Contrary to these news datasets, the documents in the WikiHow dataset are written using a generalized approach and have lesser abstractedness and higher compression ratio, thus proposing a greater challenge to generate summaries. A lot of the current state-of-the-art text summarization techniques tend to eliminate important information present in source documents in the favor of brevity. Our proposed technique aims to capture all the varied information present in source documents. Although the dataset proved challenging, after performing extensive tests within our experimental setup, we have discovered that our model produces encouraging ROUGE results and summaries when compared to the other published extractive and abstractive text summarization models.

READ FULL TEXT
research
10/21/2021

CNewSum: A Large-scale Chinese News Summarization Dataset with Human-annotated Adequacy and Deducibility Level

Automatic text summarization aims to produce a brief but crucial summary...
research
06/07/2021

Neural Abstractive Unsupervised Summarization of Online News Discussions

Summarization has usually relied on gold standard summaries to train ext...
research
08/07/2021

Fine-tuning GPT-3 for Russian Text Summarization

Automatic summarization techniques aim to shorten and generalize informa...
research
08/15/2017

Automatic Summarization of Online Debates

Debate summarization is one of the novel and challenging research areas ...
research
11/02/2018

Abstractive Summarization of Reddit Posts with Multi-level Memory Networks

We address the problem of abstractive summarization in two directions: p...
research
05/31/2022

NEWTS: A Corpus for News Topic-Focused Summarization

Text summarization models are approaching human levels of fidelity. Exis...
research
06/26/2019

User-Oriented Summaries Using a PSO Based Scoring Optimization Method

Automatic text summarization tools have a great impact on many fields, s...

Please sign up or login with your details

Forgot password? Click here to reset