Automatic Text Summarization Approaches to Speed up Topic Model Learning Process

03/20/2017
by   Mohamed Morchid, et al.
0

The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete documents. The real advantage of such an approach is the processing time gain: we showed that the processing time can be drastically reduced using summarized documents (more than 60% in general). This study finally points out the differences between thematic representations of documents depending on the targeted languages such as English or latin languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2016

Topic Sensitive Neural Headline Generation

Neural models have recently been used in text summarization including he...
research
07/23/2022

A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling

With the advent and popularity of big data mining and huge text analysis...
research
11/25/2019

FLATM: A Fuzzy Logic Approach Topic Model for Medical Documents

One of the challenges for text analysis in medical domains is analyzing ...
research
02/25/2023

HADES: Homologous Automated Document Exploration and Summarization

This paper introduces HADES, a novel tool for automatic comparative docu...
research
02/14/2018

Classification of Scientific Papers With Big Data Technologies

Data sizes that cannot be processed by conventional data storage and ana...
research
07/25/2017

Un modèle pour la représentation des connaissances temporelles dans les documents historiques

Processing and publishing the data of the historical sciences in the sem...
research
11/24/2018

Novelty and Coverage in context-based information filtering

We present a collection of algorithms to filter a stream of documents in...

Please sign up or login with your details

Forgot password? Click here to reset