Contextualization for the Organization of Text Documents Streams

There has been a significant effort by the research community to address the problem of providing methods to organize documentation with the help of information Retrieval methods. In this report paper, we present several experiments with some stream analysis methods to explore streams of text documents. We use only dynamic algorithms to explore, analyze, and organize the flux of text documents. This document shows a case study with developed architectures of a Text Document Stream Organization, using incremental algorithms like Incremental TextRank, and IS-TFIDF. Both these algorithms are based on the assumption that the mapping of text documents and their document-term matrix in lower-dimensional evolving networks provides faster processing when compared to batch algorithms. With this architecture, and by using FastText Embedding to retrieve similarity between documents, we compare methods with large text datasets and ground truth evaluation of clustering capacities. The datasets used were Reuters and COVID-19 emotions. The results provide a new view for the contextualization of similarity when approaching flux of documents organization tasks, based on the similarity between documents in the flux, and by using mentioned algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2018

Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

In this report, we experimented with several concepts regarding text str...
research
03/06/2022

A Comparative Study on Data Representation to Categorize Text Documents

In the modern world text documents play an important role in most of the...
research
01/22/2019

Adapting The Secretary Hiring Problem for Optimal Hot-Cold Tier Placement under Top-K Workloads

Top-K queries are an established heuristic in information retrieval. Thi...
research
10/21/2016

Automated Big Text Security Classification

In recent years, traditional cybersecurity safeguards have proven ineffe...
research
01/12/2006

Automatic Detection of Trends in Dynamical Text: An Evolutionary Approach

This paper presents an evolutionary algorithm for modeling the arrival d...
research
02/21/2016

Interactive Storytelling over Document Collections

Storytelling algorithms aim to 'connect the dots' between disparate docu...
research
03/06/2022

Evaluation of Partition-Based Text Clustering Techniques to Categorize Indic Language Documents

Wide availability of electronic data has led to the vast interest in tex...

Please sign up or login with your details

Forgot password? Click here to reset