Incremental Sparse TFIDF & Incremental Similarity with Bipartite Graphs

In this report, we experimented with several concepts regarding text streams analysis. We tested an implementation of Incremental Sparse TF-IDF (IS-TFIDF) and Incremental Cosine Similarity (ICS) with the use of bipartite graphs. We are using bipartite graphs - one type of node are documents, and the other type of nodes are words - to know what documents are affected with a word arrival at the stream (the neighbors of the word in the graph). Thus, with this information, we leverage optimized algorithms used for graph-based applications. The concept is similar to, for example, the use of hash tables or other computer science concepts used for fast access to information in memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2022

Contextualization for the Organization of Text Documents Streams

There has been a significant effort by the research community to address...
research
06/11/2019

Weighted, Bipartite, or Directed Stream Graphs for the Modeling of Temporal Networks

We recently introduced a formalism for the modeling of temporal networks...
research
12/22/2017

Estimating Node Similarity by Sampling Streaming Bipartite Graphs

Bipartite graph data increasingly occurs as a stream of edges that repre...
research
12/08/2018

Counting Butterfies from a Large Bipartite Graph Stream

We consider the estimation of properties on massive bipartite graph stre...
research
03/29/2019

Sparse graphs are near-bipartite

A multigraph G is near-bipartite if V(G) can be partitioned as I,F such ...
research
02/17/2022

Occupation similarity through bipartite graphs

Similarity between occupations is a crucial piece of information when ma...
research
10/22/2020

Cluster-and-Conquer: When Randomness Meets Graph Locality

K-Nearest-Neighbors (KNN) graphs are central to many emblematic data min...

Please sign up or login with your details

Forgot password? Click here to reset