Biclustering and Boolean Matrix Factorization in Data Streams

12/05/2020
by   Stefan Neumann, et al.
0

We study the clustering of bipartite graphs and Boolean matrix factorization in data streams. We consider a streaming setting in which the vertices from the left side of the graph arrive one by one together with all of their incident edges. We provide an algorithm that, after one pass over the stream, recovers the set of clusters on the right side of the graph using sublinear space; to the best of our knowledge, this is the first algorithm with this property. We also show that after a second pass over the stream, the left clusters of the bipartite graph can be recovered and we show how to extend our algorithm to solve the Boolean matrix factorization problem (by exploiting the correspondence of Boolean matrices and bipartite graphs). We evaluate an implementation of the algorithm on synthetic data and on real-world data. On real-world datasets the algorithm is orders of magnitudes faster than a static baseline algorithm while providing quality results within a factor 2 of the baseline algorithm. Our algorithm scales linearly in the number of edges in the graph. Finally, we analyze the algorithm theoretically and provide sufficient conditions under which the algorithm recovers a set of planted clusters under a standard random graph model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/17/2019

Boolean matrix factorization meets consecutive ones property

Boolean matrix factorization is a natural and a popular technique for su...
research
08/12/2023

Latent Random Steps as Relaxations of Max-Cut, Min-Cut, and More

Algorithms for node clustering typically focus on finding homophilous st...
research
01/28/2019

From-Below Boolean Matrix Factorization Algorithm Based on MDL

During the past few years Boolean matrix factorization (BMF) has become ...
research
12/08/2018

Counting Butterfies from a Large Bipartite Graph Stream

We consider the estimation of properties on massive bipartite graph stre...
research
12/24/2018

The content correlation of multiple streaming edges

We study how to detect clusters in a graph defined by a stream of edges,...
research
03/06/2019

topFiberM: Scalable and Efficient Boolean Matrix Factorization

Matrix Factorization has many applications such as clustering. When the ...
research
10/25/2021

SSMF: Shifting Seasonal Matrix Factorization

Given taxi-ride counts information between departure and destination loc...

Please sign up or login with your details

Forgot password? Click here to reset