Parallel Index-Based Structural Graph Clustering and Its Approximation

12/21/2020
by   Tom Tseng, et al.
0

SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share work among queries with different SCAN parameter settings. Since users of SCAN often explore many parameter settings to find good clusterings, it is worthwhile to precompute an index that speeds up queries. This paper presents a practical and provably efficient parallel index-based SCAN algorithm based on GS*-Index, a recent sequential algorithm. Our parallel algorithm improves upon the asymptotic work of the sequential algorithm by using integer sorting. It is also highly parallel, achieving logarithmic span (parallel time) for both index construction and clustering queries. Furthermore, we apply locality-sensitive hashing (LSH) to design a novel approximate SCAN algorithm and prove guarantees for its clustering behavior. We present an experimental evaluation of our algorithms on large real-world graphs. On a 48-core machine with two-way hyper-threading, our parallel index construction achieves 50–151× speedup over the construction of GS*-Index. In fact, even on a single thread, our index construction algorithm is faster than GS*-Index. Our parallel index query implementation achieves 5–32× speedup over GS*-Index queries across a range of SCAN parameter values, and our implementation is always faster than ppSCAN, a state-of-the-art parallel SCAN algorithm. Moreover, our experiments show that applying LSH results in faster index construction while maintaining good clustering quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2023

Parallel Algorithms for Hierarchical Nucleus Decomposition

Nucleus decompositions have been shown to be a useful tool for finding d...
research
09/01/2020

ParIS+: Data Series Indexing on Multi-Core Architectures

Data series similarity search is a core operation for several data serie...
research
12/02/2019

GPU Algorithm for Earliest Arrival Time Problem in Public Transport Networks

Given a temporal graph G, a source vertex s, and a departure time at sou...
research
06/07/2021

Parallel Batch-Dynamic k-Core Decomposition

Maintaining a k-core decomposition quickly in a dynamic graph is an impo...
research
06/23/2022

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

Obtaining scalable algorithms for hierarchical agglomerative clustering ...
research
07/23/2019

BPPSA: Scaling Back-propagation by Parallel Scan Algorithm

In an era when the performance of a single compute device plateaus, soft...
research
11/07/2022

visClust: A visual clustering algorithm based on orthogonal projections

We present a novel clustering algorithm, visClust, that is based on lowe...

Please sign up or login with your details

Forgot password? Click here to reset