Scalable Mining of Maximal Quasi-Cliques: An Algorithm-System Codesign Approach

04/30/2020
by   Guimu Guo, et al.
0

Given a user-specified minimum degree threshold γ, a γ-quasi-clique is a subgraph where each vertex connects to at least γ fraction of the other vertices. Mining maximal quasi-cliques is notoriously expensive with the state-of-the-art algorithm scaling only to small graphs with thousands of vertices. This has hampered its popularity in real applications involving big graphs. We developed a task-based system called G-thinker for massively parallel graph mining, which is the first graph mining system that scales with the number of CPU cores. G-thinker provides a unique opportunity to scale the compute-intensive quasi-clique mining. This paper designs parallel algorithms for mining maximal quasi-cliques on G-thinker that scale to big graphs. Our algorithms follow the idea of divide and conquer which partitions the problem of mining a big graph into tasks that mine smaller subgraphs. However, we find that a direct application of G-thinker is insufficient due to the drastically different running time of different tasks that violates the original design assumption of G-thinker, requiring a system reforge. We also observe that the running time of a task is highly unpredictable solely from the features extracted from its subgraph, leading to difficulty in pinpoint expensive tasks to decompose for concurrent processing, and size-threshold based partitioning under-partitions some tasks but over-partitions others, leading to bad load balancing and enormous task partitioning overheads. We address this issue by proposing a novel time-delayed divide-and-conquer strategy that strikes a balance between the workloads spent on actual mining and the cost of balancing the workloads. Extensive experiments verify that our G-thinker algorithm scales perfectly with the number of CPU cores, achieving over 300x speedup when running on a graph with over 1M vertices in a small cluster.

READ FULL TEXT
research
02/20/2019

Load-Balancing for Parallel Delaunay Triangulations

Computing the Delaunay triangulation (DT) of a given point set in R^D is...
research
01/30/2020

Shared-Memory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs

Maximal Clique Enumeration (MCE) is a fundamental graph mining problem, ...
research
05/23/2023

Fast Maximal Quasi-clique Enumeration: A Pruning and Branching Co-Design Approach

Mining cohesive subgraphs from a graph is a fundamental problem in graph...
research
10/03/2018

Mining Contrasting Quasi-Clique Patterns

Mining dense quasi-cliques is a well-known clustering task with applicat...
research
08/28/2018

Enumerating Top-k Quasi-Cliques

Quasi-cliques are dense incomplete subgraphs of a graph that generalize ...
research
08/18/2020

Mining Large Quasi-cliques with Quality Guarantees from Vertex Neighborhoods

Mining dense subgraphs is an important primitive across a spectrum of gr...
research
12/28/2017

ASYMP: Fault-tolerant Mining of Massive Graphs

We present ASYMP, a distributed graph processing system developed for th...

Please sign up or login with your details

Forgot password? Click here to reset