
LoadBalancing for Parallel Delaunay Triangulations
Computing the Delaunay triangulation (DT) of a given point set in R^D is...
read it

SharedMemory Parallel Maximal Clique Enumeration from Static and Dynamic Graphs
Maximal Clique Enumeration (MCE) is a fundamental graph mining problem, ...
read it

Mining Contrasting QuasiClique Patterns
Mining dense quasicliques is a wellknown clustering task with applicat...
read it

Enumerating Topk QuasiCliques
Quasicliques are dense incomplete subgraphs of a graph that generalize ...
read it

Mining Large Quasicliques with Quality Guarantees from Vertex Neighborhoods
Mining dense subgraphs is an important primitive across a spectrum of gr...
read it

A Structureaware Approach for Efficient Graph Processing
With the advent of the big data, graph are processed in an iterative man...
read it

A generalpurpose hierarchical mesh partitioning method with node balancing strategies for largescale numerical simulations
Largescale parallel numerical simulations are essential for a wide rang...
read it
Scalable Mining of Maximal QuasiCliques: An AlgorithmSystem Codesign Approach
Given a userspecified minimum degree threshold γ, a γquasiclique is a subgraph where each vertex connects to at least γ fraction of the other vertices. Mining maximal quasicliques is notoriously expensive with the stateoftheart algorithm scaling only to small graphs with thousands of vertices. This has hampered its popularity in real applications involving big graphs. We developed a taskbased system called Gthinker for massively parallel graph mining, which is the first graph mining system that scales with the number of CPU cores. Gthinker provides a unique opportunity to scale the computeintensive quasiclique mining. This paper designs parallel algorithms for mining maximal quasicliques on Gthinker that scale to big graphs. Our algorithms follow the idea of divide and conquer which partitions the problem of mining a big graph into tasks that mine smaller subgraphs. However, we find that a direct application of Gthinker is insufficient due to the drastically different running time of different tasks that violates the original design assumption of Gthinker, requiring a system reforge. We also observe that the running time of a task is highly unpredictable solely from the features extracted from its subgraph, leading to difficulty in pinpoint expensive tasks to decompose for concurrent processing, and sizethreshold based partitioning underpartitions some tasks but overpartitions others, leading to bad load balancing and enormous task partitioning overheads. We address this issue by proposing a novel timedelayed divideandconquer strategy that strikes a balance between the workloads spent on actual mining and the cost of balancing the workloads. Extensive experiments verify that our Gthinker algorithm scales perfectly with the number of CPU cores, achieving over 300x speedup when running on a graph with over 1M vertices in a small cluster.
READ FULL TEXT
Comments
There are no comments yet.