GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

03/28/2022
by   David Tench, et al.
0

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketches) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2021

Determining 4-edge-connected components in linear time

In this work, we present the first linear time deterministic algorithm c...
research
11/30/2021

Connected Components for Infinite Graph Streams: Theory and Practice

Motivated by the properties of unending real-world cybersecurity streams...
research
11/16/2020

Strongly Connected Components in Stream Graphs: Computation and Experimentations

Stream graphs model highly dynamic networks in which nodes and/or links ...
research
11/27/2020

Near-Optimal Algorithms for Reachability, Strongly-Connected Components and Shortest Paths in Partially Dynamic Digraphs

In this thesis, we present new techniques to deal with fundamental algor...
research
12/10/2020

Building Graphs at a Large Scale: Union Find Shuffle

Large scale graph processing using distributed computing frameworks is b...
research
06/28/2023

Finding the connected components of the graph using perturbations of the adjacency matrix

The problem of finding the connected components of a graph is considered...
research
09/02/2021

Computing Graph Descriptors on Edge Streams

Graph feature extraction is a fundamental task in graphs analytics. Usin...

Please sign up or login with your details

Forgot password? Click here to reset