DeepAI AI Chat
Log In Sign Up

GraphZeppelin: Storage-Friendly Sketching for Connected Components on Dynamic Graph Streams

by   David Tench, et al.

Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketches) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for a lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.


page 1

page 2

page 3

page 4


Determining 4-edge-connected components in linear time

In this work, we present the first linear time deterministic algorithm c...

Connected Components for Infinite Graph Streams: Theory and Practice

Motivated by the properties of unending real-world cybersecurity streams...

Strongly Connected Components in Stream Graphs: Computation and Experimentations

Stream graphs model highly dynamic networks in which nodes and/or links ...

Building Graphs at a Large Scale: Union Find Shuffle

Large scale graph processing using distributed computing frameworks is b...

Computing Graph Descriptors on Edge Streams

Graph feature extraction is a fundamental task in graphs analytics. Usin...

Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism

Graph processing has become an important part of various areas of comput...