Faster motif counting via succinct color coding and adaptive sampling

09/04/2020
by   Marco Bressan, et al.
0

We address the problem of computing the distribution of induced connected subgraphs, aka graphlets or motifs, in large graphs. The current state-of-the-art algorithms estimate the motif counts via uniform sampling, by leveraging the color coding technique by Alon, Yuster and Zwick. In this work we extend the applicability of this approach, by introducing a set of algorithmic optimizations and techniques that reduce the running time and space usage of color coding and improve the accuracy of the counts. To this end, we first show how to optimize color coding to efficiently build a compact table of a representative subsample of all graphlets in the input graph. For 8-node motifs, we can build such a table in one hour for a graph with 65M nodes and 1.8B edges, which is 2000 times larger than the state of the art. We then introduce a novel adaptive sampling scheme that breaks the “additive error barrier” of uniform sampling, guaranteeing multiplicative approximations instead of just additive ones. This allows us to count not only the most frequent motifs, but also extremely rare ones. For instance, on one graph we accurately count nearly 10.000 distinct 8-node motifs whose relative frequency is so small that uniform sampling would literally take centuries to find them. Our results show that color coding is still the most promising approach to scalable motif counting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

Motivo: fast motif counting via succinct color coding and adaptive sampling

The randomized technique of color coding is behind state-of-the-art algo...
research
11/24/2019

Efficiently Counting Vertex Orbits of All 5-vertex Subgraphs, by EVOKE

Subgraph counting is a fundamental task in network analysis. Typically, ...
research
07/23/2020

Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal ε-uniform sampling, and perfect uniform sampling

We study the connected subgraph sampling problem: given an integer k ≥ 3...
research
01/18/2021

PRESTO: Simple and Scalable Sampling Techniques for the Rigorous Approximation of Temporal Motif Counts

The identification and counting of small graph patterns, called network ...
research
11/13/2022

Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph Counting on Fully Dynamic Graph Streams

As the popularity of graph data increases, there is a growing need to co...
research
06/24/2020

Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS

Clique and near-clique counts are important graph properties with applic...
research
04/07/2021

Polynomial Anonymous Dynamic Distributed Computing without a Unique Leader

Counting the number of nodes in Anonymous Dynamic Networks is enticing f...

Please sign up or login with your details

Forgot password? Click here to reset