Motivo: fast motif counting via succinct color coding and adaptive sampling

06/04/2019
by   Marco Bressan, et al.
0

The randomized technique of color coding is behind state-of-the-art algorithms for estimating graph motif counts. Those algorithms, however, are not yet capable of scaling well to very large graphs with billions of edges. In this paper we develop novel tools for the `motif counting via color coding' framework. As a result, our new algorithm, Motivo, is able to scale well to larger graphs while at the same time provide more accurate graphlet counts than ever before. This is achieved thanks to two types of improvements. First, we design new succinct data structures that support fast common color coding operations, and a biased coloring trick that trades accuracy versus running time and memory usage. These adaptations drastically reduce the time and memory requirements of color coding. Second, we develop an adaptive graphlet sampling strategy, based on a fractional set cover problem, that breaks the additive approximation barrier of standard sampling. This strategy gives multiplicative approximations for all graphlets at once, allowing us to count not only the most frequent graphlets but also extremely rare ones. To give an idea of the improvements, in 40 minutes Motivo counts 7-nodes motifs on a graph with 65M nodes and 1.8B edges; this is 30 and 500 times larger than the state of the art, respectively in terms of nodes and edges. On the accuracy side, in one hour Motivo produces accurate counts of ≈ 10.000 distinct 8-node motifs on graphs where state-of-the-art algorithms fail even to find the second most frequent motif. Our method requires just a high-end desktop machine. These results show how color coding can bring motif mining to the realm of truly massive graphs using only ordinary hardware.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/04/2020

Faster motif counting via succinct color coding and adaptive sampling

We address the problem of computing the distribution of induced connecte...
research
10/01/2018

A sampling framework for counting temporal motifs

Pattern counting in graphs is fundamental to network science tasks, and ...
research
10/24/2019

Scaling Betweenness Approximation to Billions of Edges by MPI-based Adaptive Sampling

Betweenness centrality is one of the most popular vertex centrality meas...
research
11/13/2022

Reinforcement Learning Enhanced Weighted Sampling for Accurate Subgraph Counting on Fully Dynamic Graph Streams

As the popularity of graph data increases, there is a growing need to co...
research
06/29/2021

Few-Shot Electronic Health Record Coding through Graph Contrastive Learning

Electronic health record (EHR) coding is the task of assigning ICD codes...
research
02/10/2023

Characterization of Simplicial Complexes by Counting Simplets Beyond Four Nodes

Simplicial complexes are higher-order combinatorial structures which hav...
research
08/29/2019

Efficient Implementation of Color Coding Algorithm for Subgraph Isomorphism Problem

We consider the subgraph isomorphism problem where, given two graphs G (...

Please sign up or login with your details

Forgot password? Click here to reset