How to Count Triangles, without Seeing the Whole Graph

06/22/2020
by   Suman K. Bera, et al.
0

Triangle counting is a fundamental problem in the analysis of large graphs. There is a rich body of work on this problem, in varying streaming and distributed models, yet all these algorithms require reading the whole input graph. In many scenarios, we do not have access to the whole graph, and can only sample a small portion of the graph (typically through crawling). In such a setting, how can we accurately estimate the triangle count of the graph? We formally study triangle counting in the random walk access model introduced by Dasgupta et al (WWW '14) and Chierichetti et al (WWW '16). We have access to an arbitrary seed vertex of the graph, and can only perform random walks. This model is restrictive in access and captures the challenges of collecting real-world graphs. Even sampling a uniform random vertex is a hard task in this model. Despite these challenges, we design a provable and practical algorithm, TETRIS, for triangle counting in this model. TETRIS is the first provably sublinear algorithm (for most natural parameter settings) that approximates the triangle count in the random walk model, for graphs with low mixing time. Our result builds on recent advances in the theory of sublinear algorithms. The final sample built by TETRIS is a careful mix of random walks and degree-biased sampling of neighborhoods. Empirically, TETRIS accurately counts triangles on a variety of large graphs, getting estimates within 5% relative error by looking at 3% of the number of edges.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2020

How the Degeneracy Helps for Triangle Counting in Graph Streams

We revisit the well-studied problem of triangle count estimation in grap...
research
12/07/2022

DeMEtRIS: Counting (near)-Cliques by Crawling

We study the problem of approximately counting cliques and near cliques ...
research
09/04/2017

Estimating graph parameters via random walks with restarts

In this paper we discuss the problem of estimating graph parameters from...
research
02/23/2018

Estimating Graphlet Statistics via Lifting

Exploratory analysis over network data is often limited by our ability t...
research
09/10/2017

WRS: Waiting Room Sampling for Accurate Triangle Counting in Real Graph Streams

If we cannot store all edges in a graph stream, which edges should we st...
research
10/07/2018

Graphlet Count Estimation via Convolutional Neural Networks

Graphlets are defined as k-node connected induced subgraph patterns. For...
research
10/24/2017

Provable and practical approximations for the degree distribution using sublinear graph samples

The degree distribution is one of the most fundamental properties used i...

Please sign up or login with your details

Forgot password? Click here to reset