Distributed Data Summarization in Well-Connected Networks

08/01/2019
by   Hsin-Hao Su, et al.
0

We study distributed algorithms for some fundamental problems in data summarization. Given a communication graph G of n nodes each of which may hold a value initially, we focus on computing ∑_i=1^N g(f_i), where f_i is the number of occurrences of value i and g is some fixed function. This includes important statistics such as the number of distinct elements, frequency moments, and the empirical entropy of the data. In the CONGEST model, a simple adaptation from streaming lower bounds shows that it requires Ω̃(D+ n) rounds, where D is the diameter of the graph, to compute some of these statistics exactly. However, these lower bounds do not hold for graphs that are well-connected. We give an algorithm that computes ∑_i=1^N g(f_i) exactly in τ_G · 2^O(√( n)) rounds where τ_G is the mixing time of G. This also has applications in computing the top k most frequent elements. We demonstrate that there is a high similarity between the GOSSIP model and the CONGEST model in well-connected graphs. In particular, we show that each round of the GOSSIP model can be simulated almost-perfectly in Õ(τ_G rounds of the CONGEST model. To this end, we develop a new algorithm for the GOSSIP model that 1±ϵ approximates the p-th frequency moment F_p = ∑_i=1^N f_i^p in Õ(ϵ^-2 n^1-k/p) rounds, for p >2, when the number of distinct elements F_0 is at most O(n^1/(k-1)). This result can be translated back to the CONGEST model with a factor Õ(τ_G) blow-up in the number of rounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2020

Distributed Lower Bounds for Ruling Sets

Given a graph G = (V,E), an (α, β)-ruling set is a subset S ⊆ V such tha...
research
01/13/2021

Round-Competitive Algorithms for Uncertainty Problems with Parallel Queries

The area of computing with uncertainty considers problems where some inf...
research
07/28/2019

Distributed Approximation Algorithms for Steiner Tree in the CONGESTED CLIQUE

The Steiner tree problem is one of the fundamental and classical problem...
research
04/18/2022

Sleeping is Superefficient: MIS in Exponentially Better Awake Complexity

Maximal Independent Set (MIS) is one of the central and most well-studie...
research
01/21/2019

On the Radius of Nonsplit Graphs and Information Dissemination in Dynamic Networks

A nonsplit graph is a directed graph where each pair of nodes has a comm...
research
08/09/2018

Longest Increasing Subsequence under Persistent Comparison Errors

We study the problem of computing a longest increasing subsequence in a ...
research
06/12/2019

Collaborative Broadcast in O(log log n) Rounds

We consider the multihop broadcasting problem for n nodes placed uniform...

Please sign up or login with your details

Forgot password? Click here to reset