Communication-Efficient (Weighted) Reservoir Sampling

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed streams presented as a sequence of mini-batches of items. We present and analyze a fully distributed algorithm for both problems. An experimental evaluation of weighted sampling on up to 256 nodes shows good speedups, while theoretical analysis promises good scaling to much larger machines.

READ FULL TEXT
research
03/01/2019

Parallel Weighted Random Sampling

Data structures for efficient sampling from a set of weighted items are ...
research
09/17/2019

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

Gradient boosting decision tree (GBDT) is a powerful and widely-used mac...
research
04/08/2019

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distribu...
research
09/13/2017

Approximate Integration of streaming data

We approximate analytic queries on streaming data with a weighted reserv...
research
01/19/2021

Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks

Training Graph Convolutional Networks (GCNs) is expensive as it needs to...
research
10/17/2016

Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online

We consider the problem of sampling n numbers from the range {1,...,N} w...
research
01/30/2020

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...

Please sign up or login with your details

Forgot password? Click here to reset