Communication-Efficient (Weighted) Reservoir Sampling

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed streams presented as a sequence of mini-batches of items. We present and analyze a fully distributed algorithm for both problems. An experimental evaluation of weighted sampling on up to 256 nodes shows good speedups, while theoretical analysis promises good scaling to much larger machines.


Parallel Weighted Random Sampling

Data structures for efficient sampling from a set of weighted items are ...

Communication-Efficient Weighted Sampling and Quantile Summary for GBDT

Gradient boosting decision tree (GBDT) is a powerful and widely-used mac...

Weighted Reservoir Sampling from Distributed Streams

We consider message-efficient continuous random sampling from a distribu...

Approximate Integration of streaming data

We approximate analytic queries on streaming data with a weighted reserv...

Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks

Training Graph Convolutional Networks (GCNs) is expensive as it needs to...

Efficient Random Sampling - Parallel, Vectorized, Cache-Efficient, and Online

We consider the problem of sampling n numbers from the range {1,...,N} w...

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. A...

Please sign up or login with your details

Forgot password? Click here to reset