Communication Efficient Checking of Big Data Operations

We propose fast probabilistic algorithms with low (i.e., sublinear in the input size) communication volume to check the correctness of operations in Big Data processing frameworks and distributed databases. Our checkers cover many of the commonly used operations, including sum, average, median, and minimum aggregation, as well as sorting, union, merge, and zip. An experimental evaluation of our implementation in Thrill (Bingmann et al., 2016) confirms the low overhead and high failure detection rate predicted by theoretical analysis.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/06/2017

A Survey on Geographically Distributed Big-Data Processing using MapReduce

Hadoop and Spark are widely used distributed processing frameworks for l...
research
08/05/2021

An Abstract View of Big Data Processing Programs

This paper proposes a model for specifying data flow based parallel data...
research
03/25/2021

Understanding the Challenges and Assisting Developers with Developing Spark Applications

To process data more efficiently, big data frameworks provide data abstr...
research
03/12/2019

Distributed Dependency Discovery

We analyze the problem of discovering dependencies from distributed big ...
research
01/31/2020

Similarità per la ricerca del dominio di una frase

English. This document aims to study the best algorithms to verify the b...
research
05/23/2020

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications ru...
research
08/18/2020

Addestramento con Dataset Sbilanciati

English. The following document pursues the objective of comparing some ...

Please sign up or login with your details

Forgot password? Click here to reset