Distributed Dependency Discovery

03/12/2019
by   Hemant Saxena, et al.
0

We analyze the problem of discovering dependencies from distributed big data. Existing (non-distributed) algorithms focus on minimizing computation by pruning the search space of possible dependencies. However, distributed algorithms must also optimize communication costs, especially in shared-nothing settings, leading to a more complex optimization space. To understand this space, we introduce six primitives shared by existing dependency discovery algorithms, corresponding to data processing steps separated by communication barriers. Through case studies, we show how the primitives allow us to analyze the design space and develop communication-optimized implementations. Finally, we support our analysis with an experimental evaluation on real datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/26/2021

Evaluating Serverless Architecture for Big Data Enterprise Applications

In this paper, we investigate serverless computing for performing large ...
research
11/28/2022

OpTree: An Efficient Algorithm for All-gather Operation in Optical Interconnect Systems

All-gather collective communication is one of the most important communi...
research
10/23/2017

Communication Efficient Checking of Big Data Operations

We propose fast probabilistic algorithms with low (i.e., sublinear in th...
research
05/06/2019

Errata Note: Discovering Order Dependencies through Order Compatibility

A number of extensions to the classical notion of functional dependencie...
research
10/08/2017

Discovery of Paradigm Dependencies

Missing and incorrect values often cause serious consequences. To deal w...
research
01/18/2021

DFOGraph: An I/O- and Communication-Efficient System for Distributed Fully-out-of-Core Graph Processing

With the magnitude of graph-structured data continually increasing, grap...
research
03/03/2021

Relate and Predict: Structure-Aware Prediction with Jointly Optimized Neural DAG

Understanding relationships between feature variables is one important w...

Please sign up or login with your details

Forgot password? Click here to reset