To Ship or Not to (Function) Ship (Extended version)

07/30/2018
by   Feilong Liu, et al.
0

Sampling is often used to reduce query latency for interactive big data analytics. The established parallel data processing paradigm relies on function shipping, where a coordinator dispatches queries to worker nodes and then collects the results. The commoditization of high-performance networking makes data shipping possible, where the coordinator directly reads data in the workers' memory using RDMA while workers process other queries. In this work, we explore when to use function shipping or data shipping for interactive query processing with sampling. Whether function shipping or data shipping should be preferred depends on the amount of data transferred, the current CPU utilization, the sampling method and the number of queries executed over the data set. The results show that data shipping is up to 6.5x faster when performing clustered sampling with heavily-utilized workers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2021

Approximate Query Processing for Group-By Queries based on Conditional Generative Models

The Group-By query is an important kind of query, which is common and wi...
research
12/12/2022

Reinforced Approximate Exploratory Data Analysis

Exploratory data analytics (EDA) is a sequential decision making process...
research
11/09/2019

EntropyDB: A Probabilistic Approach to Approximate Query Processing

We present EntropyDB, an interactive data exploration system that uses a...
research
11/26/2019

Starling: A Scalable Query Engine on Cloud Function Services

Much like on-premises systems, the natural choice for running database a...
research
12/02/2019

Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure

The promise of ultimate elasticity and operational simplicity of serverl...
research
10/16/2019

Similarity Driven Approximation for Text Analytics

Text analytics has become an important part of business intelligence as ...
research
03/21/2020

A Synopses Data Engine for Interactive Extreme-Scale Analytics

In this work, we detail the design and structure of a Synopses Data Engi...

Please sign up or login with your details

Forgot password? Click here to reset