CIAO: An Optimization Framework for Client-Assisted Data Loading

02/23/2021
by   Cong Ding, et al.
0

Data loading has been one of the most common performance bottlenecks for many big data applications, especially when they are running on inefficient human-readable formats, such as JSON or CSV. Parsing, validating, integrity checking and data structure maintenance are all computationally expensive steps in loading these formats. Regardless of these costs, many records may be filtered later during query evaluation due to highly selective predicates – resulting in wasted computation. Meanwhile, the computing power of client ends is typically not exploited. Here, we explore investing limited cycles of clients on prefiltering to accelerate data loading and enable data skipping for query execution. In this paper, we present CIAO, a tunable system to enable client cooperation with the server to enable efficient partial loading and data skipping for a given workload. We proposed an efficient algorithm that would select a near-optimal predicate set to push down within a given budget. Moreover, CIAO will address the trade-off between client cost and server savings by setting different budgets for different clients. We implemented CIAO and evaluated its performance on three real-world datasets. Our experimental results show that the system substantially accelerates data loading by up to 21x and query execution by up to 23x and improves end-to-end performance by up to 19x within a budget of 1.0 microseconds latency per record on clients.

READ FULL TEXT

page 1

page 9

research
11/16/2017

Cloud Data Auditing Using Proofs of Retrievability

Cloud servers offer data outsourcing facility to their clients. A client...
research
10/29/2019

Shielding Collaborative Learning: Mitigating Poisoning Attacks through Client-Side Detection

Collaborative learning allows multiple clients to train a joint model wi...
research
11/11/2018

ConcurORAM: High-Throughput Stateless Parallel Multi-Client ORAM

ConcurORAM is a parallel, multi-client ORAM that eliminates waiting for ...
research
08/23/2023

When MiniBatch SGD Meets SplitFed Learning:Convergence Analysis and Performance Evaluation

Federated learning (FL) enables collaborative model training across dist...
research
03/11/2021

Optimizing Fund Allocation for Game-based Verifiable Computation Outsourcing

This paper considers the setting where a cloud server services a static ...
research
09/04/2009

Assessing the Impact of Informedness on a Consultant's Profit

We study the notion of informedness in a client-consultant setting. Usin...
research
06/28/2021

Chat Room Using HTML, PHP, CSS, JS, AJAX

Earlier there was no mode of online communication between users. In big ...

Please sign up or login with your details

Forgot password? Click here to reset