Lambada: Interactive Data Analytics on Cold Data using Serverless Cloud Infrastructure

12/02/2019
by   Ingo Müller, et al.
0

The promise of ultimate elasticity and operational simplicity of serverless computing has recently lead to an explosion of research in this area. In the context of data analytics, the concept sounds appealing, but due to the limitations of current offerings, there is no consensus yet on whether or not this approach is technically and economically viable. In this paper, we identify interactive data analytics on cold data as a use case where serverless computing excels. We design and implement Lambada, a system following a purely serverless architecture, in order to illustrate when and how serverless computing should be employed for data analytics. We propose several system components that overcome the previously known limitations inherent in the serverless paradigm as well as additional ones we identify in this work. We can show that, thanks to careful design, a serverless query processing system can be at the same time one order of magnitude faster and two orders of magnitude cheaper compared to commercial Query-as-a-Service systems, the only alternative with similar operational simplicity.

READ FULL TEXT

page 6

page 10

research
09/12/2019

Simple-ML: Towards a Framework for Semantic Data Analytics Workflows

In this paper we present the Simple-ML framework that we develop to supp...
research
04/24/2018

In-Browser Split-Execution Support for Interactive Analytics in the Cloud

The canonical analytics architecture today consists of a browser connect...
research
07/26/2019

ServerMix: Tradeoffs and Challenges of Serverless Data Analytics

Serverless computing has become very popular today since it largely simp...
research
04/07/2020

Modularis: Modular Data Analytics for Hardware, Software, and Platform Heterogeneity

Today's data analytics displays an overwhelming diversity along many dim...
research
02/04/2019

Declarative Data Analytics: a Survey

The area of declarative data analytics explores the application of the d...
research
07/30/2018

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
research
05/09/2018

RHEEMix in the Data Jungle -- A Cross-Platform Query Optimizer --

In pursuit of efficient and scalable data analytics, the insight that "o...

Please sign up or login with your details

Forgot password? Click here to reset