An Elastic Ephemeral Datastore using Cheap, Transient Cloud Resources

05/23/2022
by   Malte Brodmann, et al.
0

Spot instances are virtual machines offered at 60-90 reclaimed at any time, with only a short warning period. Spot instances have already been used to significantly reduce the cost of processing workloads in the cloud. However, leveraging spot instances to reduce the cost of stateful cloud applications is much more challenging, as the sudden preemptions lead to data loss. In this work, we propose leveraging spot instances to decrease the cost of ephemeral data management in distributed data analytics applications. We specifically target ephemeral data as this large class of data in modern analytics workloads has low durability requirements; if lost, the data can be regenerated by re-executing compute tasks. We design an elastic, distributed ephemeral datastore that handles node preemptions transparently to user applications and minimizes data loss by redistributing data during node preemption warning periods. We implement our elastic datastore on top of the Apache Crail datastore and evaluate the system with various workloads and VM types. By leveraging spot instances, we show that we can run TPC-DS queries with 60% lower cost compared to using on-demand VMs for the datastore, while only increasing end-to-end execution time by 2.1

READ FULL TEXT

page 4

page 5

research
11/10/2020

Scheduling Bag-of-Tasks in Clouds using Spot and Burstable Virtual Machines

Leading Cloud providers offer several types of Virtual Machines (VMs) in...
research
11/26/2019

Starling: A Scalable Query Engine on Cloud Function Services

Much like on-premises systems, the natural choice for running database a...
research
10/05/2022

Spot-on: A Checkpointing Framework for Fault-Tolerant Long-running Workloads on Cloud Spot Instances

Spot instances offer a cost-effective solution for applications running ...
research
07/25/2023

Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems

Many data analytic systems have adopted a newly emerging compute resourc...
research
05/11/2018

Peacock: Probe-Based Scheduling of Jobs by Rotating Between Elastic Queues

In this paper, we propose Peacock, a new distributed probe-based schedul...
research
10/10/2020

A Predictive Autoscaler for Elastic Batch Jobs

Large batch jobs such as Deep Learning, HPC and Spark require far more c...
research
08/27/2016

Effect of Human Learning on the Transient Performance of Cloud-based Tiered Applications

Cloud based tiered applications are increasingly becoming popular, be it...

Please sign up or login with your details

Forgot password? Click here to reset