Starling: A Scalable Query Engine on Cloud Function Services

11/26/2019
by   Matthew Perron, et al.
0

Much like on-premises systems, the natural choice for running database analytics workloads in the cloud is to provision a cluster of nodes to run a database instance. However, analytics workloads are often bursty or low volume, leaving clusters idle much of the time, meaning customers pay for compute resources even when unused. The ability of cloud function services, such as AWS Lambda or Azure Functions, to run small, fine granularity tasks make them appear to be a natural choice for query processing in such settings. But implementing an analytics system on cloud functions comes with its own set of challenges. These include managing hundreds of tiny stateless resource-constrained workers, handling stragglers, and shuffling data through opaque cloud services. In this paper we present Starling, a query execution engine built on cloud function services that employs number of techniques to mitigate these challenges, providing interactive query latency at a lower total cost than provisioned systems with low-to-moderate utilization. In particular, on a 1TB TPC-H dataset in cloud storage, Starling is less expensive than the best provisioned systems for workloads when queries arrive 1 minute apart or more. Starling also has lower latency than competing systems reading from cloud object stores and can scale to larger datasets.

READ FULL TEXT

page 10

page 11

research
05/23/2022

An Elastic Ephemeral Datastore using Cheap, Transient Cloud Resources

Spot instances are virtual machines offered at 60-90 reclaimed at any ti...
research
02/11/2021

Silentium! Run-Analyse-Eradicate the Noise out of the DB/OS Stack

When multiple tenants compete for resources, database performance tends ...
research
12/19/2021

An Experimental and Comparative Benchmark Study Examining Resource Utilization in Managed Hadoop Context

Transitioning cloud-based Hadoop from IaaS to PaaS, which are commercial...
research
05/22/2023

On-demand Container Loading in AWS Lambda

AWS Lambda is a serverless event-driven compute service, part of a categ...
research
12/09/2020

JANUS: Benchmarking Commercial and Open-Source Cloud and Edge Platforms for Object and Anomaly Detection Workloads

With diverse IoT workloads, placing compute and analytics close to where...
research
07/30/2018

To Ship or Not to (Function) Ship (Extended version)

Sampling is often used to reduce query latency for interactive big data ...
research
06/22/2022

A milestone for FaaS pipelines; object storage vs VM-driven data exchange

Serverless functions provide high levels of parallelism, short startup t...

Please sign up or login with your details

Forgot password? Click here to reset