A milestone for FaaS pipelines; object storage vs VM-driven data exchange

06/22/2022
by   Germán T. Eizaguirre, et al.
0

Serverless functions provide high levels of parallelism, short startup times, and "pay-as-you-go" billing. These attributes make them a natural substrate for data analytics workflows. However, the impossibility of direct communication between functions makes the execution of workflows challenging. The current practice to share intermediate data among functions is through remote object storage (e.g., IBM COS). Contrary to conventional wisdom, the performance of object storage is not well understood. For instance, object storage can even be superior to other simpler approaches like the execution of shuffle stages (e.g., GroupBy) inside powerful VMs to avoid all-to-all transfers between functions. Leveraging a genomics pipeline, we show that object storage is a reasonable choice for data passing when the appropriate number of functions is used in shuffling stages.

READ FULL TEXT
research
08/18/2018

Pangea: Monolithic Distributed Storage for Data Analytics

Storage and memory systems for modern data analytics are heavily layered...
research
04/14/2023

GreedyGD: Enhanced Generalized Deduplication for Direct Analytics in IoT

Exponential growth in the amount of data generated by the Internet of Th...
research
07/05/2022

Learnings from an Under the Hood Analysis of an Object Storage Node IO Stack

Conventional object-stores are built on top of traditional OS storage st...
research
11/26/2019

Starling: A Scalable Query Engine on Cloud Function Services

Much like on-premises systems, the natural choice for running database a...
research
05/22/2017

Liquid Cloud Storage

A liquid system provides durable object storage based on spreading redun...
research
03/06/2023

Data management and execution systems for the Rubin Observatory Science Pipelines

We present the Rubin Observatory system for data storage/retrieval and p...
research
11/04/2021

Revisiting Active Object Stores: Bringing Data Locality to the Limit With NVM

Object stores are widely used software stacks that achieve excellent sca...

Please sign up or login with your details

Forgot password? Click here to reset