Pangea: Monolithic Distributed Storage for Data Analytics

08/18/2018
by   Jia Zou, et al.
0

Storage and memory systems for modern data analytics are heavily layered, managing shared persistent data, cached data, and non- shared execution data in separate systems such as distributed file system like HDFS, in-memory file system like Alluxio and computation framework like Spark. Such layering introduces significant performance and management costs for copying data across layers redundantly and deciding proper resource allocation for all layers. In this paper we propose a single system called Pangea that can manage all data---both intermediate and long-lived data, and their buffer/caching, data placement optimization, and failure recovery---all in one monolithic storage system, without any layering. We present a detailed performance evaluation of Pangea and show that its performance compares favorably with several widely used layered systems such as Spark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2019

Persistent Buffer Management with Optimistic Consistency

Finding the best way to leverage non-volatile memory (NVM) on modern dat...
research
06/22/2022

A milestone for FaaS pipelines; object storage vs VM-driven data exchange

Serverless functions provide high levels of parallelism, short startup t...
research
02/18/2020

Characterizing Synchronous Writes in Stable Memory Devices

Distributed algorithms that operate in the fail-recovery model rely on t...
research
08/10/2021

Metall: A Persistent Memory Allocator For Data-Centric Analytics

Data analytics applications transform raw input data into analytics-spec...
research
11/20/2021

Freeing Compute Caches from Serialization and Garbage Collection in Managed Big Data Analytics

Managed analytics frameworks (e.g., Spark) cache intermediate results in...
research
08/31/2012

Performance Evaluation of Flash File Systems

Today, flash memory are strongly used in the embedded system domain. NAN...
research
02/26/2020

Black or White? How to Develop an AutoTuner for Memory-based Analytics [Extended Version]

There is a lot of interest today in building autonomous (or, self-drivin...

Please sign up or login with your details

Forgot password? Click here to reset