Storage and Memory Characterization of Data Intensive Workloads for Bare Metal Cloud

05/22/2018
by   Hosein Mohammadi Makrani, et al.
0

As the cost-per-byte of storage systems dramatically decreases, SSDs are finding their ways in emerging cloud infrastructure. Similar trend is happening for main memory subsystem, as advanced DRAM technologies with higher capacity, frequency and number of channels are deploying for cloud-scale solutions specially for non-virtualized environment where cloud subscribers can exactly specify the configuration of underling hardware. Given the performance sensitivity of standard workloads to the memory hierarchy parameters, it is important to understand the role of memory and storage for data intensive workloads. In this paper, we investigate how the choice of DRAM (high-end vs low-end) impacts the performance of Hadoop, Spark, and MPI based Big Data workloads in the presence of different storage types on bare metal cloud. Through a methodical experimental setup, we have analyzed the impact of DRAM capacity, operating frequency, the number of channels, storage type, and scale-out factors on the performance of these popular frameworks. Based on micro-architectural analysis, we classified data-intensive workloads into three groups namely I/O bound, compute bound, and memory bound. The characterization results show that neither DRAM capacity, frequency, nor the number of channels play a significant role on the performance of all studied Hadoop workloads as they are mostly I/O bound. On the other hand, our results reveal that iterative tasks (e.g. machine learning) in Spark and MPI are benefiting from a high-end DRAM in particular high frequency and large number of channels, as they are memory or compute bound. Our results show that using SSD PCIe cannot shift the bottleneck from storage to memory, while it can change the workload behavior from I/O bound to compute bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2019

Understanding the Interactions of Workloads and DRAM Types: A Comprehensive Experimental Study

It has become increasingly difficult to understand the complex interacti...
research
11/30/2016

Memory Controller Design Under Cloud Workloads

This work studies the behavior of state-of-the-art memory controller des...
research
06/13/2021

Farview: Disaggregated Memory with Operator Off-loading for Database Engines

Cloud deployments disaggregate storage from compute, providing more flex...
research
11/20/2021

Freeing Compute Caches from Serialization and Garbage Collection in Managed Big Data Analytics

Managed analytics frameworks (e.g., Spark) cache intermediate results in...
research
03/01/2022

Pond: CXL-Based Memory Pooling Systems for Cloud Platforms

Public cloud providers seek to meet stringent performance requirements a...
research
06/07/2023

An Analytical Model-based Capacity Planning Approach for Building CSD-based Storage Systems

The data movement in large-scale computing facilities (from compute node...
research
10/26/2021

Endure: A Robust Tuning Paradigm for LSM Trees Under Workload Uncertainty

Log-Structured Merge trees (LSM trees) are increasingly used as the stor...

Please sign up or login with your details

Forgot password? Click here to reset