Is Big Data Performance Reproducible in Modern Cloud Networks?

12/19/2019
by   Alexandru Uta, et al.
0

Performance variability has been acknowledged as a problem for over a decade by cloud practitioners and performance engineers. Yet, our survey of top systems conferences reveals that the research community regularly disregards variability when running experiments in the cloud. Focusing on networks, we assess the impact of variability on cloud-based big-data workloads by gathering traces from mainstream commercial clouds and private research clouds. Our data collection consists of millions of datapoints gathered while transferring over 9 petabytes of data. We characterize the network variability present in our data and show that, even though commercial cloud providers implement mechanisms for quality-of-service enforcement, variability still occurs, and is even exacerbated by such mechanisms and service provider policies. We show how big-data workloads suffer from significant slowdowns and lack predictability and replicability, even when state-of-the-art experimentation techniques are used. We provide guidelines for practitioners to reduce the volatility of big data performance, making experiments more repeatable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2019

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Apache Hive is an open-source relational database system for analytic bi...
research
07/27/2018

Greening Cloud-Enabled Big Data Storage Forensics: Syncany as a Case Study

The pervasive nature of cloud-enabled big data storage solutions introdu...
research
07/05/2018

A Comparative Study of Containers and Virtual Machines in Big Data Environment

Container technique is gaining increasing attention in recent years and ...
research
09/21/2018

S3BD: Secure Semantic Search over Encrypted Big Data in the Cloud

Cloud storage is a widely utilized service for both personal and enterpr...
research
09/21/2023

A Multi-faceted Analysis of the Performance Variability of Virtual Machines

Cloud computing and virtualization solutions allow one to rent the virtu...
research
12/13/2021

Meterstick: Benchmarking Performance Variability in Cloud and Self-hosted Minecraft-like Games Extended Technical Report

Due to increasing popularity and strict performance requirements, online...
research
03/04/2022

Benchmarking tunnel and encryption methodologies in cloud environments

The recent past has seen the adoption of multi-cloud deployments by ente...

Please sign up or login with your details

Forgot password? Click here to reset