A Comparative Study of Containers and Virtual Machines in Big Data Environment

07/05/2018
by   Qi Zhang, et al.
0

Container technique is gaining increasing attention in recent years and has become an alternative to traditional virtual machines. Some of the primary motivations for the enterprise to adopt the container technology include its convenience to encapsulate and deploy applications, lightweight operations, as well as efficiency and flexibility in resources sharing. However, there still lacks an in-depth and systematic comparison study on how big data applications, such as Spark jobs, perform between a container environment and a virtual machine environment. In this paper, by running various Spark applications with different configurations, we evaluate the two environments from many interesting aspects, such as how convenient the execution environment can be set up, what are makespans of different workloads running in each setup, how efficient the hardware resources, such as CPU and memory, are utilized, and how well each environment can scale. The results show that compared with virtual machines, containers provide a more easy-to-deploy and scalable environment for big data workloads. The research work in this paper can help practitioners and researchers to make more informed decisions on tuning their cloud environment and configuring the big data applications, so as to achieve better performance and higher resources utilization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/26/2019

Apache Hive: From MapReduce to Enterprise-grade Big Data Warehousing

Apache Hive is an open-source relational database system for analytic bi...
research
03/01/2022

Tiny Autoscalers for Tiny Workloads: Dynamic CPU Allocation for Serverless Functions

In serverless computing, applications are executed under lightweight vir...
research
11/01/2018

Modeling Conceptual Characteristics of Virtual Machines for CPU Utilization Prediction

Cloud services have grown rapidly in recent years, which provide high fl...
research
12/19/2019

Is Big Data Performance Reproducible in Modern Cloud Networks?

Performance variability has been acknowledged as a problem for over a de...
research
11/01/2019

Performance Evaluation of VDI Environment

Virtualization technology is widely used for sharing the abilities of co...
research
05/12/2018

Deploying Jupyter Notebooks at scale on XSEDE resources for Science Gateways and workshops

Jupyter Notebooks have become a mainstream tool for interactive computin...
research
10/11/2018

A Comparative Study of Consistent Snapshot Algorithms for Main-Memory Database Systems

In-memory databases (IMDBs) are gaining increasing popularity in big dat...

Please sign up or login with your details

Forgot password? Click here to reset