Modeling the Linux page cache for accurate simulation of data-intensive applications

01/05/2021
by   Hoang-Dung Do, et al.
0

The emergence of Big Data in recent years has resulted in a growing need for efficient data processing solutions. While infrastructures with sufficient compute power are available, the I/O bottleneck remains. The Linux page cache is an efficient approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results. In this paper, we propose an I/O simulation model that includes the key features of the Linux page cache. We have implemented this model as part of the WRENCH workflow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. We find that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations.

READ FULL TEXT

page 1

page 7

research
12/16/2018

Performance Evaluation of Big Data Processing Strategies for Neuroimaging

Neuroimaging datasets are rapidly growing in size as a result of advance...
research
06/09/2023

CAWL: A Cache-aware Write Performance Model of Linux Systems

The performance of data intensive applications is often dominated by the...
research
01/04/2019

Page Cache Attacks

We present a new hardware-agnostic side-channel attack that targets one ...
research
01/24/2019

Accuracy vs. Computational Cost Tradeoff in Distributed Computer System Simulation

Simulation is a fundamental research tool in the computer architecture f...
research
04/28/2021

FaaT: A Transparent Auto-Scaling Cache for Serverless Applications

Function-as-a-Service (FaaS) has become an increasingly popular way for ...
research
04/27/2018

Intermediate Data Caching Optimization for Multi-Stage and Parallel Big Data Frameworks

In the era of big data and cloud computing, large amounts of data are ge...
research
08/15/2015

Cracking Intel Sandy Bridge's Cache Hash Function

On Intel Sandy Bridge processor, last level cache (LLC) is divided into ...

Please sign up or login with your details

Forgot password? Click here to reset