Fine-grained Policy-driven I/O Sharing for Burst Buffers

06/20/2023
by   Ed Karrels, et al.
0

A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5-13.7 I/O throughput and 19.5-40.4 algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1-99.8

READ FULL TEXT

page 4

page 5

research
02/12/2019

Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications

GPU computing is becoming increasingly more popular with the proliferati...
research
04/22/2022

nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling

Future Exascale systems will feature massive parallelism, many-core proc...
research
08/19/2020

FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices

Modern user-facing latency-sensitive web services include numerous distr...
research
10/06/2020

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Resource provisioning in multi-tenant stream processing systems faces th...
research
02/11/2023

Analyzing DCTCP and Cubic Buffer Sharing under Diverse Router Configurations

In this work, we look at the impact of router configurations on DCTCP an...
research
08/24/2018

Performance evaluation of job schedulers on Hadoop YARN

To solve the limitation of Hadoop on scalability, resource sharing, and ...
research
04/18/2022

Unveiling User Behavior on Summit Login Nodes as a User

We observe and analyze usage of the login nodes of the leadership class ...

Please sign up or login with your details

Forgot password? Click here to reset