Fine-grained Policy-driven I/O Sharing for Burst Buffers

by   Ed Karrels, et al.

A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5-13.7 I/O throughput and 19.5-40.4 algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1-99.8


page 4

page 5


Salus: Fine-Grained GPU Sharing Primitives for Deep Learning Applications

GPU computing is becoming increasingly more popular with the proliferati...

nOS-V: Co-Executing HPC Applications Using System-Wide Task Scheduling

Future Exascale systems will feature massive parallelism, many-core proc...

FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices

Modern user-facing latency-sensitive web services include numerous distr...

Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

Resource provisioning in multi-tenant stream processing systems faces th...

Analyzing DCTCP and Cubic Buffer Sharing under Diverse Router Configurations

In this work, we look at the impact of router configurations on DCTCP an...

Performance evaluation of job schedulers on Hadoop YARN

To solve the limitation of Hadoop on scalability, resource sharing, and ...

Unveiling User Behavior on Summit Login Nodes as a User

We observe and analyze usage of the login nodes of the leadership class ...

Please sign up or login with your details

Forgot password? Click here to reset