A Software-Defined QoS Provisioning Framework for HPC Applications

by   Neda Tavakoli, et al.

With the emergence of large-scale data-intensive high-performance applications, new I/O challenges appear in the efficient management of petabytes of information in High-Performance Computing (HPC) environments. Data management environments must meet the performance needs of such applications, represented by various Quality-of-Service (QoS) metrics such as desired bandwidth, response time guarantee, and resource utilization. Traditional high-performance management platforms are facing considerable challenges regarding flexibility, as well as the need to address a variety of QoS metrics and constraints. To tackle these challenges, a Software-Defined approach is considered promising, and various prototypes have already been deployed in Cloud-based data centers. In this paper, we investigate the idea of utilizing a software-defined approach to provide I/O QoS provisioning for HPC applications. We identify the key challenges towards the high degree of concurrency and variation in HPC platforms, and propose a series of novel designs into the general software-defined approach in order to deliver our goal. Specifically, we introduced a borrowing-based strategy and a new M-LWDF algorithm based on traditional token-bucket algorithms to assure a fair and efficient utilization of resources for HPC applications. Due to the lack of software-defined frameworks in current HPC platform, we evaluated our framework through simulation. The experimental results show that our strategies make a significant improvement upon the general HPC frameworks and lead to clear performance gain for HPC applications.


A Software-Defined Approach for QoS Control in High-Performance Computing Storage Systems

High-performance computing (HPC) storage systems become increasingly cri...

A Serverless Tool for Platform Agnostic Computational Experiment Management

Neuroscience has been carried into the domain of big data and high perfo...

Integrating Deep Learning in Domain Sciences at Exascale

This paper presents some of the current challenges in designing deep lea...

Learning Everywhere: Pervasive Machine Learning for Effective High-Performance Computation

The convergence of HPC and data-intensive methodologies provide a promis...

Block size estimation for data partitioning in HPC applications using machine learning techniques

The extensive use of HPC infrastructures and frameworks for running data...

The Locus Algorithm IV: Performance metrics of a grid computing system used to create catalogues of optimised pointings

This paper discusses the requirements for and performance metrics of the...

Modernizing the HPC System Software Stack

Through the 1990s, HPC centers at national laboratories, universities, a...

Please sign up or login with your details

Forgot password? Click here to reset