Noise in the Clouds: Influence of Network Performance Variability on Application Scalability

10/27/2022
by   Daniele De Sensi, et al.
0

Cloud computing represents an appealing opportunity for cost-effective deployment of HPC workloads on the best-fitting hardware. However, although cloud and on-premise HPC systems offer similar computational resources, their network architecture and performance may differ significantly. For example, these systems use fundamentally different network transport and routing protocols, which may introduce network noise that can eventually limit the application scaling. This work analyzes network performance, scalability, and cost of running HPC workloads on cloud systems. First, we consider latency, bandwidth, and collective communication patterns in detailed small-scale measurements, and then we simulate network performance at a larger scale. We validate our approach on four popular cloud providers and three on-premise HPC systems, showing that network (and also OS) noise can significantly impact performance and cost both at small and large scale.

READ FULL TEXT

page 4

page 7

page 8

page 9

page 10

page 11

page 14

page 15

research
09/03/2019

An Event-Driven Approach to Serverless Seismic Imaging in the Cloud

Adapting the cloud for high-performance computing (HPC) is a challenging...
research
12/16/2020

Container Orchestration on HPC Systems

Containerisation demonstrates its efficiency in application deployment i...
research
08/03/2022

The Case for Non-Volatile RAM in Cloud HPCaaS

HPC as a service (HPCaaS) is a new way to expose HPC resources via cloud...
research
03/10/2020

In Datacenter Performance, The Only Constant Is Change

All computing infrastructure suffers from performance variability, be it...
research
06/28/2020

Fast and Low-cost Search for Efficient Cloud Configurations for HPC Workloads

The use of cloud computational resources has become increasingly importa...
research
07/20/2020

Modernizing the HPC System Software Stack

Through the 1990s, HPC centers at national laboratories, universities, a...
research
05/10/2020

Optimizing Deep Learning Recommender Systems' Training On CPU Cluster Architectures

During the last two years, the goal of many researchers has been to sque...

Please sign up or login with your details

Forgot password? Click here to reset