Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters

09/17/2021
by   Pouya Kousha, et al.
0

Understanding and visualizing the full-stack performance trade-offs and interplay between HPC applications, MPI libraries, the communication fabric, and the file system is a challenging endeavor. Designing a holistic profiling and visualization method for HPC communication networks is challenging since different levels of communication coexist and interact with each other on the communication fabric. A breakdown of traffic is essential to understand the interplay of different layers along with the application's communication behavior without losing a general view of network traffic. Unfortunately, existing profiling tools are disjoint and either focus on only profiling and visualizing a few levels of the HPC stack, which limits the insights they can provide, or they provide extremely detailed information which necessitates a steep learning curve to understand. We target our profiling tool visualization to provide holistic and real-time insights into HPC communication stacks. In this paper, we propose and implement our visualization methods to enable holistic insight for representing the cross-stack metrics. Moreover, we propose and implement a low-overhead I/O profiling inside the communication library, collect and store the profiling information, and then study the correlation and evaluation of I/O traffic with MPI communication using a cross-stack approach by INAM. Through experimental evaluations and use cases, we demonstrate novel benefits of our cross-stack communication analysis in real-time to detect bottlenecks and understand communication performance.

READ FULL TEXT

page 1

page 2

page 6

page 8

research
12/28/2022

Hybrid Cloud and HPC Approach to High-Performance Dataframes

Data pre-processing is a fundamental component in any data-driven applic...
research
05/10/2017

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

HPC applications pose high demands on I/O performance and storage capabi...
research
06/18/2019

From Facility to Application Sensor Data: Modular, Continuous and Holistic Monitoring with DCDB

Today's HPC installations are highly-complex systems, and their complexi...
research
06/26/2021

Exploring Spatial Indexing for Accelerated Feature Retrieval in HPC

Despite the critical role that range queries play in analysis and visual...
research
08/19/2019

XSP: Across-Stack Profiling and Analysis of Machine Learning Models on GPUs

There has been a rapid proliferation of machine learning/deep learning (...
research
06/11/2021

CommAID: Visual Analytics for Communication Analysis through Interactive Dynamics Modeling

Communication consists of both meta-information as well as content. Curr...
research
12/09/2021

Is Disaggregation possible for HPC Cognitive Simulation?

Cognitive simulation (CogSim) is an important and emerging workflow for ...

Please sign up or login with your details

Forgot password? Click here to reset