Monitoring Collective Communication Among GPUs

Communication among devices in multi-GPU systems plays an important role in terms of performance and scalability. In order to optimize an application, programmers need to know the type and amount of the communication happening among GPUs. Although there are prior works to gather this information in MPI applications on distributed systems and multi-threaded applications on shared memory systems, there is no tool that identifies communication among GPUs. Our prior work, ComScribe, presents a point-to-point (P2P) communication detection tool for GPUs sharing a common host. In this work, we extend ComScribe to identify communication among GPUs for collective and P2P communication primitives in NVIDIA's NCCL library. In addition to P2P communications, collective communications are commonly used in HPC and AI workloads thus it is important to monitor the induced data movement due to collectives. Our tool extracts the size and the frequency of data transfers in an application and visualizes them as a communication matrix. To demonstrate the tool in action, we present communication matrices and some statistics for two applications coming from machine translation and image classification domains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2018

An Empirical Evaluation of Allgatherv on Multi-GPU Systems

Applications for deep learning and big data analytics have compute and m...
research
11/08/2021

Synthesizing Collective Communication Algorithms for Heterogeneous Networks with TACCL

Large ML models and datasets have necessitated the use of multi-GPU syst...
research
02/13/2020

Hoplite: Efficient Collective Communication for Task-Based Distributed Systems

Collective communication systems such as MPI offer high performance grou...
research
01/27/2022

GC3: An Optimizing Compiler for GPU Collective Communication

Machine learning models made up of millions or billions of parameters ar...
research
08/09/2023

gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters

GPU-aware collective communication has become a major bottleneck for mod...
research
10/23/2018

Language Modeling at Scale

We show how Zipf's Law can be used to scale up language modeling (LM) to...
research
04/30/2021

Memory Reduction using a Ring Abstraction over GPU RDMA for Distributed Quantum Monte Carlo Solver

Scientific applications that run on leadership computing facilities ofte...

Please sign up or login with your details

Forgot password? Click here to reset