Node-Type-Based Load-Balancing Routing for Parallel Generalized Fat-Trees

11/21/2022
by   John Gliksberg, et al.
0

High-Performance Computing (HPC) clusters are made up of a variety of node types (usually compute, I/O, service, and GPGPU nodes) and applications don't use nodes of a different type the same way. Resulting communication patterns reflect organization of groups of nodes, and current optimal routing algorithms for all-to-all patterns will not always maximize performance for group-specific communications. Since application communication patterns are rarely available beforehand, we choose to rely on node types as a good guess for node usage. We provide a description of node type heterogeneity and analyse performance degradation caused by unlucky repartition of nodes of the same type. We provide an extension to routing algorithms for Parallel Generalized Fat-Tree topologies (PGFTs) which balances load amongst groups of nodes of the same type. We show how it removes these performance issues by comparing results in a variety of situations against corresponding classical algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2022

High-Quality Fault-Resiliency in Fat-Tree Networks (Extended Abstract)

Coupling regular topologies with optimized routing algorithms is key in ...
research
07/01/2021

Scalable Node-Disjoint and Edge-Disjoint Multi-wavelength Routing

Probabilistic message-passing algorithms are developed for routing trans...
research
11/23/2022

High-Quality Fault Resiliency in Fat Trees

Coupling regular topologies with optimised routing algorithms is key in ...
research
03/24/2023

Generalized Distance Metric for Different DHT Routing Algorithms in Peer-to-Peer Networks

We present a generalized distance metric that can be used to identify ro...
research
05/19/2020

Efficient Process-to-Node Mapping Algorithms for Stencil Computations

Good process-to-compute-node mappings can be decisive for well performin...
research
01/01/2020

AIR – A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing

Distributed Stream Processing Systems (DSPSs) are among the currently mo...
research
08/10/2018

Self-Organization Scheme for Balanced Routing in Large-Scale Multi-Hop Networks

We propose a self-organization scheme for cost-effective and load-balanc...

Please sign up or login with your details

Forgot password? Click here to reset