An In-Depth Analysis of the Slingshot Interconnect

08/20/2020
by   Daniele De Sensi, et al.
0

The interconnect is one of the most critical components in large scale computing systems, and its impact on the performance of applications is going to increase with the system size. In this paper, we will describe Slingshot, an interconnection network for large scale computing systems. Slingshot is based on high-radix switches, which allow building exascale and hyperscale datacenters networks with at most three switch-to-switch hops. Moreover, Slingshot provides efficient adaptive routing and congestion control algorithms, and highly tunable traffic classes. Slingshot uses an optimized Ethernet protocol, which allows it to be interoperable with standard Ethernet devices while providing high performance to HPC applications. We analyze the extent to which Slingshot provides these features, evaluating it on microbenchmarks and on several applications from the datacenter and AI worlds, as well as on HPC applications. We find that applications running on Slingshot are less affected by congestion compared to previous generation networks.

READ FULL TEXT

page 2

page 8

page 9

research
11/21/2022

High-Quality Fault-Resiliency in Fat-Tree Networks (Extended Abstract)

Coupling regular topologies with optimized routing algorithms is key in ...
research
12/14/2020

Application-aware Congestion Mitigation for High-Performance Computing Systems

High-performance computing (HPC) systems frequently experience congestio...
research
12/07/2022

SAIH: A Scalable Evaluation Methodology for Understanding AI Performance Trend on HPC Systems

Novel artificial intelligence (AI) technology has expedited various scie...
research
03/30/2020

Deep-learning enhancement of large scale numerical simulations

Traditional simulations on High-Performance Computing (HPC) systems typi...
research
11/23/2022

High-Quality Fault Resiliency in Fat Trees

Coupling regular topologies with optimised routing algorithms is key in ...
research
05/05/2020

Nanotechnology-inspired Information Processing Systems of the Future

Nanoscale semiconductor technology has been a key enabler of the computi...
research
07/22/2022

Impact of RoCE Congestion Control Policies on Distributed Training of DNNs

RDMA over Converged Ethernet (RoCE) has gained significant attraction fo...

Please sign up or login with your details

Forgot password? Click here to reset