
-
PsPIN: A high-performance low-power architecture for flexible in-network compute
The capacity of offloading data and control tasks to the network is beco...
read it
-
An In-Depth Analysis of the Slingshot Interconnect
The interconnect is one of the most critical components in large scale c...
read it
-
High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
The recent line of research into topology design focuses on lowering net...
read it
-
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Deep learning at scale is dominated by communication time. Distributing ...
read it
-
Mitigating Network Noise on Dragonfly Networks through Application-Aware Routing
System noise can negatively impact the performance of HPC systems, and t...
read it
-
Network-Accelerated Non-Contiguous Memory Transfers
Applications often communicate data that is non-contiguous in the send- ...
read it
-
Taming Unbalanced Training Workloads in Deep Learning with Partial Collective Operations
Load imbalance pervasively exists in distributed deep learning training ...
read it
-
FatPaths: Routing in Supercomputers, Data Centers, and Clouds with Low-Diameter Networks when Shortest Paths Fall Short
We introduce FatPaths: a simple, generic, and robust routing architectur...
read it
-
SimFS: A Simulation Data Virtualizing File System Interface
Nowadays simulations can produce petabytes of data to be stored in paral...
read it