
-
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization
Dense linear algebra kernels, such as linear solvers or tensor contracti...
read it
-
PsPIN: A high-performance low-power architecture for flexible in-network compute
The capacity of offloading data and control tasks to the network is beco...
read it
-
High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
The recent line of research into topology design focuses on lowering net...
read it
-
Optimizing the Data Movement in Quantum Transport Simulations via Data-Centric Parallel Programming
Designing efficient cooling systems for integrated circuits (ICs) relies...
read it
-
A Data-Centric Approach to Extreme-Scale Ab initio Dissipative Quantum Transport Simulations
The computational efficiency of a state of the art ab initio quantum tra...
read it
-
Network-Accelerated Non-Contiguous Memory Transfers
Applications often communicate data that is non-contiguous in the send- ...
read it
-
Stateful Dataflow Multigraphs: A Data-Centric Model for Performance Portability on Heterogeneous Architectures
The ubiquity of accelerators in high-performance computing has driven pr...
read it
-
Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs
With the ubiquity of accelerators, such as FPGAs and GPUs, the complexit...
read it