Exploiting Inter-Operation Data Reuse in Scientific Applications using GOGETA

by   Raveesh Garg, et al.

HPC applications are critical in various scientific domains ranging from molecular dynamics to chemistry to fluid dynamics. Conjugate Gradient (CG) is a popular application kernel used in iterative linear HPC solvers and has applications in numerous scientific domains. However, the HPCG benchmark shows that the peformance achieved by Top500 HPC systems on CG is a small fraction of the performance achieved on Linpack. While CG applications have significant portions of computations that are dense and sparse matrix multiplications, skewed SpMMs/GEMMs in the HPC solvers have poor arithmetic intensities which makes their execution highly memory bound unlike GEMMs in DNNs which have high arithmetic intensity. The problem of low intensity individual skewed GEMMs also exists in various emerging workloads from other domains like Graph Neural Networks, Transformers etc. In this work we identify various reuse opportunities between the tensors in these solver applications to extract reuse in the entire Directed Acyclic Graph of the tensor operations rather than individual tensor operations. These opportunities essentially depend on the dimensions of the tensors and the structure of the tensor dependency graph. We propose a systematic methodology to determine various kinds of reuse opportunities in the graph of operations and determine the loop order and tiling in the interdependent operations. As a result, we propose a novel mapping strategy GOGETA that improves reuse of HPC applications on spatial accelerators. We also propose a data organization strategy in the buffer. Our mapping achieves geomean 6.7x reduction in memory accesses.


Evaluating Emerging CXL-enabled Memory Pooling for HPC Systems

Current HPC systems provide memory resources that are statically configu...

Parallel Algorithms for Tensor Train Arithmetic

We present efficient and scalable parallel algorithms for performing mat...

Performance Analysis of Scientific Computing Workloads on Trusted Execution Environments

Scientific computing sometimes involves computation on sensitive data. D...

TensorFlow Doing HPC

TensorFlow is a popular emerging open-source programming framework suppo...

ALTO: Adaptive Linearized Storage of Sparse Tensors

The analysis of high-dimensional sparse data is becoming increasingly po...

A High-Throughput Solver for Marginalized Graph Kernels on GPU

We present the design of a solver for the efficient and high-throughput ...

NVM-ESR: Using Non-Volatile Memory in Exact State Reconstruction of Preconditioned Conjugate Gradient

HPC systems are a critical resource for scientific research and advanced...

Please sign up or login with your details

Forgot password? Click here to reset