General Purpose Graphics Processing Units (GPGPU) are used in most of th...
Iterative stencils are used widely across the spectrum of High Performan...
In this humorous and thought provoking article, we discuss certain myths...
A considerable amount of research and engineering went into designing pr...
Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU
...
Scientific communities are increasingly adopting machine learning and de...
The fastest supercomputer in 2020, Fugaku, has not only achieved digital...
Computed Tomography (CT) is a key 3D imaging technology that fundamental...
Matrix engines or units, in different forms and affinities, are becoming...
The dedicated memory of hardware accelerators can be insufficient to sto...
We present scalable hybrid-parallel algorithms for training large-scale ...
GPUs are playing an increasingly important role in general-purpose compu...
With the end of both Dennard's scaling and Moore's law, computer users a...
In this paper we evaluate the performance of FPGAs for high-order stenci...
Stencil computation is one of the most widely-used compute patterns in h...
Supported by their high power efficiency and recent advancements in High...
Computed Tomography (CT) is a widely used technology that requires
compu...
This paper proposes a versatile high-performance execution model, inspir...
Graph Convolutional Networks (GCNs) are recently getting much attention ...
In this paper, we give a new linear time correctness condition for proof...
Graph pattern matching algorithms to handle million-scale dynamic graphs...
Large-scale distributed training of deep neural networks suffer from the...
Among the (uncontended) common wisdom in High-Performance Computing (HPC...
NVIDIA cuDNN is a low-level library that provides GPU kernels frequently...
Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...
Recent developments in High Level Synthesis tools have attracted softwar...