DeepAI AI Chat
Log In Sign Up

Overcoming Limitations of GPGPU-Computing in Scientific Applications

by   Connor Kenyon, et al.

The performance of discrete general purpose graphics processing units (GPGPUs) has been improving at a rapid pace. The PCIe interconnect that controls the communication of data between the system host memory and the GPU has not improved as quickly, leaving a gap in performance due to GPU downtime while waiting for PCIe data transfer. In this article, we explore two alternatives to the limited PCIe bandwidth, NVIDIA NVLink interconnect, and zero-copy algorithms for shared memory Heterogeneous System Architecture (HSA) devices. The OpenCL SHOC benchmark suite is used to measure the performance of each device on various scientific application kernels.


page 4

page 5

page 6


Scientific Computing Using Consumer Video-Gaming Hardware Devices

Commodity video-gaming hardware (consoles, graphics cards, tablets, etc....

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

Graph Convolutional Networks (GCNs) are increasingly adopted in large-sc...

Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors

Traditional graphics processing units (GPUs) suffer from the low memory ...

Machine Learning Training on a Real Processing-in-Memory System

Training machine learning algorithms is a computationally intensive proc...

Recent Advances in Overcoming Bottlenecks in Memory Systems and Managing Memory Resources in GPU Systems

This article features extended summaries and retrospectives of some of t...

cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs

Nonuniform fast Fourier transforms dominate the computational cost in ma...

Safe, Seamless, And Scalable Integration Of Asynchronous GPU Streams In PETSc

Leveraging Graphics Processing Units (GPUs) to accelerate scientific sof...