RegDem: Increasing GPU Performance via Shared Memory Register Spilling

07/05/2019
by   Putt Sakdhnagool, et al.
0

GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count, and therefore lower program performance. Our investigation found that registers are often the occupancy limiters. The de-facto nvcc compiler-based approach spills excessive registers to the off-chip memory, ignoring the shared memory and leaving the on-chip resources underutilized. To mitigate the register demand, this paper presents a binary translation technique, called RegDem, that spills excessive registers to the underutilized shared memory by transforming the GPU assembly code (SASS). Most GPU programs do not fully use shared memory, thus allowing RegDem to use it for register spilling. The higher occupancy achieved by RegDem outweighs the slightly higher cost of accessing shared memory instead of placing data in registers. The paper also presents a compile-time performance predictor that models instructions stalls to choose the best version from a set of program variants. Cumulatively, these techniques outperform the nvcc compiler with a 9 geometric mean, the highest observed being 18

READ FULL TEXT
research
07/12/2016

Scratchpad Sharing in GPUs

GPGPU applications exploit on-chip scratchpad memory available in the Gr...
research
08/05/2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting t...
research
11/08/2019

AMOEBA: A Coarse Grained Reconfigurable Architecture for Dynamic GPU Scaling

Different GPU applications exhibit varying scalability patterns with net...
research
10/16/2020

Combinatorics and Geometry for the Many-ported, Distributed and Shared Memory Architecture

Manycore SoC architectures based on on-chip shared memory are preferred ...
research
10/23/2015

FIESTA 4: optimized Feynman integral calculations with GPU support

This paper presents a new major release of the program FIESTA (Feynman I...
research
05/02/2018

Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance

The application resource specification--a static specification of severa...
research
02/07/2018

Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management

The application resource specification--a static specification of severa...

Please sign up or login with your details

Forgot password? Click here to reset