Exploring Memory Persistency Models for GPUs

04/24/2019
by   Zhen Lin, et al.
0

Given its high integration density, high speed, byte addressability, and low standby power, non-volatile or persistent memory is expected to supplement/replace DRAM as main memory. Through persistency programming models (which define durability ordering of stores) and durable transaction constructs, the programmer can provide recoverable data structure (RDS) which allows programs to recover to a consistent state after a failure. While persistency models have been well studied for CPUs, they have been neglected for graphics processing units (GPUs). Considering the importance of GPUs as a dominant accelerator for high performance computing, we investigate persistency models for GPUs. GPU applications exhibit substantial differences with CPUs applications, hence in this paper we adapt, re-architect, and optimize CPU persistency models for GPUs. We design a pragma-based compiler scheme to express persistency models for GPUs. We identify that the thread hierarchy in GPUs offers intuitive scopes to form epochs and durable transactions. We find that undo logging produces significant performance overheads. We propose to use idempotency analysis to reduce both logging frequency and the size of logs. Through both real-system and simulation evaluations, we show low overheads of our proposed architecture support.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/03/2009

Hard Data on Soft Errors: A Large-Scale Assessment of Real-World Error Rates in GPGPU

Graphics processing units (GPUs) are gaining widespread use in computati...
research
08/01/2021

Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs

We investigate the sources of Detected Unrecoverable Errors (DUEs) in GP...
research
09/12/2021

Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors

Traditional graphics processing units (GPUs) suffer from the low memory ...
research
06/27/2019

State-of-the-Art on Query Transaction Processing Acceleration

The vast amount of processing power and memory bandwidth provided by mod...
research
03/17/2014

High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

We implement a master-slave parallel genetic algorithm (PGA) with a besp...
research
06/22/2023

ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code

Automatic code optimization is a complex process that typically involves...

Please sign up or login with your details

Forgot password? Click here to reset