HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems

07/08/2020
by   Saiful A. Mojumder, et al.
0

While multi-GPU (MGPU) systems are extremely popular for compute-intensive workloads, several inefficiencies in the memory hierarchy and data movement result in a waste of GPU resources and difficulties in programming MGPU systems. First, due to the lack of hardware-level coherence, the MGPU programming model requires the programmer to replicate and repeatedly transfer data between the GPUs' memory. This leads to inefficient use of precious GPU memory. Second, to maintain coherency across an MGPU system, transferring data using low-bandwidth and high-latency off-chip links leads to degradation in system performance. Third, since the programmer needs to manually maintain data coherence, the programming of an MGPU system to maximize its throughput is extremely challenging. To address the above issues, we propose a novel lightweight timestamp-based coherence protocol, HALCONE, for MGPU systems and modify the memory hierarchy of the GPUs to support physically shared memory. HALCONE replaces the Compute Unit (CU) level logical time counters with cache level logical time counters to reduce coherence traffic. Furthermore, HALCONE introduces a novel timestamp storage unit (TSU) with no additional performance overhead in the main memory to perform coherence actions. Our proposed HALCONE protocol maintains the data coherence in the memory hierarchy of the MGPU with minimal performance overhead (less than 1%). Using a set of standard MGPU benchmarks, we observe that a 4-GPU MGPU system with shared memory and HALCONE performs, on average, 4.6× and 3× better than a 4-GPU MGPU system with existing RDMA and with the recently proposed HMG coherence protocol, respectively. We demonstrate the scalability of HALCONE using different GPU counts (2, 4, 8, and 16) and different CU counts (32, 48, and 64 CUs per GPU) for 11 standard benchmarks.

READ FULL TEXT

page 1

page 4

page 6

page 10

page 11

research
08/05/2020

MGPU-TSM: A Multi-GPU System with Truly Shared Memory

The sizes of GPU applications are rapidly growing. They are exhausting t...
research
05/19/2017

GPU System Calls

GPUs are becoming first-class compute citizens and are being tasked to p...
research
04/23/2021

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

Hardware specialization is becoming a key enabler of energyefficient per...
research
02/10/2020

Rainbow: A Composable Coherence Protocol for Multi-Chip Servers

The use of multi-chip modules (MCM) and/or multi-socket boards is the mo...
research
02/19/2020

Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics

This work provides the first study to explore the interaction of update ...
research
09/30/2022

Hardware Trojan Threats to Cache Coherence in Modern 2.5D Chiplet Systems

As industry moves toward chiplet-based designs, the insertion of hardwar...
research
07/01/2021

MIND: In-Network Memory Management for Disaggregated Data Centers

Memory-compute disaggregation promises transparent elasticity, high util...

Please sign up or login with your details

Forgot password? Click here to reset