A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

04/23/2021
by   Johnathan Alsop, et al.
0

Hardware specialization is becoming a key enabler of energyefficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands. Traditionally, communication between accelerators has been inefficient, typically orchestrated through explicit DMA transfers between different address spaces. More recently, industry has proposed unified coherent memory which enables implicit data movement and more data reuse, but often these interfaces limit the coherence flexibility available to heterogeneous systems. This paper demonstrates the benefits of fine-grained coherence specialization for heterogeneous systems. We propose an architecture that enables low-complexity independent specialization of each individual coherence request in heterogeneous workloads by building upon a simple and flexible baseline coherence interface, Spandex. We then describe how to optimize individual memory requests to improve cache reuse and performance-critical memory latency in emerging heterogeneous workloads. Collectively, our techniques enable significant gains, reducing execution time by up to 61 network traffic by up to 99 protocol.

READ FULL TEXT
research
07/08/2020

HALCONE : A Hardware-Level Timestamp-based Cache Coherence Scheme for Multi-GPU systems

While multi-GPU (MGPU) systems are extremely popular for compute-intensi...
research
08/04/2019

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device

Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA ...
research
02/19/2020

Specializing Coherence, Consistency, and Push/Pull for GPU Graph Analytics

This work provides the first study to explore the interaction of update ...
research
01/25/2017

Hardware Translation Coherence for Virtualized Systems

To improve system performance, modern operating systems (OSes) often und...
research
09/14/2021

Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs

One of the most critical aspects of integrating loosely-coupled accelera...
research
07/01/2021

MIND: In-Network Memory Management for Disaggregated Data Centers

Memory-compute disaggregation promises transparent elasticity, high util...
research
12/20/2021

Dijkstra-Through-Time: Ahead of time hardware scheduling method for deterministic workloads

Most of the previous works on data flow optimizations for Machine Learni...

Please sign up or login with your details

Forgot password? Click here to reset