Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs

09/14/2021
by   Joseph Zuckerman, et al.
0

One of the most critical aspects of integrating loosely-coupled accelerators in heterogeneous SoC architectures is orchestrating their interactions with the memory hierarchy, especially in terms of navigating the various cache-coherence options: from accelerators accessing off-chip memory directly, bypassing the cache hierarchy, to accelerators having their own private cache. By running real-size applications on FPGA-based prototypes of many-accelerator multi-core SoCs, we show that the best cache-coherence mode for a given accelerator varies at runtime, depending on the accelerator's characteristics, the workload size, and the overall SoC status. Cohmeleon applies reinforcement learning to select the best coherence mode for each accelerator dynamically at runtime, as opposed to statically at design time. It makes these selections adaptively, by continuously observing the system and measuring its performance. Cohmeleon is accelerator-agnostic, architecture-independent, and it requires minimal hardware support. Cohmeleon is also transparent to application programmers and has a negligible software overhead. FPGA-based experiments show that our runtime approach offers, on average, a 38 compared to state-of-the-art design-time approaches. Moreover, it can match runtime solutions that are manually tuned for the target architecture.

READ FULL TEXT
research
08/04/2019

Analysis and Optimization of I/O Cache Coherency Strategies for SoC-FPGA Device

Unlike traditional PCIe-based FPGA accelerators, heterogeneous SoC-FPGA ...
research
03/16/2022

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Responding to the "datacenter tax" and "killer microseconds" problems fo...
research
12/02/2018

Training for 'Unstable' CNN Accelerator:A Case Study on FPGA

With the great advancements of convolution neural networks(CNN), CNN acc...
research
08/15/2022

ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures

Unlike other accelerators, FPGAs are capable of supporting cache coheren...
research
03/05/2019

Integrating NVIDIA Deep Learning Accelerator (NVDLA) with RISC-V SoC on FireSim

NVDLA is an open-source deep neural network (DNN) accelerator which has ...
research
04/23/2021

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

Hardware specialization is becoming a key enabler of energyefficient per...
research
08/29/2018

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoC...

Please sign up or login with your details

Forgot password? Click here to reset