Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

12/09/2020
by   Qinzhe Wu, et al.
0

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61

READ FULL TEXT
research
09/01/2016

On-Chip Mechanisms to Reduce Effective Memory Access Latency

This dissertation develops hardware that automatically reduces the effec...
research
05/14/2013

Phase-Priority based Directory Coherence for Multicore Processor

As the number of cores in a single chip increases, a typical implementat...
research
10/27/2020

Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

In applications such as sharded data processing systems, sharded in-memo...
research
06/07/2020

Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication

To meet ever increasing demand for performance of emerging System-on-Chi...
research
10/16/2022

RevaMp3D: Architecting the Processor Core and Cache Hierarchy for Systems with Monolithically-Integrated Logic and Memory

Recent nano-technological advances enable the Monolithic 3D (M3D) integr...
research
12/31/2020

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...
research
03/16/2022

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Responding to the "datacenter tax" and "killer microseconds" problems fo...

Please sign up or login with your details

Forgot password? Click here to reset