Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

by   Qinzhe Wu, et al.

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61



There are no comments yet.


page 4


On-Chip Mechanisms to Reduce Effective Memory Access Latency

This dissertation develops hardware that automatically reduces the effec...

Phase-Priority based Directory Coherence for Multicore Processor

As the number of cores in a single chip increases, a typical implementat...

Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication

To meet ever increasing demand for performance of emerging System-on-Chi...

Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

In applications such as sharded data processing systems, sharded in-memo...

An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

On-chip communication infrastructure is a central component of modern sy...

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Responding to the "datacenter tax" and "killer microseconds" problems fo...

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.