Virtual-Link: A Scalable Multi-Producer, Multi-Consumer Message Queue Architecture for Cross-Core Communication

12/09/2020
by   Qinzhe Wu, et al.
0

Cross-core communication is increasingly a bottleneck as the number of processing elements increase per system-on-chip. Typical hardware solutions to cross-core communication are often inflexible; while software solutions are flexible, they have performance scaling limitations. A key problem, as we will show, is that of shared state in software-based message queue mechanisms. This paper proposes Virtual-Link (VL), a novel light-weight communication mechanism with hardware support to facilitate M:N lock-free data movement. VL reduces the amount of coherent shared state, which is a bottleneck for many approaches, to zero. VL provides further latency benefit by keeping data on the fast path (i.e., within the on-chip interconnect). VL enables directed cache-injection (stashing) between PEs on the coherence bus, reducing the latency for core-to-core communication. VL is particularly effective for fine-grain tasks on streaming data. Evaluation on a full system simulator with 7 benchmarks shows that VL achieves a 2.09x speedup over state-of-the-art software-based communication mechanisms, while reducing memory traffic by 61

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

09/01/2016

On-Chip Mechanisms to Reduce Effective Memory Access Latency

This dissertation develops hardware that automatically reduces the effec...
05/14/2013

Phase-Priority based Directory Coherence for Multicore Processor

As the number of cores in a single chip increases, a typical implementat...
06/07/2020

Stochastic Automata Network for Performance Evaluation of Heterogeneous SoC Communication

To meet ever increasing demand for performance of emerging System-on-Chi...
10/27/2020

Jiffy: A Fast, Memory Efficient, Wait-Free Multi-Producers Single-Consumer Queue

In applications such as sharded data processing systems, sharded in-memo...
09/11/2020

An Open-Source Platform for High-Performance Non-Coherent On-Chip Communication

On-chip communication infrastructure is a central component of modern sy...
03/16/2022

ORCA: A Network and Architecture Co-design for Offloading us-scale Datacenter Applications

Responding to the "datacenter tax" and "killer microseconds" problems fo...
12/31/2020

Data Criticality in Multi-Threaded Applications: An Insight for Many-Core Systems

Multi-threaded applications are capable of exploiting the full potential...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.