Inter-thread Communication in Multithreaded, Reconfigurable Coarse-grain Arrays

01/16/2018
by   Dani Voitsechov, et al.
0

Traditional von Neumann GPGPUs only allow threads to communicate through memory on a group-to-group basis. In this model, a group of producer threads writes intermediate values to memory, which are read by a group of consumer threads after a barrier synchronization. To alleviate the memory bandwidth imposed by this method of communication, GPGPUs provide a small scratchpad memory that prevents intermediate values from overloading DRAM bandwidth. In this paper we introduce direct inter-thread communications for massively multithreaded CGRAs, where intermediate values are communicated directly through the compute fabric on a point-to-point basis. This method avoids the need to write values to memory, eliminates the need for a dedicated scratchpad, and avoids workgroup-global barriers. The paper introduces the programming model (CUDA) and execution model extensions, as well as the hardware primitives that facilitate the communication. Our simulations of Rodinia benchmarks running on the new system show that direct inter-thread communication provides an average speedup of 4.5x (13.5x max) and reduces system power by an average of 7x (33x max), when compared to an equivalent Nvidia GPGPU.

READ FULL TEXT

page 4

page 7

page 8

page 10

research
05/02/2022

A Real Time 1280x720 Object Detection Chip With 585MB/s Memory Traffic

Memory bandwidth has become the real-time bottleneck of current deep lea...
research
07/27/2020

CARAM: A Content-Aware Hybrid PCM/DRAM Main Memory System Framework

The emergence of Phase-Change Memory (PCM) provides opportunities for di...
research
09/21/2017

Accelerating PageRank using Partition-Centric Processing

PageRank is a fundamental link analysis algorithm and a key representati...
research
07/12/2023

Corona: System Implications of Emerging Nanophotonic Technology

We expect that many-core microprocessors will push performance per chip ...
research
09/02/2021

Construction of Inter-Group Complementary Code Set and 2-D Z-Complementary Array Code Set Based on Multivariable Functions

The need of two dimensional (2-D) arrays with good 2-D correlation prope...
research
07/15/2022

Multi-node Acceleration for Large-scale GCNs

Limited by the memory capacity and compute power, singe-node graph convo...
research
09/24/2019

A high-level characterisation and generalisation of communication-avoiding programming techniques

Today's hardware's explosion of concurrency plus the explosion of data w...

Please sign up or login with your details

Forgot password? Click here to reset