MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing

03/11/2021
by   Xinfeng Xie, et al.
0

With the growing number of data-intensive workloads, GPU, which is the state-of-the-art single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth wall. To alleviate this bottleneck, previously proposed 3D-stacking near-bank computing accelerators benefit from abundant bank-internal bandwidth by bringing computations closer to the DRAM banks. However, these accelerators are specialized for certain application domains with simple architecture data paths and customized software mapping schemes. For general purpose scenarios, lightweight hardware designs for diverse data paths, architectural supports for the SIMT programming model, and end-to-end software optimizations remain challenging. To address these issues, we propose MPU (Memory-centric Processing Unit), the first SIMT processor based on 3D-stacking near-bank computing architecture. First, to realize diverse data paths with small overheads while leveraging bank-level bandwidth, MPU adopts a hybrid pipeline with the capability of offloading instructions to near-bank compute-logic. Second, we explore two architectural supports for the SIMT programming model, including a near-bank shared memory design and a multiple activated row-buffers enhancement. Third, we present an end-to-end compilation flow for MPU to support CUDA programs. To fully utilize MPU's hybrid pipeline, we develop a backend optimization for the instruction offloading decision. The evaluation results of MPU demonstrate 3.46x speedup and 2.57x energy reduction compared with an NVIDIA Tesla V100 GPU on a set of representative data-intensive workloads.

READ FULL TEXT
research
12/25/2022

CINM (Cinnamon): A Compilation Infrastructure for Heterogeneous Compute In-Memory and Compute Near-Memory Paradigms

The rise of data-intensive applications exposed the limitations of conve...
research
10/19/2020

Virtual Secure Platform: A Five-Stage Pipeline Processor over TFHE

We present Virtual Secure Platform (VSP), the first comprehensive platfo...
research
05/08/2021

PIM-DRAM: Accelerating Machine Learning Workloads using Processing in Commodity DRAM

Deep Neural Networks (DNNs) have transformed the field of machine learni...
research
01/23/2017

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

High-performance computing systems are moving towards 2.5D and 3D memory...
research
11/05/2021

PIM-Enclave: Bringing Confidential Computation Inside Memory

Demand for data-intensive workloads and confidential computing are the p...
research
05/24/2018

GIRAF: General purpose In-storage Resistive Associative Framework

GIRAF is an in-storage architecture and algorithm framework based on Res...
research
04/15/2021

SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems

Simple graph algorithms such as PageRank have been the target of numerou...

Please sign up or login with your details

Forgot password? Click here to reset