SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators

09/03/2021
by   Farzaneh Zokaee, et al.
0

Ultra-fast & low-power superconductor single-flux-quantum (SFQ)-based CNN systolic accelerators are built to enhance the CNN inference throughput. However, shift-register (SHIFT)-based scratchpad memory (SPM) arrays prevent a SFQ CNN accelerator from exceeding 40% of its peak throughput, due to the lack of random access capability. This paper first documents our study of a variety of cryogenic memory technologies, including Vortex Transition Memory (VTM), Josephson-CMOS SRAM, MRAM, and Superconducting Nanowire Memory, during which we found that none of the aforementioned technologies made a SFQ CNN accelerator achieve high throughput, small area, and low power simultaneously. Second, we present a heterogeneous SPM architecture, SMART, composed of SHIFT arrays and a random access array to improve the inference throughput of a SFQ CNN systolic accelerator. Third, we propose a fast, low-power and dense pipelined random access CMOS-SFQ array by building SFQ passive-transmission-line-based H-Trees that connect CMOS sub-banks. Finally, we create an ILP-based compiler to deploy CNN models on SMART. Experimental results show that, with the same chip area overhead, compared to the latest SHIFT-based SFQ CNN accelerator, SMART improves the inference throughput by 3.9× (2.2×), and reduces the inference energy by 86% (71%) when inferring a single image (a batch of images).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2020

Memory-Efficient Dataflow Inference for Deep CNNs on FPGA

Custom dataflow Convolutional Neural Network (CNN) inference accelerator...
research
01/18/2018

ECA: Energy-Efficient FPGA-based Convolutional Neural Networks Architecture for Single Image Super-Resolution

Convolutional neural networks (CNN) show the excellent performance compa...
research
01/18/2018

An Energy-Efficient FPGA-based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution

Convolutional neural networks (CNNs) demonstrate excellent performance a...
research
08/24/2020

Tearing Down the Memory Wall

We present a vision for the Erudite architecture that redefines the comp...
research
03/22/2022

Scale-out Systolic Arrays

Multi-pod systolic arrays are emerging as the architecture of choice in ...
research
12/04/2021

IMCRYPTO: An In-Memory Computing Fabric for AES Encryption and Decryption

This paper proposes IMCRYPTO, an in-memory computing (IMC) fabric for ac...
research
04/10/2020

SMART Paths for Latency Reduction in ReRAM Processing-In-Memory Architecture for CNN Inference

This research work proposes a design of an analog ReRAM-based PIM (proce...

Please sign up or login with your details

Forgot password? Click here to reset