Scale-out Systolic Arrays

03/22/2022
by   Ahmet Caner Yüzügüler, et al.
2

Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt (i.e., throughput/Watt adjusted when accounting for array utilization) poses a unique set of challenges. In this work, we study three key pillars in multi-pod systolic array designs, namely array granularity, interconnect, and tiling. We identify optimal array granularity across workloads and show that state-of-the-art commercial accelerators use suboptimal array sizes for single-tenancy workloads. We, then evaluate the bandwidth/latency trade-offs in interconnects and show that Butterfly networks offer a scalable topology for accelerators with a large number of pods. Finally, we introduce a novel data tiling scheme with custom partition size to maximize utilization in optimally sized pods. We propose Scale-out Systolic Arrays, a multi-pod inference accelerator for both single- and multi-tenancy based on these three pillars. We show that SOSA exhibits scaling of up to 600 TeraOps/s in effective throughput for state-of-the-art DNN inference workloads, and outperforms state-of-the-art multi-pod accelerators by a factor of 1.5x.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2023

Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips

Two distinguishing features of state-of-the-art mobile and autonomous sy...
research
06/24/2020

On the Difficulty of Designing Processor Arrays for Deep Neural Networks

Systolic arrays are a promising computing concept which is in particular...
research
10/05/2021

RASA: Efficient Register-Aware Systolic Array Matrix Engine for CPU

As AI-based applications become pervasive, CPU vendors are starting to i...
research
08/10/2022

A Fresh Perspective on DNN Accelerators by Performing Holistic Analysis Across Paradigms

Traditional computers with von Neumann architecture are unable to meet t...
research
10/16/2018

SCALE-Sim: Systolic CNN Accelerator

Systolic Arrays are one of the most popular compute substrates within De...
research
09/03/2021

SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators

Ultra-fast & low-power superconductor single-flux-quantum (SFQ)-based CN...
research
10/16/2018

SCALE-Sim: Systolic CNN Accelerator Simulator

Systolic Arrays are one of the most popular compute substrates within De...

Please sign up or login with your details

Forgot password? Click here to reset