The Case for Asymmetric Systolic Array Floorplanning

09/06/2023
by   C. Peltekis, et al.
0

The widespread proliferation of deep learning applications has triggered the need to accelerate them directly in hardware. General Matrix Multiplication (GEMM) kernels are elemental deep-learning constructs and they inherently map onto Systolic Arrays (SAs). SAs are regular structures that are well-suited for accelerating matrix multiplications. Typical SAs use a pipelined array of Processing Elements (PEs), which communicate with local connections and pre-orchestrated data movements. In this work, we show that the physical layout of SAs should be asymmetric to minimize wirelength and improve energy efficiency. The floorplan of the SA adjusts better to the asymmetric widths of the horizontal and vertical data buses and their switching activity profiles. It is demonstrated that such physically asymmetric SAs reduce interconnect power by 9.1 (CNN) layers, as compared to SAs of the same size but with a square (i.e., symmetric) layout. The savings in interconnect power translate, in turn, to 2.1

READ FULL TEXT
research
08/03/2020

High Throughput Matrix-Matrix Multiplication between Asymmetric Bit-Width Operands

Matrix multiplications between asymmetric bit-width operands, especially...
research
04/25/2023

Low-Power Data Streaming in Systolic Arrays with Bus-Invert Coding and Zero-Value Clock Gating

Systolic Array (SA) architectures are well suited for accelerating matri...
research
06/19/2023

From array algebra to energy efficiency on GPUs: Data and hardware shapes with dimension-lifting to optimize memory-processor layouts

We present a new formulation for parallel matrix multiplication (MM) to ...
research
11/22/2022

ArrayFlex: A Systolic Array Architecture with Configurable Transparent Pipelining

Convolutional Neural Networks (CNNs) are the state-of-the-art solution f...
research
11/28/2021

Search for Optimal Systolic Arrays: A Comprehensive Automated Exploration Framework and Lessons Learned

Systolic arrays have been widely used for accelerating HPC and deep lear...
research
05/16/2020

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands e...
research
02/03/2022

Learning with Asymmetric Kernels: Least Squares and Feature Interpretation

Asymmetric kernels naturally exist in real life, e.g., for conditional p...

Please sign up or login with your details

Forgot password? Click here to reset