Serving Recurrent Neural Networks Efficiently with a Spatial Accelerator

09/26/2019
by   Tian Zhao, et al.
0

Recurrent Neural Network (RNN) applications form a major class of AI-powered, low-latency data center workloads. Most execution models for RNN acceleration break computation graphs into BLAS kernels, which lead to significant inter-kernel data movement and resource underutilization. We show that by supporting more general loop constructs that capture design parameters in accelerators, it is possible to improve resource utilization using cross-kernel optimization without sacrificing programmability. Such abstraction level enables a design space search that can lead to efficient usage of on-chip resources on a spatial architecture across a range of problem sizes. We evaluate our optimization strategy on such abstraction with DeepBench using a configurable spatial accelerator. We demonstrate that this implementation provides a geometric speedup of 30x in performance, 1.6x in area, and 2x in power efficiency compared to a Tesla V100 GPU, and a geometric speedup of 2x compared to Microsoft Brainwave implementation on a Stratix 10 FPGA.

READ FULL TEXT
research
12/25/2020

EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

Low-latency, low-power portable recurrent neural network (RNN) accelerat...
research
07/09/2021

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

The combination of Winograd's algorithm and systolic array architecture ...
research
08/28/2022

FFCNN: Fast FPGA based Acceleration for Convolution neural network inference

We present a new efficient OpenCL-based Accelerator for large scale Conv...
research
09/18/2020

Hardware Accelerator for Multi-Head Attention and Position-Wise Feed-Forward in the Transformer

Designing hardware accelerators for deep neural networks (DNNs) has been...
research
06/14/2020

Architecture Support for FPGA Multi-tenancy in the Cloud

Cloud deployments now increasingly provision FPGA accelerators as part o...
research
05/24/2023

Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

We propose a distributed system based on lowpower embedded FPGAs designe...
research
05/07/2020

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

Convolutional Neural Networks are extensively used in a wide range of ap...

Please sign up or login with your details

Forgot password? Click here to reset