Sparse Winograd Convolutional neural networks on small-scale systolic arrays

10/03/2018
by   Feng Shi, et al.
0

The reconfigurability, energy-efficiency, and massive parallelism on FPGAs make them one of the best choices for implementing efficient deep learning accelerators. However, state-of-art implementations seldom consider the balance between high throughput of computation power and the ability of the memory subsystem to support it. In this paper, we implement an accelerator on FPGA by combining the sparse Winograd convolution, clusters of small-scale systolic arrays, and a tailored memory layout design. We also provide an analytical model analysis for the general Winograd convolution algorithm as a design reference. Experimental results on VGG16 show that it achieves very high computational resource utilization, 20x 30x energy efficiency, and more than 5x speedup compared with the dense implementation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2017

A GPU-Outperforming FPGA Accelerator Architecture for Binary Convolutional Neural Networks

FPGA-based hardware accelerators for convolutional neural networks (CNNs...
research
09/19/2023

Flip: Data-Centric Edge CGRA Accelerator

Coarse-Grained Reconfigurable Arrays (CGRA) are promising edge accelerat...
research
04/01/2020

Efficient Implementation of Multi-Channel Convolution in Monolithic 3D ReRAM Crossbar

Convolutional neural networks (CNNs) demonstrate promising accuracy in a...
research
11/07/2018

Packing Sparse Convolutional Neural Networks for Efficient Systolic Array Implementations: Column Combining Under Joint Optimization

This paper describes a novel approach of packing sparse convolutional ne...
research
05/26/2022

HashPIM: High-Throughput SHA-3 via Memristive Digital Processing-in-Memory

Recent research has sought to accelerate cryptographic hash functions as...
research
02/18/2022

EF-Train: Enable Efficient On-device CNN Training on FPGA Through Data Reshaping for Online Adaptation or Personalization

Conventionally, DNN models are trained once in the cloud and deployed in...
research
11/15/2019

Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs

Generative adversarial networks (GANs) have shown excellent performance ...

Please sign up or login with your details

Forgot password? Click here to reset