unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

03/09/2021
by   Stylianos I. Venieris, et al.
0

Single computation engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) enabling the deployment of diverse models without fabric reconfiguration. This flexibility, however, often comes with significantly reduced performance on memory-bound layers and resource underutilisation due to suboptimal mapping of certain layers on the engine's fixed configuration. In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time. We refer to these approaches as on-the-fly. To minimise the negative impact of limited bandwidth on memory-bound layers, we present a novel hardware component that enables the on-chip on-the-fly generation of weights. We further introduce an input selective processing element (PE) design that balances the load between PEs on suboptimally mapped layers. Finally, we present unzipFPGA, a framework to train on-the-fly models and traverse the design space to select the highest performing CNN engine configuration. Quantitative evaluation shows that unzipFPGA yields an average speedup of 2.14x and 71 and pruned CNN engines under constrained bandwidth and up to 3.69x higher performance density over the state-of-the-art FPGA-based CNN accelerators.

READ FULL TEXT
research
07/25/2023

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

The unprecedented accuracy of convolutional neural networks (CNNs) acros...
research
05/25/2018

f-CNN^x: A Toolflow for Mapping Multiple Convolutional Neural Networks on FPGAs

The predictive power of Convolutional Neural Networks (CNNs) has been an...
research
09/30/2016

Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have gained significant traction in...
research
03/22/2017

CNN-MERP: An FPGA-Based Memory-Efficient Reconfigurable Processor for Forward and Backward Propagation of Convolutional Neural Networks

Large-scale deep convolutional neural networks (CNNs) are widely used in...
research
02/18/2022

Towards Enabling Dynamic Convolution Neural Network Inference for Edge Intelligence

Deep learning applications have achieved great success in numerous real-...
research
03/30/2023

HARFLOW3D: A Latency-Oriented 3D-CNN Accelerator Toolflow for HAR on FPGA Devices

For Human Action Recognition tasks (HAR), 3D Convolutional Neural Networ...
research
10/18/2022

Deterministic vs. Non Deterministic Finite Automata in Automata Processing

Linear-time pattern matching engines have seen promising results using F...

Please sign up or login with your details

Forgot password? Click here to reset