FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

08/09/2023
by   Benjamin Ramhorst, et al.
0

Neural networks achieve state-of-the-art performance in image classification, speech recognition, scientific analysis and many more application areas. With the ever-increasing need for faster computation and lower power consumption, driven by real-time systems and Internet-of-Things (IoT) devices, FPGAs have emerged as suitable devices for deep learning inference. Due to the high computational complexity and memory footprint of neural networks, various compression techniques, such as pruning, quantization and knowledge distillation, have been proposed in literature. Pruning sparsifies a neural network, reducing the number of multiplications and memory. However, pruning often fails to capture properties of the underlying hardware, causing unstructured sparsity and load-balance inefficiency, thus bottlenecking resource improvements. We propose a hardware-centric formulation of pruning, by formulating it as a knapsack problem with resource-aware tensor structures. The primary emphasis is on real-time inference, with latencies in the order of 1μs, accelerated with hls4ml, an open-source framework for deep learning inference on FPGAs. Evaluated on a range of tasks, including real-time particle classification at CERN's Large Hadron Collider and fast image classification, the proposed method achieves a reduction ranging between 55 utilization of digital signal processing blocks (DSP) and up to 81 memory (BRAM) utilization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/14/2020

Quantisation and Pruning for Neural Network Compression and Regularisation

Deep neural networks are typically too computationally expensive to run ...
research
08/13/2021

Pruning vs XNOR-Net: A Comprehensive Study of Deep Learning for Audio Classification on Edge-devices

Deep Learning has celebrated resounding successes in many application ar...
research
12/11/2021

CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural Networks

We propose a novel hardware-aware magnitude pruning technique for cohere...
research
01/22/2022

Enabling Deep Learning on Edge Devices through Filter Pruning and Knowledge Transfer

Deep learning models have introduced various intelligent applications to...
research
12/29/2015

Structured Pruning of Deep Convolutional Neural Networks

Real time application of deep learning algorithms is often hindered by h...
research
09/05/2023

Dynamic Early Exiting Predictive Coding Neural Networks

Internet of Things (IoT) sensors are nowadays heavily utilized in variou...
research
11/30/2022

Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution

Embedded and IoT devices, largely powered by microcontroller units (MCUs...

Please sign up or login with your details

Forgot password? Click here to reset