Towards Fast and Energy-Efficient Binarized Neural Network Inference on FPGA

10/04/2018
by   Cheng Fu, et al.
0

Binarized Neural Network (BNN) removes bitwidth redundancy in classical CNN by using a single bit (-1/+1) for network parameters and intermediate representations, which has greatly reduced the off-chip data transfer and storage overhead. However, a large amount of computation redundancy still exists in BNN inference. By analyzing local properties of images and the learned BNN kernel weights, we observe an average of ∼78 and ∼59 metric in common network architectures. Thus there does exist redundancy that can be exploited to further reduce the amount of on-chip computations. Motivated by the observation, in this paper, we proposed two types of fast and energy-efficient architectures for BNN inference. We also provide analysis and insights to pick the better strategy of these two for different datasets and network models. By reusing the results from previous computation, much cycles for data buffer access and computations can be skipped. By experiments, we demonstrate that 80 skipped by exploiting BNN similarity. Thus, our design can achieve 17 reduction in total power consumption, 54 consumption and 2.4× maximum speedup, compared to the baseline without applying our reuse technique. Our design also shows 1.9× more area-efficiency compared to state-of-the-art BNN inference design. We believe our deployment of BNN on FPGA leads to a promising future of running deep learning models on mobile devices.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2019

LUTNet: Learning FPGA Configurations for Highly Efficient Neural Network Inference

Research has shown that deep neural networks contain significant redunda...
research
05/25/2021

DTNN: Energy-efficient Inference with Dendrite Tree Inspired Neural Networks for Edge Vision Applications

Deep neural networks (DNN) have achieved remarkable success in computer ...
research
03/25/2023

Energy-efficient Task Adaptation for NLP Edge Inference Leveraging Heterogeneous Memory Architectures

Executing machine learning inference tasks on resource-constrained edge ...
research
02/25/2023

Efficient Multitask Learning on Resource-Constrained Systems

We present Antler, which exploits the affinity between all pairs of task...
research
02/01/2022

Sense: Model Hardware Co-design for Accelerating Sparse Neural Networks

Sparsity is an intrinsic property of neural network(NN). Many software r...
research
05/21/2018

Streaming MANN: A Streaming-Based Inference for Energy-Efficient Memory-Augmented Neural Networks

With the successful development of artificial intelligence using deep le...
research
02/06/2020

Low Overhead Online Data Flow Tracking for Intermittently Powered Non-volatile FPGAs

Energy harvesting is an attractive way to power future IoT devices since...

Please sign up or login with your details

Forgot password? Click here to reset