CodeX: Bit-Flexible Encoding for Streaming-based FPGA Acceleration of DNNs

01/17/2019
by   Mohammad Samragh, et al.
12

This paper proposes CodeX, an end-to-end framework that facilitates encoding, bitwidth customization, fine-tuning, and implementation of neural networks on FPGA platforms. CodeX incorporates nonlinear encoding to the computation flow of neural networks to save memory. The encoded features demand significantly lower storage compared to the raw full-precision activation values; therefore, the execution flow of CodeX hardware engine is completely performed within the FPGA using on-chip streaming buffers with no access to the off-chip DRAM. We further propose a fully-automated algorithm inspired by reinforcement learning which determines the customized encoding bitwidth across network layers. CodeX full-stack framework comprises of a compiler which takes a high-level Python description of an arbitrary neural network architecture. The compiler then instantiates the corresponding elements from CodeX Hardware library for FPGA implementation. Proof-of-concept evaluations on MNIST, SVHN, and CIFAR-10 datasets demonstrate an average of 4.65x throughput improvement compared to stand-alone weight encoding. We further compare CodeX with six existing full-precision DNN accelerators on ImageNet, showing an average of 3.6x and 2.54x improvement in throughput and performance-per-watt, respectively.

READ FULL TEXT

page 1

page 2

page 5

page 6

research
02/04/2016

FPGA Based Implementation of Deep Neural Networks Using On-chip Memory Only

Deep neural networks (DNNs) demand a very large amount of computation an...
research
08/15/2019

Automatic Compiler Based FPGA Accelerator for CNN Training

Training of convolutional neural networks (CNNs)on embedded platforms to...
research
03/21/2022

Automatic Creation of High-Bandwidth Memory Architectures from Domain-Specific Languages: The Case of Computational Fluid Dynamics

Numerical simulations can help solve complex problems. Most of these alg...
research
09/03/2020

Layer-specific Optimization for Mixed Data Flow with Mixed Precision in FPGA Design for CNN-based Object Detectors

Convolutional neural networks (CNNs) require both intensive computation ...
research
01/23/2018

Stacked Filters Stationary Flow For Hardware-Oriented Acceleration Of Deep Convolutional Neural Networks

To address memory and computation resource limitations for hardware-orie...
research
08/11/2023

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

An increasing number of researchers are finding use for nth-order gradie...

Please sign up or login with your details

Forgot password? Click here to reset