eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

10/13/2019
by   Chao-Tsung Huang, et al.
0

Convolutional neural networks (CNNs) have recently demonstrated superior quality for computational imaging applications. Therefore, they have great potential to revolutionize the image pipelines on cameras and displays. However, it is difficult for conventional CNN accelerators to support ultra-high-resolution videos at the edge due to their considerable DRAM bandwidth and power consumption. Therefore, finding a further memory- and computation-efficient microarchitecture is crucial to speed up this coming revolution. In this paper, we approach this goal by considering the inference flow, network model, instruction set, and processor design jointly to optimize hardware performance and image quality. We apply a block-based inference flow which can eliminate all the DRAM bandwidth for feature maps and accordingly propose a hardware-oriented network model, ERNet, to optimize image quality based on hardware constraints. Then we devise a coarse-grained instruction set architecture, FBISA, to support power-hungry convolution by massive parallelism. Finally,we implement an embedded processor—eCNN—which accommodates to ERNet and FBISA with a flexible processing architecture. Layout results show that it can support high-quality ERNets for super-resolution and denoising at up to 4K Ultra-HD 30 fps while using only DDR-400 and consuming 6.94W on average. By comparison, the state-of-the-art Diffy uses dual-channel DDR3-2133 and consumes 54.3W to support lower-quality VDSR at Full HD 30 fps. Lastly, we will also present application examples of high-performance style transfer and object recognition to demonstrate the flexibility of eCNN.

READ FULL TEXT

page 8

page 10

research
10/13/2019

ERNet Family: Hardware-Oriented CNN Models for Computational Imaging Using Block-Based Inference

Convolutional neural networks (CNNs) demand huge DRAM bandwidth for comp...
research
08/30/2023

ACNPU: A 4.75TOPS/W 1080P@30FPS Super Resolution Accelerator with Decoupled Asymmetric Convolution

Deep learning-driven superresolution (SR) outperforms traditional techni...
research
01/18/2018

ECA: Energy-Efficient FPGA-based Convolutional Neural Networks Architecture for Single Image Super-Resolution

Convolutional neural networks (CNN) show the excellent performance compa...
research
04/19/2021

RingCNN: Exploiting Algebraically-Sparse Ring Tensors for Energy-Efficient CNN-Based Computational Imaging

In the era of artificial intelligence, convolutional neural networks (CN...
research
05/02/2022

BSRA: Block-based Super Resolution Accelerator with Hardware Efficient Pixel Attention

Increasingly, convolution neural network (CNN) based super resolution mo...
research
04/10/2019

An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

In recent years, neural networks have surpassed classical algorithms in ...
research
08/14/2014

Cortical Processing with Thermodynamic-RAM

AHaH computing forms a theoretical framework from which a biologically-i...

Please sign up or login with your details

Forgot password? Click here to reset