ACNPU: A 4.75TOPS/W 1080P@30FPS Super Resolution Accelerator with Decoupled Asymmetric Convolution

08/30/2023
by   Tun-Hao Yang, et al.
0

Deep learning-driven superresolution (SR) outperforms traditional techniques but also faces the challenge of high complexity and memory bandwidth. This challenge leads many accelerators to opt for simpler and shallow models like FSRCNN, compromising performance for real-time needs, especially for resource-limited edge devices. This paper proposes an energy-efficient SR accelerator, ACNPU, to tackle this challenge. The ACNPU enhances image quality by 0.34dB with a 27-layer model, but needs 36% less complexity than FSRCNN, while maintaining a similar model size, with the decoupled asymmetric convolution and split-bypass structure. The hardware-friendly 17K-parameter model enables holistic model fusion instead of localized layer fusion to remove external DRAM access of intermediate feature maps. The on-chip memory bandwidth is further reduced with the input stationary flow and parallel-layer execution to reduce power consumption. Hardware is regular and easy to control to support different layers by processing elements (PEs) clusters with reconfigurable input and uniform data flow. The implementation in the 40 nm CMOS process consumes 2333 K gate counts and 198KB SRAMs. The ACNPU achieves 31.7 FPS and 124.4 FPS for x2 and x4 scales Full-HD generation, respectively, which attains 4.75 TOPS/W energy efficiency.

READ FULL TEXT

page 1

page 3

research
05/09/2022

A Real Time Super Resolution Accelerator with Tilted Layer Fusion

Deep learning based superresolution achieves high-quality results, but i...
research
05/02/2022

BSRA: Block-based Super Resolution Accelerator with Hardware Efficient Pixel Attention

Increasingly, convolution neural network (CNN) based super resolution mo...
research
10/13/2019

eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

Convolutional neural networks (CNNs) have recently demonstrated superior...
research
01/18/2018

On-Chip CNN Accelerator for Image Super-Resolution

To implement convolutional neural networks (CNN) in hardware, the state-...
research
05/19/2021

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Deep convolutional neural networks have achieved remarkable progress in ...
research
11/08/2017

Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

Many modern video processing pipelines rely on edge-aware (EA) filtering...
research
07/06/2021

Energy-Efficient Accelerator Design for Deformable Convolution Networks

Deformable convolution networks (DCNs) proposed to address the image rec...

Please sign up or login with your details

Forgot password? Click here to reset