On-Chip CNN Accelerator for Image Super-Resolution

01/18/2018
by   Jung-Woo Chang, et al.
0

To implement convolutional neural networks (CNN) in hardware, the state-of-the-art CNN accelerators pipeline computation and data transfer stages using an off-chip memory and simultaneously execute them on the same timeline. However, since a large amount of feature maps generated during the operation should be transmitted to the off-chip memory, the pipeline stage length is determined by the off-chip data transfer stage. Fusion architectures that can fuse multiple layers have been proposed to solve this problem, but applications such as super-resolution (SR) require a large amount of an on-chip memory because of the high resolution of the feature maps. In this paper, we propose a novel on-chip CNN accelerator for SR to optimize the CNN dataflow in the on-chip memory. First, the convolution loop optimization technique is proposed to prevent using a frame buffer. Second, we develop a combined convolutional layer processor to reduce the buffer size used to store the feature maps. Third, we explore how to perform low-cost multiply-and-accumulate operations in the deconvolutional layer used in SR. Finally, we propose a two-stage quantization algorithm to select the optimized hardware size for the limited number of DSPs to implement the on-chip CNN accelerator. We evaluate our proposed accelerator with FSRCNN, which is most popular as the CNN-based SR algorithm. Experimental results show that the proposed accelerator requires 9.21ms to achieve an output image with the 2560x1440 pixel resolution, which is 36 times faster than the conventional method. In addition, we reduce the on-chip memory usage and DSP usage by 4 times and 1.44 times, respectively, compared to the conventional methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/18/2018

ECA: Energy-Efficient FPGA-based Convolutional Neural Networks Architecture for Single Image Super-Resolution

Convolutional neural networks (CNN) show the excellent performance compa...
research
10/12/2021

Memory-Efficient CNN Accelerator Based on Interlayer Feature Map Compression

Existing deep convolutional neural networks (CNNs) generate massive inte...
research
05/19/2021

Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA

Deep convolutional neural networks have achieved remarkable progress in ...
research
01/18/2018

An Energy-Efficient FPGA-based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution

Convolutional neural networks (CNNs) demonstrate excellent performance a...
research
12/22/2022

AoCStream: All-on-Chip CNN Accelerator With Stream-Based Line-Buffer Architecture

Convolutional neural network (CNN) accelerators are being widely used fo...
research
08/30/2023

ACNPU: A 4.75TOPS/W 1080P@30FPS Super Resolution Accelerator with Decoupled Asymmetric Convolution

Deep learning-driven superresolution (SR) outperforms traditional techni...
research
05/09/2022

A Real Time Super Resolution Accelerator with Tilted Layer Fusion

Deep learning based superresolution achieves high-quality results, but i...

Please sign up or login with your details

Forgot password? Click here to reset