An Energy-Efficient FPGA-based Deconvolutional Neural Networks Accelerator for Single Image Super-Resolution
Convolutional neural networks (CNNs) demonstrate excellent performance as compared to conventional machine learning algorithms in various computer vision applications. In recent years, FPGA-based CNN accelerators have been proposed for optimizing performance and power efficiency. Most accelerators are designed for object detection and recognition algorithms that are performed on low-resolution (LR) images. However, image super-resolution (SR) cannot be implemented in real time based on the typical accelerator because of the long execution cycles required to generate high-resolution (HR) images, such as those used in ultra-high-definition (UHD) systems. In this paper, we propose a novel CNN accelerator with efficient parallelization methods for SR applications. First, we propose a new methodology for optimizing the deconvolutional neural networks (DCNNs) used for increasing feature maps, based on trained filters. Second, we propose a novel method to optimize the CNN dataflow using on-chip memory so that the SR algorithm can be driven at low power in display applications. Third, we propose a two-stage quantization algorithm to determine the optimized hardware size for a limited number of DSPs. Finally, we present an energy-efficient architecture for SR and validate our architecture on a mobile panel with quad-high-definition (QHD) resolution. Our experimental results show that, with the same hardware resources, the proposed DCNN accelerator achieves a throughput up to 108 times greater than that of the conventional DCNN accelerator. In addition, our SR system achieves an energy efficiency of 92.7 GOPS/W, 173.5 GOPS/W, and 286.8 GOPS/W when the scale factors for SR are 2, 3, and 4, respectively. Furthermore, we demonstrate that our system can restore HR images with a higher peak signal-to-noise-ratio (PSNR) than conventional SR systems.
READ FULL TEXT