SRZoo: An integrated repository for super-resolution using deep learning

06/02/2020
by   Jun-Ho Choi, et al.
Yonsei University
0

Deep learning-based image processing algorithms, including image super-resolution methods, have been proposed with significant improvement in performance in recent years. However, their implementations and evaluations are dispersed in terms of various deep learning frameworks and various evaluation criteria. In this paper, we propose an integrated repository for the super-resolution tasks, named SRZoo, to provide state-of-the-art super-resolution models in a single place. Our repository offers not only converted versions of existing pre-trained models, but also documentation and toolkits for converting other models. In addition, SRZoo provides platform-agnostic image reconstruction tools to obtain super-resolved images and evaluate the performance in place. It also brings the opportunity of extension to advanced image-based researches and other image processing models. The software, documentation, and pre-trained models are publicly available on GitHub.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

02/16/2019

Deep Learning for Image Super-resolution: A Survey

Image Super-Resolution (SR) is an important class of image processing te...
06/08/2021

Variational AutoEncoder for Reference based Image Super-Resolution

In this paper, we propose a novel reference based image super-resolution...
09/23/2019

LISR: Image Super-resolution under Hardware Constraints

We investigate the image super-resolution problem by considering the pow...
04/30/2020

Real-World Textured Things: a Repository of Textured Models Generated with Modern Photo-Reconstruction Tools

We are witnessing a proliferation of textured 3D models captured from th...
11/30/2018

Lightweight and Efficient Image Super-Resolution with Block State-based Recursive Network

Recently, several deep learning-based image super-resolution methods hav...
04/11/2021

Edge-Aware Image Compression using Deep Learning-based Super-resolution Network

We propose a learning-based compression scheme that envelopes a standard...
04/05/2019

Controlling Neural Networks via Energy Dissipation

The last decade has shown a tremendous success in solving various comput...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In recent years, the performance of image processing algorithms has been significantly improved thanks to the development of deep learning-based approaches. Massively enhanced computing environments power them in terms of both hardware (e.g., accelerated calculation by graphics processing units (GPUs)) and software (e.g., open-sourced versatile deep learning frameworks). Many image-based tasks such as object classification, object detection, super-resolution, and image enhancement benefit from such improvements.

When introducing a new image processing algorithm, it is necessary to compare its performance with that of the other state-of-the-art algorithms to prove its improvement. Various performance measures such as computation time and quantitative errors can be considered. To make a fair comparison, it is necessary to run the algorithms in the same environment and use the same criteria. However, we observe that such a fair comparison is not well employed for enhancement models such as super-resolution, unlike image classification and detection algorithms, because of the following reasons.

First, several integrated repositories provide state-of-the-art image classification and detection models, e.g., Keras Applications

[6]

and TensorFlow Detection Model Zoo

[15]. However, there are very few repositories for the enhancement models, which makes employing such models harder than the classification models. There exist some repositories for deep super-resolution models, e.g., VideoSuperResolution [16] and super-resolution [14]. However, they do not contain the official pre-trained models provided by the original authors.

Fig. 1: Overall structure of SRZoo

In addition, the performance evaluation metrics for the enhancement models are less standardized than those for the classification models. In the classification tasks, the performance of different models has been justly compared using the same criteria, since the metrics are usually straightforward (e.g., counting the number of misclassified images). However, in super-resolution tasks, we observe that the performance of the models is often evaluated using different criteria. For example, different color spaces (e.g., RGB and YCbCr), different measurement implementations (e.g., different implementations of calculating structural similarity (SSIM)), and different post-processing methods (e.g., shaving edges and applying geometric self-ensemble

[8]

) are used. Moreover, we find that most super-resolution methods are implemented on deep learning frameworks such as TensorFlow and PyTorch but are evaluated separately on MATLAB.

Motivated by these observations, we decide to develop an integrated repository for deep image super-resolution models, which provides various state-of-the-art pre-trained models that are ready to be deployed, and also additional useful tools. Image super-resolution is one of the most evolutive research topics in recent days, which can be used in various applications such as surveillance, content streaming, and medical diagnosis [20]. Thus, our repository can catalyze to apply the state-of-the-art super-resolution models to such applications easily. In addition, it also assists researchers to evaluate the performance with the same evaluation criteria.

Fig. 1 shows the overall structure of the proposed repository, which is named as SRZoo, the model zoo for the super-resolution models. In this paper, we present the motivations, main features, performance comparison, and applications of our work in Sections 2, 3, 4, and 5, respectively.

2 Motivations

2.1 Models with different frameworks

The image super-resolution methods proposed in literature have been implemented in various deep learning frameworks such as Caffe

[5], PyTorch [21, 17, 1], and TensorFlow [7]. Therefore, evaluating their performance requires setting up all of these frameworks in the same computing environment for a fair comparison. In addition, when one needs to use the models to build an extension of the super-resolution task, it is necessary to implement it repeatedly for all the frameworks. Examples of the extension include image classification using super-resolved images [4] and robustness analysis of super-resolution against adversarial perturbations [3].

For the image classification and detection tasks, there exist integrated repositories that provide various state-of-the-art classification models on a single deep learning framework. For example, Keras Applications [6] provides various deep image classification models with pre-trained weights. It enables researchers to employ or modify the models for purposes rather than the original image classification task.

2.2 Models with different evaluation criteria

Method Color channels Shaving edges (pixels) Image precision Self-ensemble
EDSR [8] Y (YCbCr) Integer (8-bit) Yes
EDSR [8] RGB Integer (8-bit) Yes
RCAN [21] Y (YCbCr) Float Yes
ESRGAN [17] Y (YCbCr) Float No
RRDB [17] Y (YCbCr) Float No
CARN [1] Y (YCbCr) Float No
  •  On the DIV2K dataset

  •  The number of pixels corresponding to the upscaling factor

Table 1: Conditions used for evaluating popular super-resolution methods.

When developing a new algorithm, it is necessary to conduct performance evaluation under the same condition for both the new and existing algorithms. However, we find that many super-resolution methods are evaluated with different criteria, which is often overlooked when the performance comparison is reported. Table 1 shows the evaluation methods used for some popular super-resolution models. For instance, some super-resolution performance results are reported with the images converted to floating-point pixel values (e.g., using the “” function in MATLAB), whereas some other results are obtained from the original integer-point pixel values. In addition, for calculating the SSIM values [18], some super-resolution methods employ the function included in MATLAB, while some others employ the code provided by the original authors, which produce slightly different results.

In the case of the super-resolution models, it is harder to standardize the evaluation criteria in comparison to the image classification models. As the evaluation criterion of image classification, the success rate of classification such as top- error is usually used, which is straightforward and clearly defined. However, there are multiple options to alter the evaluation procedure of super-resolution. For example, there exist various configurations (e.g., color space, shaving edges) and evaluation methods (e.g., calculating pixel-wise differences, measuring structural similarity, considering perceptual awareness). These factors make it difficult to compare different algorithms under the same condition.

Reported Measured
Models (year) PSNR (dB) SSIM PSNR (dB) SSIM NIQE Running time (CPU, s) Running time (GPU, s)
ESRGAN (2018) [17] N/A N/A 25.31 0.6502 3.664   8.236 0.106
EDSR-baseline (2017) [8] N/A N/A 27.57 0.7357 5.913   1.018 0.015
CARN (2018) [1] 27.58 0.7349 27.58 0.7358 6.071   1.006 0.017
FRSR (2019) [13] 27.60 0.7366 27.61 0.7374 5.853   1.284 0.028
EUSR (2018) [7] 27.69 0.739 27.69 0.7403 5.964   1.790 0.036
EDSR (2017) [8] 27.71 0.7420 27.73 0.7422 5.884   8.806 0.133
EUSR+ (2018) [7] 27.74 0.741 27.75 0.7415 6.019 14.223 0.249
RCAN (2018) [21] 27.77 0.7436 27.75 0.7432 5.921   6.607 0.102
EDSR+ (2017) [8] 27.79 0.7437 27.81 0.7439 5.981 68.570 0.977
RCAN+ (2018) [21] 27.85 0.7455 27.83 0.7451 6.013 52.451 0.800
RRDB (2018) [17] 27.85 0.7455 27.85 0.7455 5.967   8.105 0.106
RRDB (2018) [17] + self-ensemble N/A N/A 27.90 0.7466 6.008 65.631 0.794
Table 2: Performance comparison of selected super-resolution models included in SRZoo, where a scaling factor of 4 is used on the BSD100 dataset [9]. The models are sorted in terms of the measured PSNR values. The running time is measured with Intel i7-7700K (CPU) and NVIDIA GeForce GTX 1080 (GPU).

3 Key Features of SRZoo

We implement SRZoo by using TensorFlow with Python to achieve the three principal features with ensuring compatibility with various hardware platforms, which are explained in this section.

3.1 Model conversion

The main objective of SRZoo is to provide various state-of-the-art deep learning-based super-resolution models in the same repository. It enables researchers to evaluate the performance of the “official” pre-trained models, i.e., the model parameters provided by the original authors, under the same condition. To do this, SRZoo provides 26 pre-trained super-resolution models222As of October 2019; more to be added if available.. They are initially implemented in different deep learning frameworks, including PyTorch and TensorFlow. These models are converted for SRZoo using open-sourced model conversion tools, e.g., pytorch2keras [12] to convert PyTorch-based models. Since these tools are optimized for the image classification models, we modify them to support converting the super-resolution models. We provide the modified code along with documentation to enable the conversion of the other models.

3.2 Model configuration

SRZoo supports various model configurations for the super-resolution tasks. For instance, SRZoo supports various upscaling factors, e.g., , , , and . In addition, it supports the geometric self-ensemble method, which is widely used for performance boosting during test [8, 21, 7]. Thus, it is possible to apply the method to the super-resolution models that do not employ it originally.

3.3 Performance evaluation

Although most recent super-resolution methods are implemented in deep learning frameworks rather than MATLAB, almost every method is separately evaluated with MATLAB-based codes. In addition, as we aforementioned in Section 2.2, different evaluation criteria are used to compare the performance directly. To alleviate this, SRZoo provides Python-based evaluation codes to measure the performance of various super-resolution methods in one place and compare them more equitably than before. It enables to evaluate the performance within the same environment in terms of both hardware and software configurations. Besides, it is easy to add a new evaluation metric, which can be done by adding a new “Evaluator” class to SRZoo. Furthermore, SRZoo also supports comparing the models in terms of computation speed, along with the image quality.

4 Performance comparison

(a) (b)
Fig. 2: Performance comparison of super-resolution models using SRZoo. (a) PSNR vs. NIQE (b) PSNR vs. running time on a CPU

Since the state-of-the-art super-resolution models are included in the single repository, it is possible to make a fair comparison of their performance with SRZoo. Table 2 shows a performance comparison of selected super-resolution models included in SRZoo. A total of 100 images in BSD100 [9]

are used, which is one of the widely used benchmarking datasets for super-resolution. The models are run on Intel i7-7700K (CPU) and NVIDIA GeForce GTX 1080 (GPU). The peak signal-to-noise ratio (PSNR), SSIM, and natural image quality evaluator (NIQE)

[11] values are measured for the super-resolved outputs. We follow the most widely used conditions in Table 1. In particular, we calculate the quality metrics on the Y channel of the YCbCr color space with the floating-point precision and shave the image edges with the number of pixels corresponding to the upscaling factor.

We also include the quality metrics that are reported from the original papers of the compared models in Table 1. Note that the PSNR and SSIM values of ESRGAN and EDSR-baseline are not reported in the original papers. The differences between the reported and measured metrics justify the necessity of SRZoo for a fair comparison of the models with the same criteria. In addition, SRZoo can evaluate the models with metrics that are not used in the original papers, including NIQE and running time. Furthermore, SRZoo can boost the performance of the existing super-resolution models. For instance, RRDB with employing the geometric self-ensemble method shows the best performance in terms of PSNR and SSIM, even though the original implementation does not employ the self-ensemble method.

Fig. 2 depicts the performance comparison of the models. We exclude ESRGAN, whose PSNR value is much lower than those of the others. Note that a smaller value of NIQE means better performance. In Fig. 2a, it is observed that there is no significant correlation between the PSNR and NIQE values. In other words, models having larger PSNR values (i.e., less distortion) are not guaranteed to have smaller NIQE values (i.e., higher perceptual quality), as also noted in [2]. Besides, employing geometric self-ensemble is harmful to perceptual quality, even though it is beneficial to reduce the distortion. Fig. 2b shows that the models producing images having higher quality tend to have higher computational complexity. Nevertheless, the RRDB model shows better performance than EDSR+ and RCAN+ in terms of both PSNR and running time. These results demonstrate that SRZoo enables to analyze the performance of the super-resolution models comprehensively.

5 Applications

5.1 Platform-agnostic image reconstruction

(a) (b)
Fig. 3: Image super-resolution using SRZoo running on (a) a desktop web browser and (b) a mobile web browser

SRZoo provides the pre-trained super-resolution models in a “ready-for-deployment” condition. For various applications of super-resolution, SRZoo allows obtaining super-resolved images regardless of the specific platforms. For example, the models in SRZoo can be deployed on the mobile platform via TensorFlow Lite or TensorFlow.js. Fig. 3 shows example showcases where the super-resolved output for a given image is produced on both desktop and mobile web browsers. In addition, the capability of platform-agnostic image reconstruction also allows evaluating the performance of the super-resolution models on various platforms.

5.2 Extension to advanced topics

Along with employing the super-resolution models as a standalone application, it is also possible to employ them as a part of other tasks. For instance, super-resolution can be used as a pre-processing tool to enhance the performance of other image-related tasks, e.g., image classification and image captioning

[19]. Since SRZoo provides a separate class for loading the models, it can serve as a “plugin” for such tasks without any additional work. In addition, the provided super-resolution models can be used to thoroughly analyze their intermediate processes by computing the features and gradients. One example use case is to examine the robustness of the super-resolution models by adding a small perturbation to a given input image, which erroneously deteriorates the super-resolved outputs [3]. Therefore, SRZoo can be employed as a testbed to analyze and refine the state-of-the-art deep learning-based super-resolution methods.

5.3 Extension to other manipulation models

SRZoo can be easily extended to other image processing tasks thanks to a well-structured architecture. There are various image manipulation algorithms, including image deblurring, image style transfer, and image compression or decompression. These take an input image and produce an output image having improved quality or different characteristics. Since SRZoo is developed to deal with models considering images as both inputs and outputs (while image classification models are not), it can be used to employ such algorithms with only a few modifications. Similar to the super-resolution models, only a pre-trained model with a configuration file that contains properties of the model are required. As a proof-of-concept, we provide a pre-trained deep learning-based image compression model [10] in SRZoo. SRZoo considers that the model as an image processing algorithm having an upscaling factor of 1; thus, no modification of the original SRZoo code is necessary.

6 Conclusion

In this paper, we proposed SRZoo, which is developed for deep learning-based state-of-the-art super-resolution models. SRZoo aims to provide an integrated repository that contains various pre-trained super-resolution models, which are ready for deployment, and evaluation tools run in one place. It enables us to overcome the limitations in employing various super-resolution methods implemented in different deep learning frameworks and to perform a fair comparison of the models using the same evaluation criteria. We showed the main features of SRZoo in terms of handy model conversion, various model configurations, and fair performance evaluation. In addition, we suggested possible applications of SRZoo.

References

  • [1] N. Ahn, B. Kang, and K. Sohn (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. In

    Proceedings of the European Conference on Computer Vision

    ,
    pp. 252–268. Cited by: §2.1, Table 1, Table 2.
  • [2] Y. Blau and T. Michaeli (2018) The perception-distortion tradeoff. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    ,
    pp. 6228–6237. Cited by: §4.
  • [3] J. Choi, H. Zhang, J. Kim, C. Hsieh, and J. Lee (2019) Evaluating robustness of deep image super-resolution against adversarial attacks. In Proceedings of the IEEE International Conference on Computer Vision, Cited by: §2.1, §5.2.
  • [4] S. Hao, W. Wang, Y. Ye, E. Li, and L. Bruzzone (2018) A deep network architecture for super-resolution-aided hyperspectral image classification with classwise loss. IEEE Transactions on Geoscience and Remote Sensing 56 (8), pp. 4650–4663. Cited by: §2.1.
  • [5] Z. Hui, X. Wang, and X. Gao (2018) Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 723–731. Cited by: §2.1.
  • [6] () Keras Applications. Note: https://keras.io/applications/ Cited by: §1, §2.1.
  • [7] J. Kim and J. Lee (2018) Deep residual network with enhanced upscaling module for super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 913–921. Cited by: §2.1, Table 2, §3.2.
  • [8] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee (2017) Enhanced deep residual networks for single image super-resolution. In Proccedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144. Cited by: §1, Table 1, Table 2, §3.2.
  • [9] D. Martin, C. Fowlkes, D. Tal, and J. Malik (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proceedings of the IEEE International Conference on Computer Vision, pp. 416–423. Cited by: Table 2, §4.
  • [10] F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool (2018)

    Conditional probability models for deep image compression

    .
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4394–4402. Cited by: §5.3.
  • [11] A. Mittal, R. Soundararajan, and A. C. Bovik (2013) Making a “completely blind” image quality analyzer. IEEE Signal Processing Letters 20 (3), pp. 209–212. Cited by: §4.
  • [12] () Pytorch2keras. Note: https://github.com/nerox8664/pytorch2keras Cited by: §3.1.
  • [13] J. W. Soh, G. Y. Park, J. Jo, and N. I. Cho (2019) Natural and realistic single image super-resolution with explicit natural manifold discrimination. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8122–8131. Cited by: Table 2.
  • [14] () Super-resolution. Note: https://github.com/icpm/super-resolution Cited by: §1.
  • [15] () TensorFlow detection model zoo. Note: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md Cited by: §1.
  • [16] () Video Super Resolution. Note: https://github.com/LoSealL/VideoSuperResolution Cited by: §1.
  • [17] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. C. Loy (2018) ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision Workshops, pp. 63–79. Cited by: §2.1, Table 1, Table 2.
  • [18] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13 (4), pp. 600–612. Cited by: §2.2.
  • [19] M. Yin, Y. Zhang, X. Li, and S. Wang (2018) When deep fool meets deep prior: Adversarial attack on super-resolution network. In Proceedings of the ACM International Conference on Multimedia Conference, pp. 1930–1938. Cited by: §5.2.
  • [20] L. Yue, H. Shen, J. Li, Q. Yuan, H. Zhang, and L. Zhang (2016) Image super-resolution: The techniques, applications, and future. Signal Processing 128, pp. 389–408. Cited by: §1.
  • [21] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, pp. 286–301. Cited by: §2.1, Table 1, Table 2, §3.2.