From General to Specific: Online Updating for Blind Super-Resolution

07/06/2021 ∙ by Shang Li, et al. ∙ 8

Most deep learning-based super-resolution (SR) methods are not image-specific: 1) They are exhaustively trained on datasets synthesized by predefined blur kernels (bicubic), regardless of the domain gap with test images. 2) Their model weights are fixed during testing, which means that test images with various degradations are super-resolved by the same set of weights. However, degradations of real images are various and unknown (blind SR). It is hard for a single model to perform well in all cases. To address these issues, we propose an online super-resolution (ONSR) method. It does not rely on predefined blur kernels and allows the model weights to be updated according to the degradation of the test image. Specifically, ONSR consists of two branches, namely internal branch (IB) and external branch (EB). IB could learn the specific degradation of the given test LR image, and EB could learn to super resolve images degraded by the learned degradation. In this way, ONSR could customize a specific model for each test image, and thus could be more tolerant with various degradations in real applications. Extensive experiments on both synthesized and real-world images show that ONSR can generate more visually favorable SR results and achieve state-of-the-art performance in blind SR.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

page 13

page 14

page 16

page 17

page 18

page 22

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image super-resolution (SISR) aims to reconstruct a plausible high-resolution (HR) image from its low-resolution (LR) counterpart. As a fundamental vision task, it has been widely applied in video enhancement, medical imaging and surveillance imaging. Mathematically, the HR image and LR image are related by a degradation model

(1)

where represents two-dimensional convolution of with blur kernel , denotes the -fold downsampler, and is usually assumed to be additive, white Gaussian noise (AWGN) usr . The goal of SISR is to restore the corresponding HR image of the given LR image, which is a classical ill-posed inverse problem.

Figure 1: The adaptation problem of the offline trained ESRGAN and RCAN. The corresponding blur kernel is at the bottom right of each image.

Recently, SR has been continuously advanced by various deep learning-based methods rcan ; casg . Although these methods have exhibited promising performance, there is a common limitation: they are too ’general’ and not image-specific. Firstly, these methods rely heavily on external information. They are exhaustively trained via LR-HR image pairs synthesized by predefined blur kernels, ignoring the real degradations of test images (i.e. non-blind SR). When the degradations of test images are different from the predefined ones, they may suffer a significant performance drop. Secondly, their model weights are fixed during testing. Since they are trained offline, test images with various degradations will be super-resolved by the same set of weights. However, different test images are usually depredated by different degradations. If the model performs well on certain degradations, it is likely to perform badly on others. Thus, training a single model for a wide range of degradations may lead to sub-optimal results. For example, as shown in Figure 1, ESRGAN esrgan , and RCAN rcan are trained via bicubically synthesized LR-HR pairs. They have excellent performance on bicubically downscaled images but incur adaptation problems when dealing with images degraded by different kernels. Therefore, these methods may only perform well under very limited cases: the blur kernels of test images are similar and all include in the predefined kernels. Unfortunately, these cases are rare in real applications.

Towards these issues, a straightforward idea is to customize a model for each test image. Some ‘zero-shot’ methods zssr ; kernel_gan have tried to get rid of datasets synthesized by predefined kernels. They highlight the similarity of recurring patches across multi-scales in the LR image, and train models via the test image and its downscaled version. Although these methods may be suitable for regions where the recurrences are salient, the limited training samples, without any external HR information, largely restrict their performance. Instead, we propose an online super-resolution (ONSR) method, which not only involves the test LR image in model optimization as the “zero-shot” methods, but also leverages the benefits of external learning-based methods. Specifically, we design two branches, namely internal branch (IB) and external branch (EB). IB utilizes the inherent information of the test LR image and learns its specific degradation. With the aid of the learned degradation, EB could utilize external HR images to render general priors and train a specific SR model. Without relying on predefined kernels, ONSR could still make full use of external HR images, and customize a specific model for each test LR image.

In summary, our main contributions are as follows:

  • Towards the various and unknown blur kernels in blind SR, we propose an online super-resolution (ONSR) method. It could customize a specific model for each test LR image and thus could have more robust performance in different cases.

  • We design two branches, namely internal branch (IB) and external branch (EB). They could work together to better incorporate the general priors from external images and specific degradation of the test image.

  • Extensive experiments on both synthesized and real-world images show that ONSR can generate more visually favorable SR results and achieve state-of-the-art performance on blind SR.

2 Related Works

2.1 Non-Blind Super-Resolution

Most learning-based SR approaches focus on non-blind SISR, in which case the blur kernel and noise level are known beforehand. These methods are externally supervised optimized via LR-HR pairs synthesized by predefined blur kernelssimusr

. With the flourish of deep learning, convolutional neural networks (CNNs) are successfully adopted for single image super-resolution 

srcnn . After the proposal of residual learning resnet , which simplifies the optimization of deep CNNs, SR networks tend to become even deeper, and the representation capability is significantly improved. Attention mechanism rcan and feature aggregation hdrn are also adopted to further boost the performance. Besides, some non-blind methods srmd ; usr simultaneously use the predefined blur kernel and synthetic LR-HR data to advance the SR performance. However, these methods only work well for certain degradations. The results may deteriorate dramatically when there exists a domain gap between training samples and the real test image. Instead, our method focuses on blind SR, in which case the degradation from HR to LR images is unavailable.

2.2 Blind Super-Resolution

Blind SR assumes that the degradations of test images are unknown and various, which is more applicable to real images. This problem is much more challenging, as it is difficult for a single mode to generalize to different degradations. In wang2017ensemble and wang2020blind , the final results are ensembled from models that are capable of handling different cases. Thus the ensembled results could be more robust to different degradations. But there are infinite number of degradations, we can not train a model for each of the them. Other methods try to utilize the internal prior of the test image itself. In dpn , the model is finetuned via similar pairs searched from the test image. In nonpara ; kernel_gan and zssr

, the blur kernel is firstly estimated by maximizing the similarity of recurring patches across multi-scale of the LR image, and then used to synthesize LR-HR training samples. However, the internal patches of the test image are limited, which heavily restricts the performances of these methods. Differently, our ONSR still estimate the blur kernel from the test image, but it optimize the SR network via HR images. In this way, ONSR can simultaneously take the benefits of internal external priors in the LR and HR images.

Figure 2: (a) Offline training scheme. Training datasets are synthesized from external HR image. The SR model are trained offline and only perform inference online. (b) The online training scheme of ZSSR zssr . Only the test image is used as the training data. The SR model is trained online.

2.3 Offline & Online Training in Super-Resolution

Most deep learning-based SR methods are offline optimized. These models are trained via a large number of synthesized paired LR-HR samples hdrn ; ikc , and the model weights are fixed during testing. Thus, their model weights are completely determined by external data, without considering the inherent information of the test image. LR images that may be degraded by various kernels are super-resolved by the same set of model weights. The domain gap between training and testing data may impair the performance. Contrary to offline training, online training can get the test LR image involved in model optimization. For example, ZSSR zssr is an online trained SR method. It is optimized by the test LR image and its downscaled version. Therefore, it can customize the network weights for each test LR image, and could have more robust performance over different images. However, the training samples of most online trained models are limited to only one test image, which heavily restricts their performance. Instead, our ONSR can utilize the external HR images during the online training phase. And in this way, it could better incorporate general priors of the external data and the inherent information of the test LR image.

3 Method

3.1 Motivation

As we have discussed above, previous non-blind SR methods are usually offline trained (as shown in Figure 2(a)) simusr , which means LR images with various degradations are super-resolved with the same set of weights, regardless of the specific degradation of the test image. Towards this problem, a straightforward idea is to adopt an online training algorithm, i.e. adjust the model weights for each test LR image with different degradations. A similar idea namely “zero-shot” learning is used in ZSSR. As shown in Figure 2(b), ZSSR is trained with the test LR image and its downscaled version. However, this pipeline has two in-born drawbacks: 1) with a limited number of training samples, it only allows relatively simple network architectures in order to avoid overfitting, thus adversely affecting the representation capability of deep learning; 2) no HR images are involved. It is difficult for the model to learn general priors of HR images, which is also essential for SR reconstruction Ulyanov2018DeepIP .

The drawbacks of ZSSR motivate us to think: a better online updating algorithm should be able to utilize both the test LR image and external HR images. The former provides inherent information about the degradation method, and the latter enables the model to exploit better general priors. Therefore, a “general” SR model can be adjusted to process the test LR image according to its “specific” degradation, which we call: from “general” to “specific”.

Figure 3: The online updating scheme of ONSR. Top: internal branch. Bottom: external branch. Images with solid borders are the input. Images with dotted borders are the output of or .

3.2 Formulation

Accoring to the framework of MAP (maximum a posterior) ren2020neural , the blind super-resolution can be formulated as:

(2)

where is the fidelity term. and model the priors of sharp image and blur kernel. and are trade-off regularization parameters. Although many delicate handcrafted priors, such as the sparsity of the dark channel darkchannel , -regularized intensity LO , and the recurrence of the internal patch recurrence , have been suggested for and

, these heuristic priors could not cover more concrete and essential characteristics of different LR images. To circumvent this issue, we design two modules,

i.e. the reconstruction module and the degradation estimation module , which can capture priors of and in a learnable manner. We substitute by , and write the degradation process as , then the problem becomes:

(3)

The prior terms are removed because they could also be captured by the generative networks and  Ulyanov2018DeepIP .

This problem involves the optimization of two neural networks,

i.e. and . Thus, we can adopt an alternating optimization strategy:

(4)

In the first step, we fix and optimize , while in the second step we fix and optimize .

So far only the given LR image is involved in this optimization. However, as we have discussed in Sec 3.1, the limited training sample may be not enough to get sufficiently optimized, because there are usually too many learnable parameters in . Thus, we introduce the external HR images in the optimization of . In the step, we degrade the by to . Then and could form a paired sample that could be used to optimize . Thus, the alternating optimization process becomes:

(5)

in which, is optimized by external datasets, while is optimized by the given LR image only. At this point, we have derived the proposed method from the perspective of alternating optimization. This may help better understand OSNR.

1:The LR image to be reconstructed:
2:      The external HR image dataset:
3:      Maximum updating step:
4:      Online testing step interval:
5:The best SR image:
6:Load the pretrained model
7:
8:while  do
9:     
10:     Sample LR image patches from
11:     Sample HR image patches from
12:     // Online testing
13:     if  then
14:         
15:     end if
16:     // Online updating different modules
17:     
18:     
19:     
20:     
21:     
22:     
23:end while
Algorithm 1 Algorithm of ONSR

3.3 Online Super-Resolution

As illustrated in Figure 3, our online SR (ONSR) consists of two branches, i.e. internal branch (IB) and external branch (EB). Both of the two branches have two modules, i.e. reconstruction module and degradation estimation module . aims to map the given LR image from the LR domain to the HR domain , i.e. reconstructing an SR image . While aims to estimate the specific degradation of the test LR image.

In IB, only the given LR image is involved. As shown in Figure 3, the input of IB are patches randomly selected from the test LR image. The input LR patch is firstly super resolved by to an SR patch. Then this SR patch is further degraded by to a fake LR patch. To guarantee that the fake LR can be translated to the original LR domain, It is supervised by the original LR patch via L1 loss. The paired SR and LR patches could help to learn the specific degradation of the test image. The optimization details will be further explained in Section 3.4.

In EB, only external HR images are involved. The input of EB are patches randomly selected from different external HR images. Conversely, the external patch is firstly degraded by to a fake LR patch, . As the weights of are shared between IB and EB, the external patches are actually degraded by the learned degradation. Thus, the paired HR and fake LR patches could help learn to super resolve LR images with specific degradations.

According to the above analysis, the loss functions of IB and EB can be formulated as:

(6)
(7)

Since information in the single test LR image is limited, to help better learn the specific degradation, we further adopt the adversarial learning strategy. As shown in Figure 3, we introduce a discriminator . is used to discriminate the distribution characteristics of the LR image. It could force to generate fake LR patches that are more similar to the real ones. Thus more accurate degradations could be learned by . We use the original GAN formulation as follows,

(8)

Adversarial training is not used for the intermediate output , because it may lead to generate unrealistic textures esrgan .We also experimentally explain this problem in Section 4.4.3.

3.4 Separate Optimization

Generally, most SR networks are optimized by the weighted sum of all objectives. All modules in an SR network are treated indiscriminately. Unlike this commonly used joint optimization method, we propose a separate optimization strategy. Specifically, is optimized by the objectives that are directly related to the test LR image, while is optimized by objectives that are related to external HR images. The losses for these two modules are as follows,

(9)
(10)

where controls the relative importance of the two losses. We will investigate the influence of in Section 4.4.5.

We adopt this separate optimization strategy for two reasons. Firstly, as the analysis in Section 3.2 that and are alternate optimized in ONSR, separate optimization may make these modules easier to converge usr . Secondly, aims to learn the specific degradation of the test image, while needs to learn the general priors from external HR images. Thus it is more targeted for them to be separately optimized. We experimentally prove the superiority of separate optimization in Sec 4.4.4. The overall algorithm is shown in Algorithm 1.

Figure 4: The architecture and initialization of .

3.5 Network Instantiation

Most existing SR structures can be used in and integrated into ONSR. In this paper, we mainly use Residual-in-Residual Dense Block (RRDB) proposed in ESRGAN esrgan . RRDB combines the multi-level residual network and dense connections, which is easy to be trained and has promising performance on SR. consists of 23 RRDBs and an upsampling module. It is initialized using the pre-trained network parameters. The pretrained model could render additional priors of external data, and also provide a comparatively reasonable initial point to accelerate optimization.

As illustrated in Figure 4, constitutes the degradation model. In Eq. 1

, since blurring and downsampling are linear transforms, we design

as a deep linear network. Theoretically, a single convolutional layer should be able to represent all possible downsampling blur methods in Eq. 1. However, according to Arora2018OnTO , linear networks have infinitely many equal global minimums. It makes the gradient-based optimization faster for deeper linear networks than shallower ones. Thus, we employ three convolutional layers with no activations and a bicubic downsampling layer in . Similarly, to obtain a reasonable initial point, kernel_gan is supervised by bicubically downsampled data at the beginning. Our bicubic downsampling layer can serve the same purpose but simpler and more elegant. Besides, to accelerate the convergence of

, we use isotropic Gaussian kernels with a standard deviation of 1 to initialize all convolutional layers, as shown in Figure 

4. Considering that images with larger downsampling factor are usually more seriously degraded, we set the size of the three convolutional layers to , , for scale factor , and , , for scale factor .

is a VGG-style network simonyan2014very to perform discrimination. The input size of is .

Figure 5: Visual comparison of ONSR and SotA SR methods for SR. The model name is denoted above the corresponding patch and PSNR/SSIM is denoted below.
Figure 6: Visual comparison of ONSR and SotA SR methods for SR. The model name is denoted above the corresponding patch and PSNR/SSIM is denoted below.

4 Experiments

4.1 Experimental Setup

Datasets. We use 800 HR images from the training set of DIV2K div2k as the external HR dataset and evaluate the SR performance on DIV2KRK kernel_gan . LR images in DIV2KRK are generated by blurring and subsampling each image from the validation set (100 images) of DIV2K with randomly generated kernels. These kernels are isotropic or anisotropic Gaussian kernels with random lengths independently distributed for each axis, rotated by a random angle . To deviate from a regular Gaussian kernel, uniform multiplicative noise (up to of each pixel value of the kernel) is further applied.

Evaluation Metrics. To quantitatively compare different methods, we use PSNR, SSIM wang2004image , Perceptual Index (PI) blau2018the and Learned Perceptual Image Patch Similarity(LPIPS) zhang2018the . Lower PI and LPIPS indicate higher perceptual quality.

Training Details. We randomly sample 10 patches of from the LR image and 10 patches of from different HR images for each input minibatch, where denotes the scaling factor. ADAM optimizer with is used for optimization. We set the learning rates to for the discriminator . For scale , the learning rate for both and are , and for scale , both are . We set the online updating step to 500 for each image, and the LR image is tested every 10 steps. To accelerate the optimization, we initialize ONSR with the bicubically pretrained model of RRDB, which is publicly available.

Type Method Scale
2 4
PSNR SSIM PI LPIPS PSNR SSIM PI LPIPS
Type1: Non-BlindSR Bicubic 28.81 0.8090 6.7039 0.3609 25.46 0.6837 8.6414 0.5572
ZSSR zssr 29.09 0.8215 6.2707 0.3252 25.61 0.6920 8.1941 0.5192
ESRGAN esrgan 29.18 0.8212 6.1826 0.3178 25.57 0.6906 8.3554 0.5266
RRDB esrgan 29.19 0.8224 6.4801 0.3376 25.66 0.6937 8.5510 0.5416
RCAN rcan 27.94 0.7885 6.8855 0.3417 24.75 0.6337 8.4560 0.5830
Type2: BlindSR Cornillere et al.cornillere2019blind 29.42 0.8459 4.8343 0.1957 - - - -
dSRVAE dsrvae - - - - 25.07 0.6553 5.7329 0.4664
Ji et al.Ji2020RealWorldSV - - - - 25.41 0.6890 8.2348 0.5219
KernelGAN+ZSSR kernel_gan 29.93 0.8548 5.2483 0.2430 26.76 0.7302 7.2357 0.4449
ONSR (Ours) 31.34 0.8866 4.7952 0.2207 27.66 0.7620 7.2298 0.4071
Table 1: Quantitative comparison of ONSR and SotA SR methods on DIV2KRK. Red: best. Blue: second best

4.2 Super-Resolution on Synthetic Data

We compare ONSR with other state-of-the-art (SotA) methods on the synthetic dataset DIV2KRK. We present two types of algorithms for analysis: 1) Type1 includes ESRGAN esrgan , RRDB esrgan , RCAN rcan and ZSSR zssr , which are non-blind SotA SR methods trained on bicubically downsampled images. 2) Type2 are blind SR methods including KernelGAN+ZSSR kernel_gan , dSRVAE dsrvae , Ji et al.Ji2020RealWorldSV and Cornillere et al.cornillere2019blind .

Quantitative Results. In Table 1

, SotA non-bind SR methods have remarkable performance under the bicubic downsampling setting, while suffering severe performance drop on DIV2KRK due to the domain gap. RCAN is even worse than the naive bicubic interpolation. ESRGAN and RRDB share the same architecture as

, but ONSR outperforms them by a large margin about 2.1dB and 2dB for scales and respectively. This improvement may be attributed to online updating. Although Type2, i.e. blind SR methods achieve significantly better quantitative results than non-blind SR methods, they still cannot generalize well to different degradations. KernelGAN+ZSSR improves over previous methods, but the performance is still inferior to ONSR by a large margin.

Qualitative Results. In Figure 5 and 6, we intuitively present visual comparisons of these methods on scales and respectively. SotA non-blind SR methods tend to produce blurry edges and undesirable artifacts, such as the window contours in image 085. Similarly, blind SR methods also tend to generate over-smooth patterns. While the results of our method are clearer and more visually natural.

Figure 7: Results of the real image “Chip” for SR.
Figure 8: Visual comparison of model adaptation to real-world video frames (from YouTube) for SR.

4.3 Super-Resolution on Real-World Data

Besides the above experiments on synthetic test images, we also conduct experiments on real images, which are more challenging due to the complicated and unknown degradation in real-world scenarios. Since there are no ground-truth HR images, we only provide the visual comparison. As shown in Figure 7, the letter “X” restored by RRDB, ESRGAN and ZSSR is blurry or has unpleasant artifacts. For RCAN, there even exists color difference from the original frame. The result of IKC is better, but the super-resolved image of our ONSR has more shaper edges and higher contrast, as well as more visually natural. We also apply these methods to YouTube raw video frames. From Figure 8, the generated SR frames from most methods are seriously blurred or contain numerous mosaics. While ONSR can produce visually promising images with clearer edges and fewer artifacts.

Method Scale PSNR SSIM PI LPIPS
IKC ikc 2 31.20 0.8767 5.1511 0.2350
RRDB-G esrgan 31.18 0.8763 4.8995 0.2213
ONSR-G (Ours) 31.53 0.8889 4.5586 0.1949
IKC ikc 4 27.69 0.7657 6.9027 0.3863
RRDB-G esrgan 27.73 0.7660 6.8767 0.3834
ONSR-G (Ours) 28.05 0.7775 6.7716 0.3781
Table 2: Quantitative comparison of ONSR-G and IKC on DIV2KRK. Red: best. Blue: second best
Figure 9: PSNR and visual results of initialized by (a) no pretrained model, (b) RRDB pretrained model, (c) RRDB-G pretrained model in different steps .
Figure 10: Visual comparison of ONSR-G and other methods for SR (image 026) and SR (image 099).

4.4 Ablation Study

4.4.1 Study on the initialization of

In this section, we experimentally investigate the influence of the initialization of . We initialize with three different methods: 1) with no pre-trained model, 2) with the bicubically pretrained model (i.e. RRDB), 3) with the pretrained model (i.e. RRDB-G) as that in ikc . In ikc , the SR module of IKC is pre-trained with image pairs that are synthesized with isotropic Gaussian blur kernels of different widths. In the same manner, we pre-train another RRDB-G model to initialize the SR module of our method (denoted as ONSR-G). From Figure 9, we can see that: 1) the SR results of initialized by the pre-trained model are more visually reasonable. It indicates the pretrained model can provide a better initial point, and guide to achieve more significant performance. 2) A more powerful pretrained SR module can better initialize and accelerate the convergence, thus achieving better performance.

As shown in Table 1 and Table 2, the performance of RRDB-G is better than the bicubically pre-trained RRDB and achieves comparable performance to the strong blind SR baseline model: IKC. Based on the pretrained RRDB-G, ONSR-G also outperforms IKC, about PSNR: , SSIM: for scale factors and . Moreover, our ONSR-G can further improve the performance of RRDB-G with the online updating scheme. Thus, it is necessary to involve the test LR image in the model optimization. The online updating scheme can effectively exploit the inherent information of the test LR image and combine it with the external priors to adjust the “general” SR model to better deal with “specific” degradations. We also provide visual comparisons in Figure 10.

Method PSNR SSIM PI LPIPS
RDN rdn 25.66 0.6935 8.5341 0.5411
ON-RDN 27.30 0.7498 7.4274 0.4377
RCAN rcan 24.75 0.6337 8.4560 0.5830
ON-RCAN 27.58 0.7612 7.1290 0.4020
Table 3: The performances of different on DIV2KRK with scale factor .

4.4.2 Study on different

In this subsection, we experimentally prove that the online updating works well for different . We replace the architecture of with different existing SR models. We use two SotA supervised SR models RDN rdn and RCAN rcan as respectively.

As shown in Table 3, only with the bicubically pretrained models, both RDN and RCAN can not adapt to LR images of different degradations. However, our online updating scheme can further adjust these models (denoted as ON-RDN and ON-RCAN) to specific degradations in test images. Thus, the performance of these models is greatly improved. Moreover, the experiments also suggest that the effectiveness of online updating is robust to different architectures of .

Figure 11: Study on different modules.

4.4.3 Study on different modules

To explain the roles of different modules (i.e. IB, EB and ) played in ONSR, we design four other methods termed IBSR, EBSR, IB-EBSR and IB-EB-GSR(as shown in Figure 11) to compare their performance on DIV2KRK.

IBSR. IBSR only has an internal branch to exploit the internal properties of the test LR image for degradation estimation and SR reconstruction, which is optimized online.

EBSR. Contrary to IBSR, EBSR only has an external branch to capture general priors of external HR images, which is optimized offline. After offline training, we use the fixed module to test LR images.

IB-EBSR. IB-EBSR has both internal branch and external branch but no GAN modules.

IB-EB-GSR. IB-EB-GSR has both and to explore the underlying distribution characteristics of the test LR and external HR images.

The quantitative comparisons on DIV2KRK are shown in Table 4. As one can see, IB-EBSR outperforms both IBSR and EBSR by a large margin. It indicates that both IB and EB are important for the SR performance. The performance of IB-EBSR could be further improved if is introduced. It suggests that adversarial training can help to be better optimized. However, when and are both added in IB-EB-GSR, the performance is inferior to ONSR. In IB-EB-GSR, the initial SR results of are likely to have unpleasant artifacts or distortions. Besides, the external HR image can not provide directly pixelwise supervision to . Therefore, the application of may hinder the better optimization of IB-EB-GSR.

Method Scale PSNR SSIM Scale PSNR SSIM
IBSR 2 28.05 0.8277 4 25.51 0.6976
EBSR 30.82 0.8806 26.56 0.7249
IB-EBSR 31.10 0.8850 27.60 0.7609
IB-EB-GSR 31.29 0.8859 27.34 0.7507
ONSR 31.34 0.8866 27.66 0.7620
Table 4: The effect of different modules on DIV2KRK.
Figure 12: The average PSNR (left) and SSIM (right) of Joint Optimization and Separate Optimization for SR in differente training steps.

4.4.4 Study on separate optimization

In this section, we experimentally compare the Separate Optimization and Joint Optimization. In separate optimization, and are alternately optimized via the test LR image and external HR images respectively. While in joint optimization, both modules are optimized together. As shown in Table 5, Separate Optimization surpasses the Joint Optimization in all metrics for scale factors and .

We also compare the convergence of these two optimization strategies. We plot the PSNR and SSIM results of the two strategies every steps. As shown in Figure 12, the results of Separate Optimization always higher and grow faster than that of Joint Optimization. It indicates that Separate Optimization could not only help the network converge faster, but also help it converge to a better point. This property of Separate Optimization allows us to make a trade-off between SR effectiveness and efficiency by setting different training iterations.

Method Scale PSNR SSIM PI LPIPS
Joint Optimization 2 31.03 0.8827 4.8759 0.2212
Separate Optimization 31.34 0.8860 4.7952 0.2207
Joint Optimization 4 26.97 0.7399 7.5985 0.4445
Separate Optimization 27.66 0.7620 7.2298 0.4071
Table 5: The impact of the proposed separate optimization scheme on DIV2KRK with scale factors and .
Figure 13: Visualization of SR images with different settings of .

4.4.5 Study on

As we mentioned in the main submission, the weight for GAN loss needs to be tuned so that the degradation of the test LR image could be better estimated and the SR image could be better restored. From Table 6, is the best choice to help optimize the network. Also, as shown in Figure 13, with the increase of from to , or when , i.e. no adversarial training, the SR results become either more blurred or contain more artifacts.

0 1
31.10 31.34 31.30 31.28 31.26 31.25
27.60 27.66 27.42 26.72 26.12 26.07
Table 6: Average PSNR on DIV2KRK for different settings of .
Method Scale
EDSR  edsr 2 25.54 27.82 20.59 21.34 27.66 27.28 26.90 26.07 27.14 26.96 19.72 19.86
RCAN  rcan 29.48 26.76 25.31 24.37 24.38 24.10 24.25 23.63 20.31 20.45 20.57 22.04
ZSSR zssr 29.44 29.48 28.57 27.42 27.15 26.81 27.09 26.25 14.22 14.22 16.02 19.39
IRCNN ircnn 29.60 30.16 29.50 28.37 28.07 27.95 28.21 27.19 28.58 26.79 29.02 28.96
USRNet usr 30.55 30.96 30.56 29.49 29.13 29.12 29.28 28.28 30.90 30.65 30.60 30.75
ONSR 31.66 31.98 31.40 30.17 29.76 29.63 29.86 28.87 30.93 30.78 30.80 31.12
EDSR  edsr 4 21.45 22.73 21.60 20.62 23.16 23.66 23.16 23.00 24.00 23.78 19.79 19.67
RCAN  rcan 22.68 25.31 25.59 24.63 24.37 24.23 24.43 23.74 20.06 20.05 20.33 21.47
ZSSR zssr 23.50 24.33 24.56 24.65 24.52 24.20 24.56 24.55 16.94 16.43 18.01 20.68
IRCNN ircnn 23.99 25.01 25.32 25.45 25.36 25.26 25.34 25.47 24.69 24.39 24.44 24.57
USRNet usr 25.30 25.96 26.18 26.29 26.20 26.15 26.17 26.30 25.91 25.57 25.76 25.70
ONSR 26.51 27.24 27.50 27.57 27.43 27.30 27.36 27.51 26.17 26.17 26.21 26.30
Table 7: Average PSNR results of non-blind setting for SR. Red: best. Blue: second best

4.5 Non-Blind Setting

To investigate the upper boundary of ONSR, we also make comparisons with other methods (in Table 7) on non-blind setting, i.e. the blur kernel is known and participates in the network optimization. For ONSR, we substitute by ground-truth degradation.

Datasets. Referring to usr , the performance are evaluated on BSD68 Martin2001ADO . 12 representive and diverse blur kernels are used to synthesize the corresponding test LR images, including 4 isotropic Gaussian kernels with different widths, 4 anisotropic Gaussian kernels from srmd , and 4 motion blur kernels from Boracchi2012ModelingTP ; Levin2009UnderstandingAE .

Quantitative Results. As reported in Table 7, ONSR outperforms all other methods on the 12 blur kernels by a large margin, which indicates the robustness of ONSR. Besides, considering GT blur kernels are provided, our online updating scheme is efficient to adjust the model to different degradations, without training on large-scale paired samples.

4.6 Speed Comparison

Method Speed(s) PSNR(dB)
KernelGAN+ZSSR kernel_gan 1127.84 26.76
ONSR(Ours) 314.46 27.12
Table 8: Average running time comparison on DIV2KRK.

4.6.1 Speed on image-specific problem

In DIV2KRK, the degradation of each image is different and unknown, which is the image-specific problem. Online blind SR methods are more suitable for this case. Thus, we compare the runtime of ONSR with a typical SotA online SR method: KernelGAN+ZSSR kernel_gan . We use the official codes of KernelGAN+ZSSR to test the average running time on DIV2KRK with scaling factor . For ONSR, we set the training steps to 100 for each image, and the LR image is tested every 10 steps. The average running time of the networks is evaluated on the same machine with an NVIDIA 2080Ti GPU. As shown in Table 8, The PSNR of ONSR is higher than KernelGAN+ZSSR, while the speed is nearly 4 times faster than KernelGAN+ZSSR.

4.6.2 Speed on degradation-specific problem

Set5 Set14 BSD100 Urban100
IKC 31.67 / 3.423 28.31 / 4.984 27.37 / 3.147 25.33 / 18.276
ONSR-G 31.75 / 0.471 28.34 / 0.847 27.48 / 0.467 25.97 / 2.489
Table 9: PSNR(dB)/Speed(s) on datasets from IKC for SR.

We call the problem that multiple images have the same degradation as the degradation-specific problem. ikc proposed a test kernel set for degradation-specific problem, namely Gaussian8. It consists of eight selected isotropic Gaussian blur kernels, and the ranges of kernel width are set to [1.80, 3.20]. We synthesize test LR images by degrading HR images in the common benchmark datasets (i.e. Set5 set5 , Set14 set14 , BSD100 bsd100 , Urban100 urban100 ) with Gaussian8. Thus each dataset contains eight degradations.

In this case, we randomly select of LR images to online update the model for each degradation. Then the optimal model weights are fixed to process the rest images with the corresponding degradation. As shown in Table 9, ONSR can be significantly accelerated. ONSR outperforms IKC on all datasets, while the speed is nearly 7 times faster than IKC.

5 Conclusion and Future Work

In this paper, we argue that most nowadays SR methods are not image-specific. Towards the limitation, we propose an online super-resolution (ONSR) method, which could customize a specific model for each test image. In detail, we design two branches, namely internal branch (IB) and external branch (EB). IB could learn the specific degradation of the test image, and EB could learn to super resolve images that are degraded by the learned degradation. IB involves only the LR image, while EB uses external HR images. In this way, ONSR could leverage the benefits of both inherent information of the test LR image and general priors from external HR images. Extensive experiments on both synthetic and real-world images prove the superiority of ONSR in blind SR problem. These results indicate that customizing a model for each test image is more practical in real applications than training a general model for all LR images. Moreover, the speed of ONSR may be further improved by designing more light-weight modules for faster inference or elaborating the training strategy to accelerate convergence. Faster speed can help it to be more practical when processing large amounts of test images, such as videos of low resolution, which is also the focus of our future work.

References

  • (1)

    K. Zhang, L. V. Gool, R. Timofte, Deep unfolding network for image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3217–3226.

  • (2) Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu, Image super-resolution using very deep residual channel attention networks, in: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 286–301.
  • (3) Y. Yang, Y. Qi, Image super-resolution via channel attention and spatial graph convolutional network, Pattern Recognition 112 (2021) 107798.
  • (4)

    X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, C. Change Loy, Esrgan: Enhanced super-resolution generative adversarial networks, in: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.

  • (5) A. Shocher, N. Cohen, M. Irani, “zero-shot” super-resolution using deep internal learning, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3118–3126.
  • (6) S. Bell-Kligler, A. Shocher, M. Irani, Blind super-resolution kernel estimation using an internal-gan, in: NeurIPS, 2019.
  • (7) N. Ahn, J. Yoo, K.-A. Sohn, Simusr: A simple but strong baseline for unsupervised image super-resolution, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 474–475.
  • (8) C. Dong, C. C. Loy, K. He, X. Tang, Learning a deep convolutional network for image super-resolution, in: European conference on computer vision, Springer, 2014, pp. 184–199.
  • (9) K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  • (10) K. Jiang, Z. Wang, P. Yi, J. Jiang, Hierarchical dense recursive network for image super-resolution, Pattern Recognition 107 (2020) 107475.
  • (11) K. Zhang, W. Zuo, L. Zhang, Learning a single convolutional super-resolution network for multiple degradations, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3262–3271.
  • (12) L. Wang, Z. Huang, Y. Gong, C. Pan, Ensemble based deep networks for image super-resolution, Pattern recognition 68 (2017) 191–198.
  • (13) Y. Wang, L. Wang, H. Wang, P. Li, H. Lu, Blind single image super-resolution with a mixture of deep networks, Pattern Recognition 102 (2020) 107169.
  • (14) Y. Liang, R. Timofte, J. Wang, S. Zhou, Y. Gong, N. Zheng, Single-image super-resolution-when model adaptation matters, Pattern Recognition (2021) 107931.
  • (15) T. Michaeli, M. Irani, Nonparametric blind super-resolution, in: Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 945–952.
  • (16) J. Gu, H. Lu, W. Zuo, C. Dong, Blind super-resolution with iterative kernel correction, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 1604–1613.
  • (17) D. Ulyanov, A. Vedaldi, V. Lempitsky, Deep image prior, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9446–9454.
  • (18) D. Ren, K. Zhang, Q. Wang, Q. Hu, W. Zuo, Neural blind deconvolution using deep priors, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 3341–3350.
  • (19) J. Pan, D. Sun, H. Pfister, M.-H. Yang, Blind image deblurring using dark channel prior, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1628–1636.
  • (20) J. Pan, Z. Hu, Z. Su, M.-H. Yang, Deblurring text images via l0-regularized intensity and gradient prior, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 2901–2908.
  • (21) T. Michaeli, M. Irani, Blind deblurring using internal patch recurrence, in: European conference on computer vision, Springer, 2014, pp. 783–798.
  • (22)

    S. Arora, N. Cohen, E. Hazan, On the optimization of deep networks: Implicit acceleration by overparameterization, in: International Conference on Machine Learning, PMLR, 2018, pp. 244–253.

  • (23) K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition, CoRR abs/1409.1556.
  • (24) E. Agustsson, R. Timofte, Ntire 2017 challenge on single image super-resolution: Dataset and study, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2017, pp. 126–135.
  • (25) Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility to structural similarity, IEEE transactions on image processing 13 (4) (2004) 600–612.
  • (26) Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, L. Zelnik-Manor, The 2018 pirm challenge on perceptual image super-resolution, in: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2018, pp. 0–0.
  • (27)

    R. Zhang, P. Isola, A. A. Efros, E. Shechtman, O. Wang, The unreasonable effectiveness of deep features as a perceptual metric, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 586–595.

  • (28) V. Cornillere, A. Djelouah, W. Yifan, O. Sorkine-Hornung, C. Schroers, Blind image super-resolution with spatially variant degradations, ACM Transactions on Graphics (TOG) 38 (6) (2019) 1–13.
  • (29)

    Z.-S. Liu, W.-C. Siu, L.-W. Wang, C.-T. Li, M.-P. Cani, Unsupervised real image super-resolution via generative variational autoencoder, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 442–443.

  • (30) X. Ji, Y. Cao, Y. Tai, C. Wang, J. Li, F. Huang, Real-world super-resolution via kernel estimation and noise injection, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 466–467.
  • (31) Y. Zhang, Y. Tian, Y. Kong, B. Zhong, Y. Fu, Residual dense network for image super-resolution, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2472–2481.
  • (32) B. Lim, S. Son, H. Kim, S. Nah, K. Mu Lee, Enhanced deep residual networks for single image super-resolution, in: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 136–144.
  • (33) K. Zhang, W. Zuo, S. Gu, L. Zhang, Learning deep cnn denoiser prior for image restoration, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 3929–3938.
  • (34) D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, IEEE, 2001, pp. 416–423.
  • (35) G. Boracchi, A. Foi, Modeling the performance of image restoration from motion blur, IEEE Transactions on Image Processing 21 (8) (2012) 3502–3517.
  • (36) A. Levin, Y. Weiss, F. Durand, W. T. Freeman, Understanding and evaluating blind deconvolution algorithms, in: 2009 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2009, pp. 1964–1971.
  • (37) M. Bevilacqua, A. Roumy, C. Guillemot, M. L. Alberi-Morel, Low-complexity single-image super-resolution based on nonnegative neighbor embedding.
  • (38) R. Zeyde, M. Elad, M. Protter, On single image scale-up using sparse-representations, in: International conference on curves and surfaces, Springer, 2010, pp. 711–730.
  • (39) D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics, in: Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, Vol. 2, IEEE, 2001, pp. 416–423.
  • (40) J.-B. Huang, A. Singh, N. Ahuja, Single image super-resolution from transformed self-exemplars, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5197–5206.