End-to-end Alternating Optimization for Blind Super Resolution

05/14/2021 ∙ by Zhengxiong Luo, et al. ∙ 9

Previous methods decompose the blind super-resolution (SR) problem into two sequential steps: i) estimating the blur kernel from given low-resolution (LR) image and ii) restoring the SR image based on the estimated kernel. This two-step solution involves two independently trained models, which may not be well compatible with each other. A small estimation error of the first step could cause a severe performance drop of the second one. While on the other hand, the first step can only utilize limited information from the LR image, which makes it difficult to predict a highly accurate blur kernel. Towards these issues, instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate the blur kernel and restore the SR image in a single model. Specifically, we design two convolutional neural modules, namely Restorer and Estimator. Restorer restores the SR image based on the predicted kernel, and Estimator estimates the blur kernel with the help of the restored SR image. We alternate these two modules repeatedly and unfold this process to form an end-to-end trainable network. In this way, Estimator utilizes information from both LR and SR images, which makes the estimation of the blur kernel easier. More importantly, Restorer is trained with the kernel estimated by Estimator, instead of the ground-truth kernel, thus Restorer could be more tolerant to the estimation error of Estimator. Extensive experiments on synthetic datasets and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at a much higher speed. The source code is available at <https://github.com/greatlog/DAN.git>.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

page 8

page 9

page 11

page 13

Code Repositories

DAN

This is an official implementation of Unfolding the Alternating Optimization for Blind Super Resolution


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image super-resolution (SISR) aims to recover the high-resolution (HR) version of a given degraded low-resolution (LR) image. It has wide applications in video enhancement, medical imaging, as well as security and surveillance imaging. Mathematically, the degradation process can be expressed as

(1)

where is the original HR image, is the degraded LR image, denotes the two-dimensional convolution of with blur kernel , denotes Additive White Gaussian Noise (AWGN), and denotes the standard -fold downsampler, which means keeping only the upper-left pixel for each distinct patch [58]. Then SISR refers to the process of recovering from . It is a highly ill-posed problem due to this inverse property, and thus has always been a challenging task [5].

During the past five years, deep neural networks (DNNs) have achieved remarkable results on SISR 

[3, 53]. But most of these methods [25, 26]

assume that the blur kernel is predefined as the kernel of bicubic interpolation. In this case, the SR task degenerates to find the inverse solution for bicubic downsampling. However, blur kernels in real applications are much more complicated. They are usually unknown and differ from image to image, as the blur kernels can be easily influenced by the camera intrinsic parameters, camera pose,

etc. Consequently, there is a domain gap between bicubically synthesized training samples and the real images. This domain gap will lead to a severe performance drop when these networks are applied to real applications [32]. Thus, more attention should be paid to SR in the context of unknown blur kernels, i.e. blind SR.

In blind SR, there is one more undetermined variable, i.e. the blur kernel , and the optimization also becomes much more difficult. To make this problem easier to be solved, previous methods [60, 28] usually decompose it into two sequential steps: i) estimating the blur kernel from LR image and ii) restoring the SR image based on estimated kernel. This two-step solution involves two independently trained models, thus they may be not well compatible with each other. Specifically, the model in the second step is usually trained with ground-truth kernels. While during testing, it is provided with kernel estimated in the first step. As a result, a small estimation error of the first step could cause a severe performance drop of the following one [20]. And on the other hand, the first step can only utilize limited information from the LR image, which makes it difficult to predict a highly accurate blur kernel. Consequently, although both models can perform well individually, the final result may be suboptimal when they are combined together.

Instead of considering these two steps separately, we adopt an alternating optimization algorithm, which can estimate the blur kernel and restore the SR image in the same model. In detail, we design two convolutional neural modules, namely Restorer and Estimator. Restorer restores the SR image based on the blur kernel predicted by Estimator, and the restored the SR image is further used to help Estimator estimate a better blur kernel. Once the blur kernel is manually initialized, the two modules can well corporate with each other to form a closed loop, which can be iterated over and over. The iterating process is then unfolded to an end-to-end trainable network, which is called a deep alternating network (DAN). In this way, Estimator can utilize information from both LR and the SR images, which makes the estimation of the blur kernel easier. More importantly, Restorer is trained with the kernel estimated by Estimator, instead of ground-truth kernel. Thus during testing Restorer could be more tolerant to the estimation error of Estimator. Besides, the results of both modules could be substantially improved during the iterations, thus it is likely for our alternating optimization algorithm to get better final results than the direct two-step solutions.

We summarize our contributions into three points:

  • We adopt an alternating optimization algorithm to estimate the blur kernel and restore the SR image for blind SR in a single network (DAN), which helps the two modules to be well compatible with each other and so as to get better final results than the previous two-step solution.

  • We design two convolutional neural modules, which can be alternated repeatedly and then unfolded to form an end-to-end trainable network, without any pre/post-processing. It is easier to be trained and has a higher speed than the previous two-step solution. To the best of our knowledge, the proposed method is the first end-to-end network for blind SR.

  • Extensive experiments on synthetic and real-world images show that our model can largely outperform state-of-the-art methods and produce more visually favorable results at a much higher speed.

A preliminary version of this work has been presented as a conference paper [37]. In the current work, we incorporate additional contents in significant ways:

  • We propose a dual-path conditional block (DPCB) to optimize the architectures of both Estimator and Restorer (Sec 3.4.1). Compared with the original conditional residual block (CRB), DPCB has its advantages: i

    ) DPCB can simultaneously explore deep features of both its basic and conditional inputs, while CRB only focuses on the basic one. It enables DPCB to model a deeper correlation between the two inputs and help improve the performance of

    Estimator and Restorer. ii) The dual-path design in DPCB abandons the expansion and concatenation operations in CRB, which saves much computation. Experiments show that DPCB accelerates the whole network by 28%.

  • In current version, Estimator is supervised by the complete blur kernel, instead of the kernel in the reduced space as the conference version does. On the one hand, stronger supervision may help Estimator to be better optimized. On the other hand, it is easy for the complete predicted kernel to be used in other tasks, while the reduced kernel can only be used in the Restorer.

  • We investigate more details and add considerable analysis to the initial version, such as visualization of the predicted kernel, ablation studies about the architectures of Restorer and Estimator, etc.

2 Related Work

2.1 Super Resolution for Bicubic Downsampling

Learning-based methods for SISR usually require a large number of paired HR and LR images as training samples. However, these paired samples are hard to obtain in the real world. Consequently, researchers manually synthesize LR images from HR images with predefined downsampling settings. The most popular setting is bicubic interpolation, i.e. defining in Equation 1

as the bicubic kernel. In this way, a large amount of paired samples can be easily synthesized, which helps boost the development of various deep-learning-based methods. Since the arising of SRCNN 

[14], various DNNs [53, 21, 23] have been proposed based on this setting. And most of them focus on optimizing the network architecture for SR. Strategies such as post-upscaling [15], residual learning [30], and pixel-shuffle operation [45], have become the default choices for building an SR network. After the proposal of RCAN [61], RRDB [52] and SAN [13], the performance in the context of bicubic downsampling even starts to get saturate on common benchmark datasets.

Despite that great achievements have been made for super-resolving bicubically downsampled images, it is still difficult for SR methods to get applied in real scenarios. Because the blur kernels for real images are usually unknown and differ from image to image, and are much more complicated than the bicubic one. Consequently, due to the domain gap between real and synthesized data, methods designed for bicubically downsampled images will suffer serve performance drop in real applications [32, 11]. To address this issue, researchers begin to work on more challenging cases where degradations of test images are unknown, i.e. blind super resolution.

2.2 Two-step Blind Super Resolution

As indicated in Equation 1, blind super resolution involves solving both the blur kernel and SR image . Previous methods usually decompose it into two sequential steps and each step is an independent research field.

Kernel estimation. The first step is estimating the blur kernel from the test image. As this is an ill-posed problem [33, 35], some priors are usually needed to get it properly solved. In [40], a non-parametric method is used by utilizing the patch recurrence between the test image and its downscaled version. A similar idea is also adopted in [6, 9], but powered with neural networks and adversarial training [19]. Another widely used prior is the extreme channel priors. In [41, 42], Pan et al. firstly propose the dark channel prior, i.e. the dark channel in a natural image is usually sparse, which can be used for solving the blur kernel from a blurred image. In [55, 10], the bright channel prior is further proposed and the idea is augmented to extreme channel priors. Although these manually set priors may help in some cases, they may often be violated in applications. Consequently, as we will show in the experimental section 4.1.3, the accuracy of estimated kernels is still limited.

Super Resolution with given kernel. The second step is super-resolving the SR image with the estimated kernel. This research field is also known as non-blind SR, in which methods are designed under the assumption that the ground-truth blur kernel is known. In [18, 46, 47], the blur kernel is used to downsample images and synthesize training samples, which can be used to train a specific model for a given kernel and LR image. In [60], the kernel and LR image are directly concatenated at the first layer of a DNN. Thus, the SR result can be closely correlated to both LR image and blur kernel. In [28], Zhang et al. propose a method based on the ADMM algorithm. They interpret this problem as MAP optimization and solve the data term and prior term alternately. A similar idea is adopted in [58]. These methods can achieve remarkable performance as long as the ground-truth blur kernel is known. However, in real applications, the blur kernels are predicted by kernel-estimating methods, which are biased from the ground-truth ones. As we will illustrate in Sec 4.1.1, this bias will cause a serve performance drop when the two steps are combined together.

2.3 End-to-End Blind Super Resolution

End-to-end methods for blind SR are rarely studied before. In [20], a kernel-estimation module and a non-blind-SR module are firstly integrated into a single blind SR method. It further proposes a correction module, which uses the super-resolved SR image to iteratively correct the estimated kernel. However, the three modules in [20] are still trained in two steps, which is complicated and may restrict its performance. In the proposed method of our paper, the kernel-estimation and SR modules are end-to-end optimized, which is not only much simpler but also can help the two modules get more compatible with each other and achieve better performance.

Fig. 1: The overview of the deep alternating network (DAN).

3 End-to-End Alternating Optimization

In this section, we first illustrate the overall algorithm of our proposed and then go into the details. We start from the formulation of blind SR, which helps us explain our method mathematically. The design details will be described at last.

3.1 Formulation

As shown in Equation 1, there are three variables, i.e. , and , to be determined in blind SR problem. From Equation 1 we can get

(2)

As is assumed to be Gaussian noise with zero mean, the blind SR problem can be mathematically expressed an optimization problem in the Maximum A Posteriori (MAP) framework [59]:

(3)

Thus, the number of variables that need to be determined becomes . However, this optimization problem is still ill-posed and has an infinite number of solutions [5]. To get it properly solved, some prior terms are usually added [44, 29]:

(4)

where denotes the prior term for HR image, and represents the prior term for blur kernel. In [49], Tipping et al

. model the process of imaging and parameterize it with several unknown variables. They further assume that these unknown variables are subjected to high-dimensional Gaussian distributions. With the elaborated imaging model and strong assumptions, they succeed to solve this optimization problem directly. However, the imaging model or assumptions about unknown variables may be easily violated in real applications. On the other hand, without these strong assumptions, it is extremely difficult to solve this problem directly.

3.2 Two-Step Solution

Given that the overall blind SR is difficult to be tackled, previous methods usually decompose this problem into two sequential steps:

(5)

where denotes the function that estimates from , and the second step is usually solved by a non-blind SR method described in Sec 2.2. As we have mentioned in Sec 2.2, the two steps are independent research fields in most cases. Both of them only consider the performance under their own given conditions, while ignoring the overall performance. This two-step solution has its drawbacks in threefold. Firstly, this algorithm usually requires training of two or even more models, which is rather complicated. Secondly, can only utilize information from . However, this also an ill-posed problem: could not be properly solved without information from . At last, the non-blind SR model for the second step is trained with ground-truth kernels. While during testing, it can only have access to kernels estimated in the first step. The difference between ground-truth and estimated kernels will usually cause serve performance drop of the non-blind SR model [20].

3.3 Unfolding the Alternating Optimization

Towards the drawbacks of two-step solution, we propose an end-to-end network that can largely alleviate these issues. We still split it into two subproblems. However, instead of solving them sequentially, we adopt an alternating optimization algorithm, which restores the SR image and estimates the corresponding blur kernel alternately. The mathematical expression is

(6)

We define two solvers, namely Estimator and Restorer for the two subproblems respectively. For Estimator, there even has an analytic solution [51]. However, in current work, we choose to implement both solvers with convolutional neural modules. We have three reasons: 1) It is difficult to determine the appropriate analytic forms of the two prior terms. While neural modules are good at learning such priors implicitly [50, 4, 22]. 2) Both modules tackle intermediate results, i.e. and respectively, instead of ground-truth ones. Methods based on ground-truth assumptions may fail in this case. We also experimentally find that a neural-network-based Estimator is more robust than the analytic solution in our method. 3) Once the neural modules are trained, it is easy for them to perform inference.

Thus, we alternately the two subproblems with two neural modules. As shown in Figure 1, we fix the number of iterations as and unfold the iterating process to form an end-to-end trainable network, which is called a deep alternating network (DAN). We initialize the kernel by Dirac function, i.e. the center of the kernel is one and zeros otherwise. Following [20, 60]

, the kernel is also reshaped and then reduced by principal component analysis (PCA) 

[43]. We set in practice and both modules are supervised only at the last iteration by L1 loss. The whole network could be well trained without any restrictions on intermediate results because the parameters of both modules are shared between different iterations.

In DAN, Estimator takes both LR and SR images as inputs, which makes the estimation of blur kernel much easier. More importantly, Restorer is trained with the kernel estimated by Estimator, instead of the ground-truth kernel as previous methods do. Thus, Restorer could be more tolerant to the estimation error of Estimator during testing. Besides, compared with previous two-step solutions, the results of both modules in DAN could be substantially improved, and it is likely for DAN to get better final results. Especially, in the case where the scale factor , DAN becomes a deblurring network.

Fig. 2: The details of (a) dual-path conditional bock (DPCB), (b) dual-path conditional group (DPCG), (c) Restorer, and (d) Estimator. ‘GAP’ denotes Global Average Pooling, denote the basic input, and denotes the conditional input.

3.4 Instantiate Convolutional Neural Modules

The most direct way to build the Estimator and Restorer is using the kernel estimation network and non-blind SR network in previous methods [20, 58]. However, on the one hand, those networks are too large to be directly combined together. On the other hand, the performance of our proposed method is more related to the compatibility between Estimator and Restorer. Architectures designed for cases where they are working alone may be not suitable for the case in DAN. Thus, in this section, we specially design the architectures for Estimator and Restorer.

3.4.1 Design of Basic Elements

Analysis. Both modules in our network have two inputs. Estimator takes LR and SR image, and Restorer takes LR image and blur kernel as inputs. We define the LR image as the basic input, and the other one is the conditional input, i.e. the blur kernel, and SR image is the conditional input of Restorer and Estimator respectively. During iterating, the basic inputs of both modules keep the same, but their conditional inputs are repeatedly updated. We claim that it is significantly important to keep the output of each module closely related to its conditional input. Otherwise, the iterating results will collapse to a fixed point at the first iteration. Specifically, if Estimator outputs the same kernel regardless of the value of SR image, or Restorer outputs the same SR image regardless of the value of blur kernel, their outputs will only depend on the basic input, and the results will keep the same during the iterating.

Conditional Residual Block. In the conference version [37], a conditional residual block (CRB) is used to ensure the outputs of Estimator and Restorer are closely related to their conditional inputs. However, this block has three drawbacks: 1) In Restorer, the conditional input, i.e. he estimated kernels have to be expanded spatially to get concatenated with the LR features, which largely increases the computational cost. 2) Experiments show that the channel attention layer (CALayer) in CRB is time-consuming and will easily lead to gradient explosion, which slows down the inference and makes the training unstable. 3) All blocks in the network are conditioned by the same features, which may restrict the representing ability of the whole network.

Dual-Path Conditional Block. To overcome the drawbacks of the conditional residual block, we propose a dual-path conditional block (DPCB) in this paper. As shown in Figure 2 (a), there are two paths in DPCB, i.e. conditional path (top one) and basic path (bottom one). we do not concatenate the conditional and basic paths directly. Instead, they are independently processed firstly and then are multiplied to get correlated. If the conditional input has different spatial sizes as the basic input, it is expanded just before the multiplication. In this way, convolutions on the conditional input are performed before the spatial expansion, which saves much computation. Besides, we add skip connection on the conditional path, which enables the basic inputs at different depths are conditioned by different features. It may improve the representing ability of the whole module and enhance the final results. We also remove the channel attentional layer to accelerate the inference and stabilize the training.

Dua-Path Conditional Group. We further adopt the residual in residual (RIR) structure proposed in [61]. As shown in 2 (b), we add long skip connections when several DPCBs are sequentially stacked. These blocks form what we call dual-path conditional group (DPCG). These long skip connections could further help stabilize the training and enhance the results of very deep neural networks [61]. Since the conditional and basic paths are independently processed, the convolutional layers on the two paths can also have different configurations. As shown in Figure 2

(b), we denote the kernel size and stride for the two paths as

, and and respectively.

3.4.2 Restorer

The whole structure of Restorer is shown in Figure 2 (c). Both inputs are firstly mapped to have the same number of channels by a single convolutional layer respectively. The body of Restorer consists of only DPCGs. The spatial size of the conditional input, i.e. the reduced kernel, is . In this case, the conditional input needs to be expanded spatially to get multiplied with the basic input in the DPCB. Fortunately, the conditional input can maintain the spatial size through the conditional path, which saves many computations than the conference verison [37]. We use PixelShuffle [45] layers to upscale the features to the desired size. In practice, Restorer consists of DPCGs and each DPCG contains DPCBs. The number of channels in the body is set as .

3.4.3 Estimator

The whole structure of Estimator is shown in Figure 2 (d). The SR image super-resolved by Restorer is firstly downscaled by a convolutional layer with stride . Then the feature maps are used as the conditional input of Estimator. The body of Estimator also consists of only DPCGs. The kernel sizes for both basic and conditional paths are set as . In practice, the body of Estimator consists one DPCG, which contains DPCBs. The number of channels in the body is set as .

In the conference version [37], the Estimator only predicts kernels in the reduced space and it is only supervised by the reduced kernel. There are two drawbacks to this design: 1) Estimator can not predict complete kernels, i.e. kernels before being transformed by PCA. Even the final SR result is good enough, we do not know how the blur kernel looks like. 2) Although the reduced kernel is well-supervised, the complete kernel is not well-constrained. While according to [33], it is better to restrict the complete kernel to sum to one. which is important to the convergence of the whole algorithm [34]. Thus, in current version, the Estimator directly predicts all elements of the blur kernel, i.e. the complete kernel. We further add a Softmax [8] layer at the end of Estimator, which explicitly forces the complete kernel to sum to one. Experiments in Sec 4.1.3 indicates that the predicted kernels of modified Estimator have fewer visual distinctions with ground truth and smaller quantitative error.

Method Scale Set5 Set14 BSD100 Urban100 Manga109
  PSNR   SSIM   PSNR   SSIM   PSNR   SSIM   PSNR   SSIM   PSNR   SSIM
Bicubic 2 28.82 0.8577 26.02 0.7634 25.92 0.7310 23.14 0.7258 25.60 0.8498
CARN [2] 30.99 0.8779 28.10 0.7879 26.78 0.7286 25.27 0.7630 26.86 0.8606
Bicubic+ZSSR [46] 31.08 0.8786 28.35 0.7933 27.92 0.7632 25.25 0.7618 28.05 0.8769
[41]+CARN [2] 24.20 0.7496 21.12 0.6170 22.69 0.6471 18.89 0.5895 21.54 0.7946
CARN [2]+[41] 31.27 0.8974 29.03 0.8267 28.72 0.8033 25.62 0.7981 29.58 0.9134
IKC [20] 37.19 0.9526 32.94 0.9024 31.51 0.8790 29.85 0.8928 36.93 0.9667
DANv1  [37] 37.34 0.9526 33.08 0.9041 31.76 0.8858 30.60 0.9060 37.23 0.9710
DANv2 37.60 0.9544 33.44 0.9094 32.00 0.8904 31.43 0.9174 38.07 0.9734
Bicubic 3 26.21 0.7766 24.01 0.6662 24.25 0.6356 21.39 0.6203 22.98 0.7576
CARN [2] 27.26 0.7855 25.06 0.6676 25.85 0.6566 22.67 0.6323 23.84 0.7620
Bicubic+ZSSR [46] 28.25 0.7989 26.11 0.6942 26.06 0.6633 23.26 0.6534 25.19 0.7914
[41]+CARN [2] 19.05 0.5226 17.61 0.4558 20.51 0.5331 16.72 0.4578 18.38 0.6118
CARN [2]+[41] 30.31 0.8562 2757 0.7531 27.14 0.7152 24.45 0.7241 27.67 0.8592
IKC [20] 33.06 0.9146 29.38 0.8233 28.53 0.7899 24.43 0.8302 32.43 0.9316
DANv1 [37] 34.04 0.9199 30.09 0.8287 28.94 0.7919 27.65 0.8352 33.16 0.9382
DANv2 34.19 0.9209 30.20 0.8309 29.03 0.7948 27.83 0.8395 33.28 0.9400
Bicubic 4 24.57 0.7108 22.79 0.6032 23.29 0.5786 20.35 0.5532 21.50 0.6933
CARN [2] 26.57 0.7420 24.62 0.6226 24.79 0.5963 22.17 0.5865 21.85 0.6834
Bicubic+ZSSR [46] 26.45 0.7279 24.78 0.6268 24.97 0.5989 22.11 0.5805 23.53 0.7240
[41]+CARN [2] 18.10 0.4843 16.59 0.3994 18.46 0.4481 15.47 0.3872 16.78 0.5371
CARN [2]+[41] 28.69 0.8092 26.40 0.6926 26.10 0.6528 23.46 0.6597 25.84 0.8035
IKC [20] 31.67 0.8829 28.31 0.7643 27.37 0.7192 25.33 0.7504 28.91 0.8782
DANv1 [37] 31.89 0.8864 28.42 0.7687 27.51 0.7248 25.86 0.7721 30.50 0.9037
DANv2 32.00 0.8885 28.50 0.7715 27.56 0.7277 25.94 0.7748 30.45 0.9037
TABLE I: Quantitative comparison with SOTA SR methods with Setting 1. The best two results are indicated in bold and underlined respectively.
Fig. 3: Visual results of img 005, img 013, img 047 and img 052 in Urban100. The of blur kernel is . Best viewed in color.
Fig. 4: Visual results of img 092 and img 096 in Urban100. The of blur kernel is

4 Experiments

To fully investigate the proposed method, experiments are performed on both synthetic and real images. In experiments on synthetic images, we evaluate its quantitative results under different settings and perform controlled experiments to help analyze the proposed method. In experiments on real images, we provide a qualitative comparison to demonstrate the effectiveness of the proposed method.

4.1 Experiments on Synthetic Images

To fully investigate the proposed method, extensive experiments are performed under two different degradation settings. Setting 1 only focuses on cases of isotropic Gaussian blur kernels. In this case, different blur kernels can be quantitatively compared, which can help study the influence of blur kernels. Setting 2 focuses on cases of more general and irregular blur kernels. Intuitively, Setting 2 is relatively more difficult and can help study the performance of the proposed method.

Setting 1. Following the setting in [20], the kernel size is set as . During training, the kernel width is uniformly sampled in [0.2, 4.0], [0.2, 3.0] and [0.2, 2.0] for scale factors , and respectively. For quantitative evaluation, we collect HR images from the commonly used benchmark datasets, i.e. Set5 [7], Set14 [57], Urban100 [24], BSD100 [38] and Manga109 [39]. Since determined kernels are needed for reasonable comparison, we uniformly choose 8 kernels, denoted as Gaussian8, from range [1.8, 3.2], [1.35, 2.40] and [0.80, 1.60] for scale factors , and respectively. The HR images are first blurred by the selected blur kernels and then downsampled to form synthetic test images.

Types Method Scale
2 4
  PSNR   SSIM   PSNR   SSIM
Class 1 Bicubic 28.73 0.8040 25.33 0.6795
Bicubic kernel + ZSSR [46] 29.10 0.8215 25.61 0.6911
EDSR [36] 29.17 0.8216 25.64 0.6928
RCAN [61] 29.20 0.8223 25.66 0.6936
Class 2   PDN [48] - 1st in NTIRE’19 track4 / / 26.34 0.7190
  WDSR [56] - 1st in NTIIRE’19 track2 / / 21.55 0.6841
WDSR [56] - 1st in NTIRE’19 track3 / / 21.54 0.7016
  WDSR [56] - 2nd in NTIRE’19 track4 / / 25.64 0.7144
Ji et al. [27] - 1st in NITRE’20 track 1 / / 25.43 0.6907
Class 3 Cornillere et al. [12] 29.46 0.8474 / /
Michaeli et al. [40] + SRMD  [60] 25.51 0.8083 23.34 0.6530
Michaeli et al. [40] + ZSSR [46] 29.37 0.8370 26.09 0.7138
KernelGAN [6] + SRMD [60] 29.57 0.8564 25.71 0.7265
KernelGAN [6] + USRNet [58] / / 20.06 0.5359
KernelGAN  [6]+ ZSSR [46] 30.36 0.8669 26.81 0.7316
Ours DANv1 32.56 0.8997 27.55 0.7582
DANv2 32.58 0.9048 28.74 0.7893
TABLE II: Quantitative comparison with SOTA SR methods with Setting 2. The best two results are indicated in bold and underlined respectively.
Fig. 5: Visual results on DIV2KRK. From the most left to the most right column are respectively ground-truth kernels, kernels estimated by Pan et al[41], kernels estimated by KernelGAN [6], kernels estimated by DANv2, corresponding LR images, and SR results restored by DANv2. From the top to the bottom row are respectively results of Image 001, 008, and 088. We also list out the PSNR results of SR image restored by DANv2. Best viewed in color.
Fig. 6: Visual results of img 864, img 816, img 812 and img 853 in DIV2KRK. Best viewed in color.
Fig. 7: Visual results of img 003 and img 074 in Urban100. The width of blur kernel is . Best viewed in color.
Fig. 8: The L1 error of predicted kernels with different (left) and PSNR results with respect to kernels with different (right).

Setting 2. Following the setting in [6], we set the kernel sizes as and for scale and

respectively. We firstly generate anisotropic Gaussian kernels. The lengths of both axes are uniformly distributed in

, rotated by a random angle uniformly distributed in [, ]. To deviate from a regular Gaussian, we further apply uniform multiplicative noise (up to 25% of each pixel value of the kernel) and normalize it to sum to one. For testing, we use the benchmark dataset DIV2KRK that is used in [6].

Data. For both settings, we collect HR images from DIV2K [1] and Flickr2K [16] as training set. We firstly crop all HR images to patches of and use them to synthesize training pairs on the fly. The synthesized pairs are then further cropped such that the sizes of LR images are for all scale factors.

Training. The batch sizes for all models are . All models are trained for iterations. We use Adam [31] as our optimizer, with , . The initial learning rate is , and will decay by half at every iterations. All models are trained on RTX2080Ti GPUs.

Evaluation metric. All methods are evaluated by PSNR and SSIM [54]. Both metrics are calculated on the Y channel (i.e. luminance) of transformed YCbCr space.

4.1.1 Quantitative Comparisons

In this section, we provide quantitative results of different methods under different settings.

Setting 1. For the first setting, we evaluate our method on test images synthesized by Gaussian8 kernels. We denote DAN in the conference version [37] as DANv1 and the DAN in current paper as DANv2. We mainly compare our results with ZSSR [46] (using bicubic kernel) and IKC [20]. We also include a comparison with CARN [2]. Since it is not designed for blind SR, we perform the deblurring method [41] before or after CARN. The results in Table I.

Despite that CARN achieves remarkable results in the context of bicubic downsampling, it suffers severe performance drop when applied to images with unknown blur kernels. Its performance is largely improved when it is followed by a deblurring method, but still inferior to that of blind-SR methods. ZSSR trains a specific network for each single tested image by utilizing the internal patch recurrence. However, ZSSR has an in-born drawback: the training samples for each image are limited, and thus it cannot learn a good prior for HR images. IKC is also a two-step solution for blind SR. Although the accuracy of the estimated kernel is largely improved in IKC, the final result is still suboptimal.

Both DANv1 and DANv2 are trained in an end-to-end manner, which is not only much easier to be trained than two-step solutions but also more likely to reach a better optimum point. As shown in Table I, they outperform other methods by a large margin. Specially, DANv1 outperforms IKC by on Urban100 for scale . This comparison indicates the importance of end-to-end training in blind SR. On the other hand, DANv2 is also improved a lot on the basis of DANv1. It suggests that the optimized structures of Restorer and Estimator are better than the conference version.

Setting 2. The second setting involves irregular blur kernels, which are more general, but also more difficult to solve. For Setting 2, we mainly compare methods of three different classes: i) SOTA SR algorithms trained on bicubically downsampled images such as EDSR [36] and RCAN [61] , ii) blind SR methods designed for NTIRE competition such as PDN [48] and WDSR [56], iii) the two-step solutions, i.e. the combination of a kernel estimation method and a non-blind SR method, such as Kernel-GAN [6] and ZSSR [46]. The PSNR and SSIM results on the Y channel are shown in Table II.

Similarly, the performance of methods trained on bicubically downsampled images is limited by the domain gap. Thus, their results are only slightly better than that of interpolation. The methods in Class 2 are trained on synthesized images provided in the NTIRE competition. Although these methods achieve remarkable results in the competition, they still cannot generalize well to irregular blur kernels.

The comparison between methods of Class 3 can enlighten us a lot. Specifically, USRNet [58] achieves remarkable results when GT kernels are provided, and KernelGAN also performs well on kernel estimation. However, when they are combined together, as shown in Table II, the final SR results are worse than most other methods. This indicates that it is important for the Estimator and Restorer to be compatible with each other. Additionally, although a better kernel-estimation method can benefit the SR results, the overall performance is still largely inferior to that of both DANv1 and DANv2. This comparison also indicates the importance of end-to-end training for blind SR. Compared with DANv1, the performance of DANv2 is further improved. Specially, DANv2 outperforms DANv1 by for scale . On the one hand, DPCB largely improves the representing ability of DANv2. On the other hand, DANv2 can be trained more stably than DANv1. Thus it can be better optimized and achieve better results.

4.1.2 Qualitative Comparisons with Other Methods

In this section, we provide some visual results of different methods under different settings for qualitative comparisons.

Setting 1. The visual results of img 005, img 013, img 047 and img 052 in Urban100 are shown in Figure 3 for comparisons between DAN and other methods. As one can see, ZSSR and CARN even cannot restore clear edges. IKC performs better, but the edges are severely blurred. DANv2 restores sharper edges and simultaneously alleviates the blurriness. This comparison indicates that DAN could produce more visually pleasant SR images. For the qualitative comparisons between DANv1 and DANv2, we need to focus on harder cases. Because for relatively easier cases, both models perform well enough and their results are hard to be visually distinguished. We provide their results of img 092 and img 096 in Urban100 for comparisons. As shown in Figure 4, it is likely for DANv1 to mix the stripes of different directions during the super-resolving processing. While DANv2 may be more stable for such cases.

Setting 2. The visual results of img 864, img 816, img 812 and img 853 in Urban100 are shown in Figure 6 for comparisons between DAN and other methods. We need to note that Bicubic interpolation is actually a strong baseline in blind SR. Although KernelGAN +ZSSR and Ji et al. can have better overall results on DIV2KRK, Bicubic interpolation can still outperform them in many cases. As indicated in the figure, compared with the other three methods, the SR images produced by DAN are much sharper and cleaner. We also provide individual comparisons between DANv1 and DANv2 in Figure 7. As one can see, the SR images of DANv1 are still slightly blurred, while those of DANv2 are much cleaner.

4.1.3 Study of Estimated Kernels

Accuracy.

We calculate the L1 error of predicted kernels to quantitatively evaluate their accuracy. As we want to investigate the performance over different kernels, we choose to measure the predicted kernels in Setting 1, because different kernels in Setting 1 can be classified via their standard deviation

. we calculate their L1 errors in the reduced space, and the results on Urban100 are shown in Figure 8 (a). As one can see that the L1 errors of reduced kernels predicted by DANv1 and DANv2 are much lower than that of IKC. It suggests that the overall improvements of DAN may partially come from more accurate predicted kernels. We need to note that DANv2 predicts more accurate kernels than DANv1, which demonstrates the modifications on Estimator in Sec 3.4.3. We also plot the PSNR results with respect to kernels with different in Figure 8 (b). As increases, the performance gap between IKC and DAN also becomes larger. It indicates that DAN may have better generalization ability.

Visualization. Compared with DANv1, DANv2 directly predicts the complete blur kernel, instead of in the reduced space. It enables us to visualize the estimated kernels. In this section, we visualize some estimated kernels to qualitatively measure the performance of Estimator. Since Gaussian kernels in Setting 1 are hard to be visually distinguished, we choose to visualize estimated kernels on DIV2KRK for scale factor . The irregular kernels of DIV2KRK are more difficult to be estimated and the performances of different methods are easier to be visually measured. We use the results of KernelGAN [6] and Pan et al[41] as comparisons. As shown in Figure 5, kernels estimated by Pan et al. are collapsed to the central area. It indicates that this method fails in estimating relatively large kernels. The kernels estimated by KernelGAN are likely to be isotropic and look very different from the ground-truth kernels. Compared with these two methods, DAN can estimate the kernel much more accurately, even if the ground-truth kernels are highly anisotropic.

4.1.4 Non-blind Setting

In this section, we replace the estimated kernel with ground truth (GT) to further investigate the influence of Estimator. If GT kernels are provided, the iterating processing becomes meaningless. Thus we test the Restorer with just once forward propagation. The tested results for Setting 1 are shown in Table III. The result almost keeps unchanged and sometimes even gets worse when GT kernels are provided. It indicates that Predictor may have already satisfied the requirements of Restorer, and the superiority of DAN also partially comes from the good cooperation between its Predictor and Restorer.

Methods Set5 Set14 B100 Urban100 Manga109
DANv2 32.00 28.50 27.56 25.94 30.45
DANv2(GT kernel) 31.98 28.49 27.56 25.95 30.46
TABLE III: PSNR results when GT kernels are provided.

4.1.5 Ablation Study of Network Architectures

In this section, we investigate the influences of different architectures, including DPCB, DPCG, and Softmax layer in

Estimator. We use DAN of the conference version, i.e. DANv1, as the baseline, which is denoted as experiment . In experiment , we replace the conditional residual block in DANv1 with DPCB. To control the model size, the number of blocks is increased from to . In experiment , we further add long skip connections. We introduce the Softmax layer to Estimator in experiment , and the network finally becomes DANv2. We report the results of different experiments on Set14 in Setting 1. As shown in Table IV, compared with the original conditional residual block in DANv1, DPCB and DPCG can improve the results by . The Softmax layer in Estimator can further improve the results by . It indicates that it helps to explicitly restrict the estimated kernels to sum to one.

Exp. DPCB DPCG Softmax Results
TABLE IV: Ablation study results of network acchitectures. Results are reported as average PSNR on Urban100 in Setting 1.
Fig. 9: PSNR and visual results with different iterations during testing on Set5 and Set14.
Fig. 10: Visual results on real image chip.

4.1.6 Study of Iterations

After the model is trained, we also change the number of iterations to see whether the two modules have learned the property of convergence or just have ‘remembered’ the iteration number. The model is trained with iterations, but during testing, we increase the iteration number from to . As shown in Figure 9 (a) and (c), the average PSNR results on Set5 and Set14 firstly increase rapidly and then gradually converge. It should be noted that when we iterate more times than training, the performance does not become worse, and sometimes even becomes better. For example, the average PSNR on Set14 is when the iteration number is , higher than when we iterate times. Although the incremental is relatively small, it suggests that the two modules may have learned to cooperate with each other, instead of solving this problem like ordinary end-to-end networks, in which cases, the performance will drop significantly when the setting of testing is different from that of training. It also suggests that the estimation error of intermediate results does not destroy the convergence of DAN. In other words, DAN is robust to various estimation errors.

Methods Params (M) GFLOPs Speed (s)
KernelGAN[6]+ZSSR [46] /
IKC  [20]
DANv1 [37]
DANv2
TABLE V: Comparisons on model complexities and inference speed of different methods.

4.1.7 Inference Speed

Compared with other blind SR methods, our end-to-end model also has superiority in inference speed. To make a quantitative comparison, we evaluate the average speed of different methods on the same platform. We choose the 40 images synthesized by Gaussian8 kernels from Set5 as testing images, and all methods are evaluated on the same platform with an RTX2080Ti GPU. We choose KernelGAN [6] + ZSSR [46] and IKC [20] as the comparison methods. The model complexities and inference speed are shown in Table V. The FLOPs of KernelGAN+ZSSR is left out because it re-trains a different model for each test image. In that case, FLOPs can not indicate the model complexity. As shown in Table V, the average speed of DANv1 is 0.75 seconds per image, nearly 554 times faster than KernelGAN + ZSSR, and 5 times faster than IKC. It indicates that DAN not only can largely outperform SOTA blind SR methods on PSNR results but also has a much higher speed. DANv2 further improves the speed of DANv1 by . This is mainly because DPCB removes the expansion and concatenation operation in CRB. It saves many computations and memory and thus can be accelerated.

4.2 Experiments on Real World Images

We also conduct experiments to prove that DAN can generalize well to real-world images. We use the model trained with Setting 1 for scale to upscale the commonly used real image chip [17]. We use KernelGAN [6] + ZSSR [46] and IKC [20] as the representative methods for blind SR, and CARN [2] as the representative method for non-blind SR method. It should be noted that it is a real image and we do not have the ground truth. Thus we can only provide a visual comparison in Figure [17]. As one can see, the result of KernelGAN + ZSSR is slightly better than bicubic interpolation but is still heavily blurred. The result of CARN is over smoothed and the edge is not sharp enough. IKC produces a cleaner result, but there are still some artifacts. The letter ‘X’ restored by IKC has an obvious dark line at the top right part. But this dark line is much lighter in the image restored by DAN. It suggests that even if DAN is trained via synthesized image pairs, it still has the ability to generalize to images in real applications in some cases.

5 Conclusion

In this paper, we have proposed an end-to-end algorithm for blind SR. This algorithm is based on alternating optimization, the two parts of which are both implemented by convolutional modules, namely Restorer and Estimator. We unfold the alternating process to form an end-to-end trainable network. In this way, Estimator can utilize information from both LR and SR images, which makes it easier to estimate blur kernel. More importantly, Restorer is trained with the kernel estimated by Estimator, instead of the ground-truth kernel, thus Restorer could be more tolerant to with the estimation error of Estimator. Experiments show that the compatibility of the two modules may be more important than their accuracy, and that is the main reason why the proposed method is better than the previous two-step solution. Our main contributions are that we provide an end-to-end algorithm for blind SR and demonstrate that an end-to-end pipeline is important for the final performance. In the future, we will try to apply similar ideas in other low-level vision tasks, such as deblur and denoise.

References

  • [1] E. Agustsson and R. Timofte (2017) NTIRE 2017 challenge on single image super-resolution: dataset and study. pp. 1122–1131. Cited by: §4.1.
  • [2] N. Ahn, B. Kang, and K. Sohn (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. In

    Proceedings of the European Conference on Computer Vision

    ,
    pp. 252–268. Cited by: TABLE I, §4.1.1, §4.2.
  • [3] S. Anwar, S. Khan, and N. Barnes (2020) A deep journey into super-resolution: a survey. ACM Computing Surveys (CSUR) 53 (3), pp. 1–34. Cited by: §1.
  • [4] M. Asim, F. Shamshad, and A. Ahmed (2020) Blind image deconvolution using deep generative priors. IEEE Transactions on Computational Imaging 6, pp. 1493–1506. Cited by: §3.3.
  • [5] S. Baker and T. Kanade (2002) Limits on super-resolution and how to break them. IEEE Transactions on Pattern Analysis and Machine Intelligence 24 (9), pp. 1167–1183. Cited by: §1, §3.1.
  • [6] S. Bell-Kligler, A. Shocher, and M. Irani (2019) Blind super-resolution kernel estimation using an internal-gan. In Advances in Neural Information Processing Systems, Cited by: §2.2, Fig. 5, §4.1.1, §4.1.3, §4.1.7, §4.1, §4.2, TABLE II, TABLE V.
  • [7] M. Bevilacqua, A. Roumy, C. Guillemot, and M. Alberi-Morel (2012) Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, Cited by: §4.1.
  • [8] L. Boltzmann (1868) Studien uber das gleichgewicht der lebenden kraft. Wissenschafiliche Abhandlungen 1, pp. 49–96. Cited by: §3.4.3.
  • [9] A. Bulat, J. Yang, and G. Tzimiropoulos (2018) To learn image super-resolution, use a gan to learn how to do image degradation first. In Proceedings of the European Conference on Computer Vision, Cited by: §2.2.
  • [10] J. Cai, W. Zuo, and L. Zhang (2020) Dark and bright channel prior embedded network for dynamic scene deblurring. IEEE Transactions on Image Processing 29, pp. 6885–6897. Cited by: §2.2.
  • [11] S. Chen, Z. Han, E. Dai, X. Jia, Z. Liu, L. Xing, X. Zou, C. Xu, J. Liu, and Q. Tian (2020) Unsupervised image super-resolution with an indirect supervised path. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 468–469. Cited by: §2.1.
  • [12] V. Cornillere, A. Djelouah, W. Yifan, O. Sorkine-Hornung, and C. Schroers (2019) Blind image super-resolution with spatially variant degradations. ACM Transactions on Graphics (TOG) 38 (6), pp. 1–13. Cited by: TABLE II.
  • [13] T. Dai, J. Cai, Y. Zhang, S. Xia, and L. Zhang (2019) Second-order attention network for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11065–11074. Cited by: §2.1.
  • [14] C. Dong, C. C. Loy, K. He, and X. Tang (2015) Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 38 (2), pp. 295–307. Cited by: §2.1.
  • [15] C. Dong, C. C. Loy, and X. Tang (2016)

    Accelerating the super-resolution convolutional neural network

    .
    In Proceedings of the European Conference on Computer Vision, pp. 391–407. Cited by: §2.1.
  • [16] R. T. et al. (2017) NTIRE 2017 challenge on single image super-resolution: methods and results. pp. 1110–1121. Cited by: §4.1.
  • [17] R. Fattal (2007) Image upsampling via imposed edge statistics. ACM Trans. Graph. 26, pp. 95. Cited by: §4.2.
  • [18] D. Glasner, S. Bagon, and M. Irani (2009) Super-resolution from a single image. 2009 IEEE 12th International Conference on Computer Vision, pp. 349–356. Cited by: §2.2.
  • [19] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems, Cited by: §2.2.
  • [20] J. Gu, H. Lu, W. Zuo, and C. Dong (2019) Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1604–1613. Cited by: §1, §2.3, §3.2, §3.3, §3.4, TABLE I, §4.1.1, §4.1.7, §4.1, §4.2, TABLE V.
  • [21] M. Haris, G. Shakhnarovich, and N. Ukita (2018) Deep back-projection networks for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1664–1673. Cited by: §2.1.
  • [22] X. Hou, H. Luo, J. Liu, B. Xu, K. Sun, Y. Gong, B. Liu, and G. Qiu (2019) Learning deep image priors for blind image denoising. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0. Cited by: §3.3.
  • [23] X. Hu, H. Mu, X. Zhang, Z. Wang, T. Tan, and J. Sun (2019) Meta-sr: a magnification-arbitrary network for super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1575–1584. Cited by: §2.1.
  • [24] J. Huang, A. Singh, and N. Ahuja (2015) Single image super-resolution from transformed self-exemplars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5197–5206. Cited by: §4.1.
  • [25] Z. Hui, X. Gao, Y. Yang, and X. Wang (2019) Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia, pp. 2024–2032. Cited by: §1.
  • [26] Z. Hui, X. Wang, and X. Gao (2018) Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 723–731. Cited by: §1.
  • [27] X. Ji, Y. Cao, Y. Tai, C. Wang, J. Li, and F. Huang (2020) Real-world super-resolution via kernel estimation and noise injection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 466–467. Cited by: TABLE II.
  • [28] Kai Zhang and Wangmeng Zuo and Lei Zhang (2019) Deep plug-and-play super-resolution for arbitrary blur kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1671–1681. Cited by: §1, §2.2.
  • [29] A. Kaufman and R. Fattal (2020) Deblurring using analysis-synthesis networks pair. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5811–5820. Cited by: §3.1.
  • [30] J. Kim, J. Kwon Lee, and K. Mu Lee (2016) Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1646–1654. Cited by: §2.1.
  • [31] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
  • [32] T. Köhler, M. Bätz, F. Naderi, A. Kaup, A. Maier, and C. Riess (2019) Toward bridging the simulated-to-real gap: benchmarking super-resolution on real data. IEEE transactions on pattern analysis and machine intelligence 42 (11), pp. 2944–2959. Cited by: §1, §2.1.
  • [33] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman (2009) Understanding and evaluating blind deconvolution algorithms. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1964–1971. Cited by: §2.2, §3.4.3.
  • [34] A. Levin (2006) Blind motion deblurring using image statistics. Advances in Neural Information Processing Systems 19, pp. 841–848. Cited by: §3.4.3.
  • [35] Levin, Anat and Weiss, Yair and Durand, Fredo and Freeman, William T (2011) Efficient marginal likelihood optimization in blind deconvolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2657–2664. Cited by: §2.2.
  • [36] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 136–144. Cited by: §4.1.1, TABLE II.
  • [37] Z. Luo, Y. Huang, S. Li, L. Wang, and T. Tan (2020) Unfolding the alternating optimization for blind super resolution. Advances in Neural Information Processing Systems 33. Cited by: §1, §3.4.1, §3.4.2, §3.4.3, TABLE I, §4.1.1, TABLE V.
  • [38] D. R. Martin, C. C. Fowlkes, D. Tal, and J. Malik (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001 2, pp. 416–423 vol.2. Cited by: §4.1.
  • [39] Y. Matsui, K. Ito, Y. Aramaki, A. Fujimoto, T. Ogawa, T. Yamasaki, and K. Aizawa (2016) Sketch-based manga retrieval using manga109 dataset. Multimedia Tools and Applications 76, pp. 21811–21838. Cited by: §4.1.
  • [40] T. Michaeli and M. Irani (2013) Nonparametric blind super-resolution. IEEE International Conference on Computer Vision, pp. 945–952. Cited by: §2.2, TABLE II.
  • [41] J. Pan, D. Sun, H. Pfister, and M. Yang (2018) Deblurring images via dark channel prior. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, pp. 2315–2328. Cited by: §2.2, TABLE I, Fig. 5, §4.1.1, §4.1.3.
  • [42] Pan, Jinshan and Sun, Deqing and Pfister, Hanspeter and Yang, Ming-Hsuan (2016) Blind image deblurring using dark channel prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1628–1636. Cited by: §2.2.
  • [43] K. Pearson (1901) LIII. on lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11), pp. 559–572. Cited by: §3.3.
  • [44] D. Ren, K. Zhang, Q. Wang, Q. Hu, and W. Zuo (2020) Neural blind deconvolution using deep priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3341–3350. Cited by: §3.1.
  • [45] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1874–1883. Cited by: §2.1, §3.4.2.
  • [46] A. Shocher, N. Cohen, and M. Irani (2018) ”Zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Cited by: §2.2, TABLE I, §4.1.1, §4.1.1, §4.1.7, §4.2, TABLE II, TABLE V.
  • [47] J. W. Soh, S. Cho, and N. I. Cho (2020)

    Meta-transfer learning for zero-shot super-resolution

    .
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3516–3525. Cited by: §2.2.
  • [48] R. Timofte, S. Gu, J. Wu, and L. Van Gool (2018) Ntire 2018 challenge on single image super-resolution: methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 852–863. Cited by: §4.1.1, TABLE II.
  • [49] M. E. Tipping, C. M. Bishop, et al. (2003) Bayesian image super-resolution. Advances in Neural Information Processing Systems, pp. 1303–1310. Cited by: §3.1.
  • [50] D. Ulyanov, A. Vedaldi, and V. Lempitsky (2018) Deep image prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9446–9454. Cited by: §3.3.
  • [51] G. Wang, Y. Wei, S. Qiao, P. Lin, and Y. Chen (2018) Generalized inverses: theory and computations. Vol. 53, Springer. Cited by: §3.3.
  • [52] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy (2018)

    Esrgan: enhanced super-resolution generative adversarial networks

    .
    In Proceedings of the European Conference on Computer Vision Workshops, pp. 0–0. Cited by: §2.1.
  • [53] Z. Wang, J. Chen, and S. C. Hoi (2020) Deep learning for image super-resolution: a survey. IEEE transactions on pattern analysis and machine intelligence. Cited by: §1, §2.1.
  • [54] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli (2004) Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing 13 (4), pp. 600–612. Cited by: §4.1.
  • [55] Y. Yan, W. Ren, Y. Guo, R. Wang, and X. Cao (2017) Image deblurring via extreme channels prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4003–4011. Cited by: §2.2.
  • [56] J. Yu, Y. Fan, and T. Huang (2020) Wide activation for efficient image and video super-resolution. In British Machine Vision Conference, Cited by: §4.1.1, TABLE II.
  • [57] R. Zeyde, M. Elad, and M. Protter (2010) On single image scale-up using sparse-representations. In Curves and Surfaces, Cited by: §4.1.
  • [58] K. Zhang, L. V. Gool, and R. Timofte (2020) Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3217–3226. Cited by: §1, §2.2, §3.4, §4.1.1, TABLE II.
  • [59] K. Zhang, W. Zuo, S. Gu, and L. Zhang (2017) Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE/CVF Conference on Computer Cision and Pattern Recognition, pp. 3929–3938. Cited by: §3.1.
  • [60] K. Zhang, W. Zuo, and L. Zhang (2018) Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3262–3271. Cited by: §1, §2.2, §3.3, TABLE II.
  • [61] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu (2018) Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision, pp. 286–301. Cited by: §2.1, §3.4.1, §4.1.1, TABLE II.