DeepAI
Log In Sign Up

Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution

03/27/2022
by   Jie Liang, et al.
0

Efficient and effective real-world image super-resolution (Real-ISR) is a challenging task due to the unknown complex degradation of real-world images and the limited computation resources in practical applications. Recent research on Real-ISR has achieved significant progress by modeling the image degradation space; however, these methods largely rely on heavy backbone networks and they are inflexible to handle images of different degradation levels. In this paper, we propose an efficient and effective degradation-adaptive super-resolution (DASR) network, whose parameters are adaptively specified by estimating the degradation of each input image. Specifically, a tiny regression network is employed to predict the degradation parameters of the input image, while several convolutional experts with the same topology are jointly optimized to specify the network parameters via a non-linear mixture of experts. The joint optimization of multiple experts and the degradation-adaptive pipeline significantly extend the model capacity to handle degradations of various levels, while the inference remains efficient since only one adaptively specified network is used for super-resolving the input image. Our extensive experiments demonstrate that the proposed DASR is not only much more effective than existing methods on handling real-world images with different degradation levels but also efficient for easy deployment. Codes, models and datasets are available at https://github.com/csjliang/DASR.

READ FULL TEXT VIEW PDF

page 9

page 12

page 13

page 14

page 18

page 19

page 20

page 21

12/15/2022

DCS-RISR: Dynamic Channel Splitting for Efficient Real-world Image Super-Resolution

Real-world image super-resolution (RISR) has received increased focus fo...
08/10/2022

Learning Degradation Representations for Image Deblurring

In various learning-based image restoration tasks, such as image denoisi...
05/10/2022

Metric Learning based Interactive Modulation for Real-World Super-Resolution

Interactive image restoration aims to restore images by adjusting severa...
04/19/2022

Towards Efficient Single Image Dehazing and Desnowing

Removing adverse weather conditions like rain, fog, and snow from images...
12/11/2020

Learning Omni-frequency Region-adaptive Representations for Real Image Super-Resolution

Traditional single image super-resolution (SISR) methods that focus on s...
10/03/2022

From Face to Natural Image: Learning Real Degradation for Blind Image Super-Resolution

Designing proper training pairs is critical for super-resolving the real...
04/13/2020

Multi-modal Datasets for Super-resolution

Nowdays, most datasets used to train and evaluate super-resolution model...

Code Repositories

DASR

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'


view repo

1 Introduction

Single image super-resolution (SISR) [20, 49, 65, 37, 45] is an active research topic in low-level vision, aiming at reconstructing a high-resolution (HR) version of a degraded low-resolution (LR) image. Since the seminal work of SRCNN [8]

, many convolutional neural network (CNN) based SISR methods 

[41, 22, 43, 66, 19] have been proposed, most of which assume a pre-defined degradation process (e.g., bicubic down-sampling) from HR to LR images. Despite the great success, the performance of these non-blind SISR methods will be much deteriorated when facing real-world images [30] because of the mismatch of degradation models between the training data and the real-world test data [60].

The blind image super-resolution (BISR) methods [30, 61, 36, 14, 67] have been proposed to address the problems of non-blind SISR methods by considering more complex degradation kernels extracted from real-world images. However, the degradation space of these methods is actually restricted to a set of pre-collected kernels, such as the DPED kernel pool [67, 16]. For real-world images, their degradation space can be much larger, including more types and more complex kernels than the DPED kernel pool, more complex and stronger noise, and other degradation operations such as compression. Therefore, many recent researches have been focused on the real-world image super-resolution (Real-ISR) tasks [4, 51, 35, 33, 10, 34, 18, 40] by modeling and synthesizing the complex degradation process of real-world images [3, 52]. The representative works include BSRGAN [60] and Real-ESRGAN [47], which introduce comprehensive degradation operations such as blur, noise, down-sampling, and JPEG compression, and control the severity of each operation by randomly sampling the respective hyper-parameters. They also employ random shuffle of degradation orders [60] and second-order degradation [47] to better simulate the real-world complex degradations, respectively.

Despite the remarkable progress of BSRGAN [60] and Real-ESRGAN [47] on improving the image perceptual quality, they have several limitations for practical usage. On one hand, they are basically designed to work on severely degraded LR images. While BSRGAN and Real-ESRGAN can generate a certain amount of details on some tough LR images, they are difficult to generate fine details on mildly degraded LR inputs. It is highly anticipated to develop Real-ISR models which can handle images with different degradation levels. On the other hand, the BSRGAN and Real-ESRGAN methods rely on heavy backbone networks (e.g., RRDB [49]), which make them difficult to be deployed on devices with limited computational resources [7, 63, 1, 55, 44]. It is also anticipated to develop efficient Real-ISR models to meet the requirement of high efficiency.

To tackle the above problems, in this paper, we propose a degradation-adaptive super-resolution (DASR) network whose parameters are adaptively specified to the given image according to its degradation. Our DASR consists of a tiny regression network to estimate the degradation parameters of the input image and multiple light-weight super-resolution experts, which are jointly optimized on a balanced degradation space. For each input image, an adaptive network is constructed via a non-linear mixture of experts, whose adaptive weighting factors are specified by the estimated degradation parameters. The multiple super-resolution experts and the degradation-aware mixture significantly improve the model capacity for handling images of different degradations. Meanwhile, the whole pipeline of DASR is highly efficient to meet the requirement of Real-ISR tasks, as only one adaptive network is employed to super-resolve the image during inference and the cost of mixing experts is negligible.

The contributions of this paper are two-fold. First, we propose a degradation-adaptive super-resolution network, which significantly improves the model capacity to super-resolve images of various degradation levels. Second, the pipeline of our DASR network is highly efficient, providing a good solution to perform Real-ISR in practical applications. Extensive experiments verified the effectiveness and efficiency of the proposed method.

2 Related Work

2.1 Real-World Image Super-Resolution

How to reproduce effectively and efficiently the HR image from low-quality and low-resolution real-world images is a challenging issue in SISR research. The distribution of real-world images can differ dramatically due to the varying image degradation process, different imaging devices, and image signal processing methods [30, 52]. Researches [4, 64] have tried to capture real-world HR-LR image pairs by adapting the focal length of the camera, yet the collection of data is tedious and this can only describe a limited subspace of image degradation. There are also some unsupervised methods [52, 10] proposed to explore the domain adaptation between the synthesized LR image and the real one, yet the domain gap is still inevitable which deteriorates the SR performance [33, 35].

Recently, several Real-ISR methods such as BSRGAN [60], Real-ESRGAN [47] and SwinIR [28] have achieved remarkable progress by introducing comprehensive degradation models to effectively synthesize real-world images. However, they rely on a heavy and computationally intensive backbone network, e.g., RRDB [49] and Swin transformer [32], and are not flexible to process images of different degradation levels. In this paper, we propose a degradation-adaptive framework to address this issue, targeting an effective and efficient network for the challenging Real-ISR task.

2.2 Image Degradation Modeling

In many non-blind SISR methods [25, 49, 65, 48, 20, 37, 11], the degradation model is simply assumed as bicubic down-sampling or blurred down-sampling with a Gaussian kernel. The performance of these non-blind methods can be dramatically undermined when applied to images with different degradations [30]. As a remedy, SRMD [61], UDVD [53] and some other methods [58, 62] extend the degradation space to cover more blur kernels and noise levels, and use the degradation map as additional input to perform conditional SISR. While these methods can handle multiple degradations with a single model, they rely on accurate degradation estimation, which itself is also a challenging task.

A few blind SISR methods have been proposed for unknown degradation [46, 15, 31, 39, 3, 56, 52]. In KMSR [67]

, a kernel pool is constructed from real photographs using generative adversarial network 

[12], followed by synthesizing training pairs in a more realistic way. Some methods like IKC [14] and VBSR [6] incorporate a blur kernel estimator into the SISR framework, which can be adaptive to images degraded from different blur kernels [36, 23]. However, most of the blind SISR methods are trained with a pre-collected kernel pool [67, 16], and hence they are not really blind and can hardly be generalized to real-world images.

Recent Real-ISR methods such as BSRGAN [60] and Real-ESRGAN [47] further extend the degradation modeling space by incorporating comprehensive degradation types with randomly sampled degradation parameters to enhance the variation. The larger degradation space helps the trained Real-ISR model to improve the perceptual quality of some tough LR inputs. However, the degradation parameter sampling in BSRGAN and Real-ESRGAN is unbalanced to train a flexible network, limiting the trained model in generating fine details, especially for inputs with mild degradations. In this work, we propose to balance the degradation space by partitioning it into three levels with balanced frequencies. Such balanced space facilitates the optimization of our degradation-adaptive model on different degradation levels and brings a better approximation to the real-world LR images.

2.3 Mixture of Experts and Dynamic Convolution

The mixture of experts (MoE, [17, 21, 13, 2]) is a long-standing method that calculates the weighted sum of multiple expert networks to improve the performance. A trainable gating network is employed to compute the weight for activating each expert [38], usually based on an explicit (e.g., labeled classes) or implicit (content clustering) partition of the data. In this paper, we calculate the adaptive weight of experts according to the degradation of the image for the Real-ISR tasks. Besides, instead of activating all experts and calculating the weighted sum of outputs as in previous MoE methods [50], we adaptively mix the network parameters, resulting in only one adapted network for inference. Such a pipeline is effective and efficient due to the increased non-linearity and the fast inference.

Dynamic convolution [5, 27] or conditional convolution [26, 54] aims to enhance the feature representation capacity by making the convolutional parameters sample-adaptive. Most of the existing methods optimize multiple sets of convolutional parameters and learn feature self-attention to linearly combine the parameters. However, this pipeline introduces many computations to obtain self-attention, causing a trade-off between efficiency and effectiveness. In this paper, we achieve the non-linear mixture of experts via an adapted conditional convolution, where the conditions are the degradation parameters and the weighting factors are calculated once for all layers to keep efficiency.

3 Methodology

Figure 1: Overall pipeline of the proposed DASR. Here, and denote the LR image, the ground truth HR image and the super-resolved result, respectively. For each convolution layer , the parameters of experts are mixed according to the weighting factors in . The input is super-resolved to by the adapted network .

This section presents our degradation-adaptive network for real-world image super-resolution, i.e., DASR. As shown in Figure 1, DASR mainly consists of a degradation prediction network and a CNN-based SR network with multiple experts. In the following sections, we first provide the details of the proposed DASR framework and then introduce our degradation modeling to set degradation parameters and generate training pairs.

3.1 Degradation-Adaptive Super-Resolution

Degradation prediction network. To allow efficient and degradation-adaptive super-resolution, we propose to estimate the degradation parameters of each input via a regression network , i.e., , where denotes the estimation of . We employ a set of parameters to elaborately describe the degradation space. The details of degradation space modeling will be discussed in Section 3.2. To make the estimation process efficient, we design a light-weighted network to predict . Specifically, consists of

convolution layers with Leaky ReLU activation, followed by a global average pooling layer. We first use convolution layers to extract image spatial degradation features and then use the global average pooling layer to estimate the degradation parameters.

To optimize the network , we introduce a regression loss between the estimated degradation parameters and the ground-truth using the -norm distance as follows:

(1)

According to the degradation model, each parameter in is randomly sampled to specify the degradation process to generate the LR-HR image pairs.

Image super-resolution network. An ideal Real-ISR method is expected to be both effective and efficient. On one hand, in real-world SR tasks, the computation resources are usually limited, especially for edge devices. On the other hand, the model should be able to effectively handle images with various kinds of degradations. Nevertheless, most of the current SR methods [28, 60, 47, 25, 29] can only trade-off between efficiency and effectiveness, and they are inflexible to handle images with different degradation types and levels.

To develop an effective and efficient Real-ISR model, we propose a degradation-adaptive SR network to boost the model capacity via a non-linear mixture of experts (MoE) strategy, whose additional cost is negligible during inference. In specific, we employ convolutional experts, denoted by , where each expert is a light-weighted SR network, e.g., SRResNet [25] or EDSR-M [29], with independent parameters . All the share the same network topology, and they are optimized jointly with the supervision of the same loss. Our idea is to implicitly train each expert to handle images falling into a sub-space of the degradation space so that they can work together to process images with various kinds of degradations in the whole space.

A vector of weighting factors

, which is adaptive to the degradation of the input , is then calculated to adaptively mix the experts. We calculate conditioned on the estimated via a tiny network with two fully-connected layers, i.e., . As both and are of low dimension ( and in our experiments), the network is highly efficient. Note that if is constrained to be a one-hot vector, only one expert will be activated for super-resolving the input , and this will degrade our framework to a competitive MoE [38], which may perform well on tasks whose sample distribution space can be partitioned with clear boundaries, yet it can hardly work well for the Real-ISR task with a large and continuous degradation space.

With the multiple experts and their adaptive weighting factors , we mix the experts adaptively in a non-linear manner. For each convolution layer of the desired network, we employ the dynamic convolution technique [54, 5] to parameterize the convolutional kernels as follows:

(2)

where and denote the input and the output features, indicates the value of , denotes the layer parameters for expert and

is the activation function. That is, we adaptively fuse the parameters of each layer among all experts, resulting in an adaptive network, denoted as

.

Note that in classic dynamic convolution, the weighting factor of each layer is calculated by an independent network conditioned on the feature map of the last layer, thus introducing non-negligible computational costs. In contrast, we learn a single set of degradation-adaptive weighting factors for all convolution layers, which is very efficient. Our framework follows the spirit of MoE but in a non-linear manner due to the activation operation in intermediate layers. The non-linearity and the degradation-adaptive mixture of multiple experts significantly extend the model capacity to handle degradations of various levels.

Our DASR is very efficient. For each convolutional layer, the model only deploys one adapted network in the inference stage, rather than deploying models as done in the classic MoE methods [17, 21]. The degradation prediction network and the weighting module are also very light-weighted. Therefore, the cost of inference is of the same order as one single expert network. The computational overhead caused by the mixture operation is negligible. Specifically, the mixture process consists of multiplications and additions operations of the parameters of experts. For a light-weighted backbone network like SRResNet or EDSR-M, the number of parameters of each expert is only , and they are independent of the size of input images. Therefore, compared with the calculation of multiple feature maps, the complexity of the mixture of parameters is several orders of magnitude lower and thus can be neglected.

3.2 Degradation Modeling

Since high-quality real-world LR-HR pairs are hard to be collected due to the misalignment issue [4, 64], the degradation modeling is very important to synthesize real-world LR inputs from a given HR image for Real-ISR model training. A degradation space, denoted by , should be pre-defined to synthesize training pairs and perform degradation-adaptive optimization. The quality of an LR sample in is controlled by a degradation parameter vector , where specifies the type or severity of a degrading operation and denotes the number of degradation parameters. In our DASR, also serves as the ground-truth for training the degradation prediction network.

The image degradation model has been recently improved significantly from the simple bicubic down-sampling [8, 49] to shuffling [60] and second-order [47] pipelines. We adopt the degradation operations of blurring (both isotropic and anisotropic Gaussian blur), resizing (both down-sampling and up-sampling with area, and bilinear and bicubic operations), noise corruption (both additive Gaussian and Poisson noise), and JPEG compression in our modeling. In , we use a one-hot code to quantify the degradation operation type and use a single value to record the degradation level normalized by its respective dynamic range.

It is worth mentioning that different from the methods [61, 14] which quantify a blur kernel by its kernel coefficients, we quantify a blurring degradation by its kernel size

, the standard deviation

along the two principal axes, and the rotation degree . In this way, the degradation parameters are more interpretable to specify the degradation types and levels, and can better support the degradation-aware mixture of experts. Meanwhile, the parameter vector has only dimensions, while the kernel vector will have much higher dimensions to estimate. Benefiting from the interpretability and compactness of the degradation space, our DASR allows explicit user control towards degradation parameters during inference. This can facilitate many user-interactive applications to customize the desired super-resolving effect.

Though the shuffling degradation method in BSRGAN [60] and the second-order degradation pipeline in Real-ESRGAN [47] can generate a sufficiently large degradation space, it is hard for them to train a model which can adaptively handle images with different levels of degradations. Our DASR is designed to be adaptive to a wide range of real-world inputs with multiple light-weight expert networks, each of which is expected to handle a subspace of images of different degradation levels. Therefore, we partition the whole degradation space into levels by specifying the parameters accordingly. Among them, and are generated with first-order degradation with small and large parameter ranges, respectively, while is generated by the second-order degradation. Due to space limitation, more details of the degradation operations and the specification of are provided in the Section 6.1 in the Appendix.

3.3 Training Losses

The learnable modules of our DASR network include . As mentioned in Section 3.1, the loss is used to optimize to predict the degradation parameters. To optimize the overall framework, following the many works in literature [47, 60, 49], we adopt the -norm pixel-wise loss , the perceptual loss and the adversarial loss . The total loss is defined as follows (more details are provided in the Section 6.2 in the Appendix):

(3)

where and denote the balancing parameters.

4 Experiments

4.1 Training Details

Following previous works [49, 47], we employ DIV2K, Flickr2K, and OutdoorSceneTraining datasets for training our DASR model. For efficiency, we employ the SRResNet [25] as our backbone. The weights of the experts are initialized by the model pre-trained with pixel-wise loss. The Adam [24] optimizer is employed to train the network. The learning rate is set to , the total batch size is and the training iteration is set to . We balance the training loss with . Without loss of generality and for a fair comparison, we conduct Real-ISR experiments with the scale factor of by following the setting in BSRGAN [60] and Real-ESRGAN [47]. In our experiment, the dimension of degradation parameters is and the number of experts is . The LR patch size is set to .

4.2 Evaluation and Compared Methods

We evaluate our DASR method both quantitatively and qualitatively. For quantitative evaluation, as in BSRGAN [60] we synthesize LR-HR pairs by applying the levels of degradations to the validation images in the DIV2K dataset, i.e., LR-HR pairs for each level. We also make the comparison on the original DIV2K dataset with bicubic downsampling. An illustration of images with different degradations is shown in Fig. 2, where more samples are shown in Section 6.3 in the Appendix. For qualitative evaluation, we also employ the images in the RealSRSet [60, 47], where the input images are corrupted by various blur, noise, or other real degradation operations.

We compare the proposed DASR with representative and state-of-the-art SR methods, including RRDB [49], ESRGAN [49], IKC [14], BSRGAN [60], Real-ESRGAN [47] and Real-SwinIR (-M and -L) [28]. Among them, RRDB is trained on bicubic degradation with pixel-wise loss; ESRGAN is trained on bicubic degradation with pixel-wise, perceptual and adversarial losses; IKC is a representative BISR method trained on various isotropic Gaussian blur kernels; BSRGAN and Real-ESRGAN are state-of-the-art Real-ISR methods with a heavy RRDB backbone; Real-SwinIR is trained on the degradation space of BSRGAN with the computationally expensive SwinIR backbone.

For a more comprehensive and fair comparison, we also re-train those commonly used backbone networks, including SRResNet, EDSR, RRDB, and SwinIR, with our constructed training dataset. Following the common practice [60, 47], we employ PSNR (the larger the better) and LPIPS (learned perceptual image patch similarity, the smaller the better) to quantitatively compare the performance of different methods on synthetic datasets, and make visual comparisons on real-world images since there are no reference images.

4.3 Quantitative Comparison

Figure 2: Sample images with different levels of degradations in our datasets.

Effectiveness. In Table 1 and Table 2, we quantitatively compare the performance of competing methods in terms of PSNR and LPIPS on datasets with different levels of degradations. Specifically, Table 1 compares the methods trained with their own degradation models, while Table 2 compares the methods re-trained on our proposed degradation space.

As shown in Table 1, existing methods can only achieve satisfactory performance on datasets with a specific type of degradation. For example, RRDB and ESRGAN can respectively achieve good fidelity and perceptual quality on the bicubic-downsampled dataset, yet their performance drops dramatically when handling images with other degradations, even for the ‘Level-I’ degradation with mild noise and blurs. Real-ESRGAN, BSRGAN, and Real-SwinIR perform well on the most severely degraded dataset, i.e., ‘Level-III’. However, their performance deteriorates much on the other three datasets.

In contrast, our DASR achieves stable and significant improvement against other methods under the first three types of degradations, which cover the majority of real-world images, while achieving highly competitive (among the best two) results for the last type of degradation. For example, DASR outperforms Real-ESRGAN by about dB in PSNR and in LPIPS on the ‘Level-I’ dataset. On the ‘Level-III’ dataset with severely degraded images (as shown in Fig. 2 (d)), DASR achieves almost the same PSNR and LPIPS indices as BSRGAN. These observations clearly demonstrate that our DASR can generalize well to images with a wide range of degradations.

To further validate the effectiveness of our degradation-adaptive strategy, in Table 2 we re-train the backbones of popular SR models on our proposed degradation space. Note that the heavy RRDB backbone is adopted in both BSRGAN and RealESRGAN, and the lightweight SRResNet is adopted in our DASR as the backbone. As can be seen from this table, with the same network topology and similar computational overhead, our DASR outperforms the baseline SRResNet on all datasets by a large margin, e.g., improving db of PSNR on the bicubic-downsampled dataset and about of LPIPS on the Level-II dataset. This demonstrates that the degradation-adaptive mixture of multiple experts can significantly extend the model capacity while keeping the efficiency.

Compared to RRDB and SwinIR backbones that are adopted in recent state-of-the-art methods [60, 47, 28], our DASR consumes much less computational resources, e.g., about and latency of RRDB and SwinIR, respectively. At the same time, DASR outperforms these heavy models in terms of reconstruction fidelity on all datasets, demonstrating its effectiveness of degradation-adaptive super-resolution and high efficiency to deploy in practice.

D-Level Metric RRDB ESRGAN IKC BSRGAN Real- ESRGAN Real- SwinIR-M Real- SwinIR-L DASR
Bicubic PSNR 30.92 28.17 28.01 27.32 26.65 26.83 27.21 28.55
LPIPS 0.2537 0.1154 0.2695 0.2364 0.2284 0.2221 0.2135 0.1696
Level-I PSNR 26.27 21.16 24.09 26.78 26.17 26.21 26.45 27.84
LPIPS 0.3419 0.4727 0.3805 0.2412 0.2312 0.2247 0.2161 0.1707
Level-II PSNR 26.46 22.77 25.39 26.75 26.16 26.12 26.39 27.58
LPIPS 0.4441 0.4900 0.4531 0.2462 0.2391 0.2313 0.2213 0.2126
Level-III PSNR 23.91 23.63 22.91 24.05 23.81 23.34 23.46 23.93
LPIPS 0.7631 0.7314 0.7583 0.3995 0.3901 0.3844 0.3765 0.4144
Table 1: Quantitative comparison of different methods on datasets with different degradations (D-Level). ‘Bicubic’ denotes the DIV2K validation set with bicubic degradation, while ‘Level I’, ‘II’, and ‘III’ denote the datasets with mild, medium, and severe degradations, respectively. For the compared methods, we employ their officially released pre-trained models. The PSNR is calculated on the Y channel of YCbCr space.

Efficiency. The inference efficiency is a crucial factor in Real-ISR tasks due to the limited computational resources in practical applications. We compare different backbone networks in terms of multiple efficiency-related metrics and depict the results in the bottom rows of Table 2.

Data & Metrics SRResNet EDSR SwinIR RRDB DASR

Bicubic
PSNR 28.05 28.26 28.28 27.92 28.55
LPIPS 0.1747 0.1807 0.1488 0.1473 0.1696
Level-I PSNR 27.60 27.79 27.78 27.84 27.84
LPIPS 0.1772 0.1834 0.1531 0.1569 0.1707
Level-II PSNR 27.34 27.53 27.45 27.29 27.58
LPIPS 0.2228 0.2284 0.1854 0.1886 0.2126
Level-III PSNR 23.71 23.87 23.60 23.54 23.93
LPIPS 0.4419 0.4351 0.3869 0.3847 0.4144
Latency (ms) 113 105 1719 460 142
#FLOPs (GMac) 166 130 539 1176 184
#Params (M) 1.52 1.52 11.72 16.70 8.07
#Memory (M) 2359 2169 2699 2417 2452
Table 2: Quantitative comparison of different backbone networks re-trained on our proposed degradation space and the efficiency comparison (the bottom rows). The evaluation datasets are the same as in Table 1. For efficiency evaluation, the input-dependent metric FLOPs is calculated on images with pixels; the Latency and Memory are the average inference time and the maximum GPU memory allocation on the DIV2K validation dataset (most LR inputs are with pixels). Statistics are collected following the implementation of [57, 59] by using an NVIDIA V100 GPU.

As shown in the table, the computational overhead of different backbone networks differs dramatically. For example, RRDB [49], which is employed in recent Real-ISR methods [60, 47], consumes about times the FLOPs and more than times the inference time than SRResNet [25]. In other words, the RRDB based Real-ISR methods achieve superior performance at the price of applicability. The recent transformer-based method SwinIR has an acceptable number of FLOPs, however, it actually consumes much more inference time due to the heavy computation of attentions and frequent IO consumption.

Benefiting from the light SRResNet-based backbone and the efficient degradation prediction and parameter fusion, our DASR is very efficient. In specific, the degradation prediction network and the weighting module consume GMac FLOPs, ms latency, M parameters and M GPU memory in total for . Besides, the consumption on parameter fusion operation is negligible, as there are only M multiplications and additions respectively and they can be calculated in parallel. Compared with the classical MoE methods that mix the feature maps of all experts [17, 21, 9, 50], our DASR only conducts one forward pass. As a result, the computational cost increases slightly with a larger , which supports a flexible extension of model capacity.

It is worth mentioning that although our model has more parameters, the maximum GPU memory consumption does not increase much as shown in the row of #Memory in Table 2, since the deployment of model parameters costs much less space than storing input-dependent feature maps. On the other hand, the increased model parameters do not demand much storage space, which is much easier to afford than the computing power.

4.4 Qualitative Comparison

Figure 3: Qualitative comparison of competing methods on images with different degradations. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.

Fig. 3 shows the visual comparisons between different methods on images with different degradations. One can see that DASR can stably restore sharp and realistic details and remove artifacts for a wide range of degradations. In specific, the first sample image is degraded with bicubic downsampling and suffers from the aliasing issue. Both BSRGAN and Real-ESRGAN cannot generate satisfactory texture details even with the heavy RRDB backbone. This is because these two methods are trained on pairs with relatively severe degradations so that their denoising capacity is strengthened yet the detail-generation capacity is limited. Similar observations can be made on all the four samples in Fig. 3.

The RRDB backbone trained with pixel-wise loss performs well on the first two samples in generating textures details, yet it cannot be generalized to the last two samples whose degradations are severe. This is reasonable since all its training pairs are generated by bicubic downsampling. In addition, the results of RRDB in the first and third samples are blurry, which is a well-acknowledged side-effect of pixel-wise loss. By applying perceptual and adversarial losses, ESRGAN achieves sharper results yet introduces many visual artifacts due to the instability of training generative adversarial networks. The ESRGAN also amplifies the noise as shown in the second sample. By considering different blur kernels, IKC can restore rich textures on most images, yet bring overshoot artifacts when facing unseen kernels in real-world images (the fourth sample). It also lacks the capacity to remove noise as shown in the second sample.

The results of Real-SRGAN are obtained by re-training the SRResNet on our proposed degradation space with the same loss as in Real-ESRGAN [47]. It can be observed that due to the insufficient feature representation capacity, Real-SRGAN cannot perform well on all four samples compared to our DASR. In the first three samples, the Real-SRGAN generates messy details or artifacts, as the light-weighted model limits its capacity to achieve degradation-adaptive super-resolution. On the last sample which is a real-world image, the reconstruction of rich details is restricted in Real-SRGAN. In contrast, our proposed DASR outperforms the others in reconstructing realistic details and inhibiting artifacts, thanks to the effective degradation-adaptive framework and the joint optimization of multiple experts. More visual comparisons can be found in Section 6.4 in the Appendix.

Figure 4: Ablation study. (a) and (b) validate the models with different ; (c) appends a sigmoid layer to the weighting module ; (d) conducts classical MoE [17, 21, 9, 50] where the output of multiple experts are fused; (e) performs dynamic convolution with a single expert by learning a mapping matrix and multiplying it to the parameters; (f) conducts dynamic convolution following the work [27]; (g) applies EDSR-M backbone to DASR; (h) denotes our default DASR model.

4.5 Ablation Study

We conduct comprehensive ablation studies on our proposed DASR model by using real-world images and depict the visual results in Fig. 4.

Effectiveness of . Models in Figs. 4(a) and (b) evaluate the selection of . It can be seen that using experts leads to relatively smooth results, while models of in (h) and in (b) enhance the generation of details. As shows similar visual quality to , we consider that is sufficient to model the proposed degradation space.

Effectiveness of model design. Figs. 4(c) and (d) validate the effectiveness of our model design. The result in (c) demonstrates that adding a sigmoid layer to the weighting module cannot improve the performance. As we mix different experts in terms of model parameters, there is no need to ensure positive weights by a sigmoid layer. The experts in Fig. 4(d) are fused following the strategy of classical MoE [17, 21, 9, 50], where all experts are forwarded and the outputs are fused. We can see that the result of classical MoE in (d) lacks fine details compared to (h), yet its computational cost is times heavier than our DASR.

Effectiveness of different dynamic convolutions. Figs. 4(e) and (f) compare different dynamic convolutions [27, 53] without introducing many additional parameters. While the inference latency and FLOPs are increased, the performance of those methods drops, e.g., the artifacts generated in (e). We believe it is the joint optimization of multiple experts and the degradation-adaptive mixture that make our DASR more effective than other methods.

Generalization to different backbone. Fig. 4(g) applies the EDSR-M backbone to DASR. The satisfactory perceptual quality of (g) demonstrates the generalization capacity of our proposed DASR to different backbone networks.

4.6 User-Interactive Super-resolution

Figure 5: Example of user-interactive super-resolution. (a) is the input image with bicubic upsampling; (b) is the result of DASR where the degradation parameters are estimated automatically by model ; (c) and (d) are generated by manually increasing and decreasing the scale of blur kernel, respectively; (e) and (f) are the super-resolution results by manually increasing and decreasing the level of noise, respectively.

One interesting advantage of our DASR over other Real-ISR methods is that it supports easy user-interactive super-resolution during inference, owing to its interpretable and compact degradation representation.

We depict an example of user-interactive super-resolution in Fig. 5. As can be seen, the proposed DASR allows explicit user control to customize the super-resolution effects. Manually setting larger values to the blur-related parameters (e.g., kernel scale) leads to sharper super-resolution results, as shown in Fig. 5(c), while adjusting the level of noise can flexibly balance between image details and noise, as shown in Fig. 5(e) and (f). Such an advantage of flexible user control makes our DASR very attractive in practical Real-ISR tasks.

5 Conclusion

In this paper, we proposed an efficient degradation-adaptive network, namely DASR, for the real-world image super-resolution (Real-ISR) task. In order to improve the modeling capacity and flexibility of various degradation levels, we jointly learned multiple super-resolution experts and adaptively mixed them into one expert in a degradation-aware manner. The proposed DASR was not only degradation adaptive but also efficient during inference. Extensive quantitative and qualitative experiments were conducted. The results demonstrated that DASR not only achieved superior performance on images with a wide range of degradation levels but also kept good efficiency for easy deployment. In addition, DASR allowed easy user control for customized super-resolution results.

6 Appendix

6.1 Detailed Settings of Degradation Modeling

We report the detailed parameter settings of our degradation modeling in Table 3. We partition the whole degradation space into levels

, and randomly select one of them to generate the LR-HR image pairs during training with a balanced probability of

. For the blur operation, we use isotropic and anisotropic Gaussian kernels with a probability of , where we set if isotropic blur kernel is specified. In the second degradation stage of , following the practice in Real-ESRGAN [47], we skip the blur operation with a probability of , and perform sinc kernel filtering with a probability of . We finally resize the image to the desired LR size, i.e., of the original size.

For those operations that have more than one mode, e.g., the resize mode, we use a one-hot vector to indicate the choice of mode in . For other parameters, we normalize each of them by , where and indicate the original value, the normalized value, the minimum and maximum values of the parameter, respectively.

6.2 Details of Training Losses

As discussed in Section 3.3, the total training loss is defined as

where the regression loss has been provided in Eq. (1) of the main paper. For the other three losses, the settings are the same as in Real-ESRGAN [47]. Specifically, the pixel loss is defined as the distance , where and denote the super-resolved image and the ground-truth HR image, respectively. For the perceptual loss , we first extract the {conv, conv, conv, conv, conv} feature maps of and by using the pre-trained VGG19 network [42], then calculate the weighted sum of the respective distances between the feature maps of and as the perceptual loss, where the weights are set to be [0.1, 0.1, 1, 1, 1]. For the adversarial loss , the U-Net discriminator with spectral normalization is adopted.

6.3 More Sample Images

In Fig. 6, we provide more sample images with different degradation levels in our datasets, as well as the ground-truth HR images. As can be seen from the figure, those images can cover a wide range of real-world degradations. The balanced sampling from the three levels during training improves the generalization capacity of our DASR to real-world images with different degradations.

6.4 More Qualitative Comparisons

In Fig. 7, we provide more qualitative comparisons of competing methods on real-world images, while in Figs. 8910 and 11, we provide more qualitative comparisons of competing methods on datasets with bicubic, Level-I, Level-II and Level-III degradations, respectively. Our models are trained by using the images in DIV2K, Flickr2K, and OutdoorScene-Training datasets. To further validate the generalization capability of DASR to different image contents, the visual comparisons in Figs. 8910 and 11 also include images from the Urban100 dataset by using the same degrading strategy as in our main paper. From those figures, consistent observations to our main paper can be made. Our DASR can generate more realistic structures and details on different degradations, benefiting from its degradation-adaptive strategy and the joint training and adaptive mixture of multiple experts.

Level Operation Parameter Stage 1 Stage 2
Range Range
Blur kernel size - -
standard deviation - -
standard deviation - -
rotation degree - -
Resize [up, down, keep] - - -
scale factor - -
resize mode [‘a’, ‘b’, ‘b’] - -
Noise type [‘G’, ‘P’] - -
sigma of Gaussian - -
scale of Poisson - -
gray probability - -
JPEG quality factor - -
mode of final resize [‘a’, ‘b’, ‘b’] - -
Blur kernel size - -
standard deviation - -
standard deviation - -
rotation degree - -
Resize [up, down, keep] - - -
scale factor - -
resize mode [‘a’, ‘b’, ‘b’] - -
Noise type [‘G’, ‘P’] - -
sigma of Gaussian - -
scale of Poisson - -
gray probability - -
JPEG quality factor - -
mode of final resize [‘a’, ‘b’, ‘b’] - -
Blur kernel size
standard deviation
standard deviation
rotation degree
sinc kernel size - -
of sinc kernel - -
Resize [up, down, keep] - -
scale factor
resize mode [‘a’, ‘b’, ‘b’] [‘a’, ‘b’, ‘b’]
Noise type [‘G’, ‘P’] [‘G’, ‘P’]
sigma of Gaussian
scale of Poisson
gray probability
JPEG quality factor
operating order - - R-J or J-R
mode of final resize - - [‘a’, ‘b’, ‘b’]
Table 3: Detailed parameter settings of the degradation sub-spaces . Here, ‘-’ indicates that the operation is not activated and the corresponding value in

is padded with

; [‘a’, ‘b’, ‘b’] denote the resize modes of [area, bilinear, bicubic]; [‘G’, ‘P’] denote the noise types of [Gaussian, Poisson]; is the cutoff frequency of the sinc kernel; R-J and J-R indicate the different operating orders of resizing and JPEG compression; denotes the value of .
Figure 6: More sample images with different levels of degradations in our constructed datasets, as well as the ground-truth HR images. Level-I, -II, and -III represent the samples whose degradations belong to , , and , respectively.
Figure 7: More qualitative comparison of competing methods on real-world images. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.
Figure 8: More qualitative comparison of competing methods on images with bicubic downsampling. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.
Figure 9: More qualitative comparison of competing methods on images with degradation of Level-I. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.
Figure 10: More qualitative comparison of competing methods on images with degradation of Level-II. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.
Figure 11: More qualitative comparison of competing methods on images with degradation of Level-III. The results of (b-f) are generated by using the officially released models, while the output of (g) is obtained by re-training the SRResNet backbone with our proposed degradation model. Better zoom in for details.

References

  • [1] N. Ahn, B. Kang, and K. Sohn (2018) Fast, accurate, and lightweight super-resolution with cascading residual network. In ECCV, Cited by: §1.
  • [2] R. Aljundi, P. Chakravarty, and T. Tuytelaars (2017) Expert gate: lifelong learning with a network of experts. In CVPR, Cited by: §2.3.
  • [3] A. Bulat, J. Yang, and G. Tzimiropoulos (2018) To learn image super-resolution, use a GAN to learn how to do image degradation first. In ECCV, Cited by: §1, §2.2.
  • [4] J. Cai, H. Zeng, H. Yong, Z. Cao, and L. Zhang (2019) Toward real-world single image super-resolution: a new benchmark and a new model. In ICCV, Cited by: §1, §2.1, §3.2.
  • [5] Y. Chen, X. Dai, M. Liu, D. Chen, L. Yuan, and Z. Liu (2020) Dynamic convolution: attention over convolution kernels. In CVPR, Cited by: §2.3, §3.1.
  • [6] V. Cornillere, A. Djelouah, W. Yifan, O. Sorkine-Hornung, and C. Schroers (2019) Blind image super-resolution with spatially variant degradations. ACM Transactions on Graphics (TOG) 38 (6), pp. 1–13. Cited by: §2.2.
  • [7] C. Dong, C. L. Chen, and T. Xiaoou (2016)

    Accelerating the super-resolution convolutional neural network

    .
    In ECCV, Cited by: §1.
  • [8] C. Dong, C. C. Loy, K. He, and X. Tang (2014) Learning a deep convolutional network for image super-resolution. In ECCV, Cited by: §1, §3.2.
  • [9] M. Emad, M. Peemen, and H. Corporaal (2022) MoESR: blind super-resolution using kernel-aware mixture of experts. In WACV, Cited by: Figure 4, §4.3, §4.5.
  • [10] M. Fritsche, S. Gu, and R. Timofte (2019) Frequency separation for real-world super-resolution. In ICCVW, Cited by: §1, §2.1.
  • [11] D. Fuoli, L. Van Gool, and R. Timofte (2021) Fourier space losses for efficient perceptual image super-resolution. In ICCV, Cited by: §2.2.
  • [12] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. NeurIPS. Cited by: §2.2.
  • [13] S. Gross, M. Ranzato, and A. Szlam (2017) Hard mixtures of experts for large scale weakly supervised vision. In CVPR, Cited by: §2.3.
  • [14] J. Gu, H. Lu, W. Zuo, and C. Dong (2019) Blind super-resolution with iterative kernel correction. In CVPR, Cited by: §1, §2.2, §3.2, §4.2.
  • [15] Z. Hui, J. Li, X. Wang, and X. Gao (2021) Learning the non-differentiable optimization for blind super-resolution. In CVPR, Cited by: §2.2.
  • [16] A. Ignatov, N. Kobyshev, R. Timofte, K. Vanhoey, and L. Van Gool (2017) DSLR-quality photos on mobile devices with deep convolutional networks. In ICCV, Cited by: §1, §2.2.
  • [17] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton (1991) Adaptive mixtures of local experts. Neural computation 3 (1), pp. 79–87. Cited by: §2.3, §3.1, Figure 4, §4.3, §4.5.
  • [18] X. Ji, Y. Cao, Y. Tai, C. Wang, J. Li, and F. Huang (2020) Real-world super-resolution via kernel estimation and noise injection. In CVPRW, Cited by: §1.
  • [19] Y. Jo, S. W. Oh, P. Vajda, and S. J. Kim (2021) Tackling the ill-posedness of super-resolution through adaptive target generation. In CVPR, Cited by: §1.
  • [20] J. Johnson, A. Alahi, and L. Fei-Fei (2016) Perceptual losses for real-time style transfer and super-resolution. In ECCV, Cited by: §1, §2.2.
  • [21] M. I. Jordan and R. A. Jacobs (1994) Hierarchical mixtures of experts and the em algorithm. Neural computation 6 (2), pp. 181–214. Cited by: §2.3, §3.1, Figure 4, §4.3, §4.5.
  • [22] J. Kim, J. K. Lee, and K. M. Lee (2016) Deeply-recursive convolutional network for image super-resolution. In CVPR, Cited by: §1.
  • [23] S. Y. Kim, H. Sim, and M. Kim (2021) KOALAnet: blind super-resolution using kernel-oriented adaptive local adjustment. In CVPR, Cited by: §2.2.
  • [24] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §4.1.
  • [25] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, et al. (2017) Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, Cited by: §2.2, §3.1, §3.1, §4.1, §4.3.
  • [26] C. Li, A. Zhou, and A. Yao (2021) Omni-dimensional dynamic convolution. In ICLR, Cited by: §2.3.
  • [27] Y. Li, Y. Chen, X. Dai, M. Liu, D. Chen, Y. Yu, L. Yuan, Z. Liu, M. Chen, and N. Vasconcelos (2021) Revisiting dynamic convolution via matrix decomposition. In ICLR, Cited by: §2.3, Figure 4, §4.5.
  • [28] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte (2021) SwinIR: image restoration using swin transformer. In ICCVW, Cited by: §2.1, §3.1, §4.2, §4.3.
  • [29] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee (2017) Enhanced deep residual networks for single image super-resolution. In CVPRW, Cited by: §3.1, §3.1.
  • [30] A. Liu, Y. Liu, J. Gu, Y. Qiao, and C. Dong (2021) Blind image super-resolution: a survey and beyond. arXiv preprint arXiv:2107.03055. Cited by: §1, §1, §2.1, §2.2.
  • [31] P. Liu, H. Zhang, Y. Cao, S. Liu, D. Ren, and W. Zuo (2020) Learning cascaded convolutional networks for blind single image super-resolution. Neurocomputing 417, pp. 371–383. Cited by: §2.2.
  • [32] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo (2021) Swin transformer: hierarchical vision transformer using shifted windows. In ICCV, Cited by: §2.1.
  • [33] A. Lugmayr, M. Danelljan, R. Timofte, M. Fritsche, S. Gu, K. Purohit, P. Kandula, M. Suin, A. Rajagoapalan, N. H. Joon, et al. (2019) AIM 2019 challenge on real-world image super-resolution: methods and results. In ICCVW, Cited by: §1, §2.1.
  • [34] A. Lugmayr, M. Danelljan, and R. Timofte (2019) Unsupervised learning for real-world super-resolution. In ICCVW, Cited by: §1.
  • [35] A. Lugmayr, M. Danelljan, and R. Timofte (2020) NTIRE 2020 challenge on real-world image super-resolution: methods and results. In CVPRW, Cited by: §1, §2.1.
  • [36] Z. Luo, Y. Huang, S. Li, L. Wang, and T. Tan (2020) Unfolding the alternating optimization for blind super resolution. In NeurIPS, Cited by: §1, §2.2.
  • [37] C. Ma, Y. Rao, Y. Cheng, C. Chen, J. Lu, and J. Zhou (2020) Structure-preserving super resolution with gradient guidance. In CVPR, Cited by: §1, §2.2.
  • [38] S. Maeda (2020) Fast and flexible image blind denoising via competition of experts. In CVPRW, Cited by: §2.3, §3.1.
  • [39] S. Maeda (2020) Unpaired image super-resolution using pseudo-supervision. In CVPR, Cited by: §2.2.
  • [40] H. Ren, A. Kheradmand, M. El-Khamy, S. Wang, D. Bai, and J. Lee (2020) Real-world super-resolution using generative adversarial networks. In CVPRW, Cited by: §1.
  • [41] M. S. Sajjadi, B. Scholkopf, and M. Hirsch (2017) Enhancenet: single image super-resolution through automated texture synthesis. In ICCV, Cited by: §1.
  • [42] K. Simonyan and A. Zisserman (2015) Very deep convolutional networks for large-scale image recognition. In ICLR, Cited by: §6.2.
  • [43] J. W. Soh, G. Y. Park, J. Jo, and N. I. Cho (2019) Natural and realistic single image super-resolution with explicit natural manifold discrimination. In CVPR, Cited by: §1.
  • [44] D. Song, Y. Wang, H. Chen, C. Xu, C. Xu, and D. Tao (2021) Addersr: towards energy efficient image super-resolution. In CVPR, Cited by: §1.
  • [45] J. Sun, Z. Xu, and H. Shum (2010) Gradient profile prior and its applications in image super-resolution and enhancement. IEEE Transactions on Image Processing 20 (6), pp. 1529–1542. Cited by: §1.
  • [46] L. Wang, Y. Wang, X. Dong, Q. Xu, J. Yang, W. An, and Y. Guo (2021) Unsupervised degradation representation learning for blind super-resolution. In CVPR, Cited by: §2.2.
  • [47] X. Wang, L. Xie, C. Dong, and Y. Shan (2021) Real-ESRGAN: training real-world blind super-resolution with pure synthetic data. In ICCVW, Cited by: §1, §1, §2.1, §2.2, §3.1, §3.2, §3.2, §3.3, §4.1, §4.2, §4.2, §4.2, §4.3, §4.3, §4.4, §6.1, §6.2.
  • [48] X. Wang, K. Yu, C. Dong, and C. C. Loy (2018) Recovering realistic texture in image super-resolution by deep spatial feature transform. In CVPR, Cited by: §2.2.
  • [49] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy (2018) ESRGAN: enhanced super-resolution generative adversarial networks. In ECCVW, Cited by: §1, §1, §2.1, §2.2, §3.2, §3.3, §4.1, §4.2, §4.3.
  • [50] Y. Wang, L. Wang, H. Wang, P. Li, and H. Lu (2020) Blind single image super-resolution with a mixture of deep networks. Pattern Recognition 102, pp. 107169. Cited by: §2.3, Figure 4, §4.3, §4.5.
  • [51] P. Wei, Z. Xie, H. Lu, Z. Zhan, Q. Ye, W. Zuo, and L. Lin (2020) Component divide-and-conquer for real-world image super-resolution. In ECCV, Cited by: §1.
  • [52] Y. Wei, S. Gu, Y. Li, R. Timofte, L. Jin, and H. Song (2021) Unsupervised real-world image super resolution via domain-distance aware training. In CVPR, Cited by: §1, §2.1, §2.2.
  • [53] Y. Xu, S. R. Tseng, Y. Tseng, H. Kuo, and Y. Tsai (2020) Unified dynamic convolutional network for super-resolution with variational degradations. In CVPR, Cited by: §2.2, §4.5.
  • [54] B. Yang, G. Bender, Q. V. Le, and J. Ngiam (2019) Condconv: conditionally parameterized convolutions for efficient inference. NeurIPS. Cited by: §2.3, §3.1.
  • [55] W. Yang, W. Wang, X. Zhang, S. Sun, and Q. Liao (2019) Lightweight feature fusion network for single image super-resolution. IEEE Signal Processing Letters 26 (4), pp. 538–542. Cited by: §1.
  • [56] Y. Yuan, S. Liu, J. Zhang, Y. Zhang, C. Dong, and L. Lin (2018) Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In CVPRW, Cited by: §2.2.
  • [57] K. Zhang, M. Danelljan, Y. Li, R. Timofte, et al. (2020) AIM 2020 challenge on efficient super-resolution: methods and results. In ECCVW, Cited by: Table 2.
  • [58] K. Zhang, L. V. Gool, and R. Timofte (2020) Deep unfolding network for image super-resolution. In CVPR, Cited by: §2.2.
  • [59] K. Zhang, S. Gu, R. Timofte, et al. (2019) AIM 2019 challenge on constrained super-resolution: methods and results. In ICCVW, Cited by: Table 2.
  • [60] K. Zhang, J. Liang, L. Van Gool, and R. Timofte (2021) Designing a practical degradation model for deep blind image super-resolution. In ICCV, Cited by: §1, §1, §1, §2.1, §2.2, §3.1, §3.2, §3.2, §3.3, §4.1, §4.2, §4.2, §4.2, §4.3, §4.3.
  • [61] K. Zhang, W. Zuo, and L. Zhang (2018) Learning a single convolutional super-resolution network for multiple degradations. In CVPR, Cited by: §1, §2.2, §3.2.
  • [62] K. Zhang, W. Zuo, and L. Zhang (2019) Deep plug-and-play super-resolution for arbitrary blur kernels. In CVPR, Cited by: §2.2.
  • [63] X. Zhang, H. Zeng, and L. Zhang (2021) Edge-oriented convolution block for real-time super resolution on mobile devices. In ACM Multimedia, Cited by: §1.
  • [64] X. Zhang, Q. Chen, R. Ng, and V. Koltun (2019) Zoom to learn, learn to zoom. In CVPR, Cited by: §2.1, §3.2.
  • [65] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu (2018) Image super-resolution using very deep residual channel attention networks. In ECCV, Cited by: §1, §2.2.
  • [66] Y. Zhang, Y. Tian, Y. Kong, B. Zhong, and Y. Fu (2018) Residual dense network for image super-resolution. In CVPR, Cited by: §1.
  • [67] R. Zhou and S. Susstrunk (2019) Kernel modeling super-resolution on real low-resolution images. In ICCV, Cited by: §1, §2.2.