Conditional Meta-Network for Blind Super-Resolution with Multiple Degradations

04/08/2021 ∙ by Guanghao Yin, et al. ∙ Zhejiang University ByteDance Inc. 0

Although single-image super-resolution (SISR) methods have achieved great success on single degradation, they still suffer performance drop with multiple degrading effects in real scenarios. Recently, some blind and non-blind models for multiple degradations have been explored. However, those methods usually degrade significantly for distribution shifts between the training and test data. Towards this end, we propose a conditional meta-network framework (named CMDSR) for the first time, which helps SR framework learn how to adapt to changes in input distribution. We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet). Specifically, the ConditionNet of our framework first learns the degradation prior from a support set, which is composed of a series of degraded image patches from the same task. Then the adaptive BaseNet rapidly shifts its parameters according to the conditional features. Moreover, in order to better extract degradation prior, we propose a task contrastive loss to decrease the inner-task distance and increase the cross-task distance between task-level features. Without predefining degradation maps, our blind framework can conduct one single parameter update to yield considerable SR results. Extensive experiments demonstrate the effectiveness of CMDSR over various blind, even non-blind methods. The flexible BaseNet structure also reveals that CMDSR can be a general framework for large series of SISR models.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 6

page 7

page 8

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Single image super-resolution (SISR) has posed a long-standing challenge in low-level vision with numerous important applications. It is an ill-posed problem that aims to restore a High-Resolution (HR) image by adding missing high-frequency information from a Low-Resolution (LR) image. Since the pioneer method by SRCNN [5]

, deep learning approaches 

[11, 12, 18, 46, 16, 36, 33] have exhibited impressive performance. However, most existing methods focus on a fixed degradation, , bicubic down-sampling or single Gaussian blurring. Those settings really limit their generalization ability. In addition to down-sampling, unknown blurring and noise may also be introduced during the acquisition of LR images. When the data distributions at test time mismatch the training distributions (referred to as distribution shift [26, 14]), those learning-based models will suffer severe performance drop [38].

Figure 1: SR results () with anisotropic Gaussian blur and AWGN (Severe degradation). Without anisotropic blur kernel for training, our CMDSR outperforms all the blind cascaded schemes, even better than non-blind method, SRMD [43].

In recent years, several non-blind and blind approaches for multiple degradations are proposed. The non-blind methods [27, 43, 44, 37] usually take the ground truth (GT) degradation maps as an additional input to establish the LR-HR mapping. Although the non-blind models have achieved satisfactory performance with the guidance of predefined information, the problem with unknown realistic degradation largely limits their usage in real-world applications. On the other hand, the blind methods [22, 29, 7, 3]

only considered the blur and down-sampling in the degradation mode. Then, the cascaded schemes with blind denoising, blur estimation and SR methods are organized to restore multi-degraded LR image 

[43, 19, 21]. However, each stage has a negative impact on each other, (, the denoiser will make the LR image more blurred and lead to kernel mismatch, increasing the difficulty of deblurring, seen in Fig. 4). Recently, there are some new attempts for blind SR. Several CycleGAN [47] based methods [4, 40, 21, 20] learn from unpaired LR-HR images, but they are more difficult to train. ZSSR [29] explores the zero-shot solution for the first time, where the CNN learns the mapping from the LR image and its downscaled versions (self-supervision). But it requires thousands of self-training iterations for each LR image. Recently, two optimization based meta-learning strategies, MZSR [31] and MLSR [25], have been proposed to accelerate the self-training steps from 1000 to 10. But they show worse results with large scale factor because the self-downsampled image cannot provide enough information.

Given the facts above, our work aims to the first attempts towards the following questions at the same time: (1) Can we propose a blind framework to effectively handle multiple degradations, especially when the accurate degradation estimation is very difficult? (2) Is it possible to overcome the distribution shift with an adaptive model, which can adapt its parameters to the unknown degraded LR images?

Figure 2:

Conditional feature extraction at task-level

. There is a prior knowledge that degraded LR images from the same task have the common parameters (, kernel width , noise level ) and those are different between different tasks. Therefore, we attempt to learn degradation prior at task-level and use the extracted conditional feature to adapt SR model to a new task.

In this paper, we propose a conditional meta-network for blind SR with multiple degradations (CMDSR) to largely overcome the aforementioned two problems. For the first challenge, there is a prior knowledge which inspires us to handle it at task-level. As shown in Fig. 2, the LR images with different degradations obey different distributions. Although the accurate estimation of degradation is hard, images from the same task may contain the similar implicit feature to describe their common degradation patterns. Therefore, we group these LR images into different tasks and extract the degradation prior at task-level to describe the degradation patterns. For the second challenge, we use the distribution information extracted from a group of LR samples to make SR network adaptively adjust its parameters according to the distribution changes, such that our framework can handle distribution shifts.

Specifically, our CMDSR consists of two parts: the BaseNet and ConditionNet. As shown in Fig. 3, the shallow ConditionNet learn the feature representations of different tasks. Then, BaseNet multiplies its convolution weights with modulated conditional features in channel-wise. Finally, the adapted BaseNet restores the LR image. Inspired by recent contrastive learning [8, 9], we propose a task contrastive loss to decrease the distances of the conditional features from the same task and increase those from different tasks. Algorithm 1

presents the training stage, where BaseNet and ConditionNet are alternately optimized with different steps and loss functions. Algorithm 

2 presents the test stage, where the extracted degradation prior from ConditionNet adapts BaseNet to handle distribution shift.

Since the shallow ConditionNet only uses the small size of support patches (, ), the time and computation cost of conditional feature extraction will be very little compared with BaseNet reconstruction. Without designing new complicated SR network, the proposed framework simply use 10 res-blocks as BaseNet (called SRResNet-10) and achieves superior performance with blind methods. For complicated degradations, CMDSR even outperforms the non-blind models. It should be noted that our framework has no strict restrictions on the BaseNet structure. The ablation experiments in Table 5 proves CMDSR can be extended to other SISR models. To the best of our knowledge, the proposed CMDSR is the first meta-network framework for blind SISR with multiple degradations.

In summary, our overall contribution is three-fold: (1) We present a first blind meta-network framework to adaptively handle SISR with multiple degradations at task-level. (2) A task contrastive loss is proposed for task-level feature extraction. (3) Our method is blind, fast and flexible, hence, can be applied as a general framework.

Figure 3: Overall scheme of our proposed CMDSR. Left: Network structures of ConditionNet and BaseNet. ConditionNet extracts conditional feature from input support set. BaseNet adapts its parameters to task according to the modulated features. Although we simply use SRResNet-10 as BaseNet, CMDSR has no strict restrictions on the BaseNet structure. Right: Loss functions optimization. BaseNet and ConditionNet are alternately trained by different loss configurations.

2 Related Work

Blind Single-Image Super-Resolution. Compared with typical SISR models [5, 11, 18, 46] tailored to specific single downsampler, blind SISR is a more challenging task, which assumes that the blur kernels are unavailable at test time. Previous methods usually combine the well-designed kernel-estimation and typical SISR methods. Michaeli et al. [22] mined the internal patch recurrence to estimate the blur kernel. Bell et al. [3] proposed KernelGAN to learn the blur kernel distribution. In order to relieve the mismatch between the estimated kernel and the real kernel, IKC [7]

iteratively trained the estimation and correction networks. Although the accuracy of estimated kernel is largely improved, it remains very challenging for severe degradation. By using ideas from zero-shot and self-supervised learning, ZSSR 

[29] efficiently exploited the internal recurrence of information inside an image. But this image-specific model requires self-training for each LR image, which is time-comsuming and cannot be applied to deep structure.

Multiple Degradations. Relatively less attention has been paid to SISR with multiple degradations despite it is important for real applications. Based on the perspective from maximum a posteriori (MAP) framework, existing non-blind methods, , SRMD [43] and UDVD [37], concatenated LR image, predefined blur kernel and noise maps as the input. Thus, SR result closely depends on both LR image and degradation pattern. The blind schemes [43, 21] are usually the sequential combinations of denoising [41], blur estimation [3] and SR models [29]. CBSR [19] adapted a cascaded architecture, which can be jointly end-to-end learned from training data. All those methods will degrade for distribution shifts. Recently, the unpaired SISR methods [4, 21] conducted the domain transfer between the clean and real degraded domains. But it remains very challenging to train a stable model for various shifts.

Meta-Learning. Meta-learning, commonly known as learn how to learn, refers to the process of improving a learning algorithm over multiple learning episodes. As [17, 39] point, diverse meta-learning methods can be categorized into three groups: (1) Metric based methods [35, 30, 32] perform non-parametric learning in the metric space, which are far largely restricted to the popular. (2) Optimization based methods [2, 6, 17, 45] use gradient descent to solve the optimization problem of meata-learner. A most famous example is MAML [6], which learns the transferable initial parameters, such that few gradient updates lead to performance improvement. Recently,  [45] proposed Adaptive Risk Minimization to handle group distribution shift for image classification. (3) Network based methods [28, 23, 24] use network to learn across task knowledges and rapidly updates its parameters to new task.

There are few explorations of meta-learning for SISR. Recently, two gradient based meta-learning models [31, 25] have been proposed. MZSR [31] and MLSR [25] both employed the typical MAML framework [6] to accelerate the self-supervised training. Nevertheless, for large scale factors, the size of self-downsampled LR son image becomes too small to provide enough information. Our proposed framework directly extract distribution prior at task-level and adapt the parameters of SR network, which avoids their shortcomings.

3 Proposed Method

3.1 CMDSR Setting

Our work focuses on blind SISR with multiple degradations, including blur, noise and down-sampling, which will simultaneously happen in a real-world case [38]. The degradation process is formulated as:

(1)

where , , k, , and n denote LR, HR image, blur kernel, convolution, decimation with scaling factor of . and Gaussian noise. In this paper, we use the configuration in Eq. (1). to synthesize LR images for training.

The key goal of our work is to develop a framework that can adapt and generalize in the face of degradation shift using only a small number of examples. To accomplish this, we need to find the representation which can describe degradation prior of the LR image and guide the model to adapt to this degradation pattern. As explained in Section 1, there is a fact that LR images from the same task are degraded with the same pattern, which inspires us to view this problem at task-level, not the single image. Therefore, we present a new thought to mine the implicit task-level semantics with different tasks. This extracted feature can be further used as a context prior to adapt the parameters of SR model.

In our framework, we provide two settings to access training data: (1) The training data should be grouped into different tasks. We consider the multi-degradation distribution over meta-training tasks . For task , it consists of LR-HR pairs, where LR images are synthesized from HR images with th degradation configuration. (2) ConditionNet extracts task-level feature from n LR patches (named support set) belonging to the same task, and BaseNet restores the single LR input . With those settings, our framework can treat the training data at task-level.

3.2 Networks of CMDSR

Our framework consists of ConditionNet and BaseNet, which are shown in Fig. 3. It should be noted that our framework has no strict restrictions on the BaseNet structure. In this paper, the backbone of BaseNet is simply designed as SRResNet-10, which consists of 10 res-blocks.

First, ConditionNet, denoted , extracts the conditional feature , which describes the degradation pattern of the input support set . It is formulated as:

(2)

where is the parameters of ConditionNet and

denotes the size of support size at each step. In order to extract task-level feature, we design a shallow ConditionNet with 2 average pooling layers and 4 convolution layers followed with ReLU and keep the input sample size unchanged during training and test time. The internal channels of convolution layers are

, 64, 64, 128, 128.

Then, BaseNet, denoted as , adapts its original parameters to with the conditional feature . Specifically, we adapt the parameters of 20 conv-layers of internal 10 res-blocks. We use 20 full-connected layers to generate adaptive coefficients with as input. The FC modulation layers change the number of channels to match convolution weights and also adjusts for each conv-layer. Then, the modulated features multiply with the weight of convolution in channel-wise:

(3)

where and are the original and adapted weights, is the modulated variable corresponding to the th channel of th conv-layer. Finally, the adapted BaseNet restores the input LR image to the SR image . The whole process of BaseNet is formulated as:

(4)

3.3 Species of Loss Functions

Owing to the fact that ConditionNet and BaseNet serve different purposes, they have different sensitivity to learning rate and loss functions. Hence, we optimize them alternately with different learning rates and optimization objectives. ConditionNet is trained after every steps of BaseNet training. The details of loss functions are listed as follows.

Reconstruction Loss. Similar to most SISR models [11, 12, 18, 46], we adopt a supervised reconstruction loss to calculate the distance between HR image and output SR image of BaseNet in pixel-wise,

(5)

Task Contrastive Loss. As the prior knowledge explains before, our ConditionNet should output the conditional features, which are similar to those from the same degradation and dissimilar to others from different degradations. Instead of matching an input to a fixed target, recent works of contrastive learning [8, 9] measure the similarities of sample pairs in a representation space. Inspired by them, we propose a task contrastive loss, which decreases the inner-task distance and increases the cross-task distance between different conditional features.

For the inner-task loss, we sample two support sets from the same task, each containing LR patches, represented as . And ConditionNet ; extracts features , from , . The inner-task loss are calculated as:

(6)

For the cross-task loss, we resample LR images from another task, denoted as support set , which show different degradation distribution from . Also, ConditionNet ; extracts conditional features , from , . Then, the cross-task loss can be calculated as

(7)

Finally, we use the Logarithm and Exponential transformations to combine and . Those transformations can smoothly optimize ConditionNet to increase the inner-task distance and decrease the cross-task distance. When is small and is large, the combined will be close to . The task contrastive loss is formulated as:

(8)

Combined Loss. As shown in Table 4, if we only train ConditionNet by the task contrastive loss in an unsupervised way, the output feature may not be entirely beneficial for the generalization of SISR. In order to make a balance between task-level feature extraction and SR reconstruction, we combine the reconstruction loss in Eq. (5) and task contrastive loss in Eq. (8) with coefficient to constraint ConditionNet, which is formulated as:

(9)

3.4 CMDSR Algorithm

Data: distribution over tasks:
Input: ConditionNet and BaseNet parameters: ,
Input: Task size: , support size: , update step: , loss coefficient: , learning rates: ,
1 for  do
2       Random sample tasks ;
3       foreach  do
4             Sample LR-HR patches ;
5             Extract conditional feature : ;
6             Compute adapted parameters of BaseNet: ;
7             Evaluate the reconstruction loss in Eq. (5): ;
8            
9       end foreach
10      Update BaseNet with reconstruction loss: ;
11       if  Mod  then
12             Resample LRs for support sets from tasks of line 4: ;
13             Resample LRs for support sets from another tasks : ;
14             foreach  do
15                   Evaluate inner-task loss as Eq. (3.3): ;
16                   Evaluate cross-task loss as Eq. (3.3): ;
17                   Evaluate task contrastive loss as Eq. (8): ;
18                  
19             end foreach
20            Update ConditionNet with combined loss: ;
21            
22       end if
23      
24 end for
Algorithm 1 CMDSR Training
Data: LR test image: , LR support set:
Input: Trained parameters of ConditionNet and BaseNet : ,
Output: Restored SR image:
1 Extract conditional feature ;
2 Compute adapted parameters of BaseNet:   ;
return
Algorithm 2 CMDSR Test

CMDSR training procedure is shown in Algorithm 1. ConditionNet and BaseNet are alternately trained until they converge. In line 4, tasks are randomly sampled from degradation distribution for each step. In Line 3-9, BaseNet is adapted and supervised with HR-LR pairs. In Line 10-18, for every steps, ConditionNet is optimized with the combined unsupervised of Line 16 and supervised of line 7.

CMDSR test stage is shown in Algorithm 2. For test support set , we can randomly sample patches from other LR images, which have the same degradation pattern with , or from itself. For convenience, we choose the self-patches to get the support set at test time. With the conditional feature extracted from support set, BaseNet performs fast adaptation to test distribution at one step and produces the restored SR image.

4 Experiments

4.1 Experimental Setting

As introduced before, the input of CMDSR consists of two parts: the support set for ConditionNet and the LR image for BaseNet. During training, the sizes of support sets and LR images are separately and . At test time, the input of ConditionNet is and LR input is the full image. For training configurations, we set the task size as and the size of support set is 20. The patch size is . The update step is , which means ConditionNet is joined for training after BaseNet has been trained for 9 steps. The loss coefficient of Eq. (9) is The initial learning rates of BaseNet and ConditionNet are and , respectively. The ADAM optimizer [13] is applied.

Kernel Noise Level Types Models Datasets
Set5 Set14 BSD100
15 BI-structured SR model RCAN [46] 24.83 23.64 23.33
Blind multi-degraded SR model ZSSR [29] 25.40 24.30 24.05
IRCNN [42] 28.35 - -
Blind denoising/deblurring + Blind SR model DnCNN [41]+KernelGAN [3]+ZSSR [29] 27.02 25.46 25.34
DnCNN [41] + IKC [7] 28.16 26.11 25.68
Blind denoising+ Gt blur kernel maps+ Non-blind SR model DnCNN [41] + SRMDNF [43] 28.31 26.19 25.79
Gt Degradation maps+ Non-blind SR model SRMD [43] 28.79 26.48 25.95
UDVD [37] 29.04 26.82 26.08
Conditional Meta-Network Our 28.35 26.23 25.83
15 BI-structured SR model RCAN [46] 23.24 22.42 22.48
Blind multi-degraded SR model ZSSR [29] 24.91 23.74 23.57
IRCNN [42] 24.36 - -
Blind denoising/deblurring + Blind SR model DnCNN [41]+KernelGAN [3]+ZSSR [29] 26.08 24.66 24.65
DnCNN [41] + IKC [7] 26.84 25.09 25.02
Blind denoising+ Gt blur kernel maps+ Non-blind SR model DnCNN [41] + SRMDNF [43] 23.85 21.04 21.79
Gt Degradation maps+ Non-blind SR model SRMD [43] 26.82 25.12 24.86
UDVD [37] 26.98 25.33 24.96
Conditional Meta-Network Our 27.10 25.39 25.12
Table 1: Average PSNR values with scale factor 4 on Simple and Middle degradations, where the degradation patterns are in range of training set. We use the provided opensource codes of SOTA models to compute their results, except results of IRCNN [42] and UDVD [37], which are directly extracted from their publications. The best results are highlighted in red. It’ clear our CMDSR outperforms all blind methods and even better than non-blind methods for Middle degradation.
Figure 4: SR results (×4) of image ”baby” with Middle degradation. Our CMDSR outperforms all the blind and non-blind methods.

We use the LR-HR pairs of DIV2K [1] for meta-training. Following previous works [43, 37], the degraded LR images of different tasks are synthesized based on Eq. (1). Specifically, we only use isotropic Gaussian blur kernels. The blur kernel widths are in range [0.2, ] for scale factor

. We sample the kernel width by a stride of 0.1. For noise, we set the Additive White Gaussian Noise (AWGN) with the noise levels

in range [0, 75]. Due to the page limit, we present results of SR tasks. All the experiments were conducted on NVIDIA Tesla-V100 GPUs.

Kernel Noise Level Types Models Datasets
Set5 Set14 BSD100
50 BI-structured SR model RCAN [46] 16.10 15.79 15.75
Blind multi-degraded SR model ZSSR [29] 17.89 17.46 17.79
Blind denoising/deblurring + Blind SR model DnCNN [41]+KernelGAN [3]+ZSSR [29] 22.32 21.69 22.34
DnCNN [41] + IKC [7] 22.18 21.63 22.23
Blind denoising+ Gt blur kernel maps+ Non-blind SR model DnCNN [41] + SRMDNF [43] 21.63 21.18 21.99
Gt degradation maps+ Non-blind SR model SRMD [43] 22.43 21.83 22.43
Conditional Meta-Network Our 23.07 22.14 23.03
Table 2: Average PSNR values with scale factor 4 on Severe degradation, where the degradation pattern doesn’t appear in training set. We use the opensource codes of SOTA models to compute results, except IRCNN [42] and UDVD [37], which don’t provide their original codes for SISR. The best results are highlighted in red. It should be noted that the non-blind SRMD uses anisotropic blur kernels for training. But only using isotropic blur kernels for training, our CMDSR significantly outperforms all SOTA methods.
Figure 5: SR results (×4) of image ”pepper” with Severe degradation. CMDSR outperforms all the blind and non-blind methods.

4.2 Experiments on Synthetic Images

To demonstrate the effectiveness and generalization of our framework, we evaluate the proposed CMDSR from the perspectives of matched degradation and shift degradation. Following [43, 37], we use the Simple and Middle testsets, which are in range of meta-training data. Since our framework is trained with isotropic Gaussian blur kernel, we add the Severe testset with anisotropic Gaussian blur kernel to validate whether CMDSR can handle degradation shift. Precisely, three testsets are synthesized as: (1) Simple: isotropic Gaussian blur kernel with kernel width followed by BI downsampler () and AWGN with noise level 15. (2) Middle: isotropic Gaussian blur kernel with kernel width followed by BI downsampler () and AWGN with noise level 15. (3) Severe: anisotropic Gaussian blur kernel with kernel width , , angle followed by BI downsampler () and AWGN with noise level 50.

We systematically compare the proposed framework with non-blind and blind methods. For non-blind methods, two latest models SRMD [43] and UDVD [37] are used, which use the accurate blur kernel and noise maps as the additional inputs. For blind methods, the SOAT BI structured SR model, RCAN [46] is first compared. Because the blind SR method for multiple degradations has not been studied sufficiently, except ZSSR [29] (5000 steps) and IRCNN [42], we follow [43, 3, 21] to add cascaded schemes, which combine SR models with blind denoising and deblurring methods: DnCNN[41] + KernelGAN[3] + ZSSR[29] (5000 steps), DnCNN [41] + IKC [7]. Moreover, in order to evaluate the mutual negative influence between cascaded stages, we add a baseline by combining blind denoiser and non-blind SR model, DnCNN [41] + SRMDNF [43].

Figure 6: SR results (×4) of real images ”frog” and ”flower. CMDSR produces best results with less artifacts and brighter color.

Matched Degradation. Table 1 shows PSNR values on Simple and Middle

degradations, where degradation patterns match the range of training data. Due to the unawareness of multiple degradations, the BI structured model RCAN 

[46] produces worse PSNR. When kernel width and noise level are increasing, the cascaded blind methods suffer the mutual negative influence between different stages, because the denoiser will make the LR image more blurred and lead to kernel mismatch, increasing the difficulty of deblurring. The severe PSNR drops in Table 1 and the over-sharp results of DnCNN [41] + SRMDNF [43] in Fig. 4 can prove this phenomenon. Our CMDSR achieves better PSNR than all blind schemes for Simple degradation, but a little lower than non-blind methods, SRMD [43] and UDVD [37], because they take the accurate blur kernel and noise maps as the additional inputs. However, it is noted that when degradation is more complicated, the generalization of our adaptive framework becomes prominent. Our CMDSR achieves the best performance for Middle degradation, even better than non-blind methods. As shown in Fig. 4, CMDSR produces sharper and clearer SR results. These results demonstrate that CMDSR is an effective blind framework to handle multiple degradations.

Shifted Degradation. Table 2 shows PSNR values on Severe degradation, where the degradation levels are higher and the blur kernel doesn’t appear in training data. The qualitative comparisons are shown in Fig. 1 and Fig. 5. Our CMDSR significantly outperforms all the blind and non-blind methods, because the parameters of BaseNet are not fixed but adaptive for a new degradation at test time. It should be emphasized that non-blind SRMD [43] is trained with both isotropic and anisotropic Gaussian blur kernel, but our blind CMDSR achieves better qualitative and quantitative results, only trained with isotropic blur kernel. These results further demonstrate the generalization of our framework to handle distribution shifts.

4.3 Experiments on Real Images

We further extend the experiments to real images. The most representative blind and non-blind methods, DnCNN [41] + IKC [7] and SRMD [43] are compared with our framework. Since there are no GT degradation patterns for real images, SRMD [43] is searched by manual grid as in [43]. The qualitative results of real images ”frog” [15] and ”flower” [43] are shown in Fig. 6. The blind scheme produces over-sharp results and non-blind SRMD [43] fails to recover sharp edges. Our CMDSR produces best results with less artifacts, sharper edges, and even brighter color.

4.4 Ablation Experiments

In this section, we use training data and settings in Section 4.1 to conduct all the ablation studies.

Model Parameters PSNR
SRResNet-10 w/o ConditionNet 1.04M 26.09
SRResNet-16 w/o ConditionNet 1.48M 26.62
SRResNet-10 w/ ConditionNet 1.46M 27.10
Table 3: Average 4 PSNR of SRResNet-10, SRResNet-16 and SRResNet-10+ConditionNet on Set5 with Middle degradation.

BaseNet w/ and w/o ConditionNet. We first evaluate the performance of BaseNet w/ and w/o ConditionNet to show the importance of conditional feature. Although ConditionNet is not directly used for SISR, it involves more parameters. For a fair comparison, we add SRResNet-16, the number of parameters of which nearly equals to the completed CMDSR. Then, SRResNet-16 is trained with the same synthetic data as ours. As shown in Table 3, PSNR result of CMDSR is much better, which proves the significance of conditional meta-network.

Figure 7: The t-SNE visualization of modulated conditional features. Different colors represent different tasks.

Visualizations between Conditional Features. To prove ConditionNet can efficiently extract task-level features, we should compare the conditional features between inner-tasks and cross-tasks. Using DIV2K validation set, we randomly sample 8 different tasks and sample 400 support sets for each task. We choose the conditional features modulated by the first modulation layer and show the t-SNE [34] visualizations in Fig. 7. Due to the page limit, we also list the values of 64 channels for each feature in Supplementary. It’s clear that the modulated features of inner-tasks are similar and those from cross-tasks are significantly different, which is consistent with the prior knowledge in Fig. 2.

Combination of Loss Functions. We compare results of CMDSR, where ConditionNet is separately trained with task contrastive loss , reconstruction loss and combined loss. As shown in Table 4, if ConditionNet is trained with , CMDSR gets collapsed to produce results even worse than single BaseNet. Only using unsupervised is acceptable, but using the combined loss achieves best results, because it makes a balance between task-level feature extraction and the generalization of SISR.

Model Loss PSNR
ConditionNet of
CMDSR
23.36
26.84
27.10
Table 4: Average 4 PSNR on Set5 with Middle degradation, where ConditionNet is separately trained with three losses.
Model
BaseNet
Parameters
PSNR
VDSR [11] w/o ConditionNet 0.67M 26.49
VDSR [11] w/ ConditionNet 26.97
IDN [10] w/o ConditionNet 0.80M 26.53
IDN [10] w/ ConditionNet 27.03
EDSR [18] w/o ConditionNet 43M 26.81
EDSR [18] w/ ConditionNet 27.51
Table 5: Average 4 PSNR on Set5 with Middle degradation using other structures for BaseNet.

Can CMDSR Extend to Other SISR Structure? As mentioned before, our CMDSR is a flexible and general framework, which has no strict restrictions on the BaseNet structure. Therefore, we replace SRResNet-10 with other three SISR models, VDSR[11], EDSR[18] and IDN [10]. Moreover, we still use the same training data in Section 4.1 to train those models without ConditionNet. As listed in Table 5, all those joint models get significant improvement and EDSR [46] achieves the best results with largest parameters. We believe our framework can be extended to more complicated structures in the future.

5 Conclusion

In this paper, we investigated the blind SISR problem with multiple degradations. Inspired by meta-learning, we design a framework that learns how to adapt to changes in input distribution. Specifically, we use a ConditionNet to extract task-level features with batches of LR patches and BaseNet rapidly adapts its parameters according to the conditional features. Extensive experiments reveal that our framework can handle distribution shift by only one-step adaptation. For complicated cases, it even outperforms non-blind methods. When we extend the structure of BaseNet to other SISR models, our framework is also applicable. In future work, we will extend our general framework to more CNN models and more low-level vision tasks.

References

  • [1] Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops

    , pages 126–135, 2017.
  • [2] Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. Learning to learn by gradient descent by gradient descent. arXiv preprint arXiv:1606.04474, 2016.
  • [3] Sefi Bell-Kligler, Assaf Shocher, and Michal Irani. Blind super-resolution kernel estimation using an internal-gan. In Advances in Neural Information Processing Systems, pages 284–293, 2019.
  • [4] Adrian Bulat, Jing Yang, and Georgios Tzimiropoulos. To learn image super-resolution, use a gan to learn how to do image degradation first. In Proceedings of the European conference on computer vision (ECCV), pages 185–200, 2018.
  • [5] Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. Image super-resolution using deep convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 38(2):295–307, 2015.
  • [6] Chelsea Finn, Pieter Abbeel, and Sergey Levine. Model-agnostic meta-learning for fast adaptation of deep networks. In

    International Conference on Machine Learning

    , pages 1126–1135, 2017.
  • [7] Jinjin Gu, Hannan Lu, Wangmeng Zuo, and Chao Dong. Blind super-resolution with iterative kernel correction. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1604–1613, 2019.
  • [8] Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742. IEEE, 2006.
  • [9] Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
  • [10] Zheng Hui, Xiumei Wang, and Xinbo Gao. Fast and accurate single image super-resolution via information distillation network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 723–731, 2018.
  • [11] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1646–1654, 2016.
  • [12] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1637–1645, 2016.
  • [13] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR (Poster), 2015.
  • [14] David Lazer, Ryan Kennedy, Gary King, and Alessandro Vespignani. The parable of google flu: traps in big data analysis. Science, 343(6176):1203–1205, 2014.
  • [15] Marc Lebrun, Miguel Colom, and Jean-Michel Morel. The noise clinic: a blind image denoising algorithm. Image Processing On Line, 5:1–54, 2015.
  • [16] Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, et al.

    Photo-realistic single image super-resolution using a generative adversarial network.

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4681–4690, 2017.
  • [17] Yoonho Lee and Seungjin Choi. Gradient-based meta-learning with learned layerwise metric and subspace. In International Conference on Machine Learning, pages 2927–2936, 2018.
  • [18] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah, and Kyoung Mu Lee. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 136–144, 2017.
  • [19] Pengju Liu, Hongzhi Zhang, Yue Cao, Shigang Liu, Dongwei Ren, and Wangmeng Zuo. Learning cascaded convolutional networks for blind single image super-resolution. Neurocomputing, 417:371–383, 2020.
  • [20] Zhi-Song Liu, Wan-Chi Siu, Li-Wen Wang, Chu-Tak Li, and Marie-Paule Cani.

    Unsupervised real image super-resolution via generative variational autoencoder.

    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 442–443, 2020.
  • [21] Shunta Maeda. Unpaired image super-resolution using pseudo-supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 291–300, 2020.
  • [22] Tomer Michaeli and Michal Irani. Nonparametric blind super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, pages 945–952, 2013.
  • [23] Tsendsuren Munkhdalai and Hong Yu. Meta networks. In International Conference on Machine Learning, pages 2554–2563, 2017.
  • [24] Boris N Oreshkin, Pau Rodriguez, and Alexandre Lacoste. Tadam: task dependent adaptive metric for improved few-shot learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, pages 719–729, 2018.
  • [25] Seobin Park, Jinsu Yoo, Donghyeon Cho, Jiwon Kim, and Tae Hyun Kim. Fast adaptation to super-resolution networks via meta-learning. In 16th European Conference on Computer Vision, ECCV 2020, pages 754–769, 2020.
  • [26] Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset shift in machine learning. The MIT Press, 2009.
  • [27] Gernot Riegler, Samuel Schulter, Matthias Ruther, and Horst Bischof. Conditioned regression models for non-blind single image super-resolution. In Proceedings of the IEEE International Conference on Computer Vision, pages 522–530, 2015.
  • [28] Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap.

    Meta-learning with memory-augmented neural networks.

    In International conference on machine learning, pages 1842–1850, 2016.
  • [29] Assaf Shocher, Nadav Cohen, and Michal Irani. “zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3118–3126, 2018.
  • [30] Jake Snell, Kevin Swersky, and Richard Zemel. Prototypical networks for few-shot learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 4080–4090, 2017.
  • [31] Jae Woong Soh, Sunwoo Cho, and Nam Ik Cho.

    Meta-transfer learning for zero-shot super-resolution.

    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3516–3525, 2020.
  • [32] Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip HS Torr, and Timothy M Hospedales. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1199–1208, 2018.
  • [33] Chunwei Tian, Yong Xu, Wangmeng Zuo, Bob Zhang, Lunke Fei, and Chia-Wen Lin. Coarse-to-fine cnn for image super-resolution. IEEE Transactions on Multimedia, 2020.
  • [34] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  • [35] Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. Advances in Neural Information Processing Systems, 29:3630–3638, 2016.
  • [36] Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pages 0–0, 2018.
  • [37] Yu-Syuan Xu, Shou-Yao Roy Tseng, Yu Tseng, Hsien-Kai Kuo, and Yi-Min Tsai. Unified dynamic convolutional network for super-resolution with variational degradations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12496–12505, 2020.
  • [38] Chih-Yuan Yang, Chao Ma, and Ming-Hsuan Yang. Single-image super-resolution: A benchmark. In European conference on computer vision, pages 372–386. Springer, 2014.
  • [39] Huaxiu Yao, Xian Wu, Zhiqiang Tao, Yaliang Li, Bolin Ding, Ruirui Li, and Zhenhui Li. Automated relational meta-learning. In International Conference on Learning Representations, 2019.
  • [40] Yuan Yuan, Siyuan Liu, Jiawei Zhang, Yongbing Zhang, Chao Dong, and Liang Lin. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 701–710, 2018.
  • [41] Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing, 26(7):3142–3155, 2017.
  • [42] Kai Zhang, Wangmeng Zuo, Shuhang Gu, and Lei Zhang. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3929–3938, 2017.
  • [43] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3262–3271, 2018.
  • [44] Kai Zhang, Wangmeng Zuo, and Lei Zhang. Deep plug-and-play super-resolution for arbitrary blur kernels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1671–1681, 2019.
  • [45] Marvin Zhang, Henrik Marklund, Abhishek Gupta, Sergey Levine, and Chelsea Finn. Adaptive risk minimization: A meta-learning approach for tackling group shift. arXiv preprint arXiv:2007.02931, 2020.
  • [46] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bineng Zhong, and Yun Fu. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 286–301, 2018.
  • [47] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros.

    Unpaired image-to-image translation using cycle-consistent adversarial networks.

    In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.