Clean Images are Hard to Reblur: A New Clue for Deblurring

04/26/2021 ∙ by Seungjun Nah, et al. ∙ Seoul National University 12

The goal of dynamic scene deblurring is to remove the motion blur present in a given image. Most learning-based approaches implement their solutions by minimizing the L1 or L2 distance between the output and reference sharp image. Recent attempts improve the perceptual quality of the deblurred image by using features learned from visual recognition tasks. However, those features are originally designed to capture the high-level contexts rather than the low-level structures of the given image, such as blurriness. We propose a novel low-level perceptual loss to make image sharper. To better focus on image blurriness, we train a reblurring module amplifying the unremoved motion blur. Motivated that a well-deblurred clean image should contain zero-magnitude motion blur that is hard to be amplified, we design two types of reblurring loss functions. The supervised reblurring loss at training stage compares the amplified blur between the deblurred image and the reference sharp image. The self-supervised reblurring loss at inference stage inspects if the deblurred image still contains noticeable blur to be amplified. Our experimental results demonstrate the proposed reblurring losses improve the perceptual quality of the deblurred images in terms of NIQE and LPIPS scores as well as visual sharpness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 6

page 7

page 8

page 14

page 15

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

(a)
(b)
(c)
(d)
(a) Sharp / Blur
(b) SRN
(c) DeblurGANv2
(d) Ours
Figure 1: Comparison of the deblurred images and their reblurred counterparts. For each image, we visualize the remaining blur kernel [7] at the center pixel visualized on the right bottom side. Upper: The kernels from the previous methods implicate the direction of the original blur. Lower: When the proposed reblurring module is applied, our result does not lose sharpness as we reconstruct the output that is hard to be reblurred.

Dynamic scene deblurring aims to remove unwanted motion blur from an image and recover the latent sharp image. Blind image deblurring is a challenging ill-posed problem as both the locally-varying blur kernel and the latent image have to be found from large solution space. Traditional optimization-based approaches [11, 46, 14, 15] tried to relieve the ill-posedness by designing priors that reflect the statistical properties of desired solutions. With ,

as vectorized blurry and the latent images and the large kernel matrix

, a typical energy formulation is

(1)

Instead of using such handcrafted knowledge, recent deep learning methods take advantage of learning from large-scale datasets 

[27, 38, 30, 26, 36]. Usually, the learning is enabled by minimizing Euclidean distance loss, e.g., L1 or L2, to maximize PSNR between the deblurred and the reference sharp images. With the advent of modern CNN architectures, state-of-the-art deblurring networks [27, 42, 8, 48, 32] have been developed toward better model capacity and deblurring accuracy. Still, most methods tend to suffer from blurry predictions due to the inherent limitation [20, 23] of PSNR-oriented solutions for ill-posed problems. To complement the conventional learning objectives, several attempts such as the perceptual [13] and the adversarial loss [20, 27, 17] have been made to improve the visual quality and the sharpness of the model output. Nevertheless, the previous perceptual losses may not be optimal for blur removal as the low-level structural properties such as blurriness are not explicitly considered in their formulations. Rather, they originate from features learned for high-level tasks such as image classification and real/fake image discrimination. As illustrated in Figure 1, results from the existing deblurring methods are not as sharp as the ground-truth example but are still blurry to a degree. Despite the reduced strength of blur in the deblurred images, we still observe the directional motion information remaining.

The observation tells that applying the VGG and the adversarial loss together [18] is not sufficient to obtain perceptually pleasing and sharp images across different architectures [43, 18]. By finding the inherent limitation of the previous loss terms, we conjecture that eliminating the motion cues remaining in the deblurred images could play an essential role in generating sharp and clean images. Starting from the motivation, we introduce the concept of reblurring which amplifies the unremoved blur in the given image. An ideally deblurred image should be sharp enough so that no noticeable blur can be found from it to be amplified, i.e., clean images are hard to reblur. In contrast, it is easier to predict the original shape of blur by recognizing the remaining blur kernel if the motion blur is not sufficiently removed. We propose to use the difference as the new optimization objective, reblurring loss for image deblurring problem.

The reblurring loss is realized by jointly training a deblurring module and the paired reblurring module. From a deblurred output, the reblurring module tries to make the reblurred image as close to the original blurry image. By using the property that the reblurred results should vary by the degree of input blur to be amplified, we construct two types of loss functions. During the joint training, supervised reblurring loss compares the amplified blurs between the deblurred and the sharp image. Complementing the L1 intensity loss, the supervised reblurring loss guides the deblurring module to focus on and eliminate the remaining blur information. While the training method being similar to the adversarial training of GANs [10], the purposes and effects of the adversary are different. Our reblurring loss concentrates on the image blurriness regardless of the image realism in the training process. Furthermore, we apply self-supervised reblurring loss at test-time so that the deblurred image would be infeasible to be reblurred as sufficiently sharp image would. The self-supervised reblurring loss lets the deblurring module to adaptively optimize to each input without ground truth.

The reblurring loss functions provide additional optimization directives to the deblurring module and can be generally applied to any learning-based methods. With the proposed approach, sharper images can be obtained without modifying the structure of the deblurring module.

We summarize our contributions as follows:

  • Based on the observation that clean images are hard to reblur, we propose novel loss functions for image deblurring. Our reblurring loss reflects the preference for sharper images and contributes to visually pleasing deblurring results.

  • At test-time, the reblurring loss can be implemented without a ground-truth image. We perform test-time adaptive inference via self-supervised optimization to each input.

  • Our method is generally applicable to any learning-based methods and jointly with other loss terms. Experiments show that the concept of reblurring loss consistently contributes to achieving state-of-the-art visual sharpness as well as LPIPS and NIQE across different model architectures.

(a) Reblurring module training process
(b) Image deblurring with reblurring loss
Figure 2: Overviews of the proposed reblurring and deblurring framework.

2 Related Works

Image Deblurring. In the classic energy optimization framework, the energy is formulated by the likelihood and the prior term. Due to the large solution space of the ill-posed dynamic scene deblurring problem, prior terms have been the essential element in alleviating the optimization ambiguity by encoding the preference on the solutions. Sophisticated prior terms were carefully designed with human knowledge on natural image statistics [21, 7, 11, 46, 41, 47, 14, 15, 31]. In the recent work of Li  [22]

, learned prior, which is derived from a classifier discriminating blurry and clean images, was also shown to be effective. Deep priors were also used for image deconvolution problems 

[33, 28].

On the other hand, deep learning methods have benefited from learning on large-scale datasets. The datasets consisting of realistic blur [27, 38, 30, 26, 8, 12, 36] align the temporal center of the blurry and the sharp image pairs with high-speed cameras. Learning from such temporally aligned datasets relieves the ill-posedness of deblurring compared with the large solution space in the energy optimization framework. Thus, more attention has been paid to designing CNN architectures and datasets than the loss or solution preference.

In the early work, the alternating kernel and image estimation processes 

[7] are implemented with CNNs [35]. In [40, 9], the spatially varying blur kernels are estimated by assuming locally linear blur followed by non-blind deconvolution with them. Later, end-to-end learning without explicit kernel estimation became prevalent. Motivated from the coarse-to-fine approach, multi-scale CNN was proposed in [27] to expand the receptive field efficiently. Several studies have proposed scale-recurrent architectures [43, 8] that share parameters across the scales. On the other hand, [49, 39] sequentially stacked network modules. Recently, [32] proposed a multi-temporal model that deblurs an image recursively. To handle spatially varying blur kernels efficiently, spatially non-uniform operations were embedded [50, 48]

in the neural networks.

Perceptual Image Restoration. Conventional image restoration methods mainly optimize L1 or L2 objectives to achieve higher PSNR. However, such approaches suffer from blurry and over-smoothed outputs [13, 52, 23]. The primary reason is that the learned models predict an average of all possible solutions under the ill-posedness [20]

. To deal with the issue, several studies utilize deep features of the pretrained VGG 

[37] network that are more related to human perception [13, 20, 52]. Then, the following methods can produce perceptually better results by minimizing the distance of output and ground-truth images in the feature domain. Recent methods further introduce adversarial training [10] so that outputs of the restoration models be indistinguishable from real samples [27, 29, 17, 18].

Nevertheless, an inherent limitation of existing perceptual objectives is that they are not task-specialized for image restoration. For example, the VGG features [37] are learned for high-level visual recognition [34] while adversarial loss [10] only contributes to reconstruct realistic images without considering the existence of motion blur. Therefore, blindly optimizing those terms may not yield an optimal solution in terms of image deblurring. In practice, we observed that those objectives still tend to leave blur footprints unremoved, making it possible to estimate the original blur. Our reblurring loss is explicitly designed to improve the perceptual quality of deblurred images by reducing remaining blurriness and thus more suitable for the deblurring task.

Image Blurring. As an image could be blurred in various directions and strength, image blurring is another ill-posed problem. Thus intrinsic [1] or extrinsic [6, 5] information is often incorporated. In the case of a non-ideally sharp image, Bae  [1] detected the small local blur kernel in the image to magnify the defocus blur for the bokeh effect. On the other hand, [6] estimated the kernel by computing the optical flow from the neighboring video frames. In a similar sense, [5] used multiple video frames to synthesize blur. Without such blur or motion cue, there could be infinitely many types of plausible blur applicable to an image. Thus, [51] used a generative model to synthesize many realistic blurry images. Contrary to the above approaches, [2] deliberately blurred an already blurry image in many ways to find the local blur kernel. Our image reblurring concept is similar to [1] in the sense that intrinsic cue in an image is used to amplify blur. Nonetheless, our main goal is to use reblurring to provide a guide to deblurring model so that such blur cues would be removed.

3 Proposed Method

In this section, we describe a detailed concept of imge reblurring and how the reblurring operation can be learned. The proposed reblurring loss can support the deblurring modules to reconstruct perceptually favorable and sharp outputs. At training and testing stages, we formulate the reblurring loss in supervised and self-supervised manner. For simplicity, we refer to the blurry, the deblurred, and the sharp image as , , and , respectively.

#ResBlocks 4 8 16 32
Deblur PSNR wrt sharp GT 28.17 29.67 30.78 31.48
Reblur PSNR wrt blur GT 34.29 32.66 31.90 31.48

Table 1: Deblurring and reblurring PSNR (dB) by deblurring model capacity. Both tasks are trained independently with L1 loss on the GOPRO [27] dataset. We note that #ResBlocks varies for the deblur network only.

3.1 Clean Images are Hard to Reblur

As shown in Figure 1, outputs from the existing deblurring methods still contain undesired motion trajectories that are not completely removed from the input. Ideally, a well-deblurred image should not contain any motion cues making reblurring to be infeasible. To validate our motivation that clean images are hard to reblur, we first build a reblurring module which amplifies the remaining blur from . The module is trained with the following blur reconstruction loss so that it would learn the inverse operation of deblurring.

(2)

We apply to the deblurred images from deblurring modules of varying capacities. Table 1 shows that the higher the deblurring PSNR, the lower the reblurring PSNR becomes. It demonstrates the better deblurred images are harder to reblur, justifying our motivation.

In contrast to the non-ideally deblurred images, is not able to generate a motion blur from a sharp image . For a high-quality clean image, should preserve the sharpness. However, optimizing the blur reconstruction loss with may fall into learning the pixel average of all blur trajectories in the training dataset. In such a case, will apply a radial blur without considering the input variety. To let the blur domain of be confined to the motion-incurred blur, we use sharp images to penalize such undesired operation. Specifically, we introduce a network-generated sharp image obtained by feeding a real sharp image to the deblurring module , as . We define sharpness preservation loss as follows:

(3)

We use pseudo-sharp image instead of a real image to make our reblurring module focus on image sharpness and blurriness rather than image realism. While and only differ by the sharpness, and also vary by their realism.

Combining two loss terms together, we train the reblurring module by optimizing the joint loss :

(4)

As zero-magnitude blur should remain unaltered from , the sharpness preservation loss can be considered a special case of the blur reconstruction loss. Figure (a)a illustrates the way our reblurring module is trained from .

3.2 Supervision from Reblurring Loss

Figure 3: Image deblurring and reblurring illustrated from the perspective of sharpness and realism. Training our modules with improves image sharpness without considering the image realism. The image realism can be optionally handled by adversarial loss .

The blurriness of images can be easily compared by amplifying the blur. Thus, we propose a new optimization objective by processing the deblurred and the sharp image with the proposed reblurring model . To suppress remaining blur in the output of the deblurring CNN , our reblurring loss for image deblurring is defined as follows:

(5)

Unlike the sharpness preservation term in (3), we do not use the pseudo-sharp image in our reblurring loss (5). As the quality of the pseudo-sharp image depends on the state of deblurring module , using may make training unstable and difficult to optimize, especially at the early stage. Thus we use a real sharp image to stabilize the training. Nevertheless, as is trained to focus on the sharpness from (4), so does the reblurring loss .

Using our reblurring loss in (5), the deblurring module is trained to minimize the following objective :

(6)

where

is a conventional L1 loss, and the hyperparameter

is empirically set to 1. Figure (b)b shows how the deblurring model is trained with our proposed reblurring loss.

For each training iterations, we alternately optimize two modules and by and , respectively. While such a strategy may look similar to the adversarial training [10], the optimization objectives are different. As the neural networks are well known to easily discriminate real and fake images [44], the realism could serve as a more salient feature than image blurriness. Thus, adversarial loss may overlook image blurriness as and can already discriminated by realism difference. On the other hand, our reblurring loss is explicitly designed to prefer sharp images regardless of realism. Figure 3 conceptually compares the actual role of the proposed reblurring loss and the existing adversarial loss .

3.3 Test-time Adaptation by Self-Supervision


Figure 4: The proposed self-supervised test-time adaptation. We repetitively find the latent image that reblurs to the current deblurred image.

1.10

Algorithm 1 Optimization process in test-time adaptation
1:procedure Test-time Adaptation()
2:     Test-time learning rate .
3:      Weights of .
4:     .
5:     for  do
6:         .
7:         .
8:         Update by and .      .
9:     histogram_matching
10:     return

Supervised learning methods have the fixed model weights at testing time as the training with ground truth is no longer available. Every image is treated equally regardless of the scene content and the blur difficulty at test time. In contrast, providing self-supervised loss can make a model adapt to each test input, improving the generalization ability. Thus, we use the proposed reblurring operation to enable a novel self-supervised optimization without the need for ground truth.

During the training of , delivers supervision from the reference by (5). With the learned reblurring operation, we can further exploit the deblurred image is in high-quality in terms of shaprness without reference data. If gets blurred by passing to , we can consider it to be insufficiently deblurred as we have discussed in Figure 1. A clean image should remain as itself due to the sharpness preservation loss, . Thus, we construct the self-supervised reblurring loss that could serve as a prior term encoding the preference on sharp images.

(7)

where denotes the image with the same value as

but the gradient does not backpropagate in the optimization process. We minimize

for each test data to obtain the sharper image. Allowing gradient to flow through can let to fall into undesired local minima where both the and are blurry. Since only considers the sharpness of an image, we keep the color consistency by matching the color histogram between the test-time adapted image and the initially deblurred image. The detailed process of test-time adaptation strategy is described in Algorithm 1 and conceptually illustrated in Figure 4.

4 Experiments

We demonstrate the effectiveness of our reblurring loss by applying it to multiple model architectures. We show the experimental results with a baseline residual U-Net and the state-of-the-art image deblurring models, the sRGB version of SRN [43] and DHN, our modified version of DMPHN [49]. For the reblurring module, we use simple residual networks with 1 or 2 ResBlock(s) with convolution kernels. The training and evaluation were done with the widely used GOPRO [27] and REDS [26] datasets. The GOPRO dataset consists of 2103 training and 1111 test images with various dynamic motion blur. Similarly, the REDS dataset has 24000 training and 3000 validation data publicly available. On each dataset, every experiment was done under the same training environment. We mainly compare LPIPS [52] and NIQE [25] perceptual metrics. For more detailed implementation details, please refer to the supplementary material.

4.1 Effect of Reblurring Loss

We implement the reblurring loss in varying degrees of emphasis on sharpness by controlling the reblurring module capacity. For a more balanced quality between PSNR and perceptual sharpness, we use 1 ResBlock for . To put more weight on the perceptual quality, we allocate a larger capacity on by using 2 ResBlocks. For notation simplicity, we denote the reblurring loss with ResBlock(s) in the reblurring module as .

Table 2 and 3 each shows how the deblurring performance varies depending on the training loss functions. With , LPIPS and NIQE improves to a moderate degree while PSNR and SSIM metrics remain at a similar level. Meanwhile, more aggressively optimizes the perceptual metrics. The perceptual metric improvements are consistently witnessed with different architectures on both the GOPRO and the REDS dataset.

Method LPIPS NIQE PSNR SSIM
U-Net ( only) 0.1635 5.996 29.66 0.8874
0.1365 5.629 29.58 0.8869
0.1238 5.124 29.44 0.8824
SRN ( only) 0.1246 5.252 30.62 0.9078
0.1140 5.136 30.74 0.9104
0.1037 4.887 30.57 0.9074
DHN ( only) 0.1179 5.490 31.53 0.9207
0.0975 5.472 31.53 0.9217
0.0837 5.076 31.34 0.9177

Table 2: Perceptual metric improvements from the reblurring loss on GOPRO [27] dataset. The reblurring loss consistently improves LPIPS and NIQE over standard loss.
Method LPIPS NIQE PSNR SSIM
U-Net ( only) 0.1486 3.649 30.80 0.8772
0.1435 3.487 30.76 0.8776
0.1252 2.918 30.46 0.8717
SRN ( only) 0.1148 3.392 31.89 0.8999
0.1071 3.305 32.01 0.9044
0.0947 2.875 31.82 0.9026
DHN ( only) 0.0942 3.288 32.65 0.9152
0.0931 3.248 32.57 0.9143
0.0805 2.830 32.44 0.9122

Table 3: Quantitative comparison on REDS [26] dataset by loss function. The reblurring loss improves LPIPS and NIQE over standard loss.
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
(c)
(d)
(e)
(f)
Figure 5: Visual comparison of deblurred results by training loss function on GOPRO dataset. Upper: SRN, Lower: U-Net.

4.2 Effect of Sharpness Preservation Loss

In training , we used both the blur reconstruction loss and the sharpness preservation loss . The latter term plays an essential role in concentrating only on the motion-driven blur in the given image and keeping sharp image sharp. Table 4 presents the performance gains from using jointly with in terms of the perceptual quality.

Table 4 also justifies the effectiveness of the pseudo-sharp image in sharpness preservation. We found the using with for in addition to causes less stable training than using . Using the pseudo-sharp image confines the input distribution of to the output domain of . While the real sharp data differ from the deblurred image in terms of realness, the pseudo-sharp image only differs by the sharpness. Thus the reblurring module can focus on the image sharpness without being distracted by other unintended properties. Furthermore, it leads the two loss terms and to reside under the same objective, amplifying any noticeable blur and keeping sharpness when blur is not found.

Method LPIPS NIQE PSNR SSIM
U-Net ( only) 0.1301 5.132 29.47 0.8839
with 0.1410 5.307 29.15 0.8694
with 0.1238 5.124 29.44 0.8824

Table 4: The effect of the sharpness preservation in training our reblurring module measured on GOPRO [27] dataset. In (3), using the pseudo-sharp image instead of the real one leads to better deblurring performance. We note that the reblurring module is constructed using 2 ResBlocks.

4.3 Comparison with Other Perceptual Losses

The reblurringl loss provides a conceptually different learning objectives from the adversarial and the perceptual losses and is designed to focus on the motion blur. Table 5 compares the effectiveness of with adversarial loss , and the VGG perceptual loss [13] by applying them to SRN [43] on GOPRO dataset. While our method provides quantitatively better perceptual scores, the different perceptual losses are oriented to varying goals and are not in essentially competing relation. They do not necessarily conflict with each other and can be jointly applied in training to catch the perceptual quality in varying aspects.

Method LPIPS NIQE PSNR SSIM
SRN () 0.1246 5.252 30.62 0.9078
0.1141 4.960 30.53 0.9068
0.1037 4.945 30.60 0.9074
0.1037 4.887 30.57 0.9074

Table 5: Comparison of reblurring loss and other perceptual losses on GOPRO [27] dataset applied to SRN.

4.4 Effect of Test-time Adaptation

We conduct test-time adaptation with the proposed self-supervised reblurring loss, to make the deblurred image even sharper. Figure 6 shows the test-time-adapted result with SRN. Compared with the baseline trained with L1 loss, our results exhibit improved trade-off relation between PSNR and the perceptual metrics, LPIPS and NIQE. Table 6 and 7 provide detailed quantitative test-time adaptation results on GOPRO and REDS dataset, respectively with various deblurring module architectures.

Figure 6: Test-time adaption results using SRN on GOPRO [27] dataset. The proposed self-supervised objective improves trade-off between the perceptual image quality (LPIPS, NIQE) and PSNR compared with the baseline.
Method LPIPS NIQE PSNR SSIM
U-Net () 0.1635 5.996 29.66 0.8874
U-Net () 0.1365 5.629 29.58 0.8869
TTA step 5 0.1327 5.599 29.52 0.8878
U-Net () 0.1238 5.124 29.44 0.8824
TTA step 5 0.1187 5.000 29.42 0.8831
SRN () 0.1246 5.252 30.62 0.9078
SRN () 0.1140 5.136 30.74 0.9104
TTA step 1 0.1129 5.125 30.74 0.9107
TTA step 3 0.1112 5.101 30.70 0.9108
TTA step 5 0.1101 5.079 30.60 0.9100
SRN () 0.1037 4.887 30.57 0.9074
TTA step 5 0.0983 4.730 30.44 0.9067
DHN () 0.1179 5.490 31.53 0.9207
DHN () 0.0975 5.472 31.53 0.9217
TTA step 5 0.0940 5.343 31.32 0.9208
DHN () 0.0837 5.076 31.34 0.9177
TTA step 5 0.0805 4.948 31.28 0.9174

Table 6: Test-time adaptation results of various deblurring networks on GOPRO [27] dataset.
Method LPIPS NIQE PSNR SSIM
U-Net () 0.1486 3.649 30.80 0.8772
U-Net () 0.1252 2.918 30.46 0.8717
TTA step 5 0.1226 2.849 30.25 0.8701
SRN () 0.1148 3.392 31.89 0.8999
SRN () 0.0947 2.875 31.82 0.9026
TTA step 5 0.0909 2.798 31.50 0.9008
DHN () 0.0942 3.288 32.65 0.9152
DHN () 0.0805 2.830 32.44 0.9122
TTA step 5 0.0763 2.761 32.17 0.9110

Table 7: Test-time adaptation results of various deblurring methods on REDS [26] dataset.
(a)
(b)
(c)
(d)
(a)
(b)
(c)
(d) TTA step 5
Figure 7: Qualitative comparison between different training objectives and the test-time adaptation. Patches are sampled from the REDS [26] dataset validation split.

4.5 Comparison with State-of-The-Art Methods

We have improved the perceptual quality of the deblurred images by training several different model architectures. We compare the perceptual quality with the other state-of-the-art methods in Figure 8. Especially, DeblurGAN-v2 was trained with the VGG loss and the adversarial loss. Our results achieve visually sharper texture from the reblurring loss and test-time adaptation.

4.6 Real World Image Deblurring

While our method uses synthetic datasets [27, 26] for training, the trained models generalize to real blurry images. In Figure 9, we show deblurred results from Lai  [19] dataset with DHN model. Compared with the baseline loss, our reblurring loss provides an improved deblurring quality. As the real test image could deviate from the training data distribution, a single forward inference may not produce optimal results. With the self-supervised test-time adaptation, our deblurred images reveal sharper and detailed textures.

(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(a) Blurry input
(b) SE-Sharing [8]
(c) DeblurGAN-v2 [18]
(d) Ours TTA step 5
Figure 8: Qualitative comparison between state-of-the-art deblurring methods on the GOPRO [27] dataset. Our approach uses the SRN [43] model as a baseline architecture.
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
(j)
(k)
(l)
(m)
(n)
(o)
(p)
(q)
(r)
(s)
(t)
(u)
(v)
(w)
(x)
(y)
(z)
(aa)
(ab)
(ac)
(ad)
(ae)
(af)
(a) Blurry input
(b)
(c) (Ours)
(d) TTA step 5 (Ours)
Figure 9: Qualitative comparison of deblurring results on the real-world images [19] by different loss functions and test-time adaptation. The proposed test-time adaptation greatly improves visual quality and sharpness of the deblurred images.

5 Conclusion

In this paper, we validate a new observation that clean sharp images are hard to reblur and develop new low-level perceptual loss. We construct reblurring loss that cares for the image blurriness by jointly training a pair of deblurring and reblurring modules. The supervised reblurring loss provides an amplified view on motion blur while the self-supervised loss inspects the blurriness from the learned reblurring module weights. The self-supervision lets the deblurring module adapt to the new image at test time without ground truth. By applying the loss terms to the state-of-the-art deblurring architectures, we demonstrated our method consistently improves the the perceptual sharpness of the deblurred images both quantitatively and visually.

Appendix: Supplementary Material

In this supplementary material, we explain the detailed experimental results that are not shown in the main manuscript. In section S1

, we show the implementation details with the model architecture specifics, training details, and the evaluation metrics. Section 

S2 describes how the reblurring module design and the size are determined. Then in section S3, we describe the different characteristics of the proposed reblurring loss and the other perceptual losses. We combine our reblurring loss with the other perceptual losses to take advantage in multiple perspectives. In section S4, we show the effect of test-time adaptation and show the trade-off relation between the conventional distortion quality metric (PSNR, SSIM) and the perceptual metrics (LPIPS, NIQE) compared with the baselines. In section S5, we visually validate the effect of reblurring loss and test-time adaptation.

S1 Implementation Details

Model Architecture. In the main manuscript, we performed the experiments with 3 model architectures. We set our baseline model as a light-weight residual U-Net architecture that runs in fast speed. The baseline model is used to design our reblurring loss with pseudo-sharp images through ablation study in Table 4.

For reblurring operation, we use a simple residual network

without strides to avoid deconvolution artifacts. The baseline U-Net and the reblurring module architectures are shown in Figure 

S1. The detailed parameters for U-Net and are each specified in Table S1 and S3.


Figure S1: The baseline U-Net architecture and the reblurring module architecture We use the same reblurring module for all experiments except the number of ResBlocks.
# Layer description Output shape
Input
1 conv
2 conv
3 conv
4-19 8 ResBlocks ()
20 conv
21 conv
22 conv

Table S1: U-Net module specifics

In addition to the U-Net, experiments were conducted with state-of-the-art deblurring models based on SRN [42] and DMPHN [49]. SRN [43] was originally designed to operate on grayscale images with a LSTM module. Later, the authors released the sRGB version code without LSTM, exhibiting an improved accuracy. We adopted the revised SRN structure in our experiments.

The other model we chose is based on DMPHN (1-2-4-8) [49]

. DMPHN performs hierarchical residual refinement to produce the final output. The model consists of convolutional layers with ReLU activations that are spatially shift-equivariant. In

[49], each level splits the given image and performs the convolutional operation on the divided patches. As the convolutional weights do not differ by the patches, the operations do not necessarily have to be done patch-wise. Thus, we remove the multi-patch strategy and perform the convolution on the whole given input without dividing the image into patches. We refer to the modified model as DHN. As shown in Table S2, convolution on the whole image compared with patch-wise convolution brings higher accuracy.

Method LPIPS NIQE PSNR SSIM
DMPHN ( only) 0.1184 5.542 31.42 0.9191
DHN ( only) 0.1179 5.490 31.53 0.9207

Table S2: DMPHN modification results on GOPRO [27] dataset. DHN without patch-wise convolution brings improved accuracy.

Metrics To quantitatively compare the deblurred images in the following sections, we use PSNR, SSIM [45], LPIPS [52], and NIQE [25]. In the image deblurring literature, SSIM has been measured by MATLAB ssim function on sRGB images with . SSIM was originally developed for grayscale images and MATLAB ssim

function for a 3-dimensional tensor considers an image to be a 3D grayscale volume image. Thus, most of the previous SSIM measures were not accurate, leading to higher values. Instead, we measured all the SSIM for each channel separately and averaged them. We used

skimage.metrics.structural_similarity function in the scikit-image package for python to measure SSIM for multi-channel images.

Training For all the experiments, we performed the same training process for a fair comparison. On the GOPRO dataset [27]

, we trained each model for 4000 epochs. On the REDS dataset 

[26], the models are trained for 200 epochs. Adam [16] optimizer is used in all cases. When calculating distance between images with Lp norm, we always set , using L1 distance. Starting from the initial learning rate

, the learning rate halves when training reaches 50%, 75%, and 90% of the total epochs. We used PyTorch 1.7.1 with CUDA 11.0 to implement the deblurring methods. Mixed-precision training 

[24] is employed to accelerate operations on RTX 2080 Ti GPUs.

# Layer description Output shape
Input
1 conv
2-5 2 ResBlocks ()
6 conv

Table S3: Reblurring module specifics

S2 Determining Reblurring Module Size

As our reblurring loss is realized by , the reblurring module design plays an essential role. As shown in Figure S1, the architecture is a simple ResNet. Table S4 shows the relation between the deblurring performance and size by changing the number of ResBlocks.

For all deblurring module architectures, LPIPS was the best when the number of ResBlocks, . NIQE showed good performance when . PSNR and SSIM had tendency to decrease when . For larger number of ResBlocks, we witnessed sharper edges could be obtained but sometimes, cartoon artifacts with over-strong edges were witnessed.

Considering the trade-off between the PSNR and the perceptual metrics, we chose in the following experiments. finds balance between the PSNR and LPIPS and puts more weight on the perceptual quality.

Method LPIPS NIQE PSNR SSIM
U-Net ( only) 0.1635 5.996 29.66 0.8874
0.1365 5.629 29.58 0.8869
0.1238 5.124 29.44 0.8824
0.1386 5.448 29.38 0.8819
0.1415 5.513 29.25 0.8789
SRN ( only) 0.1246 5.252 30.62 0.9078
0.1140 5.136 30.74 0.9104
0.1037 4.887 30.57 0.9074
0.1091 4.875 30.50 0.9060
0.1155 5.041 30.53 0.9056
DHN ( only) 0.1179 5.490 31.53 0.9207
0.0975 5.472 31.53 0.9217
0.0837 5.076 31.34 0.9177
0.0845 4.963 31.26 0.9159
0.0861 5.041 31.19 0.9149

Table S4: The effect of reblurring loss on GOPRO [27] dataset by the reblurrimg module size. Reblurring module size varies by the number of ResBlocks.

S3 Combining Reblurring Loss with Other Perceptual Losses

Method LPIPS NIQE PSNR SSIM
SRN ( only) 0.1246 5.252 30.62 0.9078
0.1037 4.945 30.60 0.9074
0.0928 4.671 30.64 0.9079
0.1141 4.960 30.53 0.9068
0.1014 4.811 30.56 0.9075
DHN ( only) 0.1179 5.490 31.53 0.9207
0.0994 5.022 31.48 0.9195
0.0773 4.897 31.28 0.9161
0.0969 5.026 31.46 0.9188
0.0835 4.799 31.28 0.9162

Table S5: Results on GOPRO [27] dataset by adding reblurring loss to the other preceptual losses.
Method LPIPS NIQE PSNR SSIM
SRN ( only) 0.1148 3.392 31.89 0.8999
0.1000 3.256 31.86 0.9001
0.0868 2.835 31.83 0.9015
DHN ( only) 0.0942 3.288 32.65 0.9152
0.0812 3.171 32.61 0.9146
0.0723 2.821 32.48 0.9133

Table S6: Results on REDS [26] dataset by adding reblurring loss to the other preceptual losses.

Our reblurring loss is a new perceptual loss that is sensitive to blurriness of an image, a type of image structure-level information while other perceptual losses such as VGG loss [13] and adversarial loss [20] are more related to the high-level contexts. As VGG model [37] is trained to recognize image classes, optimizing with VGG loss could make an image better recognizable. In the GAN frameworks [10], it is well known that discriminators can easily tell fake images from real images [44], being robust against JPEG compression and blurring. In the adversarial loss from the discriminator, the realism difference could be more salient than other features such as blurriness.

With the perceptual loss functions designed with different objectives, combining them could bring visual quality improvements in various aspects. Table 

S5 and S5 shows the effect of applying our reblurring loss jointly with the other perceptual losses on GOPRO and REDS datasets. We omit the loss coefficients for simplicity. We used weight for the VGG loss and for the adversarial loss, . We witness LPIPS and NIQE further improves when our reblurring loss is combined with or .

S4 Perception vs. Distortion Trade-Off

It is known in image restoration literature that the distortion error and the perceptual quality error are in trade-off relation [4, 3]. The relation is often witnessed by training a single model with different loss functions. In most cases, to obtain a better perceptual quality from a single model architecture, retraining with another loss from scratch is necessary. Our test-time adaptation from self-supervised reblurring loss, in contrast, can provide the steps toward perceptual quality without full retraining.

In Figure S2 and S3, we present the perception-distortion trade-off from our test-time adaption. LPIPS and NIQE scores consistently improve from each adaptation step in both SRN and DHN models. While PSNR is moderately sacrificed from the adaptation, SSIM improves in the early steps as it more reflects the structural information. Our results show improved trade-off between the distortion and perception metrics over the baseline models trained with L1 loss. d trade-off between the distortion and perception metrics over the baseline trained with L1 loss.

(a) PSNR vs LPIPS
(b) PSNR vs NIQE
(c) SSIM vs LPIPS
(d) SSIM vs NIQE
Figure S2: Perception-distortion trade-off from test-time adaptation applied to SRN model on GOPRO [27] dataset.
(a) PSNR vs LPIPS
(b) PSNR vs NIQE
(c) SSIM vs LPIPS
(d) SSIM vs NIQE
Figure S3: Perception-distortion trade-off from test-time adaptation applied to DHN model on GOPRO [27] dataset.

S5 Visual Comparison of Loss Function

In this section, we present the visual comparison of deblurred results. In Figure S4 and S5, we perform visual ablation by showing the deblurred results from baseline L1 loss, our reblurring loss, and additional test-time adaptation. For , we used DHN. For , 2 ResBlocks are used. Our final result reveals sharper image structure and texture.

In Figure S6 and S7, we compare the effect of 3 different perceptual losses. The results from VGG loss, adversarial loss, and our reblurring loss are shown. Our reblurring loss exhibits clearer edges and the face details than other perceptual losses.

(a) Blur
(b) Our deblurred image (TTA step 5)
(c)
(d)
(e)
(f)
(c) Blur
(d)
(e)
(f) Ours (TTA step 5)
Figure S4: Visual comparison of deblurred results by reblurring loss and test-time adaptation on REDS [26] dataset.
(a) Blur
(b) Our deblurred image (TTA step 5)
(c)
(d)
(e)
(f)
(c) Blur
(d)
(e)
(f) Ours (TTA step 5)
Figure S5: Visual comparison of deblurred results by reblurring loss and test-time adaptation on REDS [26] dataset.
(a) Blur
(b) Our deblurred image ()
(c)
(d)
(e)
(f)
(g)
(c) Blur
(d)
(e)
(f)
(g)
Figure S6: Visual comparison of perceptual losses on REDS [26] dataset.
(a) Blur
(b) Our deblurred image ()
(c)
(d)
(e)
(f)
(g)
(c) Blur
(d)
(e)
(f)
(g)
Figure S7: Visual comparison of perceptual losses on REDS [26] dataset.

References

  • [1] S. Bae and F. Durand (2007) Defocus magnification. Computer Graphics Forum 26 (3), pp. 571–579. Cited by: §2.
  • [2] Y. Bahat, N. Efrat, and M. Irani (2017) Non-uniform blind deblurring by reblurring. In ICCV, Cited by: §2.
  • [3] Y. Blau, R. Mechrez, R. Timofte, T. Michaeli, and L. Zelnik-Manor (2018-09)

    The 2018 pirm challenge on perceptual image super-resolution

    .
    In Proceedings of the ECCV Workshops, Cited by: §S4.
  • [4] Y. Blau and T. Michaeli (2018) The perception-distortion tradeoff. In CVPR, Cited by: §S4.
  • [5] T. Brooks and J. T. Barron (2019) Learning to synthesize motion blur. In CVPR, Cited by: §2.
  • [6] H. Chen, J. Gu, O. Gallo, M. Liu, A. Veeraraghavan, and J. Kautz (2018) Reblur2deblur: deblurring videos via self-supervised learning. In ICCP, Cited by: §2.
  • [7] S. Cho and S. Lee (2009) Fast motion deblurring. In ACM SIGGRAPH Asia, Cited by: Figure 1, §2, §2.
  • [8] H. Gao, X. Tao, X. Shen, and J. Jia (2019) Dynamic scene deblurring with parameter selective sharing and nested skip connections. In CVPR, Cited by: §1, §2, §2, (b)b.
  • [9] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. van den Hengel, and Q. Shi (2017) From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur. In CVPR, Cited by: §2.
  • [10] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NIPS, Cited by: §1, §2, §2, §3.2, §S3.
  • [11] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schölkopf (2011) Fast removal of non-uniform camera sshake. In ICCV, Cited by: §1, §2.
  • [12] M. Jin, Z. Hu, and P. Favaro (2019) Learning to extract flawless slow motion from blurry videos. In CVPR, Cited by: §2.
  • [13] J. Johnson, A. Alahi, and L. Fei-Fei (2016) Perceptual losses for real-time style transfer and super-resolution. In ECCV, Cited by: §1, §2, §S3, §4.3.
  • [14] T. H. Kim, B. Ahn, and K. M. Lee (2013) Dynamic scene deblurring. In ICCV, Cited by: §1, §2.
  • [15] T. H. Kim and K. M. Lee (2014) Segmentation-free dynamic scene deblurring. In CVPR, Cited by: §1, §2.
  • [16] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §S1.
  • [17] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas (2018) DeblurGAN: blind motion deblurring using conditional adversarial networks. In CVPR, Cited by: §1, §2.
  • [18] O. Kupyn, T. Martyniuk, J. Wu, and Z. Wang (2019) DeblurGAN-v2: deblurring (orders-of-magnitude) faster and better. In ICCV, Cited by: §1, §2, (c)c.
  • [19] W. Lai, J. Huang, Z. Hu, N. Ahuja, and M. Yang (2016) A comparative study for single image blind deblurring. In CVPR, Cited by: Figure 9, §4.6.
  • [20] C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Shi (2017)

    Photo-realistic single image super-resolution using a generative adversarial network

    .
    In CVPR, Cited by: §1, §2, §S3.
  • [21] A. Levin (2006) Blind motion deblurring using image statistics. NIPS. Cited by: §2.
  • [22] L. Li, J. Pan, W. Lai, C. Gao, N. Sang, and M. Yang (2018) Learning a discriminative prior for blind image deblurring. In CVPR, Cited by: §2.
  • [23] S. Menon, A. Damian, S. Hu, N. Ravi, and C. Rudin (2020) PULSE: self-supervised photo upsampling via latent space exploration of generative models. In CVPR, Cited by: §1, §2.
  • [24] P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, et al. (2017) Mixed precision training. arXiv preprint arXiv:1710.03740. Cited by: §S1.
  • [25] A. Mittal, R. Soundararajan, and A. C. Bovik (2012) Making a “completely blind” image quality analyzer. IEEE SPL 20 (3), pp. 209–212. Cited by: §S1, §4.
  • [26] S. Nah, S. Baik, S. Hong, G. Moon, S. Son, R. Timofte, and K. M. Lee (2019) NTIRE 2019 challenges on video deblurring and super-resolution: dataset and study. In CVPR Workshops, Cited by: §1, §S1, §2, Table S6, Figure 7, §4.6, Table 3, Table 7, §4, Figure S4, Figure S5, Figure S6, Figure S7.
  • [27] S. Nah, T. H. Kim, and K. M. Lee (2017)

    Deep multi-scale convolutional neural network for dynamic scene deblurring

    .
    In CVPR, Cited by: Table S2, §1, §S1, Table S4, §2, §2, §2, Table 1, Table S5, Figure S2, Figure S3, Figure 6, Figure 8, §4.6, Table 2, Table 4, Table 5, Table 6, §4.
  • [28] Y. Nan and H. Ji (2020) Deep learning for handling kernel/model uncertainty in image deconvolution. In CVPR, Cited by: §2.
  • [29] T. M. Nimisha, A. Kumar Singh, and A. N. Rajagopalan (2017) Blur-invariant deep learning for blind-deblurring. In ICCV, Cited by: §2.
  • [30] M. Noroozi, P. Chandramouli, and P. Favaro (2017) Motion deblurring in the wild. In GCPR, Cited by: §1, §2.
  • [31] J. Pan, D. Sun, H. Pfister, and M. Yang (2016) Blind image deblurring using dark channel prior. In CVPR, Cited by: §2.
  • [32] D. Park, D. U. Kang, J. Kim, and S. Y. Chun (2020)

    Multi-temporal recurrent neural networks for progressive non-uniform single image deblurring with incremental temporal training

    .
    In ECCV, Cited by: §1, §2.
  • [33] D. Ren, K. Zhang, Q. Wang, Q. Hu, and W. Zuo (2020) Neural blind deconvolution using deep priors. In CVPR, Cited by: §2.
  • [34] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. (2015) ImageNet large scale visual recognition challenge. IJCV 115 (3), pp. 211–252. Cited by: §2.
  • [35] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf (2015) Learning to deblur. IEEE TPAMI 38 (7), pp. 1439–1451. Cited by: §2.
  • [36] Z. Shen, W. Wang, X. Lu, J. Shen, H. Ling, T. Xu, and L. Shao (2019) Human-aware motion deblurring. In ICCV, Cited by: §1, §2.
  • [37] K. Simonyan and A. Zisserman (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. Cited by: §2, §2, §S3.
  • [38] S. Su, M. Delbracio, J. Wang, G. Sapiro, W. Heidrich, and O. Wang (2017) Deep video deblurring for hand-held cameras. In CVPR, Cited by: §1, §2.
  • [39] M. Suin, K. Purohit, and A. N. Rajagopalan (2020) Spatially-attentive patch-hierarchical network for adaptive motion deblurring. In CVPR, Cited by: §2.
  • [40] J. Sun, W. Cao, Z. Xu, and J. Ponce (2015) Learning a convolutional neural network for non-uniform motion blur removal. In CVPR, Cited by: §2.
  • [41] L. Sun, S. Cho, J. Wang, and J. Hays (2013) Edge-based blur kernel estimation using patch priors. In ICCP, Cited by: §2.
  • [42] X. Tao, H. Gao, R. Liao, J. Wang, and J. Jia (2017) Detail-revealing deep video super-resolution. In ICCV, Cited by: §1, §S1.
  • [43] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia (2018) Scale-recurrent network for deep image deblurring. In CVPR, Cited by: §1, §S1, §2, Figure 8, §4.3, §4.
  • [44] S. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros (2020) CNN-Generated images are surprisingly easy to spot… for now. In CVPR, Cited by: §3.2, §S3.
  • [45] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, et al. (2004) Image quality assessment: from error visibility to structural similarity. IEEE TIP 13 (4), pp. 600–612. Cited by: §S1.
  • [46] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce (2012) Non-uniform deblurring for shaken images. IJCV 98 (2), pp. 168–186. Cited by: §1, §2.
  • [47] L. Xu, S. Zheng, and J. Jia (2013) Unnatural L0 sparse representation for natural image deblurring. In CVPR, Cited by: §2.
  • [48] Y. Yuan, W. Su, and D. Ma (2020) Efficient dynamic scene deblurring using spatially variant deconvolution network with optical flow guided training. In CVPR, Cited by: §1, §2.
  • [49] H. Zhang, Y. Dai, H. Li, and P. Koniusz (2019) Deep stacked hierarchical multi-patch network for image deblurring. In CVPR, Cited by: §S1, §S1, §2, §4.
  • [50] J. Zhang, J. Pan, J. Ren, Y. Song, L. Bao, R. W.H. Lau, and M. Yang (2018) Dynamic scene deblurring using spatially variant recurrent neural networks. In CVPR, Cited by: §2.
  • [51] K. Zhang, W. Luo, Y. Zhong, L. Ma, B. Stenger, W. Liu, and H. Li (2020) Deblurring by realistic blurring. In CVPR, Cited by: §2.
  • [52] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang (2018) The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, Cited by: §S1, §2, §4.