Log In Sign Up

Beyond Camera Motion Removing: How to Handle Outliers in Deblurring

Performing camera motion deblurring is an important low-level vision task for achieving better imaging quality. When a scene has outliers such as saturated pixels and salt-and pepper noise, the image becomes more difficult to restore. In this paper, we propose an edge-aware scalerecurrent network (EASRN) to conduct camera motion deblurring. EASRN has a separate deblurring module that removes blur at multiple scales and an upsampling module that fuses different input scales. We propose a salient edge detection network to supervise the training process and solve the outlier problem by proposing a novel method of dataset generation. Light streaks are printed on the sharp image to simulate the cutoff effect from saturation. We evaluate our method on the standard deblurring datasets. Both objective evaluation indexes and subjective visualization show that our method results in better deblurring quality than the other state-of-the-art approaches.


page 1

page 3

page 5

page 6

page 8


Single Image Deblurring and Camera Motion Estimation with Depth Map

Camera shake during exposure is a major problem in hand-held photography...

Clean Images are Hard to Reblur: A New Clue for Deblurring

The goal of dynamic scene deblurring is to remove the motion blur presen...

FN-Net:Remove the Outliers by Filtering the Noise

Establishing the correspondence between two images is an important resea...

Fast and Full-Resolution Light Field Deblurring using a Deep Neural Network

Restoring a sharp light field image from its blurry input has become ess...

Edge-Aware Deep Image Deblurring

Image deblurring is a fundamental and challenging low-level vision probl...

Model Quality Aware RANSAC: A Robust Camera Motion Estimator

Robust estimation of camera motion under the presence of outlier noise i...

Human and Scene Motion Deblurring using Pseudo-blur Synthesizer

Present-day deep learning-based motion deblurring methods utilize the pa...

1 Introduction

Figure 1: A case of camera motion deblurring with outlier removal. (a) blurred image; (b) result from Hu et al. [9]

who utilizes “light streaks ?to estimate blur kernel; (c) result from SRN 

[26] which is trained by GoPro dataset; (d) result from the proposed EASRN.

Image blur due to camera motion is a common type of image quality degradation. Camera motion deblurring benefits include the ability to extend the shutter duration during photography and image quality improvements, such as higher dynamic range, more accurate color reproduction, and lower noise levels.

When scene illumination conditions are inappropriate, blurred images are often accompanied by overexposure or heavy noise. Regardless of whether traditional methods or learning-based methods are used, outliers such as saturated pixels and salt-and-pepper noise play negative roles in image restoration. The outliers mislead the estimation the degenerate function [3] and cut off the real light intensity information, which leads to ringing artifacts during deblurring.

Hence, our research focuses on camera motion deblurring with outlier removal via a convolutional neural network (CNN) framework. At the framework level, we select a scale-recurrent structure 

[26] because it fits the “coarse-to-fine ?strategy, which is effective in both learning-based and rule-based deblurring approaches. When the parameters are limited, the collection of degradation functions that a CNN can restore is finite. The scaling operation of blurred images works to increase the size of the collection of restorable degradation functions. The scale-recurrent network (SRN) can process large-scale blurred images via multiscale iteration. Moreover, the scale-recurrent structure effectively reduces the number of parameters [26]. Within the scale approach, we design a novel Encoder-Decoder ResBlock network to achieve deblurring and a cascaded ResBlock network to perform the conversions between scales.

According to Liu et al. [14], the performance of image deblurring quality evaluations depend primarily on the unprocessed blur, the ringing artifacts and noise in the deblurred result. Among the above three factors, ringing artifacts most strongly affect the human visual response. Ring artifacts reflect the mid-frequency level of the deblurred image. In addition, according to Xu [29], not all edges of an image should be treated equally because the edges whose sizes are smaller than the blur kernel are negative for blur kernel estimation. Hence, we train a salient edge detection network to supervise the deblurring. Salient edge (SE) loss is proposed to prevent the deblurred results from ringing artifacts.

As mentioned above, outlier removal is important in camera motion deblurring. We solve this issue from the dataset. The previous learning-based methods employed two technical routes to synthesize blurred-sharp image pairs. One approach synthesizes blurred images by convolving sharp images with uniform or nonuniform blur kernels [8, 2]. However, this approach has difficulty reflecting the continuous spatial changes of blur in the image.

The other approach is to generate blurred images by averaging consecutive short-exposure frames from high frame-rate videos [16]. This approach is aimed at deblurring dynamic scenes. The image quality of the published dataset is unsatisfactory. The main problem with CNN methods based on dynamic scene datasets under camera motion deblurring is that they are likely to transform independent edges or gradient areas in the latent image into a single strong edge. Moreover, none of the existing datasets have considered the outlier issue. Hence, we propose a dataset synthesis approach that utilizes optical flow and light streaks to more realistically simulate camera motion blur. In real camera motion cases, we prefer the dynamic objects to remain blurred rather than risk creating artifacts. The experimental results demonstrate that our dataset is effective at deblurring blurred images with outliers.

The main contributions of our work are as follows.

  • Based on a more accurate imaging model of the blur process, we propose a dataset synthesis approach that considers the outliers issue in camera motion deblurring. With the contribution of our dataset, the deblurring neutral network is able to restore the blurred images with outliers.

  • We propose a novel scale-recurrent network architecture for camera motion deblurring that consists of two parts: a deblurring part and an upsampling part. Both parts have distinctive capabilities and can conveniently be integrated into multiscale frameworks.

  • We propose a salient edge loss to prevent deblurred results from ringing artifacts and design a salient edge detection network to supervise the deblurring process, which makes the deblurred results more suitable for human vision.

Based on the above contributions, our work succeeds in solving the camera motion blur issue with outliers. Through extensive empirical tests and evaluations, the proposed method outperforms the state-of-the-art methods with regard to both image restoration quality and robustness, as shown in Fig. 1.

2 Related Work

Figure 2: The proposed EASRN framework and architectures.

Image deblurring is divided into two categories based on whether the blur kernel is known, i.e., nonblind deblurring [3, 21, 20] and blind deblurring [22, 3, 27, 1, 4, 11, 17, 19, 29, 18, 10]

. The latter is more realistic yet more ill-posed. Blind deblurring methods include both rule-based and learning-based approaches. The rapidly development of deep learning techniques has improved the ability to perform image deblurring. Previous works 

[25, 32, 31, 6] replaced some steps of rule-based frameworks with CNNs. More recent work has focused on creating end-to-end networks for motion deblurring. Nah et al. [16] learned a coarse-to-fine strategy in a rule-based deblurring framework and proposed a multiscale CNN to enlarge the receptive field for large motion cases. The proposed GoPro Dataset has been widely applied in dynamic scene deblurring studies. Tao et al. [26] extended the multiscale CNN to a scale-recurrent network (SRN), which not only reduces the number of parameters but also increases the robustness of the deblurring effect. Gao et al. [5] further developed the SRN by adding parameter-selective sharing and nested skip connections. Kupyn et al. [12] employed a generative adversarial network (GAN) for single image motion blurring, named DeblurGAN, which provided another approach to deblurring. However, due to the GoPro dataset used for training, above all [26, 5, 12, 16]

had the same problem: these methods probably transform the independent edges or the gradient area in the latent image into one strong edge.

Regarding outlier removal in deblurring, several rule-based deblurring approaches have made contributions. Cho et al. [3] analyzed the effect of outliers and proposed a probability model to extend the Bayesian framework. Pan et al. [18] analyzed how outliers mislead algorithms in blur kernel estimation and proposed a confidence function to remove the outliers during kernel estimation, which enabled the algorithm to address blurred images with outliers. Hu et al. [9] attempted to employ light streaks to estimate the blur kernel, and the proposed method performs impressively in various situations. However, when the scene has no obvious light streaks, this method is unsatisfactory. To the best of our knowledge, no existing learning-based approach is targeted toward deblurring images with outliers.

3 Proposed Method

In this section, we describe our model development. Fig. 2 provides an overview of the proposed network architecture, called the edge-aware scale-recurrent network (EASRN). The network has a recurrent architecture consisting of two parts: a deblurring subnet and an upsampling subnet. A blurred image is first decomposed into an -scale Gaussian pyramid . We deblur from a minimum scale to a maximum scale. At scale , the restoration process is as follows:


where denotes the deblurring subnet as a function, and denotes the upsampling subnet as a function. The intermediate variable represents the preprocessed blurred image, which is deblurred by the deblurring subnet to obtain the latent image . Then, is upsampled and concatenated with the next-scale blurred image , forming the input to the upsampling subnet to obtain the next-scale intermediate variable . The latent image at the full resolution, , is the final output.

3.1 Deblurring subnet

The deblurring subnet is designed to deblur an input image. Our deblurring subnet employs the encoder-decoder network structure. The encoder outputs a feature map that represents the input. The decoder uses the feature maps from the encoder and tries to achieve the closest match to the intended output. The encoder uses max pooling to downsample the feature maps, while the decoder uses deconvolution upsampling. Skip connections are added between corresponding feature maps in the encoder-decoder to combine different information levels.

Here, we introduce residual-in-residual blocks (Res-in-Res Block) in both the encoder and decoder parts. Each Res-in-Res Block contains two Resblocks [7]

in a residual unit with a short skip connection. At the same time, an inception module is adopted between the encoder and decoder to capture multiscale information. The four filter sizes in the inception module are set to 1, 3, 5, 7, separately and the channel numbers of the encoder-decoder pairs are 32, 64, 128 and 256. Two additional convolution layers are employed at the front and back ends to extract features from the input and combine them into the output. A residual connection from the input blurred image to the output deblurred result is added to improve the residual learning. Except for the inception module, all kernel sizes are set to 3. A Leaky Rectified Linear Unit (LReLU) 


is employed as the activation function for the entire subnet. Compared to Tao

et al. [26] who used large convolution kernel sizes and cascaded ResBlocks to form encoder/decoder blocks, our structure has fewer parameters and better performance.

3.2 Upsampling subnet

The upsampling subnet is designed to fuse the lower-scale output of the deblurring subnet with the next-highest scale of the blurred image. The ideal situation is to acquire the low frequency information from the lower-scale output and the high frequency information from the upper-scale blurred image. Hence, We separated upsampling task from deblurring and designed an extraction-and-upsampling subnet to update the multiscale fusion effect.

The upsampling subnet consists of an upsampling and concatenation layer and 3 Res-in-Res Blocks. The Res-in-Res Blocks have 32 channels. The number of channels in the output is the same as that of the input. Similar to the deblurring subnet, we adopt filters and LReLU. Using this subnet, the immediate deblurring result will restore more high-frequency textures.

3.3 Training losses

As mentioned above, the image quality of the deblurred result reflects three main aspects: noise level, ringing artifacts and residual blur. The outliers in blurred images increase the difficulty of solving the above three problems. Hence, we design a loss for each aspect.

For noise reduction and data fidelity, we use the Manhattan distance for each scale. The ground truth, , is decomposed to obtain a -scale Gaussian pyramid , and the fidelity loss is calculated as follows:


Compared with Euclidean loss, norm has better noise tolerance and performs better for ringing artifact removal.

To remove ringing artifacts, we employ the Manhattan distance between the salient edge maps of the deblurred result and ground truth to represent the ringing artifact loss. A salient edge-detection network (SEDnet) is designed to obtain the salient edge map. An example salient edge-detection case is shown in Fig. 6 and discussed in detail in Section 4.2.

We employ a structure similar to that of the deblurring subnet for SEDnet in which the only difference is that we remove the residual connection from the input to the output. This similar structure makes the receptive fields of the two networks match. Only the full resolution deblurred result is required to have the closet salient edge map with the ground truth because it is uncertain whether the ringing artifacts of intermediate latent images are incurable.

Thus, the ringing artifact loss is represented as follows:


where denotes SEDnet as a function and represents a weight coefficient.

The residual blur of the deblurred result reflects the loss of detail. We employ perceptual loss at each scale to enhance the detail in the deblurred results. We select and from VGG-19 as the feature maps to be compared. However, the deblurred result will be gridded if only perceptual loss is used. Hence, we also employ total variation regularization to remove the grid effect. The detailed enhancement loss is denoted as


where represents feature map of VGG-19, or , and represent the corresponding weight coefficients.

We restrict the output from three aspects: noise level, ringing artifact removal and detail enhancement. The total loss is represented as follows:


3.4 Dataset

Figure 3: Flowchart for synthesizing blurred/sharp image pairs.

The methods to acquire blurred-sharpened image pairs directly from hardware devices is difficult. A typical approach for obtaining synthetic image pairs is to generate blurred images by averaging consecutive short-exposure frames from high frame-rate videos. However, datasets produced using this approach are not appropriate for our task because the scenes are dynamic; in contrast, we aim to restore images that suffer from camera motion blur. Another widely adopted method is to create blurred images by convolving the sharp image with either uniform or nonuniform blur kernels. The fidelity of the blur kernels determines the value of the synthetic dataset. Boracchi and Foi [2] analyzed the coordinate function of each pixel in image space under real-world camera-motion conditions. We followed this idea. First, we assume that the camera motion is a Markovian process. Next, we calculated the coordinates of each pixel to obtain the optical flow map at time

. Then, we warp the sharp image utilizing the optical flow map and subpixel interpolation to obtain the intermediate image at time

. Finally, we accumulate and average the intermediate image series to obtain the blurred image and add Gaussian noise with standard deviation after the blurring process is complete. Compared with previous approaches, our synthesis method is closer to real camera motion situations.

To simulate the degradation feature of outliers in blur situations, we apply an extra ‘print light streaks ?processing step to part of the dataset. Hu et al. [9] indicates that distinguishing the light streaks in blur images and the other salient edges is advantageous for kernel estimation. We expect the CNN to extract and utilize the blur information in the light streaks—or at least to distinguish them from the normal exposed texture. First, we generate light streaks using the approach of generating random trajectories generation proposed by Boracchi and Foi [2]. The intensity of the light streaks is 1 to 10 times that of the maximum dynamic range of the sharp image, simulating the cutoff effect of the limited dynamic range. The intensities of the three channels are treated independently to simulate colorful light sources. Next, we randomly select a certain percentage of the sharp images before blur processing and insert random numbers of the generated light streaks into them. Then, the sharpened images with light streaks are blurred by the proposed blur synthesis method. Finally, the values of both the sharp and blurred images are clipped to match the original dynamic range. Because blur processing leads to a center offset of the PSF (point spread function), the sharp image is registered and warped correspondingly to obtain the ground truth. A flowchart of this process is shown in Fig. 3, and Fig. 4 shows an example blurred/sharp pair with the extra ‘print light streaks ?processing.

Figure 4: An example of a synthesized blurred/sharp image pair.

3.5 Implementation details

We implemented all our models using the TensorFlow platform. Training was performed on two NVIDIA TITAN xp GPUs. Our source code will be released publicly.

Data Preparation For EASRN, we excluded the blurred images and selected 3,155 high-resolution images from the Flickr2K and DIV2k datasets as the sharp images. After random cropping, flipping and gamma correction operations, we obtained 50,480 sharp images with a size of . On one-third of the sharp images, we added 2 to 20 light streaks randomly. The maximum relative shift in optical flows is constrained to 30 pixels by setting the appropriate range of the camera motion blurring. The standard deviation of Gaussian noise is set to no more than 0.02. After obtaining the blurred/sharp image pairs, we divided the dataset into 46,000 pairs for training and 4,480 pairs for evaluation.

For SEDnet, we employed the augmented BSDS data from Xie and Tu [28] as our dataset. The BSDS 500 dataset provides an empirical basis for research on image segmentation and boundary detection, and it includes 500 hand-labeled image-mask pairs. After scaling, rotation and flipping operations, an augmented BSDS 500 dataset includes 28,801 pairs.

Model Parameters The blurred image and ground truth are decomposed to 3 scales; thus, during training. The deblurring processes move from the smallest to the largest scale. The input image of the minimum scale is set as the corresponding scale of the blurred image, . The other parameters were set as follows: , and .

Training Details For EASRN, we employed the Adam optimizer and set and

for training. The learning rate decayed exponentially every 40 epochs from its initial value of

at a power of 0.1. In each iteration, we sampled a batch of 8 blurred/sharp image pairs and randomly cropped them to -pixels for both the training input and the ground truth. loss was employed to train the SEDnet. Then the pretrained VGG-19 and SEDnet were directly loaded as fixed networks. The complete training process requires approximately 48 hours for 80 epochs.

4 Experimental Results

In this section, we analyze the effectiveness of the proposed contributions by controlled experiments.

4.1 Effectiveness of the upsampling subnet

Figure 5: Details of the intermediate results.(a) the deblurring output from the upper scale after the upsampling operation; (b) the blurred input in the current scale; (c) the output of the upsampling subnet in the current scale; (d) the deblurring output in the current scale; (e) the deblurring result from the model without an upsampling subnet; (f) detail enlargements of (a), (d) and (e).

To demonstrate the effectiveness of the separated upsampling subnet, we ablated the upsampling subnet of EASRN and trained the modified network. In this model, the latent output is upsampled and concatenated directly with the next-scale blurred image to form the input for the next scale. Fig. 5 shows the details of the intermediate results from the proposed method and the model without the upsampling subnet. The output of the upsampling subnet fuses the low frequency information of with the high frequency information of . The deblurred result in Fig. 5(d) has more texture and higher resolution compared with the image in Fig. 5(a). However, the network without upsampling subnet generates a deblurred result that is slightly different from the upsampled upper-scale image . Moreover, with the preprocessing from the upsampling subnet, the deblur subnet is independent on every scale.

4.2 Effectiveness of SED loss

Figure 6: A comparison between models with and without SED loss.

To demonstrate the effectiveness of SED loss, we trained the proposed network with . The obtained model is named EASRN w.o.SED. The quantitative results of this model on Kohler’s dataset are listed in Table 2. Both the PSNR and MSSIM of the network with SED loss are higher than the model without SED loss. To indicate the role of SED loss more intuitively, the image deblurring case shown in Fig. 6 is employed to illustrate the effect. Distortions of edge spatial intensity distribution result in ringing artifacts. The SED loss applies additional constraints to the pixels whose corresponding regions in the ground truth are salient edges. The SED loss requires the salient edges to appear in the deblurred results in terms of position and contrast. Hence, the deblurred results from EASRN are better at ringing artifact removal than the model without SED loss.

4.3 Characteristic of the proposed dataset

Our dataset is proposed to simulate the camera motion blur in photography. The blur in the dynamic scene is not our restorative target; instead, the intent of the proposed dataset is that the blur caused by limited depth of field and dynamic scene should remain unchanged after deblurring. Considering this practicality, we set the size of blur kernel to no more than pixels for a image.

To demonstrate the characteristics of our dataset, we employ the proposed dataset to retrain the SRN model [26] whose network architecture is similar to ours and set the training parameters the same as for the original SRN reported in [2]. After 80 epochs, the losses converge. We name this retrained model SRN*. The quantitative results on Lai’s dataset are listed in Tables 3. All the SRN* evaluation indices are higher than that of the original SRN. This result indicates that the proposed synthesis method is closer to real camera motion blur. The effect of outlier removal is discussed in Lai’s dataset of Section 4.4.

Next, we analyze the side effect of the additional ‘print light streaks ?processing. In addition to the dataset that contains 1/3 overexposed (OE) images, we construct a control group in which none of the sharp images include added light streaks. We trained the model with the control group dataset and obtain a version named EASRN w.o. OE. The quantitative comparison results on Kohler’s dataset are listed in Table 2. The PSNR and MSSIM of EASRN are lower than those of EASRN w.o. OE because Kohler’s dataset does not include blurred images with outliers. This result indicates that the model has gained the ability to handle outliers by slightly sacrificing the image restoration quality.

4.4 Comprehensive comparison

We also performed a comprehensive comparison between the proposed EASRN and both rule-based and learning-based state-of-the-art deblurring approaches on the GoPro dataset, Kohler’s dataset, and Lai’s dataset. SRN [26] and DeblurGAN [12] are two representative deblurring networks. Pan et al. [17] and Xu et al. [30] achieved the highest performances among the traditional deblurring methods according to Lai et al. [13]. Hence, the above four blind deblurring methods are our main comparisons. For SRN [26] and DeblurGAN [12], we employed the pretrained model downloaded from the authors ?webpages. For Pan et al. [17] and Xu et al. [30], we employed the deblurring results reported by Lai et al. [13].

GoPro dataset The GoPro dataset represents dynamic scene blurring, which is not exactly the same as camera motion blurring. The intent of EASRN is that blur caused by limited depth of field and dynamic scenes should remain unchanged after deblurring. We randomly generated 220 synthetically blurred images from 11 GoPro videos as our test images. Table 1 summarizes the performance and efficiency of SRN [26], DeblurGAN [12] and the proposed method. The deblurring performance of EASRN lies between that of SRN [26] and DeblurGAN [12] according to the ranked PSNR and SSIM results. Considering that SRN [26] and DeblurGAN [12] were trained on the GoPro dataset but our method was not, these deblurring results are acceptable. Moreover, the computational time cost of our method on a image is only 0.22 s, which is the fastest among these methods.

Measure SRN DeblurGAN EASRN
PSNR 28.73 27.15 27.69
SSIM 0.9517 0.8755 0.9354
Flops 1434 G 678 G 984 G
Time 0.33 s 0.62 s 0.22 s
Table 1: Performance and efficiency comparison on the GoPro test dataset.

Kohler’s dataset Kohler’s dataset contains images that represent camera motion blurring. We employed the evaluation code from Kohler’s webpage to calculate the PSNR and MSSIM. Table 2 summarizes the PSNR and MSSIM results on Kohler’s dataset. The deblurring performance of EASRN is better than that of SRN [26] and DeblurGAN [12] on the dataset. Our training dataset uses a generation approach similar to that of Kohler’s dataset; the other two models are trained on the GoPro dataset. Because Kohler’s dataset does not include saturated blurred images, the EASRN trained without the OE dataset performs better on this dataset than does the one trained with the OE dataset.

Measure SRN DeblurGAN EASRN
w.o. SED w.o. OE full
PSNR 26.75 24.64 26.64 27.78 27.45
MSSIM 0.8370 0.7880 0.8471 0.8547 0.8468
Table 2: PSNR and SSIM comparisons on Kohler’s dataset
Figure 7: The saturated deblurring cases on the synthetic part of Lai’s dataset.
Figure 8: The typical deblurring cases on the real-world part of Lai’s dataset.

Lai’s dataset Lai’s dataset emphasizes real-world blurring. In the synthetic portion, 25 sharp images are degraded by 4 kernels/gyro. series to obtain 100 uniform/nonuniform blurred images. Moreover, 20% of the blurred images are saturated. Hence, we additionally tested SRN* to validate the effect of our dataset. For the tests with the synthetic portion of Lai’s dataset, we selected VIF [24] and IFC [23] as the evaluation indexes. According to Lai et al. [13], VIF emphasizes image edges, while IFC focuses on high-quality details. VIF and IFC are more highly correlated with subjective visualization results than PSNR and SSIM. Table 3 summarizes the performance comparisons on the synthetic part and the real part of Lai’s dataset. EASRN obviously performs better than the other deblurring methods listed in Table 3. The proposed dataset is beneficial for saturated blurred image deblurring. Fig. 7 shows the saturated deblurring cases on the synthetic part of Lai’s dataset. The upper image is a nonuniform blurred image, and the bottom image is a uniform blurred image. Comparing Fig. 7(c) and Fig. 7(d), SRN* gains the capability to restore the saturated images, while the SRN trained with the GoPro dataset cannot distinguish the real blur kernel under the interference of outliers. Comparing Fig. 7(f) with the other deblurring results, the EASRN results have higher detail contrast and fewer ringing artifacts. In the nonuniform blurred case, we select the lower-left and upper-right patches of the image and pay them extra attention. These two patches have different blur kernels. Both patches are well deblurred in EASRN, which indicates its capability for nonuniform deblurring.

Method uniform nonuniform real
Pan et al. 0.0842 0.626 0.3458 2.5975 -10.78
Xu et al. 0.0586 0.3659 0.2527 1.7812 -10.58
DeblurGAN 0.1245 0.9430 0.3539 2.6563 -10.71
SRN 0.1072 0.8090 0.3373 2.4884 -11.79
SRN* 0.1196 0.9005 0.3604 2.7219 -11.44
EASRN 0.1409 0.9775 0.3851 2.9063 -9.97
Table 3: Performance comparisons on the Lai’s dataset.

The real-world part of Lai’s dataset consists of 100 real-world blurred images, of which 27 are saturated images. These images cover various types of scenes and have different qualities and resolutions. We select a half-subjective no-reference evaluation method proposed by Liu et al. [14] to validate the performance of EASRN. Using this method, a higher image quality index refers to a better deblurring effect. EASRN performs the best among the tested methods from Table 3. On the entire dataset, the quality indexes of EASRN occupy 40% of the top 1 scores and 84% of the top 3, which demonstrates the robustness of EASRN. Fig. 8 shows some typical deblurring cases on the real-world part of Lai’s dataset. Compared with the other methods, the deblurring results of EASRN have sharper salient edges and include more high-frequency details, fewer ringing artifacts, and less noise.

5 Conclusion

In this paper, we proposed an edge-aware scale-recurrent neural network, called EASRN, to perform camera motion deblurring with outliers. Benefitting from separate subnets and a scale-recurrent approach, EASRN can handle more complicated blurring situations. We successfully solved the saturation problem in deblurring by devising a novel dataset generation method, and proposed a salient edge detection neural network to assist in constructing a loss function to suppress ring artifacts. When equipped with the three contributions mentioned above, EASRN obtains better image quality in the deblurred results than do other state-of-the-art methods.


  • [1] L. Bar, N. Kiryati, and N. Sochen (2006) Image deblurring in the presence of impulsive noise.

    International Journal of Computer Vision

    70 (3), pp. 279–298.
    Cited by: §2.
  • [2] G. Boracchi and A. Foi (2012) Modeling the performance of image restoration from motion blur. IEEE Transactions on Image Processing 21 (8), pp. 3502–3517. Cited by: §1, §3.4, §3.4.
  • [3] S. Cho, J. Wang, and S. Lee (2011) Handling outliers in non-blind image deconvolution. In 2011 International Conference on Computer Vision, pp. 495–502. Cited by: §1, §2, §2.
  • [4] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman (2006) Removing camera shake from a single photograph. In ACM transactions on graphics (TOG), Vol. 25, pp. 787–794. Cited by: §2.
  • [5] H. Gao, X. Tao, X. Shen, and J. Jia (2019) Dynamic scene deblurring with parameter selective sharing and nested skip connections. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 3848–3856. Cited by: §2.
  • [6] D. Gong, J. Yang, L. Liu, Y. Zhang, I. Reid, C. Shen, A. Van Den Hengel, and Q. Shi (2017) From motion blur to motion flow: a deep learning solution for removing heterogeneous motion blur. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2319–2328. Cited by: §2.
  • [7] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.1.
  • [8] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schölkopf (2011) Fast removal of non-uniform camera shake. In 2011 International Conference on Computer Vision, pp. 463–470. Cited by: §1.
  • [9] Z. Hu, S. Cho, J. Wang, and M. Yang (2014) Deblurring low-light images with light streaks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3382–3389. Cited by: Figure 1, §2, §3.4.
  • [10] Z. Hu, L. Yuan, S. Lin, and M. Yang (2016) Image deblurring using smartphone inertial sensors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1855–1864. Cited by: §2.
  • [11] D. Krishnan and R. Fergus (2009) Fast image deconvolution using hyper-laplacian priors. In Advances in neural information processing systems, pp. 1033–1041. Cited by: §2.
  • [12] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin, and J. Matas (2018)

    Deblurgan: blind motion deblurring using conditional adversarial networks

    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8183–8192. Cited by: §2, §4.4, §4.4, §4.4.
  • [13] W. Lai, J. Huang, Z. Hu, N. Ahuja, and M. Yang (2016) A comparative study for single image blind deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1709. Cited by: §4.4, §4.4.
  • [14] Y. Liu, J. Wang, S. Cho, A. Finkelstein, and S. Rusinkiewicz (2013) A no-reference metric for evaluating the quality of motion deblurring.. ACM Trans. Graph. 32 (6), pp. 175–1. Cited by: §1, §4.4.
  • [15] A. L. Maas, A. Y. Hannun, and A. Y. Ng (2013) Rectifier nonlinearities improve neural network acoustic models. In Proc. icml, Vol. 30, pp. 3. Cited by: §3.1.
  • [16] S. Nah, T. Hyun Kim, and K. Mu Lee (2017) Deep multi-scale convolutional neural network for dynamic scene deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3883–3891. Cited by: §1, §2.
  • [17] J. Pan, Z. Hu, Z. Su, and M. Yang (2014) Deblurring text images via l0-regularized intensity and gradient prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2901–2908. Cited by: §2, §4.4.
  • [18] J. Pan, Z. Lin, Z. Su, and M. Yang (2016) Robust kernel estimation with outliers handling for image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2800–2808. Cited by: §2, §2.
  • [19] J. Pan, D. Sun, H. Pfister, and M. Yang (2016) Blind image deblurring using dark channel prior. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1628–1636. Cited by: §2.
  • [20] W. H. Richardson (1972) Bayesian-based iterative method of image restoration. JoSA 62 (1), pp. 55–59. Cited by: §2.
  • [21] L. I. Rudin, S. Osher, and E. Fatemi (1992) Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena 60 (1-4), pp. 259–268. Cited by: §2.
  • [22] Q. Shan, J. Jia, and A. Agarwala (2008) High-quality motion deblurring from a single image. Acm transactions on graphics (tog) 27 (3), pp. 73. Cited by: §2.
  • [23] H. R. Sheikh, A. C. Bovik, and G. De Veciana (2005) An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Transactions on image processing 14 (12), pp. 2117–2128. Cited by: §4.4.
  • [24] H. R. Sheikh and A. C. Bovik (2006) Image information and visual quality. IEEE Transactions on image processing 15 (2), pp. 430–444. Cited by: §4.4.
  • [25] J. Sun, W. Cao, Z. Xu, and J. Ponce (2015) Learning a convolutional neural network for non-uniform motion blur removal. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 769–777. Cited by: §2.
  • [26] X. Tao, H. Gao, X. Shen, J. Wang, and J. Jia (2018) Scale-recurrent network for deep image deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8174–8182. Cited by: Figure 1, §1, §2, §3.1, §4.3, §4.4, §4.4, §4.4.
  • [27] O. Whyte, J. Sivic, and A. Zisserman (2014) Deblurring shaken and partially saturated images. International journal of computer vision 110 (2), pp. 185–201. Cited by: §2.
  • [28] S. Xie and Z. Tu (2015) Holistically-nested edge detection. In Proceedings of the IEEE international conference on computer vision, pp. 1395–1403. Cited by: §3.5.
  • [29] L. Xu and J. Jia (2010) Two-phase kernel estimation for robust motion deblurring. In European conference on computer vision, pp. 157–170. Cited by: §1, §2.
  • [30] L. Xu, S. Zheng, and J. Jia (2013) Unnatural l0 sparse representation for natural image deblurring. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1107–1114. Cited by: §4.4.
  • [31] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang (2017) Beyond a gaussian denoiser: residual learning of deep cnn for image denoising. IEEE Transactions on Image Processing 26 (7), pp. 3142–3155. Cited by: §2.
  • [32] K. Zhang, W. Zuo, S. Gu, and L. Zhang (2017) Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3929–3938. Cited by: §2.