Learning to Deblur Images with Exemplars

05/15/2018 ∙ by Jinshan Pan, et al. ∙ University of California, Merced 2

Human faces are one interesting object class with numerous applications. While significant progress has been made in the generic deblurring problem, existing methods are less effective for blurry face images. The success of the state-of-the-art image deblurring algorithms stems mainly from implicit or explicit restoration of salient edges for kernel estimation. However, existing methods are less effective as only few edges can be restored from blurry face images for kernel estimation. In this paper, we address the problem of deblurring face images by exploiting facial structures. We propose a deblurring algorithm based on an exemplar dataset without using coarse-to-fine strategies or heuristic edge selections. In addition, we develop a convolutional neural network to restore sharp edges from blurry images for deblurring. Extensive experiments against the state-of-the-art methods demonstrate the effectiveness of the proposed algorithms for deblurring face images. In addition, we show the proposed algorithms can be applied to image deblurring for other object classes.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 5

page 6

page 8

page 10

page 11

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The goal of image deblurring is to recover the sharp contents and corresponding blur kernel from one blurry input. The image formation is usually formulated as

(1)

where is the blurred input image, is the latent sharp image, is the blur kernel, is the convolution operator, and is the noise term. The single image deblurring problem has attracted much attention with significant advances in recent years [1, 2, 3, 4, 5, 6, 7, 8, 9]. As image deblurring is an ill-posed problem, additional information is required to constrain the solutions. One common approach is to exploit statistical priors of natural images such as heavy-tailed gradient distributions [1, 2, 3, 10], prior [7], and sparsity constraints [11]. While these priors have been shown to be effective for deblurring in general, they are not designed to capture image properties for specific object classes. Recently, numerous methods that exploit specific properties have been developed for text and low-light images [12, 13, 14, 15]. As human faces are one of the most interesting objects that find numerous applications, we mainly focus on face image deblurring in this work.

(a) (b) (c) (d)
(e) (f) (g) (h)
Fig. 1: A challenging example. (a) Blurred face image. (b)-(d) Results of Cho and Lee [4], Krishnan et al. [7], and Xu et al. [9]. (e)-(f) Intermediate results of Krishnan et al. [7] and Xu et al. [9]. (g) Restored salient edges by our exemplar-based method visualized by Poisson reconstruction. (h) Deblurred image by our method (with the support size of pixels).

The success of the state-of-the-art image deblurring methods hinges on implicit or explicit restoration of salient edges for kernel estimation [4, 5, 6, 9]. Existing algorithms predict sharp edges, mainly based on local image gradients without considering the structural information of an object class. Ambiguity inevitably arises in restoring salient edges when only local appearance is considered due to the ill-posed image deblurring problem. Furthermore, for blurred images without much texture, the edge prediction schemes require parameter tuning and do not usually perform well. For example, face images have similar components and skin complexion with less texture than natural images, and existing deblurring methods do not perform well on such inputs. Fig. 1(a) shows a blurry face image which contains scarce texture as a result of large motion blur. For such images, it is difficult to restore a sufficient number of sharp edges for kernel estimation using the state-of-the-art methods. Fig. 1(b) and (c) show that the state-of-the-art methods based on sparsity prior [7] and explicit edge prediction [4] do not deblur this image well.

In this work, we first propose an exemplar-based method to address the above-mentioned issues for deblurring face images. To exploit the structural information from one specific class, we collect an exemplar dataset and restore important visual information for kernel estimation. For each test image, we use the exemplar with most similar facial structure to restore salient edges and guide the kernel estimation process. Fig. 1(g) shows that the proposed method is able to restore important facial structures for kernel estimation, and deblur this blurred image (Fig. 1(h)).

Predicting salient edges based on exemplars entails an effective similarity metric and search in a large exemplar dataset, which is computationally expensive. We further develop a deep convolutional neural network (CNN) to restore salient edges from the blurred input. The proposed CNN-based algorithm performs favorably against with the exemplar-based method and can be carried out in real-time. In addition, we show that the proposed algorithm can be directly applied to deblur images of other object classes.

2 Related Work

Image deblurring has been studied extensively in computer vision and machine learning. In this section we discuss the most relevant algorithms and put this work in proper context.

Statistical Priors. Since blind image deblurring is an ill-posed problem, it requires certain assumptions or prior knowledge to constrain the solution space. Early approaches, e.g., [16], assume simple parametric blur kernels to deblur images, which cannot deal with complex motion blur. As image gradients of natural images can be modeled well by a heavy-tailed distribution, Fergus et al. [1] use a mixture of Gaussians to learn the statistical prior for deblurring. Similarly, Shan et al. [3]

use a parametric model to approximate the heavy-tailed prior for natural images. In 

[11], Cai et al. assume that the latent images and kernels can be sparsely represented by an over-complete dictionary based on wavelets. On the other hand, it has been shown that the most favorable solution for a maximum a posteriori (MAP) deblurring method with sparsity prior is usually a blurred image rather than a sharp one [10]. As [10] is usually computationally expensive, an efficient algorithm for approximation of marginal likelihood is developed [2] for image deblurring.

Image Priors in Favor of Clear Images. Different image priors that favor clear images instead of blurred images have been introduced for image deblurring. Krishnan et al. [7] present a normalized sparsity prior, and Xu et al. [9] use the constraint on image gradients for kernel estimation. Non-parametric patch priors that model edges and corners have also been proposed [17] for blur kernel estimation. We note that although the use of sparse priors facilitates kernel estimation, it is likely to fail when the blurred images do not contain rich texture. In [18], Michaeli and Irani exploit internal patch recurrence for image deblurring. This method performs well when images contain repetitive patch patterns, but may fail otherwise. Class-specific image prior [19] has been shown to be effective for certain object categories and less effective for scenes with complex background. Recently, Pan et al. [20] develop an image prior based on the dark channel prior [21] for blur kernel estimation. However, this method does not perform well when clear images do not contain zero-intensity pixels or the blurred images contain noise.

(a) (b) (c) (d) (e) (f) (g)
(h) (i) (j) (k) (l) (m) (n)
Fig. 2: Effect of salient edges in kernel estimation. (a) True image and kernel. (h) Blurred image. (b)-(f) Extracted salient edges of facial components from the clear images visualized by Poisson reconstruction. (g) The ground-truth edges of (a). (i)-(n) Deblurred results by using edges (b)-(g), respectively.

Edge Selection. In addition to statistical priors, numerous blind image deblurring methods explicitly exploit edges for kernel estimation [4, 5, 6, 22]. Joshi et al. [6] and Cho et al. [22] use the restored sharp edges from a blurred image for kernel estimation. In [4], Cho and Lee utilize bilateral and shock filters to predict sharp edges. The blur kernel is determined by alternating between restoring sharp edges and estimating blur kernels in a coarse-to-fine manner. As strong edges restored from a blurred image are not necessarily useful for kernel estimation, Xu and Jia [5] develop a method to select informative ones for deblurring. Despite demonstrated success, these methods rely largely on image filtering methods (e.g., shock and bilateral filters) and heuristics for restoring sharp edges, which are less effective for objects with specific geometric structures.

Face Deblurring. A few algorithms have been developed to deblur face images for the recognition task. Nishiyama et al. [23] learn subspaces from blurred face images with known blur kernels for recognition. As the set of blur kernels is pre-defined, the application domain of this approach is limited. Zhang et al. [24] propose a joint image restoration and recognition method based on sparse representations. However, this method is most effective for well cropped and aligned face images with simple motion blurs.

Example-based Deblurring. Recently, HaCohen et al. [25] propose a deblurring method which uses sharp reference examples for guidance. The method requires a reference image with the same contents as the input to obtain dense correspondence for reconstruction. Although it has been shown to deblur specific images well, the assumption of using reference images with same contents limit its application domain. In contrast, the proposed methods do not require the exemplar to have the same or closely similar contents of the input. The blurred face image can be of different identity and background when compared to exemplar images. The proposed methods only require the matched example to have similar structures (in terms of image gradients) for kernel estimation instead of using dense corresponding pixels. As such, the proposed algorithms can be applied to class specific image deblurring with fewer constraints.

Convolutional Neural Networks. Convolutional neural networks have been widely used in low-level vision tasks including image denoising [26]

, super-resolution 

[27, 28], non-blind deconvolution [29, 30], blind image deblurring [31] and image filtering [32, 33]. Schuler et al. [31] incorporate a sharpening convolutional neural network into an iterative blind deconvolution method to estimate the blur kernel. However, this method needs to re-train different networks for kernels of different sizes, which limits the application domains. In [32], Xu et al. propose a method to learn edge-aware filters using a deep convolutional neural network. However, we note that this method can only be applied to approximate edge-aware filters for clear images. This method cannot be directly applied to restore salient edges from blurry images for kernel estimation.

3 Proposed Algorithms

As the kernel estimation problem is non-convex [1, 2], most state-of-the-art deblurring methods use coarse-to-fine approaches to refine the results. Furthermore, explicit or implicit edge selection schemes are adopted to constrain and converge to feasible solutions. Notwithstanding the demonstrated success in deblurring images, these methods are less effective for face images that contain fewer textured contents. To address these issues, we first propose an exemplar-based algorithm to estimate blur kernels for face images. The proposed method restores important structural information from exemplars to facilitate accurate kernel estimation. To reduce the computational cost, we further propose a CNN-based algorithm which can predict sharp edges more effectively than the exemplar-based method.

(a) (b) (c)
Fig. 3: Extracted salient edges (see Section 3.2.1 for details). (a) Input image. (b) Initial contour. (c) Refined contour.
Fig. 4: Kernel estimation accuracy (KS stands for kernel similarity) with respect to restored salient edges from different facial components. The -axis (b)-(g) represent 6 facial components in Fig. 2(b)-(g).

3.1 Structure of Face Images

We first determine the types and number of salient edges from exemplars for kernel estimation within the context of face deblurring. For face images, the salient edges that capture the object structure may be the lower contour, mouth, eyes, nose, eyebrows and hair. As eyebrows and hair have small edges with large variations which may be less effective for kernel estimation [5, 34], we do not consider them as useful structures. Fig. 2 shows several components restored from a clear face image as approximations of the latent image for kernel estimation.

To extract salient edges as shown in Fig. 2(b)-(g), we manually locate the initial contours of the informative components (Fig. 3(b)), and use the guided filter [35] for refinement. The optimal threshold, computed by the Otsu method [36], is applied to each filtered image to obtain the refined binary contour mask  of the facial components (Fig. 3(c)). As such, the salient edge is defined by

(2)

where is the clear image and is the gradient operator. We use the horizontal and vertical derivatives to compute image gradients.

We evaluate these edges by considering them as the predicted salient edges in the deblurring framework and estimate the blur kernels according to [2] by

(3)

where is the gradient of the salient edges restored from an exemplar image as shown in Fig. 2(b)-(g), is the gradient computed from the blurred input (Fig. 2(h)), is the blur kernel, and is a weight (e.g., 0.005 in this work) for the regularization term. The sparse deconvolution method [2] with a hyper-Laplacian prior is employed to restore latent images (Fig. 2(i)-(n)). The deblurred results using the above-mentioned components (e.g., Fig. 2(l) and (m)), are comparable to that using the ground-truth edges (Fig. 2(n)), which provide the ideal case for salient edge prediction of the input blurry image.

To support the above-mentioned observations, we collect a set of 160 images generated from 20 images (10 images from the CMU PIE dataset [37] and 10 images from the Internet) convolving with 8 blur kernels, and restore the corresponding edges from different combinations of components (i.e., Fig. 2(b)-(g)). We conduct the same experiment as Fig. 2, and compute the average accuracy of the estimated kernels in terms of kernel similarity [34]. The red dashed curve in Fig. 4 shows the relationship between the edges of facial components and accuracy of estimated kernels. As shown in the figure, the metric converges when all the mentioned components (e.g., Fig. 2(e)) are included, and the set of edges is sufficient (kernel similarity value of in Fig. 4) for accurate kernel estimation.

For real-world applications, the ground-truth edges are not available. Recent methods adopt thresholding and similar techniques to select salient edges for kernel estimation and this inevitably introduces some incorrect edges from a blurred image. Furthermore, the edge selection strategies, either explicitly or implicitly, consider only local edges rather than structural information of a particular object class, e.g., facial components and contour. In contrast, we consider important geometric structures of a face image for kernel estimation. From the experiments with different facial components, we determine that the set of lower face contour, mouth and eyes is sufficient to achieve accurate kernel estimation and deblurred results. More importantly, these components can also be robustly restored [38] unlike the other parts (e.g., eyebrows or nose in Fig. 2(a)). Thus, we use these three components as the informative structures for face image deblurring.

3.2 Structure Prediction

Based on above discussions, we propose two structure prediction methods for blur kernel estimation.

3.2.1 Structure Prediction by Exemplars

We use a set of face images from the CMU PIE dataset [37] as our exemplars for deblurring. The selected face images are from different identities with varying facial expressions and poses. For each exemplar, we restore the informative structures (i.e., lower face contour, eyes and mouth) as discussed in Section 3.1. As such, a set of exemplar structures are generated as the potential facial structure for kernel estimation.

Given a blurred image , we search for its best matched exemplar structure. We use the maximum response of normalized cross-correlation to find the best candidate based on image gradients,

(4)

where is the index of the exemplar, is the -th exemplar, and is the possible shift between image gradients and . The value of is large if is similar to . To deal with face images of different scales, we resize each exemplar with sampled scaling factors in the range [1/2, 2] at a sampling step size of 0.5. before using (4). Similarly, we rotate each exemplar with the rotation angle in [-10, 10] degree before using (4) to deal with rotated face images, where the sampling step size is 1.

The predicted salient edges for kernel estimation is defined by

(5)

where , and is computed by

(6)

Here is the contour mask for -th exemplar. In the experiments, we find that the method using the edges of exemplars as the predicted salient edges performs similarly as that of the input image , (see Section 4). The reason is that and share similar structures due to the matching step, and thus the results using either of them as the guidance are similar.

We conduct experiments with the quantitative evaluations to demonstrate the effectiveness and robustness of our matching criterion. We collect 100 clear images from 50 identities, with 2 images for each. The images from the same person are different in terms of facial expression and background. In the test phase, we blur one image with random noise as the test image, and use the others as exemplars. If the matched exemplar is the image from the same person, we consider that as a success. We evaluate each images with 8 blur kernels and 11 noise levels (0-10%) and show the matching accuracy in Fig. 5(b). We note that although noise decreases the average matching values (see Fig. 5(a)), it does not affect the matching accuracy (Fig. 5(b)).

(a) (b)
Fig. 5: Effect of noise on the proposed matching criterion.

3.2.2 Structure Prediction by a Deep CNN

Although the above-mentioned structure prediction method can effectively predict , searching for the best matched exemplar in the large dataset is computationally expensive. In this section, we propose an approach to predict structure information from a blurry face image based on a CNN, which has similar effect for edge prediction but achieves 3,000 times acceleration against the exemplar-based method (see Section 4).

Layer 1 2 3 4 5 6
Filter size
Channel 64 64 64 64 64 1
TABLE I: CNN architecture for structure prediction.

Proposed Network.

Given a blurred face image, our goal is to predict the salient structure by a CNN, where each layer contains convolution operations followed by non-linear activations. The network architecture and parameters are shown in Table I. We assume that the input blurred face image is of size for 1 gray channel, where is the spatial resolution. In the structure prediction network, the first convolution layer takes a large filter () to capture large spatial information. The subsequent layers take the output from the previous layer by applying an filter. The response of each convolution layer is given by

(7)

where and are the feature maps of layer and , respectively. In addition, is the convolution kernel, indices denote the mapping from the -th feature map of one layer to the -th feature map of the next layer. The function

denotes the Rectified Linear Unit (ReLU

[39] and is the bias.

The proposed network is motivated by the state-of-the-art edge prediction approaches [4, 5] which rely on heuristic filtering methods to select sharp edges. These edge prediction methods usually contain two main steps: 1) suppression of minor details by a smoothing filter and 2) enhancement of strong structures by the shock filter. In this work, we propose a deep CNN to restore sharp edges from blurred images, where the first few layers are designed to remove details (see Fig. 6(b)) and the following layers are used to restore sharp edges (see Fig. 6(d)).

We note that Xu et al. [32] propose a CNN to approximate various image filters. However, this network architecture cannot learn sharp structure information when the input image is blurred as the mapping function between the blurred images and sharp structures are more complex. Fig. 6 shows the structure prediction results by the CNN [32] and our network. As the method by Xu et al. [32] is designed to restore edges from clear images, it does perform well on blurry inputs as shown in Fig. 6(c). In contrast, the proposed network restores sharp edges (Fig. 6(d)) from the blurred input images, especially at face contour, nose and eyes regions.

(a) Inputs (b) Feature maps (c) Using [32] (d) Our results
Fig. 6: Extracted salient edges by [32] and our method. (a) Input blurred images. (b) Some intermediate feature maps generated by proposed network. (c) Restored edges by [32]. (d) Restored edges by the proposed CNN.

Training.

Learning the mapping function between a blurred face image and the corresponding structure is achieved by minimizing the loss between the gradient of the reconstructed structure and the corresponding gradient of the ground-truth structure ,

(8)

where is the number of blurred face images in training set, is the sparse regularization to enforce sparsity on gradients and is the parameter for the regularization. For the ground-truth structure , we use the smoothing filter [40] to remove extraneous details in the clear face image . Then the smoothed result can be considered as the desired sharp edges .

Fig. 7: Examples of synthetic blur kernels.

To generate blurred face images, we synthesize blur kernels that appear realistic to real scenarios by sampling random 3D trajectories used in [41], the obtained trajectories are projected and rasterized to random square kernel sizes in the range from up to pixels. Some examples of the generated blur kernels are shown in Fig. 7. We synthetically generate blurred face images by convolving each clean image with 50 generated blur kernels. With the blurred face images and corresponding clear images, the network parameters are learned by minimizing the energy function (8

) using the stochastic gradient descent (SGD) scheme. In the test stage, we apply the trained network to a blurred face image to generate the salient edges contained in

.

3.3 Kernel Estimation from Exemplar Structure

After obtaining salient edges by the exemplar-based or CNN-based method, we estimate the blur kernel by alternately solving

(9)

and

(10)

where , and are parameters for the regularization terms. Here the -norm is employed to restore and remove ringing artifacts in as shown by [40], and the last term in (9) enforces the gradient of is similar to the predicted . In (10), the -norm based regularization is employed to stabilize the blur kernel estimation with a fast solver.

We use the half-quadratic splitting minimization method [40] to solve (9). By introducing the auxiliary variable corresponding to , we rewrite (9) as

(11)

where is a scalar weight and increased by a factor of 2 over iterations. When is close to infinity, the solution of (11) approaches that of (9).

We note that (11) can be efficiently solved by alternately minimizing and w. At each iteration, the solution of can be obtained by

(12)

which has a closed-form solution computed in the frequency domain by

(13)

Here and

denote the Discrete Fourier Transform (DFT) and inverse DFT, respectively;

is the complex conjugate operator; and where and denote the vertical and horizontal derivative operators.

Given , the solution of w in (11) can be obtained by

(14)

The main steps for solving (9) are shown in Algorithm 1.

  Input: Blurred image and estimated kernel .
  , .
  repeat
     solve w using (14).
     solve using (13).
     .
  until 
  Output: Latent image .
Algorithm 1 Solving (9)

Based on the above analysis, the main steps for the proposed kernel estimation algorithm are summarized in Algorithm 2. We use the conjugate gradient method to solve the least squares problem (10).

In Algorithm 2, we update the initial predicted to remove extraneous weak edges generated by inaccurate estimation of the CNN-based or exemplar-based method (see Section 4.6 for more analysis).

  Input: Blurred image and predicted salient edges by the exemplar or CNN-based method.
  for  do
     solve using (10).
     solve using Algorithm 1.
     . // Update the salient edges
  end for
  Output: Blur kernel .
Algorithm 2 Blur kernel estimation algorithm

3.4 Recovering Latent Images

Once the blur kernel is determined, the latent image can be estimated by a number of non-blind deconvolution methods. In this work, we use the method with a hyper-Laplacian prior  [42] to recover the latent image.

4 Experimental Results

We evaluate the proposed algorithm against the state-of-the-art image deblurring methods on face images. In addition, we show that the proposed algorithm can be applied to other deblurring tasks by using exemplars of specific classes with categorical structures. Implemented in MATLAB, it takes about seconds for the exemplar-based method to process a blurred image of pixels on an Intel Xeon CPU with 12 GB RAM. The code and dataset are available on the authors’ websites and more results can be found in the supplementary document, available online. As the method [25] requires a reference image with same contents as the blurred image, this is not included in performance evaluation. However, for completeness we provide some comparisons in the supplementary material.

Parameter setting.

In all the experiments, the parameters , , and are set to be , , and , respectively. The sensitivity analysis on these parameters is presented in Section 4.6.

Dataset.

For the exemplar-based method, we use a set of face images from the CMU PIE dataset [37] (which contains face images in different poses and expressions) as our dataset. To train the proposed network, we use exemplar images and blur kernels as the training dataset. That is, a set of blurred images is used in the training process. The identities of exemplar and test sets are not overlapped in all the experiments.

4.1 Synthetic Dataset using Frontal Faces

For quantitative evaluations, we collect a dataset of 60 clear face images and 8 ground-truth kernels in a way similar to [10] to generate a test set of 480 blurred inputs. We evaluate the proposed algorithms against state-of-the-art methods based on edge selection [4, 5] and sparsity priors [3, 7, 2, 9]. We use the non-blind deconvolution method [43] and adopt the error metric proposed by Levin et al. [10] for fair comparison. Fig. 8 shows the cumulative error ratio where higher curves indicate more accurate results. The proposed algorithms generate better results than state-of-the-art methods for face image deblurring. The results show the advantages of using facial structures as the guidance over those using local edge selection methods [4, 5, 9].

We evaluate different schemes to predict edges : 1) using the edges of exemplars as (original); 2) using the edges predicted by the CNN; 3) using the edges of the input image as (i.e., using instead of to compute in (6)); 4) not using at all. Fig. 8(a) shows the first three approaches perform similarly as the matched , predicted edges by the CNN, and share similar structures, which also demonstrates the effectiveness of the proposed exemplar-based and CNN-based methods. On the other hand, the schemes using the predicted edges perform significantly better than the one without using predicted edges.

(a) Results on noise-free images (b) Results on noisy images
Fig. 8: Quantitative comparisons with several state-of-the-art single-image blind deblurring methods: Shan et al. [3], Cho and Lee [4], Xu and Jia [5], Krishnan et al. [7], Levin et al. [2], Zhong et al. [44], Xu et al. [9], Sun et al. [17], and Michaeli and Irani [18].
(a) Input and kernel (b) from exemplar (c) CNN-based (d) Sun et al. [17] (e) Xu and Jia [5]
(f) Xu et al. [9] (g) Michaeli and Irani [18] (h) Ours without (i) Our exemplar-based (j) Our CNN-based
Fig. 9: An example from the synthesized frontal face test dataset.
(a) Input and kernel (b) Predicted (c) Shan et al. [3] (d) Cho and Lee [4]
(f) Krishnan et al. [7] (h) Xu et al. [9] (i) Ours without (j) Our CNN-based
Fig. 10: An example from the synthesized profile face test dataset.

We note the proposed method without predicted does not use coarse-to-fine strategies and generates similar results to [9], which indicates that the coarse-to-fine strategy does not help kernel estimation on blurry face images with less texture. In addition, we note that the results generated by the exemplar-based method are slightly better than those by the CNN-based one as shown in Fig. 8(a). One of the main reasons is that the exemplar-based method directly uses the structures of the clear exemplars, while the CNN-based algorithm uses the structures predicted from blurred inputs via regression. Thus, edges from exemplars are much sharper than those of the CNN-based method, which accordingly lead to better kernel estimates. However, the run time of the prediction step by the CNN-based algorithm is significantly less than that of the exemplar-based method as shown in Table II. The average run time of the prediction step by the exemplar-based method is 4260 seconds. In contrast, the average run time of the prediction step by the CNN-based method is only 0.95 seconds.

Time Prediction step Deblurring step Total
Exemplar-based
Deep CNN-based
TABLE II: Average run time (/s) on the 480 test images.

Fig. 8(b) shows the quantitative comparisons when 1% random noise is added to the test images for examples. For the CNN-based algorithm, we train the proposed network using noise-free images (denoted as from CNN w/o noise in Fig. 8(b)) and images with random noise (denoted as from CNN w/ noise in Fig. 8(b)) to evaluate the deblurring performance under noise. Compared to other state-of-the-art methods, the proposed algorithms perform well on blurry images with noise. We note that the results on noisy images show higher curves than those with noise-free images. The reason is that a noisy input increases the denominator value of the measure [10]. Thus the error ratios from noisy images are usually smaller than those from noise-free inputs, under the same blur kernel.

We show one example from the test set in Fig. 9. The method based on the patch recurrence prior [18] generates deblurred images with significant blur residual as the statistical models are designed for generic objects without exploiting categorical structures. The edge based methods [5, 17] do not perform well for face deblurring as the assumption that there exist a sufficient number of sharp edges in the latent images does not hold. Compared to the method based on an -regularization [9], the results by the proposed algorithms contain significantly fewer artifacts.

In Fig. 9(b), although the best matched exemplars are from different identities with different facial expressions, the main structures of (a) and (b) are similar, e.g., the lower face contours and upper eye contours. In addition, the learned sharp edges capture the main structures of the blurred inputs as shown in Fig. 9(c). The deblurred results also indicate that our search approach (4) is able to find the image with similar structure, and the learning scheme (8) is able to restore the sharp latent edge from an input. The results shown in Fig. 9(i) and (j) demonstrate that the predicted salient edges significantly improve the accuracy of kernel estimation, while the results without predicted salient edges are similar to delta functions. Although our method is also developed within the MAP framework, the predicted salient edges based on the matched exemplar or CNN provide good initialization for kernel estimation such that the issue with delta kernel solution (e.g., Fig. 9(h)) is addressed effectively.

4.2 Synthetic Dataset using Profile Faces

We collect a dataset of 50 clear profile face images from the PICS dataset (http://pics.psych.stir.ac.uk/) and 8 ground-truth kernels from [10] to generate a test set of 400 blurred face images. As the proposed algorithms perform similarly as discussed in Section 4.1, we only compare the CNN-based method with the state-of-the-arts [2, 4, 5, 7, 9, 17]. One example from this profile face dataset and the deblurred results are shown in Fig. 10.

Fig. 10(b) show the predicted structures by the proposed CNN method for the blurry profile face images. Note that most blurred edges are not included in the predicted salient structures. Similar to the results presented in Section 4.1, the estimated kernels and restored images by Cho and Lee [4] contain a significant amount of noise as shown in Fig. 10(d). The deblurred results by the method based on the sparsity priors [3, 7] contain ringing artifacts as shown in Fig. 10(c) and (f).

Quantitatively, Fig. 11 shows that the proposed algorithm based on the CNN performs well against the state-of-the-art methods on this dataset of profile face images based on the cumulative error ratio [10].

Fig. 11: Quantitative comparisons on profile faces with several state-of-the-art single-image blind deblurring methods: Shan et al. [3], Cho and Lee [4], Krishnan et al. [7], Levin et al. [2], Xu et al. [9], and Sun et al. [17].

4.3 Real Images

We evaluate the proposed algorithms with comparisons to the state-of-the-art deblurring methods using real blurred images. The input image in Fig. 12(a) contains some noise and saturated pixels. The deblurred results by the state-of-the-art methods [4, 5, 17, 18, 44] contain noticeable noise and ringing artifacts. In contrast, the proposed exemplar-based method is able to deblur this image with fewer visual artifacts and finer details (Fig. 12(i)) despite the best matched exemplar (Fig. 12(b)) is significantly different from the input. Furthermore, the deblurred result by the proposed CNN-based method also contains fewer ringing artifacts as shown in Fig. 12(j).

Fig. 13(a) shows another example of a real captured image. The deblurring methods based on edge selection [4, 5, 17] do not perform well as ambiguous edges are selected for kernel estimation. Similarly, the deblurred images by the methods based on natural priors [9, 18] contain artifacts, while the exemplar-based and CNN-based methods generate sharper contents as shown in Fig. 13(i) and (j).

(a) Input (b) Exemplar-based (c) Sun et al. [17] (d) Cho and Lee [4] (e) Xu and Jia [5]
(f) Zhong et al. [44] (g) Xu et al. [9] (h) Michaeli and Irani [18] (i) Our exemplar-based (j) Our CNN-based
Fig. 12: Real captured example with some noise and saturated pixels. The estimated kernel is of pixels.
(a) Input (b) Exemplar-based (c) Sun et al. [17] (d) Cho and Lee [4] (e) Xu and Jia [5]
(f) Zhong et al. [44] (g) Xu et al. [9] (h) Michaeli and Irani [18] (i) Our exemplar-based (j) Our CNN-based
Fig. 13: Example of real captured image. The estimated kernel is of pixels.
(a) Input (b) Exemplar-based (c) Sun et al. [17] (d) Cho and Lee [4] (e) Krishnan et al. [7]
(f) Zhong et al. [44] (g) Xu et al. [9] (h) Michaeli and Irani [18] (i) Our exemplar-based (j) Our CNN-based
Fig. 14: Object deblurring. Our method generates the deblurred result with fewer ringing artifacts.

4.4 Object Deblurring

In this work, we focus on face image deblurring, as it is of great interest with numerous applications. However, the proposed methods can be applied to other deblurring tasks by using exemplars of specific classes with categorical structures. We use one example in Fig. 14 to show the proposed methods can be extended to object deblurring.

Similar to face deblurring, we first collect a set of exemplar images and restore categorical structures (e.g., car body, windows and wheels for car images) using the method described in Section 3.2.1. For each test image, we use (4) to find the best exemplar image as shown in Fig. 14(b) and compute salient edges according to (5). Finally, we use the same algorithm (Algorithm 2) for object deblurring. For the CNN-based method, we first use the exemplars to generate blurred images and sharp edges using the method in Section 3.2.2, and then train a network based on the synthetic data.

The results generated by [4, 7, 17] contain significant ringing artifacts as shown in Fig. 14(c)-(e) and (g). In addition, the deblurred results by the state-of-the-art methods [44, 9, 18] contain blurry regions as shown in Fig. 14(f) and (h). In contrast, the results generated by our exemplar-based (Fig. 14(i)) and CNN-based (Fig. 14(j)) methods are sharper with significantly fewer artifacts.

4.5 Natural Image Deblurring

In contrast to the exemplar-based method, the proposed CNN-based algorithm is not limited to the structures of specific scenarios (e.g., poses). Thus, it can be applied to deblur other images of object classes, e.g., natural scenes. Fig. 15 shows that the proposed CNN-based method is able to deblur natural images effectively. Overall, the proposed method performs comparably against the state-of-the-art natural deblurring algorithms [17, 18].

(a) Input (b) Predicted (c) Xu and Jia [5] (d) Sun et al. [17] (e) Michaeli and Irani [18] (f) Ours
Fig. 15: Natural image deblurring. Our CNN-based method can be applied to natural image deblurring and generates the image with few ringing artifacts and much clearer characters.
(a) (b) (c) (d) (e)
(f) (g) (h) (i) (j)
(k) (l) (m) (n) (o)
Fig. 16: Results without and with predicted salient edges . (a)-(d) The 1st, 2nd, 5th, and 9th iteration intermediate results, respectively, using the edge selection method [4] to predict salient edges in Algorithm 2. (e) Deblurred result with the edge selection method [4] to predict salient edges in Algorithm 2. (f)-(i) The 1st, 2nd, 5th, and 9th iteration intermediate results, respectively, using the proposed exemplar-based method to predict salient edges in Algorithm 2. (j) The deblurred result based on exemplars. (k)-(n) The 1st, 2nd, 5th, and 9th iteration intermediate results, respectively, using the proposed CNN-based method to predict salient edges in Algorithm 2. (o) The deblurred result based on the proposed CNN. The blurred image in this figure is the same as that of Fig. 1.

4.6 Analysis and Discussion

In this section, we analyze the effect of the proposed edge prediction algorithms. We show that proposed algorithms are not sensitive to variation of dataset size, image noise, and parameters. In addition, we discuss the limitations of the proposed algorithms.

Effect of predicted salient edges .

The initial predicted salient edges play a critical role in kernel estimation. We use an example to demonstrate the effectiveness of the proposed algorithm for predicting initial salient edges . Fig. 16(a)-(e) show that the deblurred results using the edge selection method [4] contain artifacts as ambiguous edges are selected. However, the proposed methods using the predicted facial structure by exemplars (Fig. 16(f)-(j)) and the CNN (Fig. 16(k)-(o)) do not include ambiguous edges and thus estimate kernels better. Fig. 16(f)-(i) and (k)-(n) also demonstrate that the predicted salient edges by the proposed algorithms lead to fast convergence than the edge selection method [4].

We note that the proposed algorithm does not require coarse-to-fine kernel estimation strategies or heuristic edge selections. The coarse-to-fine strategy can be viewed as the initialization for the finer levels, which constrains the solution space and reduces the computational load. Recent results of several state-of-the-art methods [4, 7, 9] show that effective salient edges at the initial stage are important for kernel estimation. If salient edges can be obtained effectively, it is not necessary to use coarse-to-fine strategies or specific edge selection, thereby simplifying the kernel estimation process significantly. Our exemplar-based and CNN-based methods operate on the input image of the original scale only and exploit the sharp structure information to constrain the solution space. By exploiting salient edges from the facial structures, the proposed methods perform well without using coarse-to-fine strategies and achieve fast convergence. In the method by Cho and Lee [4], blur kernels are estimated in a coarse-to-fine manner based on an heuristic edge selection strategy. However, it is difficult to select salient edges from heavily blurred images without exploiting any structural information (Fig. 16(a)). Compared to the intermediate results using the prior (Fig. 1(f)), our methods based on exemplars and CNN restore the important facial components effectively (Fig. 16(i) and (n)), thereby facilitating kernel estimation and image restoration.

Robustness of exemplar structures.

In the exemplar-based method, we use (4) to find the best matched exemplar in the gradient space. The matched exemplar should share similar, although not perfect, structural information with the input image (e.g., Fig. 1(g)). Furthermore, the shared structures should not contain numerous false salient edges caused by blur. We also note that most mismatched contours caused by facial expressions correspond to the small gradients in the blurred images. In such cases, these extraneous weak edges are not expected to help estimate kernels according to the edge based methods [4, 5]. To alleviate this problem, we update exemplar edges iteratively (see Algorithm 2) to increase its reliability as shown in Fig. 16(f)-(i). Consequently, the matched exemplars help estimate blur kernels and restore latent face images.

(a) Exemplar-based deblurring method
(b) CNN-based deblurring method
Fig. 17: Sensitivity analysis of dataset size.

Robustness to dataset size.

Although a larger dataset is likely to contain more diverse exemplars that facilitates finding the matching process by the proposed method, the linear search time can be computationally expensive. Empirically we show that blurry face images can be deblurred well when coarse matches are available in a small exemplar set. We apply the -means clustering method to a set of face images, and choose 40, 80, 100, and 200 centers as the exemplar datasets, respectively. Similar to [10], we generate 40 blurred images consisting of 5 images (of different identities as the exemplars) with 8 blur kernels for experiments. The cumulative error ratio [10] is used to evaluate the method. Fig. 17(a) shows that the proposed exemplar-based method performs well with a small set of exemplars (e.g., 40). With the increasing exemplar dataset size, the estimated results do not change significantly, which demonstrates the robustness of the proposed method to exemplar dataset size.

To assess the sensitivity of the CNN-based structure prediction method, we evaluate the proposed method with different numbers of exemplars. We use 200, 500, 1000, and 1500 training images for the datasets, respectively. For each clear image in these datasets, we synthesize the blurred images using the generated kernels in Section 3.2.2. We use the same 40 test images in the exemplar-based method to evaluate the sensitivity of dataset size for the CNN-based method. Fig. 17(b) shows that the proposed CNN-based method performs well with a small set of exemplars (e.g., 200). As the number of training images is increased, the performance of the proposed method does not change significantly, especially when 1500 or all images are used. The results show the proposed CNN-based method performs robustly against different dataset size.

Robustness to noise.

If the blurred image contains large noise, edge selection [4, 5] and other state-of-the-art methods (e.g., [2, 7, 9]) may not perform well for kernel estimation. However, the proposed methods perform well in such cases due to the robust matching criterion (see analysis in Section 3.2.1). We show some examples in Section 4.

Parameter analysis.

The proposed deblurring model involves three main parameters , , and . We evaluate the effects of these parameters on image deblurring using the dataset with 32 blurred images. For each parameter, we carry out experiments with different settings by varying one and fixing the others using the kernel similarity metric to measure accuracy of estimated kernels. Fig. 18 shows the proposed deblurring algorithm is insensitive to parameter settings.

(a) (b) (c)
Fig. 18: Sensitivity analysis with respect to parameters , , and .
Filter size Kernel size
32.84 29.99 33.33 32.58 22.03 26.60 22.84
32.58 29.21 33.60 32.60 22.19 26.00 22.93
33.11 29.56 32.99 32.49 22.01 26.48 24.09
33.51 29.14 33.01 32.16 21.78 25.92 23.58
33.11 29.72 33.48 33.01 22.16 25.75 23.89
32.83 29.06 32.98 32.14 21.51 25.83 23.58
TABLE III: Effect of the filter size in the first layer on image deblurring.
(a) (b) (c) (d) (e)
Fig. 19: The proposed CNN-based method is not able to handle the image with large blur. (a) Blurred image. (b) by CNN. (c) Exemplar-based . (d) Results by the CNN-based method. (e) Results by the exemplar-based method.

In the proposed network, we note that the filter size of the first layer plays an important role for predicting sharp edges from blurred images. We evaluate the effect of this parameter on the proposed test dataset and use the PSNR as the metric in Table III . Table III demonstrates that the proposed model is insensitive to filter size change within a certain range.

Note that we use a large filter size in the first layer of the proposed network. It is interesting to analyze when the filter sizes of all the convolution layer are the same, e.g., the widely used setting, pixels. For fair comparisons, we use the the same receptive field in the network and train it on the same training dataset. Table IV demonstrates that the network with this setting generates similar results to the proposed method using the frontal face dataset.

Filter size settings Proposed setting Filter size with
Avg. PSNR
TABLE IV: Effect of the different filter size settings in the convolution layer of the proposed network. The results are generated by networks whose the receptive fields are the same.

Limitations.

As mentioned in Section 4.1, the exemplar-based edge prediction method is time-consuming (see Table II) and does not deblur face images well when the main components cannot be extracted, e.g., profile faces. Furthermore, it is not able to deblur generic images where the salient structures cannot be extracted. Although the CNN-based method is more efficient and able to handle profile faces, it is not able to handle the blurred images with large blur. Fig. 19 shows an example where the main structures are severely blurred. It is difficult for the CNN-based method to predict salient edges from such blurred images (see Fig. 19(b)), while the exemplar-based method performs better in such cases with the help of exemplars (with the same pose).

5 Conclusions

We propose an exemplar-based deblurring algorithm for face images that exploits the structural information. The proposed method uses facial structures and reliable edges from exemplars for kernel estimation without resorting to complex edge predictions. Our method generates good initialization without using coarse-to-fine optimization strategies to enforce convergence, and performs well when the blurred images do not contain rich texture. In addition, we further propose a CNN-based deblurring method which can effectively predict the sharp structure from a blurred input in real time. Extensive evaluations with state-of-the-art deblurring methods show that the proposed algorithms are effective for deblurring face images. We also show that that proposed methods can be applied to other object deblurring.

References

  • [1] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman, “Removing camera shake from a single photograph,” ACM Trans. Graph., vol. 25, no. 3, pp. 787–794, 2006.
  • [2] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Efficient marginal likelihood optimization in blind deconvolution,” in CVPR, 2011, pp. 2657–2664.
  • [3] Q. Shan, J. Jia, and A. Agarwala, “High-quality motion deblurring from a single image,” ACM Trans. Graph., vol. 27, no. 3, p. 73, 2008.
  • [4] S. Cho and S. Lee, “Fast motion deblurring,” ACM Trans. Graph., vol. 28, no. 5, p. 145, 2009.
  • [5] L. Xu and J. Jia, “Two-phase kernel estimation for robust motion deblurring,” in ECCV, 2010, pp. 157–170.
  • [6] N. Joshi, R. Szeliski, and D. J. Kriegman, “PSF estimation using sharp edge prediction,” in CVPR, 2008, pp. 1–8.
  • [7] D. Krishnan, T. Tay, and R. Fergus, “Blind deconvolution using a normalized sparsity measure,” in CVPR, 2011, pp. 2657–2664.
  • [8] A. Goldstein and R. Fattal, “Blur-kernel estimation from spectral irregularities,” in ECCV, 2012, pp. 622–635.
  • [9] L. Xu, S. Zheng, and J. Jia, “Unnatural L sparse representation for natural image deblurring,” in CVPR, 2013, pp. 1107–1114.
  • [10] A. Levin, Y. Weiss, F. Durand, and W. T. Freeman, “Understanding and evaluating blind deconvolution algorithms,” in CVPR, 2009, pp. 1964–1971.
  • [11] J.-F. Cai, H. Ji, C. Liu, and Z. Shen, “Framelet based blind motion deblurring from a single image,” IEEE Trans. Image Process., vol. 21, no. 2, pp. 562–572, 2012.
  • [12] H. Cho, J. Wang, and S. Lee, “Text image deblurring using text-specific properties,” in ECCV, 2012, pp. 524–537.
  • [13] J. Pan, Z. Hu, Z. Su, and M.-H. Yang, “Deblurring text images via L-regularized intensity and gradient prior,” in CVPR, 2014.
  • [14] X. Cao, W. Ren, W. Zuo, X. Guo, and H. Foroosh, “Scene text deblurring using text-specific multiscale dictionaries,” IEEE Transactions on Image Processing, vol. 24, no. 4, pp. 1302–1314, 2015.
  • [15] Z. Hu, S. Cho, J. Wang, and M.-H. Yang, “Deblurring low-light images with light streaks,” in CVPR, 2014.
  • [16] Y. Yitzhaky, I. Mor, A. Lantzman, and N. S. Kopeika, “Direct method for restoration of motion-blurred images,” J. Opt. Soc. Am. A, vol. 15, no. 6, pp. 1512–1519, 1998.
  • [17] L. Sun, S. Cho, J. Wang, and J. Hays, “Edge-based blur kernel estimation using patch priors,” in ICCP, 2013.
  • [18] T. Michaeli and M. Irani, “Blind deblurring using internal patch recurrence,” in ECCV, 2014, pp. 783–798.
  • [19] S. Anwar, C. P. Huynh, and F. Porikli, “Class-specific image deblurring,” in ICCV, 2015, pp. 495–503.
  • [20] J. Pan, D. Sun, H. Pfister, and M.-H. Yang, “Blind image deblurring using dark channel prior,” in CVPR, 2016, pp. 1628–1636.
  • [21] K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” in CVPR, 2009, pp. 1956–1963.
  • [22] T. S. Cho, S. Paris, B. K. P. Horn, and W. T. Freeman, “Blur kernel estimation using the radon transform,” in CVPR, 2011, pp. 241–248.
  • [23] M. Nishiyama, A. Hadid, H. Takeshima, J. Shotton, T. Kozakaya, and O. Yamaguchi, “Facial deblur inference using subspace analysis for recognition of blurred faces,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 4, pp. 838–845, 2011.
  • [24] H. Zhang, J. Yang, Y. Zhang, and T. S. Huang, “Close the loop: joint blind image restoration and recognition with sparse representation prior,” in ICCV, 2011, pp. 770–777.
  • [25] Y. HaCohen, E. Shechtman, and D. Lischinski, “Deblurring by example using dense correspondence,” in ICCV, 2013.
  • [26] V. Jain and S. Seung, “Natural image denoising with convolutional networks,” in NIPS, 2009, pp. 769–776.
  • [27] C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in ECCV, 2014, pp. 184–199.
  • [28] Z. Wang, D. Liu, J. Yang, W. Han, and T. Huang, “Deep networks for image super-resolution with sparse prior,” in ICCV, 2015, pp. 370–378.
  • [29] J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in CVPR, 2015, pp. 769–777.
  • [30] L. Xu, J. S. Ren, C. Liu, and J. Jia, “Deep convolutional neural network for image deconvolution,” in NIPS, 2014, pp. 1790–1798.
  • [31] C. J. Schuler, M. Hirsch, S. Harmeling, and B. Schölkopf, “Learning to deblur,” IEEE Transactions on Pattern Analysis Machine Intelligence, 2015.
  • [32] L. Xu, J. Ren, Q. Yan, R. Liao, and J. Jia, “Deep edge-aware filters,” in ICML, 2015, pp. 1669–1678.
  • [33] J. S. Ren, L. Xu, Q. Yan, and W. Sun, “Shepard convolutional neural networks,” in NIPS, 2015, pp. 901–909.
  • [34] Z. Hu and M.-H. Yang, “Good regions to deblur,” in ECCV, 2012, pp. 59–72.
  • [35] K. He, J. Sun, and X. Tang, “Guided image filtering,” in ECCV, 2010, pp. 1–14.
  • [36] N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. Syst., Man, and Cybern., vol. 9, no. 9, pp. 62–66, 1979.
  • [37] R. Gross, I. Matthews, J. F. Cohn, T. Kanade, and S. Baker, “Multi-pie,” in FG, 2008, pp. 1–8.
  • [38]

    X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in

    CVPR, 2012, pp. 2879–2886.
  • [39]

    V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in

    ICML, 2010, pp. 807–814.
  • [40] L. Xu, C. Lu, Y. Xu, and J. Jia, “Image smoothing via L gradient minimization,” ACM Trans. Graph., vol. 30, no. 6, p. 174, 2011.
  • [41] U. Schmidt, C. Rother, S. Nowozin, J. Jancsary, and S. Roth, “Discriminative non-blind deblurring,” in CVPR, 2013, pp. 604–611.
  • [42] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph., vol. 26, no. 3, p. 70, 2007.
  • [43] D. Zoran and Y. Weiss, “From learning models of natural image patches to whole image restoration,” in ICCV, 2011, pp. 479–486.
  • [44] L. Zhong, S. Cho, D. Metaxas, S. Paris, and J. Wang, “Handling noise in single image deblurring using directional filters,” in CVPR, 2013, pp. 612–619.