Recently, deblurring technique has made a lot of progress. Early deblurring methods only focused on the blur caused by camera shake in constant depth images [8, 12, 34, 39]. Recently, however, there are some methods to handle the blurred images with depth variations [15, 23, 29, 37], and there are even methods to handle object motion blur [17, 19, 28]. Object motion deblurring problem is very challenging since it requires the estimation of independent spatially-varying blur kernels.
What makes the object motion deblurring problem more difficult is the occlusions generated by intra-frame motions. While occlusions generated by inter-frame motions cause a photo-inconsistency, occlusions generated by intra-frame motions cause a mixture of foreground pixels and background pixels at occlusion boundaries in the blurred image. These ambiguous pixels can lead to severe ringing artifacts in the deblurring results.
To address this problem, several methods explicitly modeled the occlusions using layered blur model [1, 4, 11]. Especially, Wulff and Black  proposed a layered blur model for the case where both layers are blurred, and obtained convincing results. They modeled a blur image as a composition of individually blurred foreground and background, and this generative model could express the layer interaction caused by occlusions. However, their layered blur model does not reflect the actual blur generation process. A blurred image is the integration of intermediate images that the camera sees while the shutter is open, which differs from their layered blur model. Figure 2 shows a counterexample. The foreground and background are moving at similar speeds in the image, and the blurred image is observed as an integration of the intermediate images. Since the tiger’s left eye (red color) in the background is occluded by the foreground fence in the entire time window between the opening and closing of the camera shutter, it should not be exposed in the blurry image but it is visible in their generative model. This error can cause severe ringing artifacts in deblurring results.
In this paper, we propose a new layered blur model reflecting the actual blur generation process, and occlusion-aware video deblurring method accordingly. We enhanced the model by changing the order of a composition and a blurring of layers, so that it follows the actual blur generation process. By using the carefully designed likelihood from our layered blur model, clear foreground and background can be successfully recovered from blurred images with occluding objects. Specifically, given a set of motion-blurred video frames, our method estimates clear foreground, clear background, clear alpha blending mask, and the motion for each layer. Figure 1 shows that our result is better than that of , especially at occlusion boundaries. Also, we analyze the layered blur model theoretically and experimentally. We show that the model of  is a good approximation that is identical to our model for some specific physical situations, and also present that the blur kernels at boundaries have distinct characteristics that cannot be captured by conventional blur models.
2 Related Work
Early works on deblurring focused on the blur caused by camera shake in constant depth images. They are roughly categorized into spatially-invariant and spatially-varying configurations. Spatially-invariant deblurring achieved some success in single-image deblurring [8, 12, 31, 36] and video deblurring . However, the spatially-invariant blur model cannot deal with a rotational camera motion, which is a significant and common component in practical scenarios . To overcome this limitation, some researchers parameterized the blur as a possible camera motions in a 3D space, and this approach is applied to single-image [13, 14, 34, 39] and video deblurring [7, 27]. Although these methods solve spatially-varying motion blur in some extent, they are limited to camera shake in a constant depth and cannot handle more general depth variation or object motion problem.
In the case of blurred images including depth variations, the blur cannot be represented by a simple homography. Some methods solved this problem by casting a blur kernel estimation problem as a scene depth estimation problem [15, 23, 29, 37]. These methods extended the applicability of deblurring methods. However, they are limited to static scenes, and do not take the mixture of pixels at occlusion boundaries into account.
Recently, several object motion deblurring methods have been developed. Some of the methods divided the image into segments to restore each of them independently. They divided the image using hybrid cameras [2, 32], based on similar motions [9, 17, 24], or under the guidance of the matting . There are also some methods without segmentations. Cho et al. 
used patch-based synthesis for deblurring by detecting and interpolating proper sharp patches at nearby frame. Kim and Lee approximated the blur kernel as the pixel-wise 2D linear motion and performed deblurring of dynamic scene in a single-image and a video . These object motion deblurring methods perform well, but do not consider the interaction between the object and the background at occlusion boundaries.
At occlusion boundaries, blurred pixels consist of a mixture of foreground pixels and background pixels and it plays an important role for object motion deblurring. To address this problem, some authors used layered models [1, 4, 11]. However, these methods assumed the background to be static and modeled the foreground motion only. Sellent et al. 
used outlier rejections to handle the occlusions. Takeda and Milanfar  proposed a method that can deal with occlusions using a spatiotemporal approach, but it requires priorly given blur kernels and depends on time interpolators.
To deal with the general case where both the foreground and the background are moving independently, Wulff and Black  proposed a layered blur model that consists of a composition of individually blurred foreground and background. This model included the interaction between layers and improved the performance at occlusion boundaries. However, as shown in Figure 2, this generative model is different with the actual blur generation process and not always valid.
3 Analysis of Layered Blur Model
In this section, we briefly review the previous layered blur model , propose our new layered blur model, and compare them. The differences in these models will be proved to be greatly attributable to removing serious artifacts at occlusion boundaries in the later section.
First of all, we set a layered model for a clear image ( is the number of pixels) to be:
where and are the clear foreground and background layer, respectively, is an alpha blending mask,
is a vector with all components equal 1, anddenotes element-wise multiplication. Notice that multiplied the background image ( means a background pixel) for notational simplicity later on, inspired by .
Our goal is to express blurred images using each layer of the reference image (i.e. ). Before expressing blurred images, we express a warped image using . We assume that the appearance and shape of each layer is constant, which is a common assumption in the deblurring literature [1, 11, 35]. If we let denotes a motion parameter for layer from the reference frame to the frame , then the warped image at frame is as follows:
where is a warping matrix according to the motion parameter . The alpha blending mask is warped by the foreground motion since its appearance depends on the foreground object. Similarly to , for notational simplification, we redefine the clear foreground layer as , and abbreviate the notation to . Then, the simplified equation is:
3.1 Previous Layered Blur Model
Wulff and Black  proposed a layered blur model to represent the mixture of the foreground and the background at occlusion boundaries. Their generative model addressed that blurred images consist of the composition of individually blurred foreground and background layers. To express this model, let denotes a blur matrix for each layer at frame , which equals the average of warping matrices while the shutter is open, as:
where is a exposure time, and is the motion parameter for intra-frame capture time (the elapsed time after the shutter of the frame is opened) from the reference frame (i.e. ). Then, the blurred frame based on  is as follows:
In this model, the blurred frame is the composition of individually blurred layers ( and ). This equation is equivalent to the layered representation of motion-blurred video of , although the notation is different.
3.2 Proposed Layered Blur Model
Here we propose a new layered blur model. As shown in Figure 2, the previous layered blur model does not reflect actual blur generation process in which a blurred image is generated by integrating intermediate images the camera sees during exposure. By applying this concept to the layered blur model, a blurred frame is newly represented as follows:
where is an abbreviated notation of .
In this model, the blurred frame is the integration of intermediate images, each of which is a composition of intermediate layers. This proposed model coincides with the actual blur generation process.
3.3 Comparison of Layered Blur Models
The main difference between the conventional layered blur model (Eq. (5)) and the proposed layered blur model (Eq. (6)) is the order of a composition and a blurring of layers, as illustrated in Figure 3. While the conventional model composites two layers after blurring each layer (i.e. integrating each intermediate layer), our model composites two layers first and then integrates the composited intermediate images.
Note that although Eq. (5) does not model actual physical two layered blurring process correctly, it becomes identical to Eq. (6) in some special cases. We analyze and compare the two models, and show the conditions when the two models become equivalent, both analytically and empirically.
Due to the fact that
Note, however, that the left and right formulas in Eq. (7) become identical when or is a constant with respect to . And this leads to the following three cases that can make the two deblurring models in Eq. (8) become equivalent.
Background is static ( is a constant w.r.t. )
Foreground is static ( is a constant w.r.t. )
Background is homogeneous ( is a constant w.r.t. )
Although homogeneous alpha map ( is a constant w.r.t. ) can also make two models become identical, it is impossible at occlusion boundaries.
Figure 4 is the experimental comparison that shows the blurred images corresponding to these three situations. The two models give the same blurred images. Thus, the previous layered blur model is a good approximation of ours, but it may lead to artifacts at occlusion boundaries in general.
4 Occlusion-Aware Video Deblurring
In this section, we propose an occlusion-aware deblurring method based on the proposed layered blur model.
Given a set of blurred frames including an occluding layer, we restore clear foreground, background, alpha blending mask, and object motions.
First we discretize the proposed model for deblurring. We divide the exposure time into samples uniformly; denote the sampled times such that . Then, our data term is defined as follows:
where denotes a gradient operator, which is widely used in deblurring field to reduce ringing artifacts [8, 19, 39]. We use affine transformations for motion parameters and is obtained by linearly interpolating the motions of adjacent frames and .
Since the deblurring is a highly ill-posed problem, we add regularization terms to reduce the ambiguity. We enforce hyper-Laplacian priors [20, 21, 25] on the gradients of the foreground and the background images as follows:
Also, since the alpha map is smoother than natural images, we use Laplacian prior on the gradient of as:
Additionally, we enforce to be close to binary values by using following constraint:
The final objective function is as follows:
where , , and are parameters that adjust the weight of each term, and denotes a set of the motion parameters for each layer at each frame (i.e. ). We restrict the intensities of , , and to be in the range [0, 1].
Since the problem is non-convex and easy to get stuck at a local minimum, it is important to start with good initial values. We use optical flow  to initialize the motion parameters as done in . Although the optical flow does not handle blurred images correctly, it provides good initial values for the variables. Based on the initial optical flow, the two dominant affine transformations are estimated for each layer using RANSAC. Also, we initialize each layer as an average of aligned input frames using initial motion parameters for each layer . For the alpha blending mask, from the RANSAC result, we first specify the pixels that correspond to the background of each frame to the intermediate masks. Then, we initialize alpha blending mask as an average of the aligned intermediate masks using the motion parameters for the background.
To optimize the non-convex objective function in Eq. (13), we divide the original problem into three sub-problems and use alternating optimization techniques [8, 17, 35] that iteratively estimate each unknown while other unknowns are fixed. In this section, we reorganize each sub-problem as a traditional deconvolution formula and describe how to solve it. Notice that we estimate non-simplified (i.e. ) although we used the simplified formula (i.e. ) in the previous section for notational simplicity.
Latent Image Estimation
In this step, we restore clear foreground and background layers while alpha blending mask and motion parameters are fixed. To make this sub-problem be in a simpler formula, we concatenate some variables. Let denotes the row concatenation of and (i.e. ), and denotes the column concatenation of and (i.e. ) such that
where denotes a diagonal matrix formed from its vector argument. Since can be represented as , is equivalent to our layered blur model.
Then, for multi-frames, let denotes the row concatenation of , and denotes the row concatenation of ( is the number of observed frames). The sub-problem for latent image estimation can be expressed as follows:
This optimization problem is same as the traditional deconvolution problem with hyper-Laplacian prior . We optimize Eq. (15) using a conjugate gradient method and a lookup table in the same way as .
Alpha Blending Mask Estimation
In this step, we restore clear alpha blending mask while layer appearances and motion parameters are fixed. Similarly to the latent image estimation step, we define and as follows:
Also, let denotes the row concatenation of , and denotes the row concatenation of . Then, the sub-problem for alpha map estimation can be expressed as follows:
where denotes the iteration number, denotes the dual variable, and and are the parameters for each update step. We apply the conjugate gradient method to optimize .
Motion Parameter Estimation
4.4 Implementation Details
To accelerate the algorithm, we optimize the objective function based on coarse-to-fine approach where the scale factor of image pyramid is 0.8. Also, the parameters used in the optimization is fixed for all experiments as and , where denotes the number of frames. was adjusted to a value between and depending on the shape of the occluding object. Camera duty cycle, which is the ratio of an exposure time to a frame interval, is given from the camera setting (0.5). The overview of our algorithm is summarized in Algorithm 1.
5 Experimental Results
We compare our deblurring results with those of the state-of-the-art video deblurring method  and the previous layered deblurring method . Since the source code of the previous method  is not available, we focus on comparing our results with published results from the previous methods. Please see our supplementary material for more results.
The image sequences consist of 5 to 10 images. We processed the sequences on a desktop computer with Intel i7-6700k CPU and 64GB memory. It took 10 minutes per image to process 640x480 images with a non-optimized Matlab implementation.
Figure 1 and Figure 5 show the deblurring results for images with severe occlusions. Since the detailed structure of the bicycle is mixed with the background by the occlusion, generalized deblurring method  that does not consider the occlusion attempts to restore the background mainly. Also, since both the foreground and background are blurred, the previous layered deblurring method  do not restore the details on the face and the bicycle properly. On the other hand, the proposed method restores the detailed structure of the human face and bicycle handle by using the carefully designed model.
Figure 6 shows the comparision with recent methods for deblurring results. The ”sign” sequence and ”car” sequence correspond to the situations where both foreground and background are moving. In the result of the previous method [19, 35], we can see the artifacts caused by segmentation errors at boundaries. Our result shows better performance at occlusion boundaries. Also, the ”hand” sequence belongs to a situation where the previous model and our model are the same because the background is static. However, even in this situation, we can see that our method produces better results because it uses effective regularization and optimization.
Additionally, our method can also achieve the effects of layer separations . It can remove occluding objects even when the images are blurry. Figure 7 shows not only that the fence occluding the tiger is removed, but also that the image is clearly restored. This image sequence was created by a physics-based renderer .
In this section, we analyze the blur kernel at object boundaries, which shows the distinctive characteristics of the blur kernel that cannot be captured by conventional blur models. In addition, we discuss the limitations and future works of the proposed method.
6.1 Blur Kernel at Occlusion Boundaries
Visualizing the proposed blur model as a traditional blur kernel gives us the interesting result at layer boundaries. Figure 8 illustrates the blur kernel of each model at boundaries where the foreground is moving to the left-side and the background is moving to the right-side.
Early works that handle abruptly-varying blur find the kernel either of the foreground or of the background [4, 9, 11, 17, 23, 29] such as Figure 8(a), or find an ambiguous kernel between them [18, 19].
In the model of Wulff and Black , the pixel at boundaries is blurred by both foreground and background kernels while each kernel is truncated or diminished in intensity compared to the kernel without occlusion. The foreground kernel is shortened in length to a factor of and the background kernel is reduced in intensity to a proportion of as shown in Figure 8(b), where is the blurred mask value of the corresponding pixel.
In the proposed model, the foreground kernel experiences truncation equal to that of , but the background kernel is truncated instead of being weakened to a factor of as shown in Figure 8(c). In representing the occlusion of a background by a foreground object, the proposed model correctly models the occlusion event with the length of the blur kernel.
Thus, the blur kernel of the foreground and the background can be separate or overlapped each other according to the relative velocity of the layers. These distinctive kernel characteristics cannot be captured by conventional blur models.
6.2 Limitation and Future work
In this study, several assumptions are made for our deblurring method. We assumed that the camera duty cycle is given for every frame, and the object motion is smooth in a frame. Since we fix the camera setting, and the exposure time is short enough in videos, these assumptions can be justified to some extent. For the videos without exposure information, the estimation of duty cycle  should be combined to our method.
Also, we parameterized the object motion as an affine motion, which causes a limitation in dealing with general object motions. Although a projective motion can be applied to our model, a further consideration is required for additional problems such as occlusions in a layer or brightness constancy. Combining our method with a non-parametrical motion deblurring method [18, 19] is one of our future directions. In addition, our method assumes the scene with two layers currently. Expanding this to multi-layer requires additional consideration of the occlusions involving multi-layer and the depth order of the layers. Solving this problem is our another future direction.
In this paper, we proposed occlusion-aware video deblurring based on a new layered blur model, allowing us an accurate restoration of object boundaries. We addressed the limitation of the conventional layered blur model theoretically and experimentally, and enhanced the model by changing the order of layer composition and blur, so that it follows the actual blur generation process. Based on this model, the proposed occlusion-aware deblurring method obtains more accurate latent image, object motion, and segmentation mask. Also, we analyzed that our model exactly extracts the contribution of occlusion from the original kernel, helping the capture of the property to overlap or separate the foreground and background kernels at boundaries. Experimental results on synthetic and real blurred videos demonstrate the outstanding performance of the proposed method.
L. Bar, B. Berkels, M. Rumpf, and G. Sapiro.
A variational framework for simultaneous motion estimation and
restoration of motion-blurred video.
IEEE International Conference on Computer Vision (ICCV), 2007.
M. Ben-Ezra and S. K. Nayar.
Motion deblurring using hybrid imaging.
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003.
-  J.-F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring using multiple images. Journal of Computational Physics, 228(14):5057–5071, 2009.
-  A. Chakrabarti, T. Zickler, and W. T. Freeman. Analyzing spatially-varying blur. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
-  A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, 2011.
-  J. Chen, L. Yuan, C.-K. Tang, and L. Quan. Robust dual motion deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
-  S. Cho, H. Cho, Y.-W. Tai, and S. Lee. Registration based non-uniform motion deblurring. Computer Graphics Forum, 31(7):2183–2192, 2012.
-  S. Cho and S. Lee. Fast motion deblurring. ACM Transactions on Graphics (TOG), 28(5):145, 2009.
-  S. Cho, Y. Matsushita, and S. Lee. Removing non-uniform motion blur from images. In IEEE International Conference on Computer Vision (ICCV), 2007.
-  S. Cho, J. Wang, and S. Lee. Video deblurring for hand-held cameras using patch-based synthesis. ACM Transactions on Graphics (TOG), 31(4):64, 2012.
-  S. Dai and Y. Wu. Removing partial blur in a single image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
-  R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. T. Freeman. Removing camera shake from a single photograph. ACM Transactions on Graphics (TOG), 25(3):787–794, 2006.
-  A. Gupta, N. Joshi, C. L. Zitnick, M. Cohen, and B. Curless. Single image deblurring using motion density functions. In European Conference on Computer Vision (ECCV), 2010.
-  M. Hirsch, C. J. Schuler, S. Harmeling, and B. Schölkopf. Fast removal of non-uniform camera shake. In IEEE International Conference on Computer Vision (ICCV), 2011.
-  Z. Hu, L. Xu, and M.-H. Yang. Joint depth estimation and camera shake removal from single blurry image. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
-  W. Jakob. Mitsuba renderer, 2010. http://www.mitsuba-renderer.org.
-  T. H. Kim, B. Ahn, and K. M. Lee. Dynamic scene deblurring. In IEEE International Conference on Computer Vision (ICCV), 2013.
-  T. H. Kim and K. M. Lee. Segmentation-free dynamic scene deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
-  T. H. Kim and K. M. Lee. Generalized video deblurring for dynamic scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-  D. Krishnan and R. Fergus. Fast image deconvolution using hyper-laplacian priors. In Advances in Neural Information Processing Systems (NIPS), 2009.
-  D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
-  J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. Convergence properties of the nelder–mead simplex method in low dimensions. SIAM Journal on optimization, 9(1):112–147, 1998.
-  H. S. Lee and K. M. Lee. Dense 3d reconstruction from severely blurred images using a single moving camera. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
-  A. Levin. Blind motion deblurring using image statistics. In Advances in Neural Information Processing Systems (NIPS), 2006.
-  A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 29(9):1647, 2007.
-  A. Levin, Y. Weiss, F. Durand, and W. T. Freeman. Understanding and evaluating blind deconvolution algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
-  Y. Li, S. B. Kang, N. Joshi, S. M. Seitz, and D. P. Huttenlocher. Generating sharp panoramas from motion-blurred videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
-  J. Pan, Z. Hu, Z. Su, H.-Y. Lee, and M.-H. Yang. Soft-segmentation guided object motion deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-  C. Paramanand and A. N. Rajagopalan. Non-uniform motion deblurring for bilayer scenes. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
-  A. Sellent, C. Rother, and S. Roth. Stereo video deblurring. In European Conference on Computer Vision (ECCV), 2016.
-  Q. Shan, J. Jia, and A. Agarwala. High-quality motion deblurring from a single image. ACM Transactions on Graphics (TOG), 27(3):73, 2008.
-  Y.-W. Tai, H. Du, M. S. Brown, and S. Lin. Correction of spatially varying image and video motion blur using a hybrid camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 32(6):1012–1028, 2010.
-  H. Takeda and P. Milanfar. Removing motion blur with space–time processing. IEEE Transactions on Image Processing (TIP), 20(10):2990–3000, 2011.
-  O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Non-uniform deblurring for shaken images. International Journal of Computer Vision (IJCV), 98(2):168–186, 2012.
-  J. Wulff and M. J. Black. Modeling blurred video with layers. In European Conference on Computer Vision (ECCV), 2014.
-  L. Xu and J. Jia. Two-phase kernel estimation for robust motion deblurring. In European Conference on Computer Vision (ECCV), 2010.
-  L. Xu and J. Jia. Depth-aware motion deblurring. In IEEE International Conference on Computational Photography (ICCP), 2012.
-  L. Xu, J. Jia, and Y. Matsushita. Motion detail preserving optical flow estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(9):1744–1757, 2012.
-  L. Xu, S. Zheng, and J. Jia. Unnatural l0 sparse representation for natural image deblurring. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
-  T. Xue, M. Rubinstein, C. Liu, and W. T. Freeman. A computational approach for obstruction-free photography. ACM Transactions on Graphics (TOG), 34(4):79, 2015.