1 Introduction
Motion blurs are the most common artifacts in videos recorded using handheld cameras. For decades, several researchers have studied deblurring algorithms to remove motion blurs. Their methodologies depend on whether the captured scenes are static or nonstatic. Early works on single image deblurring usually assumed that the scene is static with constant depth [5, 9, 10, 11, 25, 27]. The successful approaches were naturally extended to video deblurring. In the work of Cai et al. [2], a multiimage deconvolution method was proposed using sparsity of blur kernels and clear image to handle registration errors. However, this method only enables twodimensional translational camera motion, which generates uniform blur. Therefore, the proposed approach cannot handle rotational camera shake, which is the main cause of large motion blurs [27]. To overcome this limitation, Li et al. [21] used a method parameterizing spatially varying motions with 3x3 homographies, and could handle spatially varying blurs by camera rotation. In the work of Cho et al. [4], camera motion in threedimensional space was estimated without the assistance of specialized hardware. In addition, nonuniform blurs by projective camera motion could be removed. Spatially varying blurs by depth variation in a static scene was handled recently in the works of Lee and Lee [19] and Paramanand et al. [23].
However, previous approaches, which assume that the scene is static, suffer from general blurs not only from camera shake but also from moving objects and depth variations in a dynamic scene. As parameterizing a spatially varying blur kernel in the dynamic scene is difficult with simple homography, kernel estimation to handle dynamic scene becomes more challenging. Therefore, several researchers have focused on restoring dynamic scenes, which is mainly grouped into two approaches: segmentationbased approach, and exemplarbased approach.
Segmentationbased deblurring approaches simultaneously estimate multiple motions, multiple kernels, and associated image segments. Cho et al. [6] proposed a method that segments images into multiple regions of homogeneous motions and estimates the corresponding blur kernel as a onedimensional Gaussian kernel. Therefore, this method cannot handle complex motions of objects and rotational motions of cameras that generate locally varying blurs. Bar et al. [1] proposed a layered model and segmented images into two layers (foreground and background). In addition, they estimated a linear blur kernel corresponding to a foreground layer. Although this method can explicitly handle occluded regions using a layered model, the kernel is limited to a onedimensional box filter only, and only a static camera is allowed. Wulff and Black [28] extended the previous work of Bar et al. They focused on estimating the parameters for both foreground and background motions. However, the motions within each segment are only parameterized using the affine model, and extending to multilayered scenes is difficult because such task requires joint estimation of depth ordering of the layers. In summary, segmentationbased approaches have the advantage of handling blurs by moving objects in dynamic scenes. However, parameterizing the motions in each segment remains an issue [16]. That is, it fails to segment nonparametrically varying complex motions such as motions of people, because doing so with the simple models used in [1, 28] is difficult.
The works of Matsushita et al. [22] and Cho et al. [7]
are typical exemplarbased approaches. These works estimate latent frames by interpolating sharp patches, that commonly exist in a long image sequence. Therefore, these methods disregard accurate segmentation and deconvolution, enabling the emergence of ringing artifacts. However, the former work cannot handle blurs by moving objects. Moreover, the latter one can only treat blurs by slightly moving objects in dynamic scenes because it searches sharp patches of a blurry patch using globally parameterized kernel with homography. Therefore, handling fastmoving objects, which have distinct motions from backgrounds, is difficult. Moreover, it degrades midfrequency textures, such as grasses and trees, because this method does not use deconvolution with spatial priors but use interpolation to restore latent frames, which renders smooth results.
To alleviate the problems in previous works, we propose a new generalized video deblurring method that estimates latent frames without using global motion parametrization and segmentation. We estimate bidirectional optical flows and use them to estimate pixelwise varying kernels. Therefore, we can naturally handle coexisting blurs by camera shake, moving objects with complex motions, and depth variations. However, sharp frames are required to obtain accurate optical flows because estimating flow fields is difficult between blurry images. In addition, accurate optical flows are necessary to restore sharp frames. This case is a typical chickenandegg problem, and thus we simultaneously estimate both variables. Therefore, we propose a new single energy model to solve our joint problem. We also provide a framework and efficient techniques to optimize the model. The result of our system is shown in Fig.2, in which the moving car is successfully restored because accurate optical flows are jointly estimated.
By minimizing the proposed energy function, we achieve significant improvements in numerous real challenging videos that other methods fail to do, as shown in Fig.1. Furthermore, we estimate more accurate optical flows compared with the stateoftheart flow estimation method, that handles blurry images. The performances are demonstrated in our extensive experiments.
2 Generalized Video Deblurring
Most conventional video deblurring methods suffer from the coexistence of various motion blurs from dynamic scenes because the motions cannot be parameterized using global or segmentwise parameterization. To handle general blurs, we propose a new energy model using pixelwise kernel estimation rather than global or segmentwise parameterization. As blind deblurring is a wellknown illposed problem, our energy model not only consists of data and spatial regularization terms but also a temporal term. The model is expressed as follows:
(1) 
and the details of each term in (1) are given in the following sections.
2.1 Data Model based on Approximated Blur
In conventional works, the motion blurs of each frame are approximated using parametric models such as homographies and affine models
[1, 7, 21, 28]. However, these kernel approximations are valid when motion blurs are parameterizable within an entire frame or segment. Therefore, pixelwise motion and kernel estimation are required to cope with general blurs. We approximate the pixelwise blur kernel using bidirectional optical flows, in accordance with previous works [8, 16, 24].Specifically, under an assumption that the velocity of the motion is constant between adjacent frames, our blur model is expressed as follows:
(2) 
where , and denote bidirectional optical flows at frame . Blurry frame and latent frame are and , respectively. Camera duty cycle of the frame is and denotes relative exposure time [21]. We define the image warping, , which transforms the frame to when , and transforms the frame to . Our bidirectional optical flows, duty cycle, and the corresponding piecewise linear kernel used in our blur model are illustrated in Fig. 3.
Although our blur kernel model is simple, our model can be justified because we treat video that has short exposure time to some extent. Therefore, we approximate the kernel as piecewise linear using bidirectional optical flows:
(3) 
where is the blur kernel using bidirectional optical flows at pixel location x, and denotes Kronecker delta.
Using this pixelwise kernel approximation, we can easily manage multiple different blurs in a frame, unlike conventional methods. The superiority of our kernel model is shown in Fig. 4. Our kernel model fits blurs from differently moving objects and camera shake much better than the conventional homographybased model.
Therefore, we cast pixelwise kernel estimation problem as an optical flows estimation problem. Discretizing the constraint (2) gives the following data term:
(4) 
where the row vector of blur kernel matrix
, corresponding to the blur kernel at pixel x, is the vector form of , and its elements are nonnegative and their sum is equal to one. Linear operator denotes the Toeplitz matrices corresponding to the partial (e.g., horizontal and vertical) derivative filters. Parameter controls the weight of the data term, and L, u, and B denote the set of latent frames, optical flows, and blurry frames, respectively.2.2 Temporal Coherence with Optical Flow Constraint
Here, we determine that optical flows are required to estimate the pixelwise blur kernel. However, the proposed data term does not have conventional optical flow constraints such as brightness constancy or gradient constancy in (4). In general, such constraints do not hold between two blurry frames. Thus, Portz et al. [24] proposed a method to apply flow constraints between blurry images. Based on the commutative law of shift invariance of kernels [13], the authors of [24] convolved the approximated blur of each observed image to the other image and assumed constant brightness between them at matched points. However, the commutativity property does not hold in theory when the kernel is not translation invariant. Therefore, this approach only works when the motion is smooth enough.
To address this problem, we propose a new model that finds correspondences between two latent sharp images to enable abrupt changes in motions and the corresponding kernels. In using this model, we need not restrict our blur kernels to be shift invariant. Our model is based on the conventional optical flow constraint between latent images, that is, brightness constancy. The formulation is expressed as follows:
(5) 
where denotes the index of neighboring frames at . Constant parameter controls the weight of each term in the summation. We apply the robust
norm to offer robustness against outliers and occlusions.
Notably, a major difference between the proposed model and the conventional optical flow estimation methods is that our problem is a joint problem. That is, the brightness of latent frames and optical flows need to be simultaneously estimated. Therefore, our model simultaneously enforces the temporal coherence of latent frames and estimates the correspondences.
2.3 Spatial Coherence
To alleviate the difficulties of highly illposed deblurring and optical flow estimation problems, several researchers have emphasized the importance of spatial regularization. Therefore, we also enforce spatial coherence to penalize spatial fluctuations while allowing discontinuities in both latent frames and flow fields. We assume that spatial priors for latent frames and optical flows are independent. They are expressed as follows:
(6) 
The first term in (6) denotes the spatial regularization term for the latent frames. Although more sparse norms (e.g., ) fit the gradient statistics of natural sharp images better [17, 18, 20], we use conventional total variation (TV) based regularization [12, 14, 16], as TV is computationally less expensive. The second term denotes the spatial smoothness term for optical flows. We adopt edgemap coupled TVbased regularization [15] to preserve discontinuities in the flow fields at edges. Similar to [16], the edgemap is expressed as follows:
(7) 
where controls the scale of the edgemap, parameter controls the weight, and is an initial latent image in the iterative optimization framework.
3 Optimization Framework
In the previous sections, we described the , , and terms. When camera duty cycle is known, our final objective function becomes as follows:
(8) 
Unlike the work of Cho et al. [7], which sequentially performs multiphase approaches, our model obtains a solution by minimizing a single objective function. However, because of its nonconvexity, our model is required to adopt practical optimization methods to obtain approximated solution. Therefore, we divide the original problem into two subproblems and use conventional iterative and alternating optimization techniques [5, 28] to minimize the nonconvex objective function. In the following sections, we introduce efficient solvers and describe how to estimate unknowns L and u, with one of them being fixed.
3.1 Sharp Video Restoration
While the optical flows u are fixed, corresponding blur kernels are also fixed, and our objective function in (8) becomes convex with respect to L, and is expressed as follows:
(9) 
To obtain L, we adopt the conventional convex optimization method in [3], and derive the primaldual update scheme as follows:
(10) 
where indicates the iteration number, and, and denote the dual variables. Parameters and denote the update steps. A linear operator A calculates the spatial difference between neighboring pixels, and another operator calculates the temporal differences between and . To update the primal variable and obtain in (10), we apply the conjugate gradient method to optimize the quadratic function.
3.2 Optical Flows Estimation
While the latent frames L are fixed, temporal coherence term becomes convex but the data term remains nonconvex. Therefore, we define a nonconvex fidelity function as follows:
(11) 
To find the optimized values of optical flows u, we first convexify the nonconvex function by applying the firstorder Taylor expansion. Similar to [16], we linearize the function near an initial in the iterative process as follows:
(12) 
Therefore, our approximated convex function for optical flows estimation is expressed as follows:
(13) 
Next, we apply the convex optimization technique in [3] to the approximated convex function (13), and the primaldual update process is expressed as follows:
(14) 
where denotes the dual variable of on the vector space and the diagonal matrix is the weighting matrix denoted as . Parameters and denote the update steps and means .
4 Implementation Details
To handle large blurs and guide fast convergence, we implement our algorithm on the traditional coarsetofine framework with empirically determined parameters. We use for our most experiments, and other parameters are determined as , , , and . In the coarsetofine framework, we build image pyramid with 17 levels for a highdefinition(1280x720) video, the scale factor is 0.9, and use bicubic interpolation to propagate both the optical flows and latent frames to the next pyramid level.
Moreover, to reduce the number of unknowns in optical flows, we only estimate and . We approximate using and . For example, it satisfies, , as illustrated in Fig. 5, and we can easily apply this for .
The overall process of our algorithm is in Algorithm 1. Further details on estimating the duty cycle and postprocessing step that reduces artifacts are given below.
4.1 Duty Cycle Estimation
In this study, we assume that the camera duty cycle is known for every frame. We can obtain the duty cyle from public SDK, when we use Kinect to capture RGB videos. However, when we conduct deblurring with conventional data sets, which do not provide exposure information, we apply the technique proposed in [7] to estimate the duty cycle. Contrary to the original method in [7], we use optical flows instead of homographies to obtain initially approximated blur kernels. Therefore, we first estimate flow fields from blurry images with [26], which runs in near realtime. We then use them as initial flows and approximate the kernels to estimate the duty cycle.
4.2 Occlusion Detection and Refinement
Our piecewise linear kernel naturally results in approximation error and it causes problems such as ringing artifacts. Moreover, our data model in (4), and temporal coherence model in (5) are invalid at occluded regions.
To reduce such artifacts from kernel errors and occlusions, we use spatiotemporal filtering as a postprocessing:
(15) 
where y denotes a pixel in the 3x3 neighboring patch at location and is the normalization factor (e.g. ). Notably, we enable in (15) for spatial filtering. Our occlusionaware weight is defined as follows:
(16) 
where occlusion state is determined using the method proposed in [15]. The 5x5 patch is centered at x in frame . The similarity control parameter is fixed as .
5 Experimental Results
In what follows, we demonstrate the superiority of the proposed method. (For more results, see the supplementary video.)
First, we compare our deblurring results with those of the stateofthe art exemplar based method [7] with the videos used in [7]. As shown in Fig. 8, the captured scenes are dynamic and contain multiple moving objects. The method [7] fails in restoring the moving objects, because the object motions are large and distinct from the backgrounds. By contrast, our results show better performances in deblurring moving objects and backgrounds. This exemplarbased approach also fails in handling large blurs, as shown in Fig. 8, as the initially estimated homographies in the largely blurred images are inaccurate. Moreover, this approach renders excessively smooth results for midfrequency textures such as trees, as the method is based on interpolation without spatial prior for latent frames.
Next, we compare our method with the stateoftheart segmentationbased approach [28]. In Fig. 8, the captured scene is a bilayer and used in [28]. Although the bilayer scene is a good example to verify the performance of the layered model, inaccurate segmentation near the boundaries causes serious artifacts in the restored frame. By contrast, our method does not depend on accurate segmentation and thus restores the boundaries much better than the layered model.
In Fig. 10, we quantitatively compare the optical flow accuracies with [24] on synthetic blurry images. Although [24] proposed to handle blurry images in optical flow estimation, its assumption does not hold in motion boundaries, which are very important for deblurring. Therefore, their optical flow is inaccurate in the motion boundaries of moving objects. However, our model enables abrupt changes of motions and thus performs better than the previous model.
Moreover, we show the deblurring results with and without using the temporal coherence term in (5), and verify that our temporal coherence model significantly reduces ringinig artifacts near the edges in Fig. 10.
Other deblurring results from numerous real videos are shown in Fig. 11. Notably, our model successfully restores the face which has highly nonuniform blurs because the person moves rotationally (Fig. 11(e)).
6 Conclusions
In this study, we introduced a novel method that removes general blurs in dynamic scenes, which conventional methods fail to do. By estimating a pixelwise kernel using optical flows, we handled general blurs. Thus, we proposed a new energy model that estimates optical flows and latent frames, jointly.
We also provided a framework and efficient solvers to minimize the energy function and achieved significant improvements in removing general blurs in dynamic scenes.
Acknowledgments
This research was supported in part by the MKE (The Ministry of Knowledge Economy), Korea and Microsoft Research, under IT/SW Creative research program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA2013H0503131041), and in part by the National Research Foundation of Korea (NRF) grant funded by the Ministry of Science, ICT & Future Planning (MSIP) (No. 20090083495)
References

[1]
L. Bar, B. Berkels, M. Rumpf, and G. Sapiro.
A variational framework for simultaneous motion estimation and
restoration of motionblurred video.
In
Proc. IEEE International Conference on Computer Vision and Pattern Recognition
, 2007.  [2] J.F. Cai, H. Ji, C. Liu, and Z. Shen. Blind motion deblurring using multiple images. Journal of computational physics, 228(14):5057–5071, 2009.
 [3] A. Chambolle and T. Pock. A firstorder primaldual algorithm for convex problems with applications to imaging. Journal of Mathematical Imaging and Vision, 40(1):120–145, May 2011.
 [4] S. Cho, H. Cho, Y.W. Tai, and S. Lee. Registration based nonuniform motion deblurring. In Computer Graphics Forum, volume 31, pages 2183–2192. Wiley Online Library, 2012.
 [5] S. Cho and S. Lee. Fast motion deblurring. In SIGGRAPH, 2009.
 [6] S. Cho, Y. Matsushita, and S. Lee. Removing nonuniform motion blur from images. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8. IEEE, 2007.
 [7] S. Cho, J. Wang, and S. Lee. Video deblurring for handheld cameras using patchbased synthesis. ACM Transactions on Graphics, 31(4):64:1–64:9, 2012.
 [8] S. Dai and Y. Wu. Motion from blur. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2008.
 [9] R. Fergus, B. Singh, A. Hertzmann, S. T. Roweis, and W. Freeman. Removing camera shake from a single photograph. In SIGGRAPH, 2006.
 [10] A. Gupta, N. Joshi, L. Zitnick, M. Cohen, and B. Curless. Single image deblurring using motion density functions. In ECCV, 2010.
 [11] M. Hirsch, C. J. Schuler, S. Harmeling, and B. Scholkopf. Fast removal of nonuniform camera shake. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 463–470. IEEE, 2011.
 [12] Z. Hu, L. Xu, and M.H. Yang. Joint depth estimation and camera shake removal from single blurry image. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2014.
 [13] H. Jin, P. Favaro, and R. Cipolla. Visual tracking in the presence of motion blur. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2005.
 [14] T. H. Kim, B. Ahn, and K. M. Lee. Dynamic scene deblurring. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3160–3167. IEEE, 2013.
 [15] T. H. Kim, H. S. Lee, and K. M. Lee. Optical flow via locally adaptive fusion of complementary data costs. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3344–3351. IEEE, 2013.
 [16] T. H. Kim and K. M. Lee. Segmentationfree dynamic scene deblurring. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2014.
 [17] D. Krishnan and R. Fergus. Fast image deconvolution using hyperlaplacian priors. In NIPS, 2009.
 [18] D. Krishnan, T. Tay, and R. Fergus. Blind deconvolution using a normalized sparsity measure. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2009.
 [19] H. S. Lee and K. M. Lee. Dense 3d reconstruction from severely blurred images using a single moving camera. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2013.
 [20] A. Levin and Y. Weiss. User assisted separation of reflections from a single image using a sparsity prior. IEEE Trans. Pattern Analysis Machine Intelligence, 29(9):1647–1654, 2007.
 [21] Y. Li, S. B. Kang, N. Joshi, S. M. Seitz, and D. P. Huttenlocher. Generating sharp panoramas from motionblurred videos. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2010.
 [22] Y. Matsushita, E. Ofek, W. Ge, X. Tang, and H.Y. Shum. Fullframe video stabilization with motion inpainting. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 28(7):1150–1163, 2006.
 [23] C. Paramanand and A. N. Rajagopalan. Nonuniform motion deblurring for bilayer scenes. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2013.
 [24] T. Portz, L. Zhang, and H. Jiang. Optical flow in the presence of spatiallyvarying motion blur. In Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2012.
 [25] Q. Shan, J. Jia, and A. Agarwala. Highquality motion deblurring from a single image. In SIGGRAPH, 2008.
 [26] A. Wedel, T. Pock, C. Zach, H. Bischof, and D. Cremers. An improved algorithm for tvl 1 optical flow. In Statistical and Geometrical Approaches to Visual Motion Analysis, pages 23–45. Springer, 2009.
 [27] O. Whyte, J. Sivic, A. Zisserman, and J. Ponce. Nonuniform deblurring for shaken images. International Journal of Computer Vision, 98(2):168–186, 2012.
 [28] J. Wulff and M. J. Black. Modeling blurred video with layers. In ECCV, 2014.
Comments
There are no comments yet.