Blur Robust Optical Flow using Motion Channel

03/07/2016 ∙ by Wenbin Li, et al. ∙ 0

It is hard to estimate optical flow given a realworld video sequence with camera shake and other motion blur. In this paper, we first investigate the blur parameterization for video footage using near linear motion elements. we then combine a commercial 3D pose sensor with an RGB camera, in order to film video footage of interest together with the camera motion. We illustrates that this additional camera motion/trajectory channel can be embedded into a hybrid framework by interleaving an iterative blind deconvolution and warping based optical flow scheme. Our method yields improved accuracy within three other state-of-the-art baselines given our proposed ground truth blurry sequences; and several other realworld sequences filmed by our imaging system.



There are no comments yet.


page 1

page 2

page 4

page 7

page 8

page 9

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Optical flow estimation has been widely applied to computer vision applications, e.g. segmentation, image deblurring and stabilization, etc. In many cases, optical flow is often estimated on the videos captured by a shaking camera. Those footages may contain a significant amount of camera blur that bring additional difficulties into the traditional variational optical flow framework. It is because such blur scenes often lead to a fact that a pixel may match multiple pixels between image pair. It further violates the basic assumption – intensity constancy – of the optical flow framework.

In this paper, we investigate the issue of how to precisely estimate optical flow from a blurry video footage. We observe that the blur kernel between neighboring frames may be near linear, which can be parameterized using linear elements of the camera motion. In this case, the camera trajectory can be informatic to enhance the image deblurring within a variational optical flow framework. Based on this observation, our major contribution in this paper is to utilise an RGB-Motion Imaging System – an RGB sensor combined with a 3D pose&position tracker – in order to propose: (A) an iterative enhancement process for camera shake blur estimation which encompasses the tracked camera motion (Sec. 3) and a Directional High-pass Filter (Sec. 4 and Sec. 7.2); (B) a Blur-Robust Optical Flow Energy formulation (Sec. 6); and (C) a hybrid coarse-to-fine framework (Sec. 7) for computing optical flow in blur scenes by interleaving an iterative blind deconvolution process and a warping based minimisation scheme. In the evaluation section, we compare our method to three existing state-of-the-art optical flow approaches on our proposed ground truth sequences (Fig. 1, blur and baseline blur-free equivalents) and also illustrate the practical benefit of our algorithm given realworld cases.

Figure 1: Visual comparison of our method to Portz et al. Portz on our ground truth benchmark Grove2 with synthetic camera shake blur. First Column: the input images; Second Column: the optical flow fields calculated by our method and the baseline; Third Column: the RMS error maps against the ground truth.

2 Related Work

Camera shake blur often occurs during fast camera movement in low-light conditions due to the requirement of adopting a longer exposure. Recovering both the blur kernel and the latent image from a single blurred image is known as Blind Deconvolution which is an inherently ill-posed problem. Cho and Lee FMD propose a fast deblurring process within a coarse-to-fine framework (Cho&Lee) using a predicted edge map as a prior. To reduce the noise effect in this framework, Zhong et al. Zhong introduce a pre-filtering process which reduces the noise along a specific direction and preserves the image information in other directions. Their improved framework provides high quality kernel estimation with a low run-time but shows difficulties given combined object and camera shake blur.

Figure 2: RGB-Motion Imaging System. (a): Our system setup using a combined RGB sensor and 3D Pose&Position Tracker. (b)

: The tracked 3D camera motion in relative frames. The top-right box is the average motion vector – which has similar direction to the blur kernel.

(c): Images captured from our system. The top-right box presents the blur kernel estimated using  FMD . (d): The internal process of our system where the presents the exposure time.

To obtain higher performance, a handful of combined hardware and software-based approaches have also been proposed for image deblurring. Tai et al. tai08 introduce a hybrid imaging system that is able to capture both video at high frame rate and a blurry image. The optical flow fields between the video frames are utilised to guide blur kernel estimation. Levin et al. levin propose to capture a uniformly blurred image by controlling the camera motion along a parabolic arc. Such uniform blur can then be removed based on the speed or direction of the known arc motion. As a complement to Levin el al.’s levin hardware-based deblurring algorithm, Joshi et al. Joshi apply inertial sensors to capture the acceleration and angular velocity of a camera over the course of a single exposure. This extra information is introduced as a constraint in their energy optimisation scheme for recovering the blur kernel. All the hardware-assisted solutions described provide extra information in addition to the blurry image, which significantly improves overall performance. However, the methods require complex electronic setups and the precise calibration.

Optical flow techniques are widely studied and adopted across computer vision because of dense image correspondences they provide. Such dense tracking is important for other fundamental research topics e.g. 3D reconstruction reflection and visual effects lv2013game ; lv2014multimodal , etc. In the last two decades, the optical flow model has evolved extensively – one landmark work being the variational model of Horn and Schunck HS where the concept of Brightness Constancy is proposed. Under this assumption, pixel intensity does not change spatio-temporally, which is, however, often weakened in realworld images because of natural noise. To address this issue, some complementary concepts have been developed to improve performance given large displacements Brox , taking advantage of feature-rich surfaces Xu_deblur and adapting to nonrigid deformation in scenes APO ; LME ; APO_JIFS ; moBlur ; tang ; li2013nonrigid . However, flow approaches that can perform well given blurred scenes – where the Brightness Constancy is usually violated – are less common. Of the approaches that do exist, Schoueri et al. Schoueri perform a linear deblurring filter before optical flow estimation while Portz et al. Portz attempt to match un-uniform camera motion between neighbouring input images. Whereas the former approach may be limited given nonlinear blur in realworld scenes; the latter requires two extra frames to parameterise the motion-induced blur. Regarding non optical-flow based methods, Yuan et al. Yuan align a blurred image to a sharp one by predefining an affine image transform with a blur kernel. Similarly HaCohen et al. HaCohen achieve alignment between a blurred image and a sharp one by embedding deblurring into the correspondence estimation. Li et al. moBlur present an approach to solve the image deblurring and optical flow simultaneously by using the RGB-Motion imaging.

3 RGB-Motion Imaging System

Camera shake blur within video footage is typically due to fast camera motion and/or long exposure time. In particular, such blur can be considered as a function of the camera trajectory supplied to image space during the exposure time . It therefore follows that knowledge of the actual camera motion between image pairs can provide significant information when performing image deblurring Joshi ; levin .

In this paper, we propose a simple and portable setup (Fig. 2(a)), combining an RGB sensor and a 3D pose&position tracker (SmartNav by NaturalPoint Inc.) in order to capture continuous scenes (video footage) along with real-time camera pose&position information. Note that the RGB sensor could be any camera or a Kinect sensor – A Canon EOS 60D is applied in our implementation to capture video at frame rate of 24 FPS. Furthermore, our tracker is proposed to provide the rotation (yaw, pitch and roll), translation and zoom information within a reasonable error range (2 mm). To synchronise this tracker data and the image recording, a real time collaboration (RTC) server JeeHang is built using the instant messaging protocol XMPP (also known as Jabber111 which is designed for message-oriented communication based on XML, and allows real-time responses between different messaging channels or any signal channels that can be transmitted and received in message form. In this case, a time stamp is assigned to the received message package by the central timer of the server. Those message packages are synchronised if they contain nearly the same time stamp. We consider the Jabber for synchronisation because of its opensource nature and the low respond delay (around 10 ms).

Assuming objects have similar depth within the same scene (a common assumption in image deblurring which will be discussed in our future work), the tracked 3D camera motion in image coordinates can be formulated as:


where represents the average of the camera motion vectors from the image to image . X denotes the 3D position of the camera while is a pixel location and represents the number of pixels in an image. represents the 3D projection matrix while and denote the rotation and translation matrices respectively of tracked camera motion in the image domain. All these information , and is computed using Optitrack’s Camera SDK222 (version 1.2.1). Fig 2(b,c) shows sample data (video frames and camera motion) captured from our imaging system. It is observed that blur from the realworld video is near linear due to the relatively high sampling rate of the camera. The blur direction can therefore be approximately described using the tracked camera motion. Let the tracked camera motion be represented in polar coordinates where and denote the magnitude and directional component respectively. is a sharing index between tracked camera motion and frame number. In addition, we also consider the combined camera motion vector of neighbouring images as shown in Fig 2(d), e.g. where denotes the combined camera motion vector from image 1 to image 3. As one of our main contributions, these real-time motion vectors are proposed to provide additional constraints for blur kernel enhancement (Sec. 7) within our framework.

4 Blind Deconvolution

The motion blur process can commonly be formulated:


where is a blurred image and represents a blur kernel w.r.t. a specific Point Spread Function. is the latent image of ; denotes the convolution operation and represents spatial noise within the scene. In the blind deconvolution operation, both and are estimated from , which is an ill-posed (but extensively studied) problem. A common approach for blind deconvolution is to solve both and in an iterative framework using a coarse-to-fine strategy:


where represents a regularization that penalizes spatial smoothness with a sparsity prior FMD , and is widely used in recent state-of-the-art work Shan ; Xu_deblur . Due to noise sensitivity, low-pass and bilateral filters Tai are typically employed before deconvolution. Eq. 5 denotes the common definition of an optimal kernel from a filtered image.


where represents the ground truth blur kernel, is a filter, and denotes the optimal blur kernel from the filtered image . The low-pass filtering process improves deconvolution computation by removing spatially-varying high frequency noise but also results in the removal of useful information which yields additional errors over object boundaries. To preserve this useful information, we introduce a directional high-pass filter that utilises our tracked 3D camera motion.

5 Directional High-pass Filter

Figure 3: Directional high-pass filter for blur kernel enhancement. Given the blur direction , a directional high-pass filter along is applied to preserve blur detail in the estimated blur kernel.

Detail enhancement using directional filters has been proved effective in several areas of computer vision Zhong . Here we define a directional high-pass filter as:


where represents a pixel position and denotes a 1D Gaussian based high-pass function. controls the filtering direction along . is a normalization factor defined as . The filter is proposed to preserve overall high frequency details along direction without affecting blur detail in orthogonal directions Chen . Given a directionally filtered image , the optimal blur kernel is defined (Eq 5) as . Fig. 3 demonstrates that noise or object motion within a scene usually results in low frequency noise in the estimated blur kernel (Cho&Lee FMD ). This low frequency noise can be removed by our directional high-pass filter while preserving major blur details. In our method, this directional high-pass filter is supplemented into the Cho&Lee FMD framework using a coarse-to-fine strategy in order to recover high quality blur kernels for use in our optical flow estimation (Sec. 7.2).

6 Blur-Robust Optical Flow Energy

Within a blurry scene, a pair of adjacent natural images may contain different blur kernels, further violating Brightness Constancy. This results in unpredictable flow error across the different blur regions. To address this issue, Portz et al. proposed a modified Brightness Constancy term by matching the un-uniform blur between the input images. As one of our main contributions, we extend this assumption to a novel Blur Gradient Constancy

term in order to provide extra robustness against illumination change and outliers. Our main energy function is given as follows:


A pair of consecutively observed frames from an image sequence is considered in our algorithm. represents the current frame and its successor is denoted by where and represent rectangular images in the RGB channel. Here is latent image and denotes the relative blur kernel. The optical flow displacement between and is defined as . To match the un-uniform blur between input images, the blur kernel from each input image is applied to the other. We have new blur images and as follows:


Our energy term encompassing Brightness and Gradient Constancy relates to and as follows:


The term presents a spatial gradient and denotes a linear weight. The smoothness regulariser penalizes global variation as follows:


  Algorithm 1: Blur-Robust Optical Flow Framework   Input    : A image pair , and camera motion , ,   Output : Optimal optical flow field w   1:   A -level top-down pyramid is built with the level index   2:     3:   ,   4:   , ,   5:   for coarse to fine do   6:            7:          Resize , , and with the th scale   8:          foreach do   9:                  IterBlindDeconv ( ) 10:                  DirectFilter ( ) 11:                  NonBlindDeconvolve ( ) 12:          endfor 13:          , 14:          Energyoptimisation ( ) 15:          16:   endfor

where we apply the Lorentzian regularisation to both the data term and smoothness term. In our case, the image properties, e.g. small details and edges, are broken by the camera blur, which leads to additional errors in those regions. We suppose to apply strong boundary preservation even the non-convex Lorentzian regularisation may bring the extra difficulty to the energy optimisation (More analysis can be found in Li et al. LME ). In the following section, our optical flow framework is introduced in detail.

7 Optical Flow Framework

Our overall framework is outlined in Algorithm 1 based on an iterative top-down, coarse-to-fine strategy. Prior to minimizing the Blur-Robust Optical Flow Energy (Sec. 7.4), a fast blind deconvolution approach FMD is performed for pre-estimation of the blur kernel (Sec. 7.1), which is followed by kernel refinement using our Directional High-pass Filter (Sec. 7.2). All these steps are detailed in the following subsections.

7.1 Iterative Blind Deconvolution

Cho and Lee FMD describe a fast and accurate approach (Cho&Lee) to recover the unique blur kernel. As shown in Algorithm 1, we perform a similar approach for the pre-estimation of the blur kernel within our iterative process, which involves two steps of prediction and kernel estimation. Given the latent image estimated from the consecutively coarser level, the gradient maps of are calculated along the horizontal and vertical directions respectively in order to enhance salient edges and reduce noise in featureless regions of . Next, the predicted gradient maps as well as the gradient map of the blurry image are utilised to compute the pre-estimated blur kernel by minimizing the energy function as follows:


where denotes the weight of Tikhonov regularization and represents a linear weight for the derivatives in different directions. Both and are propagated from the nearest coarse level within the pyramid. To minimise this energy Eq. (12), we follow the inner-iterative numerical scheme of FMD which yields a pre-estimated blur kernel .

7.2 Directional High-pass Filtering

Once the pre-estimated kernel is obtained, our Directional High-pass Filters are applied to enhance the blur information by reducing noise in the orthogonal direction of the tracked camera motion. Although our RGB-Motion Imaging System provides an intuitively accurate camera motion estimation, outliers may still exist in the synchronisation. We take into account the directional components of two consecutive camera motions and as well as their combination (Fig. 2(d)) for extra robustness. The pre-estimated blur kernel is filtered along its orthogonal direction as follows:


where linearly weights the contribution of filtering in different directions. Note that two consecutive images and are involved in our framework where the former accepts the weight set while the other weight set is performed for the latter. This filtering process yields an updated blur kernel which is used to update the latent image within a non-blind deconvolution Zhong . Note that the convolution operation is computationally expensive in the spatial domain, we consider an equivalent filtering scheme in the frequency domain in the following subsection.

7.3 Convolution for Directional Filtering

Our proposed directional filtering is performed as convolution operation in the spatial domain, which is often highly expensive in computation given large image resolutions. In our implementations, we consider a directional filtering scheme in the frequency domain where we have the equivalent form of filtering model Eq. (6) as follows:


where is the optimal blur kernel in the frequency domain while and present the Fourier Transform of the blur kernel and our directional filter respectively. Thus, the optimal blur kernel in the spatial domain can be calculated as using Inverse Fourier Transform. In this case, the equivalent form of our directional high-pass filter in the frequency domain is defined as follows:


where the line function controls the filtering process along the direction while

is the standard deviation for controlling the strength of the filter. Please note that other more sophisticated high-pass filters could also be employed using this directional substitution

. Even though this consumes a reasonable proportion of computer memory, convolution in the frequency domain is faster than equivalent computation in the spatial domain .

Having performed blind deconvolution and directional filtering (Sec. 7.1, 7.2 and 7.3), two updated blur kernels and on the th level of the pyramid are obtained from input images and respectively, which is followed by the uniform blur image and computation using Eq. (9). In the following subsection, Blur-Robust Optical Flow Energy optimisation on and is introduced in detail.

7.4 Optical Flow Energy optimisation

As mentioned in Sec. 6, our blur-robust energy is continuous but highly nonlinear. minimisation of such energy function is extensively studied in the optical flow community. In this section, a numerical scheme combining Euler-Lagrange Equations and Nested Fixed Point Iterations is applied Brox to solve our main energy function Eq. 7. For clarity of presentation, we define the following mathematical abbreviations:

At the first phase of energy minimization, a system is built based on Eq. 7 where Euler-Lagrange is employed as follows:


An -level image pyramid is then constructed from the top coarsest level to the bottom finest level. The flow field is initialized as on the top level and the outer fixed point iterations are applied on w. We assume that the solution converges on the level. We have:


Because of the nonlinearity in terms of , , the system (Eqs. 18, 19) is difficult to solve by linear numerical methods. We apply the first order Taylor expansions to remove these nonlinearity in , which results in:

Based on the coarse-to-fine flow assumption of Brox et al. Brox w.r.t. and where the unknown flow field on the next level can be obtained using the flow field and its incremental from the current level . The new system can be presented as follows:


where the terms and contained provide robustness to flow discontinuity on the object boundary. In addition, is also regularizer for a gradient constraint in motion space. Although we fixed in Eqs. 20 and 21, the nonlinearity in leads to the difficulty of solving the system. The inner fixed point iterations are applied to remove this nonlinearity: and are assumed to converge within iterations by initializing and . Finally, we have the linear system in and as follows:


where denotes a robustness factor against flow discontinuity and occlusion on the object boundaries. represents the diffusivity of the smoothness regularization.

In our implementation, the image pyramid is constructed with a downsampling factor of 0.75. The final linear system in Eq. (22,23) is solved using Conjugate Gradients within 45 iterations.

  Algorithm 2: Auto Blur-Robust Optical Flow Framework   Input    : A image pair , Without camera motion   Output : Optimal optical flow field w   1:   A -level top-down pyramid is built with the level index   2:     3:   ,   4:   , , ,   5:   for coarse to fine do   6:            7:          Resize , , and with the th scale   8:          foreach do   9:                  IterBlindDeconv ( ) 10:                  DirectFilter ( ) 11:                  NonBlindDeconvolve ( ) 12:          endfor 13:          , 14:          EnergyOptimisation ( ) 15:          16:          CameraMotionEstimation() 17:   endfor

7.5 Alternative Implementation with Automatic Camera Motion Estimation

Alternative to using our assisted tracker, we also provide an additional implementation by using the camera motion estimated generically from the flow field. As shown in Algorithm 2, the system does not take the camera motion () as input but computes it (CameraMotionEstimation) generically at every level of the image pyramid.


On each level, we calculate the Affine Matrix from to using the correspondences and RANSAC. The translation information from is then normalized and converted to the angle format . In this case, our is also downgraded to consider one direction for each level. In the next section, we quantitatively compare our method to other popular baselines.

8 Evaluation

Figure 4: The synthetic blur sequences with the blur kernel, tracked camera motion direction and ground truth flow fields. From Top To Bottom: sequences of RubberWhale, Urban2, Hydrangea and Urban2.

In this section, we evaluate our method on both synthetic and realworld sequences and compare its performance against three existing state-of-the-art optical flow approaches of Xu et al.’s MDP Xu_deblur , Portz et al.’s Portz and Brox et al.’s Brox (an implementation of liu ). MDP is one of the best performing optical flow methods given blur-free scenes, and is one of the top 3 approaches in the Middlebury benchmark Middlebury . Portz et al.’s method represents the current state-of-the-art in optical flow estimation given object blur scenes while Brox et al.’s contains a similar optimisation framework and numerical scheme to Portz et al.’s, and ranks in the midfield of the Middlebury benchmarks based on overall average. Note that all three baseline methods are evaluated using their default parameters setting; all experiments are performed using a 2.9Ghz Xeon 8-cores, NVIDIA Quadro FX 580, 16Gb memory computer.

In the following subsections, we compare our algorithm (moBlur) and four different implementations (auto, nonGC, nonDF and nonGCDF) against the baseline methods. auto denotes the implementation using the automatic camera motion estimation scheme (Algorithm 2); nonGC represents the implementation without the Gradient Constancy term while nonDF denotes an implementation without the directional filtering process. nonGCDF is the implementation with neither of these features. The results show that our Blur-Robust Optical Flow Energy and Directional High-pass Filter significantly improve algorithm performance for blur scenes in both synthetic and realworld cases.

8.1 Middlebury Dataset with camera shake blur

One advance for evaluating optical flow given scenes with object blur is proposed by Portz et al. Portz where synthetic Ground Truth (GT) scenes are rendered with blurry moving objects against a blur-free static/fixed background. However, their use of synthetic images and controlled object trajectories lead to a lack of global camera shake blur, natural photographic properties and real camera motion behaviour. To overcome these limitations, we render four sequences with camera shake blur and corresponding GT flow-fields by combining sequences from the Middlebury dataset Middlebury with blur kernels estimated using our system.

(b) Visual comparison on sequences RubberWhale, Urban2, Hydrangea and Urban2 by varying baseline methods. For each sequence, First Row: optical flow fields from different methods. Second Row: the error maps against the ground truth.
(a) Left: Quantitative Average Endpoint Error (AEE), Average Angle Error (AAE) and Time Cost (in second) comparisons on our synthetic sequences where the subscripts show the rank in relative terms. Right: AEE measure on RubberWhale by ramping up the noise distribution.
(a) Left: Quantitative Average Endpoint Error (AEE), Average Angle Error (AAE) and Time Cost (in second) comparisons on our synthetic sequences where the subscripts show the rank in relative terms. Right: AEE measure on RubberWhale by ramping up the noise distribution.

Figure 5: Quantitative evaluation on four synthetic blur sequences with both camera motion and ground truth.

In our experiments we select the sequences Grove2, Hydrangea, RubberWhale and Urban2 from the Middlebury dataset. For each of them, four adjacent frames are selected as latent images along with the GT flow field (supplied by Middlebury) for the middle pair. blur kernels are then estimated FMD from realworld video streams captured using our RGB-Motion Imaging System. As shown in Fig. 4, those kernels are applied to generate blurry images denoted by , , and while the camera motion direction is set for each frame based on the 3D motion data. Although the between latent images can be utilised for the evaluation on relative blur images  Sintel ; SintelWS , strong blur can significantly violate the original image intensity, which leads to a multiple correspondences problem: a point in the current image corresponds to multiple points in the consecutive image. To remove such multiple correspondences, we sample reasonable correspondence set to use as the GT for the blur images where denotes a predefined threshold. Once we obtain , both Average Endpoint Error (AEE) and Average Angle Error (AAE) tests Middlebury are considered in our evaluation. The computation is formulated as follows:


Figure 6: Quantitative comparison between our implementations using RGB-Motion Imaging (moBlur); and automatic camera motion estimation scheme (auto, see Sec. 7.5). From Left To Right: AEE and AAE tests on all sequences respectively; the angular error of camera motion estimated by auto by varying the pyramidal levels of the input images.

Figure 7: AEE measure of our method (moBlur) by varying the input motion directions. (a): the overall measure strategy and error maps of moBlur on sequence Urban2. (b): the quantitative comparison of moBlur against nonDF by ramping up the angle difference . (c): the measure of moBlur against Portz et al. Portz .

where and denotes the baseline flow field and the ground truth flow field (by removing multiple correspondences) respectively while presents the number of ground truth vectors in . The factor in AAE is an arbitrary scaling constant to convert the units from pixels to degrees Middlebury . Fig. 5(a) Left shows AEE (in pixel) and AAE (in degree) tests on our four synthetic sequences. moBlur and nonGC lead both AEE and AAE tests in all the trials. Both Brox et al. and MDP yield significant error in Hydrangea, RubberWhale and Urban2 because those sequences contain large textureless regions with blur, which in turn weakens the inner motion estimation process as shown in Fig. 5(b). Fig. 5(a) also illustrates the average time cost (second per frame) of the baseline methods. Our method gives reasonable performance (45 sec. per frame) comparing to the state-of-the-art Portz et al. and MDP even an inner image deblurring process is involved. Furthermore, Fig 5(a) Right shows the AEE metric for RubberWhale by varying the distribution of Salt&Pepper noise. It is observed that a higher noise level leads to additional errors for all the baseline methods. Both moBlur and nonGC yield the best performance while Portz et al. and Brox et al. show a similar rising AEE trend when the noise increases.

Figure 8: The realworld sequences captured along the tracked camera motion. From Top To Bottom: sequences of warrior, chessboard, LabDesk and shoes.

Fig. 6 shows our quantitative measure by comparing our two implementations which use the RGB-Motion Imaging (moBlur) and automatic camera motion estimation scheme (auto, see Sec. 7.5) respectively. For better observation, we also give the Portz et al. in this measure. We observe that both our implementations outperform Portz et al. in the AEE and AAE tests. Especially the moBlur gives the best accuracy in all trials. The implementation auto yields the less accurate results than the moBlur. It may be because the auto camera motion estimation is affected by ambiguous blur that often caused by multiple moving objects. To investigate this issue, we plot the angular error by comparing the auto-estimated camera motion to the ground truth on all the sequences (Fig. 6, right end). We observe that our automatic camera motion estimation scheme leads to higher errors on the upper/coarser level of the image pyramid. Even the accuracy is improved on the finer levels but the error may be accumulated and affect the final result.

In practice, the system may be used in some challenge scenes, e.g. fast camera shaking, super high frame rate capture, or even infrared interference, etc. In those cases, the wrong tracked camera motion may be given to some specific frames. To investigate how the tracked camera motion affects the accuracy of our algorithm, we compare moBlur to nonDF (our method without directional filtering) and Portz et al. by varying the direction of input camera motion. As shown in Fig. 7(a), we rotate the input camera motion vector with respect to the GT blur direction by an angle of degrees. Here represents the ideal situation where the input camera motion has the same direction as the blur direction. The increasing simulates more errors in the camera motion estimation. Fig. 7(b,c) shows the AEE metric by increasing the . We observe that the AEE increases during this test. moBlur outperforms the nonDF (moBlur without the directional filter) in both Grove2 and RubberWhale while nonDF provides higher performance in Hydrangea when is larger than . In addition, moBlur outperforms Portz et al. in all trials except Hydrangea where Portz et al. shows a minor advantage (AEE 0.05) when . The rationale behind this experiment is that the wrong camera motion may yield significant information loss in the directional high-pass filtering. Such information loss harms the deblurring process and consequently leads to errors in the optical flow estimation. Thus, obtaining precise camera motion is the essential part of this system, as well as a potential future research.

8.2 Realworld Dataset

(b) Visual comparison on realworld sequences of LabDesk and shoes.
(a) Visual comparison on realworld sequences of warrior and chessboard.
(a) Visual comparison on realworld sequences of warrior and chessboard.

Figure 9: Visual comparison of image warping on realworld sequences of warrior, chessboard, LabDesk and shoes, captured by our RGB-Motion Imaging System.

To evaluate our method in the realworld scenes, we capture four sequences warrior, chessboard, LabDesk and shoes with tracked camera motion using our RGB-Motion Imaging System. As shown in Fig. 8, both warrior and chessboard contain occlusions, large displacements and depth change while the sequences of LabDesk and shoes embodies the object motion blur and large textureless regions within the same scene. Fig. 9 shows visual comparison of our method moBlur against Portz et al. on these realworld sequences. It is observed that our method preserves appearance details on the object surface and reduce boundary distortion after warping using the flow field. In addition, our method shows robustness given cases where multiple types of blur exist in the same scene (Fig.9(b), sequence shoes).

9 Conclusion

In this paper, we introduce a novel dense tracking framework which interleaves both a popular iterative blind deconvolution; as well as a warping based optical flow scheme. We also investigate the blur papameterization for the video footages. In our evaluation, we highlight the advantages of using both the extra motion channel and the directional filtering in the optical flow estimation for the blurry video footages. Our experiments also demonstrated the improved accuracy of our method against large camera shake blur in both noisy synthetic and realworld cases. One limitation in our method is that the spatial invariance assumption for the blur is not valid in some realworld scenes, which may reduce accuracy in the case where the object depth significantly changes. Finding a depth-dependent deconvolution and deep data-driven model would be a challenge for future work as well.

10 Acknowledgements

We thank Ravi Garg and Lourdes Agapito for providing their GT datasets. We also thank Gabriel Brostow and the UCL Vision Group for their helpful comments. The authors are supported by the EPSRC CDE EP/L016540/1 and CAMERA EP/M023281/1; and EPSRC projects EP/K023578/1 and EP/K02339X/1.



  • (1)

    T. Portz, L. Zhang, H. Jiang, Optical flow in the presence of spatially-varying motion blur, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), 2012, pp. 1752–1759.

  • (2) S. Cho, S. Lee, Fast motion deblurring, ACM Transactions on Graphics (TOG’09) 28 (5) (2009) 145.
  • (3) L. Zhong, S. Cho, D. Metaxas, S. Paris, J. Wang, Handling noise in single image deblurring using directional filters, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13), 2013, pp. 612–619.
  • (4) Y.-W. Tai, H. Du, M. S. Brown, S. Lin, Image/video deblurring using a hybrid camera, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’08), 2008, pp. 1–8.
  • (5) A. Levin, P. Sand, T. S. Cho, F. Durand, W. T. Freeman, Motion-invariant photography, ACM Transactions on Graphics (TOG’08) 27 (3) (2008) 71.
  • (6) N. Joshi, S. B. Kang, C. L. Zitnick, R. Szeliski, Image deblurring using inertial measurement sensors, ACM Transactions on Graphics (TOG’10) 29 (4) (2010) 30.
  • (7) C. Godard, P. Hedman, W. Li, G. J. Brostow, Multi-view reconstruction of highly specular surfaces in uncontrolled environments, in: 3D Vision (3DV), 2015 International Conference on, IEEE, 2015, pp. 19–27.
  • (8) Z. Lv, A. Tek, F. Da Silva, C. Empereur-Mot, M. Chavent, M. Baaden, Game on, science-how video game technology may help biologists tackle visualization challenges, PloS one 8 (3) (2013) e57990.
  • (9) Z. Lv, A. Halawani, S. Feng, H. Li, S. U. Réhman, Multimodal hand and foot gesture interaction for handheld devices, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 11 (1s) (2014) 10.
  • (10)

    B. Horn, B. Schunck, Determining optical flow, Artificial intelligence 17 (1-3) (1981) 185–203.

  • (11) T. Brox, A. Bruhn, N. Papenberg, J. Weickert, High accuracy optical flow estimation based on a theory for warping, in: European Conference on Computer Vision (ECCV’04), 2004, pp. 25–36.
  • (12) L. Xu, S. Zheng, J. Jia, Unnatural l0 sparse representation for natural image deblurring, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13), 2013, pp. 1107–1114.
  • (13) W. Li, D. Cosker, M. Brown, An anchor patch based optimisation framework for reducing optical flow drift in long image sequences, in: Asian Conference on Computer Vision (ACCV’12), Springer, 2012, pp. 112–125.
  • (14) W. Li, D. Cosker, M. Brown, R. Tang, Optical flow estimation using laplacian mesh energy, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’13), IEEE, 2013, pp. 2435–2442.
  • (15) W. Li, D. Cosker, M. Brown, Drift robust non-rigid optical flow enhancement for long sequences, Journal of Intelligent and Fuzzy Systems 0 (0) (2016) 12.
  • (16) W. Li, Y. Chen, J. Lee, G. Ren, D. Cosker, Robust optical flow estimation for continuous blurred scenes using rgb-motion imaging and directional filtering, in: IEEE Winter Conference on Application of Computer Vision (WACV’14), IEEE, 2014, pp. 792–799.
  • (17) R. Tang, D. Cosker, W. Li, Global alignment for dynamic 3d morphable model construction, in: Workshop on Vision and Language (V&LW’12), 2012.
  • (18) W. Li, Nonrigid surface tracking, analysis and evaluation, Ph.D. thesis, University of Bath (2013).
  • (19) Y. Schoueri, M. Scaccia, I. Rekleitis, Optical flow from motion blurred color images, in: Canadian Conference on Computer and Robot Vision, 2009.
  • (20) L. Yuan, J. Sun, L. Quan, H.-Y. Shum, Progressive inter-scale and intra-scale non-blind image deconvolution 27 (3) (2008) 74.
  • (21) Y. HaCohen, E. Shechtman, D. B. Goldman, D. Lischinski, Non-rigid dense correspondence with applications for image enhancement, ACM Transactions on Graphics (TOG’11) 30 (4) (2011) 70.
  • (22) J. Lee, V. Baines, J. Padget, Decoupling cognitive agents and virtual environments, in: Cognitive Agents for Virtual Environments, 2013, pp. 17–36.
  • (23) Q. Shan, J. Jia, A. Agarwala, High-quality motion deblurring from a single image, ACM Transactions on Graphics (TOG’08) 27 (3) (2008) 73.
  • (24) Y.-W. Tai, S. Lin, Motion-aware noise filtering for deblurring of noisy and blurry images, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12), 2012, pp. 17–24.
  • (25) X. Chen, J. Yang, Q. Wu, J. Zhao, X. He, Directional high-pass filter for blurry image analysis, Signal Processing: Image Communication 27 (2012) 760–771.
  • (26) C. Liu, Beyond pixels: exploring new representations and applications for motion analysis, Ph.D. thesis, Massachusetts Institute of Technology (2009).
  • (27) S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, R. Szeliski, A database and evaluation methodology for optical flow, International Journal of Computer Vision (IJCV’11) 92 (2011) 1–31.
  • (28) D. J. Butler, J. Wulff, G. B. Stanley, M. J. Black, A naturalistic open source movie for optical flow evaluation, in: European Conference on Computer Vision (ECCV’12), 2012, pp. 611–625.
  • (29) J. Wulff, D. J. Butler, G. B. Stanley, M. J. Black, Lessons and insights from creating a synthetic optical flow benchmark, in: ECCV Workshop on Unsolved Problems in Optical Flow and Stereo Estimation (ECCVW’12), 2012, pp. 168–177.