Asymmetric Bilateral Phase Correlation for Optical Flow Estimation in the Frequency Domain

11/01/2018 ∙ by Vasileios Argyriou, et al. ∙ Kingston University 4

We address the problem of motion estimation in images operating in the frequency domain. A method is presented which extends phase correlation to handle multiple motions present in an area. Our scheme is based on a novel Bilateral-Phase Correlation (BLPC) technique that incorporates the concept and principles of Bilateral Filters retaining the motion boundaries by taking into account the difference both in value and distance in a manner very similar to Gaussian convolution. The optical flow is obtained by applying the proposed method at certain locations selected based on the present motion differences and then performing non-uniform interpolation in a multi-scale iterative framework. Experiments with several well-known datasets with and without ground-truth show that our scheme outperforms recently proposed state-of-the-art phase correlation based optical flow methods.



There are no comments yet.


page 1

page 3

page 4

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Optical flow estimation is one of the fundamental problems in computer vision

[16, 24] with significant applications such as structure from motion, SLAM, 3D reconstruction [3] object and pedestrian tracking, face, object or building detection and behaviour recognition [26, 8]

, robots or vehicle navigation, image super-resolution, medical image registration, restoration, and compression. Dense motion estimation or optical flow is an ill-posed problem and it is defined as the process of approximating the 3D movement in a scene on a 2D image sequence based on the illumination changes due to camera or object motion. As an outcome of this process a dense vector field is obtained providing motion information for each pixel in the image sequences.

Several aspects make this problem particularly challenging including occlusions (i.e. points can appear or disappear between two frames), and the aperture problem (i.e. regions of uniform appearance with the local cues to not be able to provide any information about the motion). In the literature there are many approaches to overcome these challenges and compute the optical flow; and most of them belong in one of the following categories. Block matching techniques [31] are based on the assumption that all pixels in a block undergo the same motion and are mainly applied for standards conversion, and in international standards for video communications such as MPEGx and H.26x; Gradient-based techniques [23, 32] are utilising the spatio-temporal image gradients introducing also some robust or smoothness constraints [50]; Bayesian techniques [19]

that utilize probability smoothness constraints or Markov Random Fields (MRFs) over the entire image, focusing on finding the Maximum a Posteriori (MAP) solution. These methods are also combined with multi-scale schemes offering more accurate and robust motion vectors; GPU based approaches using local operations have been proposed

[20] providing accurate estimates; and Phase correlation techniques [43] that operate in the frequency domain computing the optical flow by applying locally phase correlation to the frames.

Fig. 1: An example inaccurate or blurred motion vectors around motion boundaries or depth discontinuities.

In the core of these methods we usually have phase correlation [36, 44]

, which has become one of the motion estimation methods of choice for a wide range of professional studio and broadcasting applications. Phase Correlation (PC) and other frequency domain approaches (that are based on the shift property of the Fourier Transform (FT)) offer speed through the use of FFT routines and enjoy a high degree of accuracy featuring several significant properties: immunity to uniform variations of illumination, insensitivity to changes in spectral energy and excellent peak localization accuracy. Furthermore, it provides sub-pixel accuracy that has a significant impact on motion compensated error performance and image registration for tracking, recognition, super-resolution and other applications, as theoretical and experimental analyses have suggested. Sub-pixel accuracy mainly can be achieved through the use of bilinear interpolation, which is also applicable to frequency domain motion estimation methods.

One of the main issues of frequency domain registration methods is that it is hard to identify the correct motion parameters in the case of multiple motions present in a region. This requirement is more pronounced especially around motion boundaries or depth discontinuities resulting to inaccurate or blurred motion vectors (see figure 1). One of the most common approaches to overcome this issue is to incorporate block matching, computing the motion compensated error for a set of candidate motions [1, 40, 39]. Another solution could be to reduce the block/window size used to apply Phase Correlation, but this affects further the accuracy of the estimates, since in order to obtain reliable motion estimates large blocks of image data are required. Also, smaller windows will not support large motion vectors. The problem becomes more visible in cases that overlapped windows are used of high density and when accurate motion border estimation is essential. Furthermore, another issue that effects the optical flow techniques is the overall complexity especially for high resolution images or videos (4K-UltraHD) and the available computational power especially from mobile devices. Additionally, since the number of real-time computer vision applications constantly increases, the overall performance and complexity should allow real-time or near real-time optical flow estimation for such high resolutions [42].

In this paper we introduce a novel high-performance optical flow algorithm operating in the frequency domain based on the principles of Phase Correlation. The overall complexity is very low allowing near real-time flow estimation for very high resolution video sequences. In order to overcome the problem of blurred or erroneous motion estimates at the motion borders, providing accurate and sharp motion vectors, a novel Asymmetric Bilateral-Phase Correlation technique is introduced, incorporating the concept and principles of Bilateral Filters. We propose to use the idea of taking into account the difference in value with the neighbours to preserve motion edges in combination with a weighted average of nearby pixels, in a manner very similar to Gaussian convolution, integrated into Phase Correlation process. The key idea of the bilateral filter is that for a pixel to influence another pixel, it should not only occupy a nearby location but also have a similar value, [45, 14]. Finally, subpixel accuracy is obtained through the use of fast and simple interpolation schemes. Experiments with well-known datasets of video sequences with and without ground truth have shown that our scheme performs significantly better than other state-of-art frequency domain optical flow methods, while overall in comparison with state of the art methods provides very accurate results with low complexity especially for high resolution 4K video sequences. In summary, our contributions are: 1) A hierarchical framework for optical flow estimation, applying motion estimation techniques only to a small amount of regions of interest based on the motion compensated prediction error allowing us to control the overall complexity and required computational power. 2) A novel asymmetric bilateral filter operating simultaneously on two separate frames and using unequal size blocks, allowing us to carry the local spatial properties and intensity constraints of one frame to the other (i.e. using the properties of a small block/window to filter the whole image). 3) Integration of the proposed asymmetric bilateral filter in the Phase Correlation process that can be applied either in the pixel domain as a pre-processing step or directly in the frequency domain as a multiplication in a higher dimensional space.

This paper is organised as follows. In Section 2, we review the state-of-the-art in optical flow estimation using phase correlation and other pixel or gradient based approaches. In Section 3, we discuss the principles of the proposed Bilateral Phase Correlation (BLPC) and the key features of this optical flow framework are analysed. In Section 4 we present experimental results while in Section 5 we draw conclusions arising from this paper.

Ii Related work

In the following section, we review the major optical flow methodologies including frequency domain approaches, and energy minimization methods developed for computer vision applications.

Fig. 2: Example of correlation surface characterised by the presence of two peaks corresponding to the present motions.

A brief review of current state-of-the-art Fourier-based methods for optical flow estimation is presented [27]. Several subpixel motion estimation and image registration algorithms operating in the frequency domain have been introduced [47, 52, 18, 34, 48, 11, 46, 13, 35]. In [22], Hoge proposes to apply a rank-1 approximation to the phase difference matrix and then performs unwrapping estimating the motion vectors. The work in [25] is a noise-robust extension to [22], where noise is assumed to be Additive White Gaussian Noise (AWGN). The authors in [7]

derive the exact parametric model of the phase difference matrix and solve an optimization problem for fitting the analytic model to the noisy data. To estimate the subpixel shifts, Stone et al.


fit the phase values to a 2D linear function using linear regression, after masking out frequency components corrupted by aliasing. An extension to this method for the additional estimation of planar rotation has been proposed in

[49]. Foroosh et al. [15] showed that the phase correlation function is the Dirichlet kernel and provided analytic results for the estimation of the subpixel shifts using a approximation. Finally, a fast method for subpixel estimation based on FFTs has been proposed in [38]. Notice that the above methods either assume aliasing-free images [15, 37, 38, 7, 4, 2], or cope with aliasing by frequency masking [41, 25, 22, 49], which requires fine tuning.

Over the last decades different optical flow methods based on the above Phase Correlation techniques have been proposed. In [44] Thomas proposes a vector measurement method that has the accuracy of the Fourier techniques, combined with the ability to measure multiple objects’ motion in a scene. The method is also able to assign vectors to individual pixels if required. The authors in [51] proposed raster scanning of the image pair method using a moving window to estimate the motion of each small window with the introduced compound phase correlation algorithm. In order to improve the fitting accuracy of the phase difference plane a phase fringe filter is utilised in Fourier domain following an improved extension of the work in [22]. The main issue with this approach is that it is not clear how it performs in the presence of multiple motions and furthermore it requires the motion boundaries to be quite distinct. In [21] a regular grid of patches is generated and the optical flow is estimated by calculating the phase correlation of each pair of co-sited patches using the Fourier Mellin Transform (FMT). This approach allows the estimation not only of translation but also scale and rotation motion of image patches. The main limitation of this algorithm is that it cannot handle multiple motions and preserve the motion edges. In the work presented in [5] the authors use an adaptive grid to extract image patches and then apply gradient correlation in an iterative process using 2D median filters to overcome the issues related to multiple motions. Despite the filtering stage the motion blurring is not avoided in many cases due to local characteristics that may effect the filter. The authors in [40, 1, 39, 17] proposed a solution, based on the concept suggested by Thomas in [44]

, defining a large enough set of candidate motion vectors, and using a combinatorial optimization algorithm (such as block-matching) to find, for each point of interest, the candidate which best represents the motion at that location. This is the main difference with the proposed method since in their work they try to minimize directly the motion compensated error. Also they are using large windows and the obtained set contains the most representative maxima of phase-correlation between the two input images, computed for different overlapping regions. This approach provides better accuracy and contains less spurious candidates but the problem with the motion boundaries or depth discontinuities remains since it depends on the matching algorithm resulting inaccurate estimates at the borders.

Fig. 3: An example of obtained images showing that the proposed approach retains pixels that are close to the centre of the window and have their intensities similar to the values of the pixels at the neighboring area of the centre. In these images we observe visually the effect of the proposed novel asymmetric Bilateral filter integrated in the PC method. In more details at the first image we have the small red window that we use to extract the filter restrictions (spatial and intensity constraints) that are then applied to the whole second frame. This allows us to correlate the two frames using only the Bilateral filter information inside the red square.

Iii Bilateral-PC for optical flow estimation

In this work, we propose a new optical flow estimation technique operating in the frequency domain based on a novel extension of Phase Correlation that incorporates the principles of bilateral filters. Initially an overview of Phase Correlation is presented, followed by an analysis of the proposed Bilateral-PC technique. Finally, a novel framework for efficient optical flow estimation in the Fourier domain is discussed supporting high resolution images or videos (4K).

Let be two image functions, related by an unknown translation


Phase correlation schemes are utilised to estimate the translational displacement. Considering each image as a continuous periodic image function, then the Fourier transform of is given by


where , , and .

Moving to the shifted version of the image, , its DFT is given based on the Fourier shift property and assuming no aliasing by

Fig. 4: The overall methodology and the related stages for optical flow estimation applied in each layer of the image pyramid.
Fig. 5: An example of the selected pixel locations by applying first uniform sampling (first from left), and then calculating the frame difference (second), applying a binary threshold (third) and finally combining both (fourth).
Fig. 6: Selected frames from the datasets used in our evaluation.
Fig. 7: Selected frames from the artificial examples used in our evaluation. In the last two columns we can see the obtained colour coded optical flows using PC and BLPC, respectively. It is clear how well the proposed BLPC methods distinguishes the multiple motions, without being affected from the background movement in each pixel location in comparison to PC.

Since the shift property of the Fourier Transform refers to integer shifts and no aliasing-free signals are assumed, we regard that our sampling device eliminates aliasing. Traditionally phase correlation (PC) is used to estimate the translational displacement, which is perhaps the most widely used correlation-based method in image registration. It finds the maximum of the phase difference function which is defined as the inverse FT of the normalized cross-power spectrum


where denotes complex conjugate and the inverse Fourier transform.

Fig. 8: The ratio of the highest over the second highest peak for each correlation surface over each pixel of all the artificial examples. We have three rows of images and two pairs of columns. In each column pair, the left is obtained using the classical PC methods, while the other corresponds to the BLPC. The above images demonstrate the improved accuracy of the BLPC method since we have higher peaks in the correlation surfaces and especially and the borders of the moving objects (clear shapes nd borders of the moving objects in relation to figure 7).

Iii-a Proposed methodology for Bilateral-PC

One of the main problems that we encounter with Phase Correlation is the unreliable estimates in the presence of multiple motions (i.e. motion boundaries or depth discontinuities). Considering the example shown in figure 2, where two motions are present (foreground object and the background), the obtained correlation surface is characterised by the presence of two peaks corresponding to the two motions. As a result, it is not feasible to identify the correspondence between pixels and the estimated motion vectors. This problem is more pronounced when a dense vector field (optical flow) is estimated using overlapped windows. In such cases the estimated motion vector is assigned only to the pixel located at the centre of each window. To overcome this problem a novel Bilateral Phase Correlation technique is proposed, based on the principles of Bilateral filters. The selection of this filter family is due to the higher accuracy on preserving the object or motion edges in our case. Bilateral-PC (BLPC) takes into account the difference in value with the neighbours to preserve motion edges in combination with a weighted average of nearby pixels, which allows for a pixel to influence another one not only if it occupies a nearby location but also have a similar value.

The bilateral filter is defined as a weighted average of nearby pixels, in a manner very similar to Gaussian convolution.


where indicates the frame/window (see equation (1)), denotes a sum over all image pixels indexed by and represents the norm, e.g., is the Euclidean distance between pixel locations and . Also, denotes the 2D Gaussian kernel

PC 0.041 16.23 27.0 92.4 21.02
BLPC 30.3
TABLE I: Results for the artificial dataset using several performance measures applying the methods on all the pixel locations.

Additionally, the bilateral filter takes into account the difference in value between the neighbors to preserve edges while smoothing. Therefore, in order a pixel to influence another one, it should also have a similar value not only to be in close distance. Thus, the bilateral filter is defined as


where denotes absolute value and is the normalization factor. The parameters and specify the amount of filtering for the image . The bilateral filter is not a convolution of the spatial weight with the product , because the range weight depends on the pixel value . Considering that, and selecting a fixed intensity value equal to the intensity of the pixel at the centre of the selected window of size , we can overcome this problem. Then, the product is computed and convolved with the the Gaussian kernel , resulting the same outcome of bilateral filer at all pixels with after normalisation. Instead of using a single intensity , a set of values obtained from the neighborhood of using a window (e.g. ), are selected. As a result we obtain


After the convolution with the special kernel and the normalisation we have


where indicates per-pixel division and denominator corresponds to the total sum of the weights. The final images and are estimated by upsampling and then interpolating equation (9). Also is selected to be much smaller in comparison to . An example of the obtained images is shown in figure 3 and observing the outcomes, it can be seen that this approach retains pixels that are close to the centre of the window and have their intensities similar to the values of the pixels at the neighboring area of the centre.

Since the filtered images and are obtained and transferred to the Fourier domain, equation (4) is used to estimate the correlation surface. Since the filtering process retains only the pixels related to the one at the centre, the correlation surface contains a single dominant peak, which yields an estimate of the shift parameters and can be recovered as


Finally, subpixel accuracy is obtained through the use of fast interpolation schemes [37]. In more details, the estimate in the and directions is given by


where , and with .

Measures MSE PSNR NRMS Time(s)
Thomas [44] 5.492 24.682 199.55 44.822
Argyr. [5] 5.453 24.747 198.83 45.591
Gaut. [17] 21.89 17.847 408.68 236.68
Yan [51] 5.498 24.679 199.67 2594.7
Ho [21] 3.263 26.753 159.56 291.64
Reyes [40] 1.934 29.240 121.72 54.569
Alba [39] 2.043 28.775 126.81 66.483
BLPC 0.908 32.680 81.64 48.669
TABLE II: The evaluation results for the 4K Test Sequences Reviving the Classics dataset [30] without ground truth. using several performance measures.
Tho [44] 2.30 21.24 40.31 62.0 4.63 9.71
Arg [5] 2.34 21.31 40.25 60.6 4.62 11.47
Gau [17] 2.40 17.45 63.76 25.1 8.22 8.40
Yan [51] 2.31 21.22 40.41 62.4 4.64 9.78
Ho [21] 3.14 24.43 31.31 23.4 3.88 74.3
Rey [40] 1.97 26.42 20.88 11.2 3.38 2.72
Alb [39] 2.06 26.25 21.64 10.2 3.38 7.59
BLPC 1.97 26.68 20.57 9.9 3.37 2.96
TABLE III: The evaluation results for the dataset with ground truth provided by Baker [6] using several performance measures
Measures MSE PSNR NRMS Time(s)
Thomas [44] 1.4471 33.082 21.108 12.784
Argyr. [5] 1.4352 33.093 21.046 16.009
Gaut. [17] 23.143 18.382 78.725 10.452
Yan [51] 1.4488 33.08 21.117 13.63
Ho [21] 1.081 33.81 18.621 132.56
Reyes [40] 0.3299 36.999 10.218 1.3671
Alba [39] 0.29531 36.891 10.343 8.6594
BLPC 0.2595 37.419 9.0248 0.9956
TABLE IV: The evaluation results for samples of videos with faces from Dhall’s dataset [12] without ground truth using several performance measures.
Measures MSE PSNR NRMS Time(s)
Thomas [44] 1.7381 28.755 25.455 13.526
Argyr. [5] 1.7381 28.756 25.455 16.661
Gaut. [17] 7.2814 21.807 54.722 10.535
Yan [51] 1.7382 28.755 25.455 17.148
Ho [21] 1.8028 28.648 25.882 131.35
Reyes [40] 1.6309 28.947 24.724 0.9989
Alba [39] 1.639 28.93 24.784 6.2942
BLPC 1.6283 28.95 24.705 0.4863
TABLE V: The evaluation results for the the CT dataset from Liang [28, 29] without ground truth using several performance measures.
Tho [44] 19.2 17.51 74.68 80.5 19.57 11
Arg [5] 19.1 17.53 74.63 79.9 19.56 10.9
Gau [17] 30.4 15.62 96.31 66.7 76.03 7.9
Yan [51] 19.2 17.53 74.75 80.8 19.6 9.6
Ho [21] 16.2 18.58 68.89 69.4 18.7 91
Rey [40] 7.5 22.04 47.11 30.8 11.33 1.5
Alb [39] 8.5 21.45 50.51 34.3 12.29 9.4
BLPC 7.2 22.82 46.19 23.3 7.44 1.6
TABLE VI: The evaluation results for the UCLgtOFv1.1 dataset with ground truth [33] using several performance measures.

Iii-B Optical flow estimation framework

In this section, we present the proposed optical flow estimation framework using the introduced Bilateral-PC. The overall methodology is separated into the following stages as it is shown in figure 4. This approach is based on a multi-resolution coarse-to-fine algorithm that constructs an image pyramid by down-sampling the original image into a maximum of three layers by a factor of power of two. The optical flow can then be estimated initially for the smallest image pair in the pyramid, and is used to unwarp the next one by up-sampling and scaling the previous layer.

The first stage of this framework includes all the pre-processing tasks, such as pyramid formation, estimation of key points and removal of the less significant ones. Initially, the image is down-sampled by a factor of power of two until the obtained resolution is below a threshold that depends on the available computational power or the desired complexity of the system. If the down-sampling process was applied more than once before we reach the minimum resolution threshold, one more layer is extracted in-between the smallest one and the original size image. Since, the three layers are obtained (small , medium , and the original ), key points are estimated on the smallest one. Let be a set of point locations in , selected by applying a uniform sampling using a step (see figure 5). Also, the absolute frame difference is calculated and then a second set of points is obtained as


where and is a scaling factor (see figure 5). The impact of this threshold is proportional to the overall complexity, and adjusting that value can improve or reduce the overall quality with an impact on the computational complexity. The selected values experimentally proved that provide a good balance between the overall accuracy and complexity for all video sequences.

Since the key points and are estimated, if their total number is above a limit , that depends on the desired complexity of the proposed optical flow algorithm, the are uniformly down-sampled. A sampling step is selected to result a new set of points , aiming to have in total less points than by combining both and .

During the second stage of the proposed methodology, motion estimation is applied at the selected key locations. Initially, Phase Correlation is applied using a square window of size centered at each key point location. From the obtained correlation surface in equation (4), the ratio of the highest over the second highest peak is estimated . If , then this is an indication that more than one motions are present in this window. Therefore, Bilateral-PC is now applied to obtain a correlation surface with a single peak and a consequently more accurate motion estimate. The threshold is specified based on the logarithm of the current window size . Additionally, for the points that were removed after down-sampling


motion vectors are estimated using [32]. In order to obtain a dense vector field non-uniform interpolation with bilateral filtering is applied on the motion components of the sparse key point locations.

Fig. 9: Examples of obtained optical flows using the MPI-SINTEL dataset with ground truth from Butler [9], for several sequences (six columns) and methods (from top to bottom [44, 5, 17, 51, 21, 40, 39], the proposed BLPC and the GT).

At the final stage, we unwrap the next smallest image (or ), where and are the scaled motion estimates from the previous layer. Motion compensation is applied to estimate the (or ) image and the difference (or ) is calculated. Finally, the process repeats the same, moving back to the first stage obtaining the new key point locations, until we reach the last layer that corresponds to the original size image.

In the case of an application related to face analysis or tracking the proposed optical flow framework can by adapted to estimate the motion information only in the area that a face is located. In more details, if a face is detected in a frame, then from the point locations in , we retain only the ones that overlap with the face window. This approach can reduce significantly the computational complexity in certain applications that only for selected parts of a frame motion information is required.

Iv Results

A comparative study was performed with state-of-the-art optical flow techniques operating both in the frequency and pixel domain. Video sequences with and without the ground truth were used for evaluating the performance. Also experiments with artificial data were performed to further demonstrate the concept of the proposed BLPC technique. In this study six datasets were utilised, and in more details the video sequences with ground truth provided by Baker in [6], the MPI-SINTEL dataset from Butler in [9] and the UCLgtOFv1.1 in [33]. Furthermore, we used the CT dataset from Liang in [28, 29], samples of videos with faces from Dhall’s dataset in [12] and also 4K Test Sequences Reviving the Classics dataset [30] without ground truth. Selected frames and videos from these datasets are shown in figure 6. In our evaluation, several performance measures were utilised based on the availability of ground truth or not, [6]. For all video sequences the motion compensated prediction error is used and is defined as the mean square error (MSE) between the ground-truth image (i.e. frame we try to predict) and the estimated compensated one


where is the number of pixels, is the motion compensated image and is the ground-truth frame. For color images, we take the norm of the vector of RGB color differences. Also based on MSE we can obtain the PSNR defined as


Another measure based on the motion compensated prediction error is the gradient normalised root-mean-square difference between the ground-truth image and the compensated one


In our experiments the arbitrary scaling constant is set to be since it is the most common value used. About the motion compensation algorithm we used the one suggested in [10] with subpixel accuracy.

Regarding the sequences with ground-truth flow available, the angular error (AE) between a flow vector and the ground-truth was used as a measure of performance for optical flow. The AE is defined as angle in 3D space between and and it can be computed using the equation below


With this measure, errors in large flows are penalized less than errors in small flows. Also, we compute an absolute error in flow (AEF) defined by:


Regarding the AEF measure is probably more appropriate for most applications, since regions of non-zero motion are not penalized less than regions of zero motion.

Proof of concept using artificial data: In the first part of this analysis we used artificial examples to demonstrate the issue of multiple motions present in an area for Phase Correlation. Therefore, nine pairs of images with two or more moving objects were used in our experiments (see a subset in figure 7). In these examples the ground-truth is available for each moving object, and in our experiments we compared the proposed BLPC method with Phase Correlation. Corresponding blocks of size

centered around each pixel are used to apply PC and BLPC generating a dense vector field. About the boundary pixels wrap padding was used and the obtained motion vector was applied on the corresponding pixel location. Examples of the obtained optical flow estimates for each case are shown in figure

7. Also we plotted the ratio of the highest over the second highest peak for each correlation surface over each pixel in all the artificial examples in figure 8. These figures demonstrate that the proposed approach operates with high accuracy at the presence of multiple motions in comparison with the traditional techniques. Furthermore, in table I all the obtained results are summarised and we can see that BLPC provides more accurate estimates both in terms of AE and PSNR.

Comparative study using real video sequences: Regarding the real sequences about 80 videos were used in our evaluation. We had a big variety in terms of frame resolution, starting from , nHD, HD, FHD and moving up to 4K UHD . In this comparative study, the performance of the proposed approach is compared 16 methods in total, with seven of them to be well-known state-of-the-art based methods [44, 5, 17, 51, 21, 40, 39]. In our experiments the overall obtained accuracy of the proposed approach is superior in comparison to other frequency domain optical flow methods, resulting sharp, and precise motion estimates with good accuracy over motion boundaries and small details present in the image scene. Also the obtained results are close to state of the art methods operating in the pixel domain. In figure 9, we can see the obtained optical flows for several sequences and methods including also the ground truth. In tables II-VI all the obtained results are summarised and we can see that BLPC provides more accurate estimates both in terms of AE and PSNR in comparison with the other state-of-the-art frequency domain methods, while the complexity remains low. Overall the suggested optical flow method provides significant accuracy, specially at motion boundaries, and outperforms current well-known methods operating in the frequency domain.

V Conclusion

In this paper, a new framework for optical flow estimation in the frequency domain was introduced. This approach is based on Phase Correlation and a novel Bilateral-Phase Correlation technique is introduced, incorporating the concept and principles of Bilateral Filters. One of the most attractive features of the proposed scheme is that it retains the motion boundaries by taking into account the difference in value of the neighbouring pixels to preserve motion edges in combination with a weighted average of nearby intensity values, in a manner very similar to Gaussian convolution. BLPC yields very accurate motion estimates for a variety of test material and motion scenarios, and outperforms optical flow techniques, which are the current registration methods of choice in the frequency domain.


This work is co-funded by the NATO within the WITNESS project under grant agreement number G5437. The Titan X Pascal used for this research was donated by NVIDIA.


  • [1] A. Alba, E. Arce-Santana, and M. Rivera. Optical flow estimation with prior models obtained from phase correlation. Lect Notes Computer Science, 2010.
  • [2] V. Argyriou. Sub-hexagonal phase correlation for motion estimation. IEEE Transactions on Image Processing, 20(1):110–120, Jan 2011.
  • [3] V. Argyriou and M. Petrou. Photometric stereo: an overview. Adv. Imaging Electron. Phys., 156:1–54, 2009.
  • [4] V. Argyriou and T. Vlachos. Quad-tree motion estimation in the frequency domain using gradient correlation. IEEE Transactions on Multimedia, 9(6):1147–1154, Oct 2007.
  • [5] V. Argyriou and T. Vlachos. Low complexity dense motion estimation using phase correlation. In DSP, pages 1–6, July 2009.
  • [6] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and R. Szeliski. A database and evaluation methodology for opticalflow. ICCV, 2007.
  • [7] M. Balci and H. Foroosh. Subpixel estimation of shifts directly in the fourier domain. IEEE Tr. Image Proc., 15(7):1965––1972, 2006.
  • [8] V. Bloom, D. Makris, and V. Argyriou. Clustered spatio-temporal manifolds for online action recognition. In

    2014 22nd International Conference on Pattern Recognition

    , pages 3963–3968, Aug 2014.
  • [9] D. Butler, J. Wulff, G. Stanley, and M. Black. A naturalistic open source movie for optical flow evaluation. ECCV, pages 611–625, 2012.
  • [10] S. H. Chan, D. T. Võ, and T. Q. Nguyen. Subpixel motion estimation without interpolation. In ICASSP, pages 722–725, March 2010.
  • [11] P. Cheng and C.-H. Menq. Real-time continuous image registration enabling ultraprecise 2-d motion tracking. Image Processing, IEEE Transactions on, 22(5):2081–2090, May 2013.
  • [12] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. in IEEE MultiMedia, 19(3):34–41, July-Sept. 2012.
  • [13] Y. Douini, J. Riffi, M. A. Mahraz, and H. Tairi. Solving sub-pixel image registration problems using phase correlation and lucas-kanade optical flow method. In ISCV, pages 1–5, April 2017.
  • [14] F. Durand and J. Dorsey. Fast bilateral filtering for the display of highdynamic-range images. ACM Tr. Graphics, 21(3):257––266, 2002.
  • [15] H. Foroosh, J. Zerubia, and M. Berthod. Extension of phase correlation to sub-pixel registration. IEEE Image Process., 11(2):188––200, 2002.
  • [16] D. Fortun, P. Bouthemy, and C. Kervrann. Optical flow modeling and computation: A survey. CVIU, 134:1–21, 2015.
  • [17] T. Gautama and M. Van Hulle. A phase-based approach to the estimation of the optical flow field using spatial filtering. Neural Networks, IEEE Transactions on, 13(5):1127–1136, Sep 2002.
  • [18] K. Ghoul, M. Berkane, and M. Batouche.

    Pc method to the optical flow using neuronal networks.

    In ICMCS, pages 71–75, 2017.
  • [19] B. Glocker, T. Heibel, N. Navab, P. Kohli, and C. Rother. Triangleflow: Optical flow with triangulation-based higher-order likelihoods. ECCV, pages 272––285, 2010.
  • [20] P. Gwosdek, H. Zimmer, S. Grewenig, A. Bruhn, and J. Weickert. A highly efficient gpu implementation for variational optic flow based on the euler-lagrange framework. ECCVW, pages 372––383, 2010.
  • [21] H. T. Ho and R. Goecke. Optical flow estimation using fourier mellin transform. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1–8, June 2008.
  • [22] W. Hoge. Subspace identification extension to the phase correlation method. IEEE Trans. Med. Imag., 22(2):277–280, Feb 2003.
  • [23] B. Horn and B. Schunck. Determining optical flow. Artificial Intelligence, 17(1–3):185––203, 1981.
  • [24] K. Jia, X. Wang, and X. Tang. Optical flow estimation using learned sparse model. In Proceedings of the 2011 International Conference on Computer Vision, ICCV ’11, pages 2391–2398, Washington, DC, USA, 2011. IEEE Computer Society.
  • [25] Y. Keller and A. Averbuch. A projection-based extension to phase correlation image alignment. Signal Process., 87:124––133, 2007.
  • [26] D. Konstantinidis, T. Stathaki, V. Argyriou, and N. Grammalidis. Building detection using enhanced hog–lbp features and region refinement processes. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 10(3):888–905, March 2017.
  • [27] S. Kruger and A. Calway. A multiresolution frequency domain method for estimating affine motion parameters. In Proc. IEEE International Conf. on Image Processing, page 113–116, 1996.
  • [28] X. Liang, Q. Cao, R. Huang, and L. Lin. Recognizing focal liver lesions in contrast-enhanced ultrasound with discriminatively trained spatio-temporal model. IEEE ISBI, 2014.
  • [29] X. Liang, L. Lin, Q. Cao, R. Huang, and Y. Wang. Recognizing focal liver lesions in ceus with dynamically trained latent structured models. IEEE TRANS ON MEDICAL IMAGING, 2015.
  • [30] E. T. Limited. 4k test sequences reviving the classics. 4k-test-sequences, 2014.
  • [31] B. Liu and A. Zaccarin. New fast algorithms for the estimation of black motion vectors. Circ. and Syst. Vid.Tech., 3(2):148––157, 1993.
  • [32] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proc. IJCAI, pages 674––679, 1981.
  • [33] O. Mac Aodha, G. J. Brostow, and M. Pollefeys. Segmenting video into classes of algorithm-suitability. In CVPR, 2010.
  • [34] V. Maik, E. Chae, L. E. sung, P. C. yong, J. G. hyun, P. Sunhee, H. Jinhee, and J. Paik. Robust sub-pixel image registration based on combination of local phase correlation and feature analysis. IEEE Intern Symp on Consumer Electronics, pages 1–2, June 2014.
  • [35] M. G. Mozerov. Constrained optical flow estimation as a matching problem. IEEE T. Image Proc., 22(5):2044–2055, May 2013.
  • [36] J. Pearson, D. Hines, S. Goldsman, and C. Kuglin. Video rate image correlation processor. Proc. SPIE, 119, 1977.
  • [37] J. Ren, J. Jiang, and T. Vlachos. High-accuracy sub-pixel estimation from noisy images in fourier domain. TIP, 19(5):1379–1384, 2010.
  • [38] J. Ren, T. Vlachos, and J. Jiang. Subspace extension to phase correlation approach for fast image registration. ICIP, pages 481––484, 2007.
  • [39] A. Reyes, A. Alba, and E. Arce-Santana. Efficiency analysis of poc-derived bases for combinatorial motion estimation. Chapter, Image and Video Technology of the series Lecture Notes in Computer Science, 8333:124–135, 2014.
  • [40] A. Reyes, A. Alba, and E. R. Santana. Optical flow estimation using phase only-correlation. Procedia Techn., 7:103––110, 2013.
  • [41] H. Stone, M. Orchard, E. Chang, and S. Martucci. A fast direct fourier-based algorithm for subpixel registration of images. IEEE Trans. Geosci. Remote Sens., 39(10):2235––2243, Oct 2001.
  • [42] M. W. Tao, J. Bai, P. Kohli, and S. Paris. Simpleflow: A non-iterative, sublinear optical flow algorithm. Computer Graphics Forum (Eurographics 2012), 31(2), may 2012.
  • [43] A. Tekalp. Digital video processing. Prentice Hall, 1995.
  • [44] G. Thomas. Television motion measurement for datv and other applications. BBC Res. Dept. Rep., No. 1987/11, 1987.
  • [45] C. Tomasi and R. Manduchi. Bilateral filtering for gray and color images. in Proceedings of the IEEE ICCV, pages 839––846, 1998.
  • [46] X. Tong, Y. Xu, Z. Ye, S. Liu, L. Li, H. Xie, F. Wang, S. Gao, and U. Stilla.

    An improved phase correlation method based on 2-d plane fitting and the maximum kernel density estimator.

    Geoscience and Remote Sensing Letters, IEEE, 12(9):1953–1957, Sept 2015.
  • [47] X. Tong, Z. Ye, Y. Xu, S. Liu, L. Li, H. Xie, and T. Li.

    A novel subpixel phase correlation method using singular value decomposition and unified random sample consensus.

    Geoscience and Remote Sensing, IEEE Trans, 53(8):4143–4156, 2015.
  • [48] M. Uss, B. Vozel, V. Dushepa, V. Komjak, and K. Chehdi. A precise lower bound on image subpixel registration accuracy. Geoscience and Remote Sensing, IEEE Trans, 52(6):3333–3345, June 2014.
  • [49] P. Vandewalle, S. Susstrunk, and M. Vetterli. A frequency domain approach to registration of aliased images with application to superresolution. EURASIP J. Appl. Signal Process., pages 1––14, 2006.
  • [50] J. Weickert and C. Schnorr. Variational optic flow computation with a spatio-temporal smoothness constraint. JMIV, 14(3):245––255, 2001.
  • [51] H. Yan and J. G. Liu. Robust phase correlation based motion estimation and its applications. BMVC, pages 1–10, 2008.
  • [52] L. Zhongke, Y. Xiaohui, and W. Lenan. Image registration based on hough transform and phase correlation. Neural Networks and Signal Processing, 2:956–959, Dec 2003.