In this paper, we study the real-time motion detection which is useful in many practical applications like increasing the sensory ability in autopilot, capturing significant cues for event analysis or providing intelligence in monitoring.
Aiming at detecting motion from complex scenes, many methods have been proposed and developed in depth. They can be classified roughly into two categories: one is with real-time property, but poor performance[1, 2] and the other is relatively high-performance but time-consuming [3, 4, 5, 6]
. To pursue high-performance while maintianing the real-time property, we propose an optical flow based framework which is able to efficiently detect motion. As the 2D projection of the real world motion, optical flow directly reflects the scene’s moving information between two frames. This makes optical flow based method much more effective in moving foreground detection when compared to the other methods. The strategy of our efficient method is to estimate a background optical flow fieldand compare it with the mixed optical flow field f in pixel-level to judge out the moving foreground. More specifically, we consider the distribution of background optical flow field as a quadratic function of the point coordinates. This is a polynomial regression estimation of the realistic complex distribution, and its accuracy is guaranteed by the small camera motion between the two consecutive frame. We sample points in the mixed opitcal flow field to perform least squares regression estimation and use our dedicated Constrained RANSAC Algotithm(CRA) to improve both the accuracy and the speed.
Subsequently, the moving foreground is judged out by setting a threshold for the vector differencebetween the optical flow in the mixed optical flow field and that in the background optical flow field. In practice, the spacial distribution of in background area is too complicated to apply a fixed threshold like that in  and many other works. Firstly, the distribution interval size of in background area is linear related to the speed of the camera motion. Secondly, if there’s evident zooming component in scene change, the distribution interval of is so large that usually have overlap with that in the foreground area. In other words, there may be no threshold value that can be used to completely separate the moving foreground from background. Unlike many other works who pay less attention on the threshold designing, we propose to use an adaptive threshold that matches the speed of scene’s motion to avoid making too much false positive judgments.
The contributions of this work are as follows: Firstly, we propose a novel and effective optical flow based motion detection framework. The framework doesn’t need model constructing, training, or updating before it used and thus can be performed online. Moreover, it is also efficient enough for general real-time applications. Secondly, adaptive invervals and adaptive thresholds is introduced to strengthen the system’s adaptation to different situations.
Ii Related Work
From different perspectives, a long line of works have studied the problem of motion detection in non-stationary scenes. We review recent algorithms in terms of several main modules: Gaussian model based, stochastic approximation based and optical flow based.
Gaussian Model Based. The method proposed in  used Dual-Mode Single Gaussian Model (SGM) to model the background in grid-level, and utilized homography matrixes between consecutive frames to accomplish motion compensation by mixing models. Foreground was figured out by estimating the feature’s conformity to the corresponding SGM. Benefitting from Dual-Mode SGM, the method can reduce the foreground’s pollution to the background models. Analogously, Yun and Jin , and Kurnianggoro et.al. 
used a foreground probability map and simple pixel-level background models respectively to fine-tune the result obtained in. The background models constructed and updated by these methods lack a reflection to the essence of the problem. They are sensitive to parameters and outputt result with low recall.
Stochastic Approximation Based. Francisco and Ezequiel 
used some predefined features to form a mixture model representing the distribution of feature in previous frame. Then they achieved the motion compensation by interpolating a full covariance matrix of the pixel models. The moving foreground was judge out according to the probability of the point feature belonging to the mixture model. The performance of this method relies on the manual selected features to a large extent and is poor in the situation that the scene changes fast.
Optical Flow Based. Kurnianggoro et.al.  modeled the background using zero optical flow vectors instead. After using a homography matrix to align the previous frame, dense optical flow was estimated between the alignment result and the current frame. Finally a simple optical-flow magnitude threshold was used to judge out the foreground points. As the homography matrixes are only used for aligning, the background model and the judge mechanism constructed by this method are too simple to deal with intricate unconstrained scenes. Though the recall of the experiment result was rather high, the precision was much lower than that of the SGM based methods. Manjunath Narayana et.al  used optical flow orientations only to deal with the change of scene depth. This makes their method performing poorly in most tracking video sequences, where the orientations of optical flow are the same in the scenes.
There are some other methods that do not depend on any background model. They construct the contour of foreground based on detecting large gradient points in dense optical flow field. For example, Li and Xu  performed mathematical morphology operations on the initial contours to obtain closed boundaries. After that the maximal contour area was selected as the area of the moving object. This simple framework can be performed easily but also limits the method to simple scenes. Papazoglou and Ferrari  combined the optical flow’s gradient and direction to generate a better contour. Then, an efficient inside-outside maps algorithm was performed to initially figure out the foreground points, which was finally fine-tuned by global optimization. The shortcoming is that the inside-outside maps algorithm can obtain reasonable result only in simple scenes that contain a single object. Moreover, the optimization operation makes it inefficient.
The framework of our online detection method for motion in dynamic scenes is shown in Figure 1. There are mainly three processes: mixed optical flow field estimation, background optical flow field estimation and foreground extraction. In the following, each step of the framework is introduced in detail.
Iii-a Mixed Optical Flow Field Estimation
Taking into account speed and accuracy, FlowNet2.0  is used to estimate the optical flow vectors , which project 2D locations in frame to the locations in specified frame
. To improve the perception of the slow motion while maintaining the accuracy of regression analysis, we designas an adaptive interval(AI) so that the expected average norm of the background optical flow maintains a fixed magnitude . is updated by:
where is the optical flow of the sample background points that used in Section III-B. is a limiting function with a upper bound of 5.
Iii-B Background Optical Flow Field Estimation
In our framework, we consider the background optical flow field as a quadratic function of the point coordinates:
where is a matrix containing 6 unknown parameters and .
We randomly sample points in the mixed optical flow field to perform least squares regression estimation(LSRE). The result is optimized by RANSAC 
algorithm to exclude the outliers. We peform RANSAC algorithm with a fixed iterations. To pursue a high fitting degree of the estimation result to the real background optical flow field, we should sample points as more as possible and as sparesly as possible. However, sampling too much points will reduce the success rate of the RANSAC algorithm. In this work, we design a constrained sampling strategy to aviod overfitting while improving the RANSAC searching efficiency. We name it as Constrained RANSAC Algorithm(CRA). The image plane is firstly divided into square pieces with a edge length of pixels. Then a certain percentage() of the pieces is selected randomly and furthermore in each of these selected pieces one point is randomly selected out to construct a set of final sample points.
Iii-C Foreground Extraction
Subsequently, based on the aforementioned two optical flow fields, we judge out the foreground points utilizing a threshold. The real optical flow of a pecific point in background area is distributed within an interval around the estimated one. So we apply an adaptive threshold(AT) to the difference between the ideal background optical flow and the actual optical flow, and obtain a foreground mask as described in formula (3):
where is the 2-norm of the complement vector. The adaptive threshold is defined as:
where and are the hyper-parameters used to control the magnitude of the threshold. is the static component part corresponding to the destabilization caused by the sensor’s resolution or the optical flow’s precision. is used to introduce the dynamic component part, and we use a high threshold when the sensor moves fast. The mean norm of the background optical flow is used to reflect the speed of the camera motion.
We summarize the whole procedure in Algorithm 1.
Iv Experiments and Results
Iv-a Implementation details
We set the parameters according to experimental regression analysis. The expected average norm of the background optical flow is set as . For Constrained RANSAC Algorithm, the size of pieces is set as and the sampling propotion is set as . For foreground judging, parameters are set as following: , , and .
The proposed method is tested on the challenging DAVIS2016 benchmark. DAVIS2016 is made up of 50 sequences, with 3455 total number of frames. It comprises a majority of challenging situations that present in motion detection, except for the situtaion with multiple objects or the situation with evident scene zooming. Thus, as supplementary, we collect an additional dataset which contains four public video sequences that contain the aforementioned missing situations. The specific information of each video is described in Table I.
|Playground||1466||Evident scene zooming|
|Horse||500||Evident scene zooming|
Iv-C Qualitative results
Our method is compared with the following methods for motion detection in non-stationary scenes: MCD5.8ms , SA  and SCBU . Fig. 2 shows the qualitative results on some key frames from the additional dataset.
The qualitatively comparative results can intuitively show our proposed method’s stronger adaptability to different challenges comparing with the other methods. As shown in Skating sequences, SGM base methods MCD5.8ms and SCBU perform poorly when the foreground color is slightly similar to the background. According to Playground sequences, SA can not deal with the challenges of slow motion and dynamic background. By contrast, the proposed optical flow based method can export more complete moving foreground. Meanwhile, there are few false positive point in the background area profiting from the adaptive interval and the adaptive threshold. According to the results of Playground and Highway sequences, our method is sensitive to shadow, which leads to some false positive results and has negative influence on the quantitative results.
Iv-D Quantitative results
Our method is quantitatively compared with the state-of-the-art methods on DAVIS dataset, and the results are listed in Table II. By using FN2-css-ft-sd optical flow estimating framework, our method outperform the existing real-time methods with an improvement of on -mean while maintaining a time consumption of 51ms. However, the performance of the mixed optical flow field estimation form a bottleneck of our method as it only takes the mixed optical flow field as input.
Iv-E The effects of some key mechanisms
Table III lists some typical combinations of different mechanisms and their -mean scores on DAVIS dataset. Using a linear function(LF) with fixed threshold, the proposed method just scores on DAVIS -mean. By contrast, applying a quadratic function(QF) improves the performance by , indicating the higher fitting degree of the quadratic function to the background optical flow field.
The adaptive threshold(AT) also plays an importance role in obtaining higher quality results. By using a low threshold, the framework can detect more foreground points and score highly in recall, but it makes so much false positive judge that its precision is rather low. On the other hand, a high threshold leads to both low recall and low precision as the framework identifies too many foreground points as background points. Although there is a tradeoff, a fixed threshold(FT) causes a ceiling of the performance and lowers the robustness of the proposed method. Table III records the best performance of our proposed methods with a fixed threshold. By applying adaptive thresholds, we enable the algorithm to employ a proper threshold under specific scene situation, dramatically improving the performance of the proposed method on DAVIS -mean by .
The adaptive interval(AI) mechanism improves the performance to a similar degree() as the adaptive threshold mechanism. This is because the adaptive threshold is designed according to the scene speed which has been constrained around a fixed value by the adaptive interval mechanism. So, respectively in two ways, two mechanisms achieve the same target that is applying a proper threshold in a specific scene. However, the constraint capacity of the adaptive interval mechanism is limited, and the adaptive threshold can not improve the perception of the slow motion as well as maintain fitting effect of the quadratic functions. By combining these two mechanisms, we can improve the performance to a higher degree().
The proposed method is implemented using python on a PC with an Intel i5-7400 CPU, 32 GB RAM, Nvidia GTX 1080 GPU. We measure the computation time with video at a resolution of to evaluate the efficiency of the proposed method and other two real-time methods. The computation time of each methods is listed in the last column of Table II. Method MCD5.8ms and SCBU take up to 30ms per frame which is nearly four times that reported in their papers on account of that the time consumption of SGM based methods is linear relative to the image resolution. In contrast, the efficiency of the proposed method are relying most on the mixed optical flow estimation and the iterations of RANSAC algorithm. When using FN2 framework, the mixed optical flow field estimation process occupies nine out of ten total time consumption(186ms), and the other processes spends relatively less time(16ms). By using FN2-css-ft-sd framework which only takes 35ms to estimate the mixed optical flow field, our algorithm can be sped up to 20fps at the cost of only performance degradation on -mean.
In this work, we address the challenging task of real-time motion detection in non-stationary scenes. An optical flow based framework has been presented and its outstanding performance has been demonstrated by roundly experimenting. The main efficient strategy for utilizing optical flow is to estimate a background optical flow field from the mixed optical flow field and use the background optical flow field as the judge criterion. Besides, the adative intervals and adaptive thresholds play an important role in improving our method’s robustness. Though the proposed method dramatically outperforms the existing real-time method, there is still much room for improvement when compared to the exiting accurate methods.
Moo Yi K, Yun K, Wan Kim S, et al, Detection of moving objects with non-stationary cameras in 5.8 ms: Bringing motion detection to your mobile device, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 27-34, 2013.
-  Yun K, Lim J, Choi J Y, Scene conditional background update for moving object detection in a moving camera, Pattern Recognition Letters, 88: 57-63, 2017.
-  Siam M, Mahgoub H, Zahran M, et al, MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving. arXiv preprint arXiv:1709.04821, 2017.
-  Tokmakov P, Alahari K, Schmid C, Learning motion patterns in videos, IEEE Conference on Computer Vision and Pattern Recognition, 531-539, 2017.
-  Papazoglou A, Ferrari V, Fast object segmentation in unconstrained video, Proceedings of the IEEE International Conference on Computer Vision, 1777-1784, 2013.
-  Jain S D, Xiong B, Grauman K, Fusionseg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos, Proc. CVPR, 1(2), 2017.
-  Kurnianggoro L, Shahbaz A, Jo K H, Dense optical flow in stabilized scenes for moving object detection from a moving camera, IEEE International Conference on Control, Automation and Systems (ICCAS), 704-708, 2016.
-  Yun K, Choi J Y, Robust and fast moving object detection in a non-stationary camera via foreground probability based sampling, IEEE International Conference on Image Processing, 4897-4901, 2015.
-  Kurnianggoro L, Yu Y, Hernandez D C, et al, Online background-subtraction with motion compensation for freely moving camera, International Conference on Intelligent Computing, Springer, Cham, 569-578, 2016.
-  López-Rubio F J, López-Rubio E, Foreground detection for moving cameras with stochastic approximation, Pattern Recognition Letters, 68: 161-168, 2015.
-  Narayana M, Hanson A, Learned-Miller E, Coherent motion segmentation in moving camera videos using optical flow orientations, Proceedings of the IEEE International Conference on Computer Vision, 1577-1584, 2013.
-  Li X, Xu C, Moving object detection in dynamic scenes based on optical flow and superpixels, IEEE International Conference on Robotics and Biomimetics, 84-89, 2015.
-  Ilg E, Mayer N, Saikia T, et al, Flownet 2.0: Evolution of optical flow estimation with deep networks, IEEE conference on computer vision and pattern recognition, 2:6, 2017.
-  Fischler M A, Bolles R C, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM, 24(6): 381-395, 1981.
-  Perazzi F, Pont-Tuset J, McWilliams B, et al, A benchmark dataset and evaluation methodology for video object segmentation, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 724-732, 2016.
-  Wang Y, Jodoin P M, Porikli F, et al, CDnet 2014: An expanded change detection benchmark dataset, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 387-394, 2014.
-  Brox T, Malik J, Large displacement optical flow: descriptor matching in variational motion estimation, IEEE transactions on pattern analysis and machine intelligence, 33(3): 500-513, 2011.
-  Liu C, Beyond pixels: exploring new representations and applications for motion analysis, Massachusetts Institute of Technology, 2009.