1 Introduction
The demand for video processing is rapidly increasing, driven by greater numbers of sensors with greater resolution, new types of sensors, new collection methods and an ever wider range of applications. For example, video surveillance, vehicle automation or wildlife monitoring, with data gathered in visual/infrared spectra or SONAR, from multiple sensors being fixed or vehicle/dronemounted etc. The overall result is an explosion in the quantity of high dimensional sensor data. Motion detection is often the fundamental building block for more complex video processing and computer vision applications, e.g. object tracking or human behavior analysis. In practice, there are many different types of sensors giving data suitable for object extraction, however we focus here on video data provided by static optical cameras, noting the findings generalize to other data types. In this case, the change in position of an object relative to its surrounding environment can be detected by intensity changes over time in a sequence of video frames. The challenge therefore is to separate intensity changes corresponding to moving objects from those generated by background noise i.e. dynamic and complex backgrounds. From a statistical point of view this can be formulated as a density estimation problem, aiming to find a suitable model describing the background. Moving objects can then be identified by differences from the reconstructed background from the video frames, via some thresholding, as illustrated in Figure
1.In practice, the problem of finding a suitable model is difficult and often illposed due to the many challenges arising in real videos, e.g., dynamic backgrounds, camouflage effects, camera jitter or noisy images, to name only a few. One framework for tackling these challenges is provided by subspace learning techniques. Recently, robust principal component analysis (RPCA) has been very successful in separating video frames into background and foreground components bouwmans2014robust . However, RPCA comes with relatively high computational costs and it is of limited utility for realtime analysis of high resolution video. Hence, in light of increasing sensor resolutions there is a need for algorithms to be more rapid, perhaps by approximating existing techniques.
A competitive alternative is Dynamic Mode Decomposition (DMD) — a datadriven method allowing decomposition of a matrix representing both time and space Kutz2013 . Due to the unique properties of videos (equally spaced time with high temporal correlation), DMD is well suited for motion detection, as first demonstrated by Grosek and Kutz Kutz2014dmd .
1.1 Related work
Bouwmans bouwmans2014traditional or Sobral and Vacavant Sobralreview
provide recent and comprehensive reviews of methods for background modeling and related challenges. Among the many different techniques, the class of (robust) subspace models are prominent. PCA can be considered a traditional technique for describing the probability distribution of a static background. However, PCA has some essential shortcomings and many enhancements have been proposed since the method was first proposed for background subtraction by Oliver et al.
oliver2000bayesian , e.g. adaptive, incremental or independent PCA. A review of those traditional subspace models and related issues is provided by Bouwman bouwmans2009subspace . While DMD is related to PCA and shares some of the same limitations, it can overcome others to greatly improve the performance. Grosek and Kutz Kutz2014dmd have shown that DMD can be seen in fact as an approximation to robust PCA (see also bouwmans2015decomp ). The idea of RPCA is to separate a matrix into a lowrank and sparse component(1) 
This can be formulated as a convex optimization problem that minimizes a combination of the and norm. Applied to video data, the lowrank component describes the relatively static background environment, which is allowed to gradually change over time, while the second component captures the moving objects. This approach has gathered substantive attention for foreground detection since the idea was first introduced by Candès candes2011robust  further extended by Zhou StableRPCA for also capturing entrywise noise. Bouwmans and Zahzah bouwmans2014robust recently provided a comparative evaluation of the most prominent RPCA implementations, whose results show LSADM Goldfarb and TFOCS Becker algorithms perform best in extracting moving objects in terms of the Fmeasure. Guyon et al. guyon2012moving show in detail how the former algorithm can be used for moving object detection.
The problem formulation via RPCA leads to iterative algorithms with high computational costs. Most of the algorithms require repeated computation of the Singular Value Decomposition (SVD), so clearly the algorithms may be accelerated by using faster approximate SVD, aiming to find only the dominant singular values. Liu et al. Liu12limitedmemory present a Krylov subspacebased algorithm for computing the first singular values with high precision. They showed that their LMSVD algorithm can reduce the computational time of RPCA substantially. Later they showed even greater computational savings with their Gauss–Newton method based SVD algorithm LiuGauss . If high precision is not the main concern then approximate MonteCarlo based SVD algorithms can be interesting alternatives frieze2004fast ; drineas2006fast
. A different approach is via randomized matrix algorithms, which are surprisingly robust and provide significant speedups, while being simple to implement
Martinsson201147 . Halko et al. halko2011rand and Gu gu2015subspace provide comprehensive surveys of randomized algorithms for constructing approximate matrix decompositions, while Mahoney Mahoney2011 gives a more general overview. One successful approximate robust PCA algorithm using a randomized matrix algorithms is given in GoDec Zhou11godec .1.2 Motivation and contributions
A core building block of the DMD algorithm, as for RPCA, is the SVD. As noted, traditional deterministic SVD algorithms are expensive to compute and with increasing data they often pose a computational bottleneck. We propose the use of a fast, probabilistic SVD algorithm, exploiting the rapidly decaying singular values of video data. Randomized SVD is a lean and easy to implement technique for computing a robust approximate lowrank SVD halko2011rand . Compared to deterministic truncated or partial SVD algorithms, we gain computational savings in the order of 10 to 30 times. The next effect is to increase speed of about 2 to 3 times with randomized DMD, rather than deterministic SVD based DMD. Hence, randomized DMD may facilitate realtime processing of videos. Moreover, randomized SVD and DMD are embarrassingly parallel and we show that the computational performance can benefit from a Graphics Processing Unit (GPU) implementation. To demonstrate the applicability for motion detection, we have evaluated and compared dynamic mode decomposition on a comprehensive set of synthetic and real videos with other leading algorithms in the field.
The rest of this paper is organized as follows. Section 2 presents randomized SVD as an approximation to the deterministic algorithms. Section 3 first introduces DMD and then shows how a lowrank DMD approximation using randomized SVD can be used for background modeling. Finally a detailed evaluation of DMD is presented in section 4. Concluding remarks and further research directions are given in section 5.
2 Singular Value Decomposition (SVD)
Matrix factorizations are fundamental tools for many practical applications in signal processing, statistical computing and machine learning. SVD is one such technique, used for data analysis, dimensionality reduction or data compression. Given an arbitrary real matrix
we seek a decomposition, such that(2) 
where and are orthogonal matrices, and is a diagonal matrix with the same dimensions as watkins2004fundamentals . The columns of and
are both orthonormal, called right and left singular vectors respectively. The singular values denoted as
are the diagonal elements of sorted in decreasing order. While we assume a real matrix here, for generality we use the Hermitian transpose denoted as .In practice we may be interested in a lowrank approximation of with target rank . Choosing the optimal target rank is highly dependent on the task, i.e. whether one is interested in a very good reconstruction of the original data or in a very low dimensional representation of the data. The reconstruction error for a lowrank approximation:
(3) 
is given by the singular value , where the index denotes the Frobenius norm. Thus, a reasonable small singular value gives a low reconstruction error, and we can denote in this case as the effective rank of the matrix . It can be proven that the exact low rank approximation is provided by the deterministic SVD, however the computational costs can be tremendous for largescale problems, in particular for unstructured data. In the following, we present a faster randomized algorithm halko2011rand .
2.1 A Randomized SVD Algorithm
Randomized matrix algorithms are approximate algorithms for linear algebra problems using random sampling and projections to accelerate the computation Mahoney2011 . Given an input matrix and a desired target rank the randomized algorithm for computing the approximate lowrank SVD can be roughly divided into two stages.
The first stage is concerned with finding a random lowdimensional subspace that best captures the column space of . Here the idea of random projections is used to build the basis for the column space. We simply draw random Gaussian vectors and compute the following random sketch
(4) 
As a result from probability theory, it follows that the random vectors, and hence the set
are linearly independent. We can compute (4) more compactly as matrixmatrix product(5) 
where
is a random Gaussian matrix. We then compute the QRDecomposition of
to obtain the orthonromal matrix so that(6) 
is satisfied.
In the second stage we project the input matrix onto the lowdimensional subspace
(7) 
The action of the column space of is now restricted to the relatively small (if ) matrix . Subsequently we can cheaply compute the deterministic SVD of as
(8) 
The randomized algorithm can be justified as follows
(9) 
Thus we can recover the right singular vectors by computing
(10) 
Algorithm (1) shows a prototype for computing the randomized SVD.^{1}^{1}1See supplementary materials for a more detailed algorithm and Python implementation with oversampling parameter and subspace iterations.
Common choices for generating the random matrix
are the normal or uniform distribution. The computational time can further reduced by first computing the QRdecomposition of
and then computing the SVD of the even smaller matrix (see Voronin et al. voronin2015rsvdpack for further details).The approximation error of a randomized SVD can be decreased by introducing a small oversampling parameter . This means, instead of drawing random vectors, we generate samples, so that the likelihood of spanning the correct subspace is increased. A small oversampling parameter (e.g. ) is generally sufficient. Further, computing power iterations can increase the accuracy:
(11) 
The power iterations drive the spectrum of down and the approximation error, which is proportional to the spectrum, decays exponentially with the number of iterations. Even if the signaltonoise ratio is low, power iterations already achieve good results. For numerical reasons a practical implementation should use subspace iterations instead of power iterations gu2015subspace . Halko et al. halko2011rand showed that the approximation error of randomized SVD has the following error bound halko2011rand , if the oversampling parameter is chosen equal to , i.e.
(12) 
2.2 Computational Costs
SVD is often the bottleneck in practical largescale applications. Many different methods for computing the SVD have been proposed and optimized for different problems, exploiting certain matrix properties. Thus, giving a detailed overview of the computational costs is difficult.
In short however, the time complexity for the ordinary deterministic SVD algorithms is if , while modern partial SVD methods based on rankrevealing QRfactorization can reduce the time complexity to demmel1997applied . The randomized SVD algorithm using random sampling, as we have presented it here, needs two passes over the input matrix and also has asymptotic costs of . Hence, theoretically we have the same costs asymptotically  however from a practical point of view, it is much cheaper to compute a matrixmatrix multiplication than a columnpivoted QR factorization. The costs can be further deceased by exploiting certain matrix properties to compute a fast matrixmatrix multiplication of (5) to
floating point operations. For example, the Subsampled Random Fourier Transform (SRFT) as proposed by Woolfe et al.
woolfe2008fast can be used.In practice, the computational time of (randomized) SVD algorithms is also heavily driven by the computational platform used, the specific implementation and whether the matrix fits into the fast memory. An advantage of randomized SVD is that it can benefit from parallel computing. For example, permitting a GPU implementation, leading to dramatic acceleration voronin2015rsvdpack . This is because the GPU architecture enables fast generation of random numbers and fast matrixmatrix multiplications.
3 Dynamic Mode Decomposition (DMD)
DMD is a datadriven method, fusing PCA with timeseries analysis (Fourier transform in time) Kutz2013 . This integrated approach for decomposing a data matrix overcomes the PCA shortcoming of performing an orthogonalization in space only. DMD is an emergent technique in the fluid mechanics community for analyzing the dynamics of non linear systems and was originally proposed by Schmidt schmid2010dynamic and Rowley et al. rowley2009spectral . Allowing the assessment of spatiotemporally coherent structures, with almost no underlying assumptions makes DMD interesting for video processing. Specifically, the resulting lowrank features are of interest for modeling the background of surveillance videos. In addition, DMD also allows predictions to be made about shorttime future states of video streams kutz2015multi .
To compute the DMD, an ordered and evenly spaced data sequence describing a dynamical system is required. This applies naturally to videos, where a data matrix can be constructed so that the columns are consecutive grey coloured videos frames . The elements of refer to the intensity of a pixel in space () and time (). As is common in the DMD literature, we denote such a data matrix also as snapshot sequence. Further, it is reasonable to assume that two consecutive video frames are related to each other in time. Mathematically, we can establish the following important relationship
(13) 
stating that there exists an unknown underlying linear operator that connects two consecutive video frames schmid2010dynamic . Here the index is denoting a frame in time. It turns out that
is the Koopman operator whose eigenvalue decomposition describes the evolution of a video sequence
tu2013dynamic . Hence, the goal of DMD is to find an approximate decomposition of ^{2}^{2}2Traditionally, the problem of obtaining the operator was formulated in terms of a companion matrix in order to emphasize the deeper theoretical relationship to the Arnodli Algorithm and the Koopman operator. We refer to schmid2010dynamic ; rowley2009spectral for further theoretical details.. It is also interesting to note that, while the operatoris considered to be linear, its eigenvectors and eigenvalues can also describe nonlinear dynamical systems.
3.1 LowRank Dynamic Mode Decomposition Algorithm
To compute the DMD we proceed by first arranging the data matrix into two matrices:
(14) 
(15) 
The left snapshot sequence is approximately linked to the right sequence by the operator as follows
(16) 
This is in fact a well known linear least squares problem
(17) 
An estimate can be computed using the pseudoinverse golub1965calculating as follows
(18) 
where and are denoting the left and right singular values respectively, and the diagonal matrix with the corresponding singular values. However, this direct approach of computing the operator
might not be feasible when dealing with high dimensional data, like videos. Instead it is more desirable to reduce the dimension first using a similarity transformation in order to find an approximate operator
as(19) 
In fact it can be shown that and have the same eigenvalues demmel1997applied . Using the similarity transformation draws a connection between DMD and PCA by projecting onto the principal components (left singular vectors) . We obtain by plugging (18) into (19) as follows (note that is a matrix with orthonormal columns and hence )
(20) 
It can be seen that the SVD plays a central role in computing the DMD. Computing the SVD can be computationally expensive, however exploiting the lowdimensional structure of video data (with rank ) allows us to use fast approximate lowrank decomposition techniques, e.g., randomized SVD (rSVD) as described in Section 2. We denote this approach using rSVD for computing an approximate lowrank dynamic mode decomposition with a specified target rank , as randomized DMD (rDMD). In this case, the dimension of the linear operator reduces to . The structure of is revealed by computing the eigenvalue decomposition of as
(21) 
where is the eigenvector matrix and is a diagonal matrix containing the eigenvalues . The dynamic modes are then computed by relating the eigenvectors back to as either schmid2010dynamic
(22) 
or more generally as tu2013dynamic
(23) 
We favor the latter approach.
The original data matrix can be reconstructed by noting that the snapshots can be represented as the linear combination jovanovic2012low
(24) 
where denotes the th eigenvalue, the th dynamic mode and the corresponding amplitude. Since is time independent reduces to
(25) 
The parameter vector can be estimated by the linear least squares method Kutz2014dmd . Figure 2 illustrates how the approximate lowrank DMD can be expressed as
(26) 
where is a diagonal matrix of the amplitudes
(27) 
and is the Vandermonde matrix of the eigenvalues
(28) 
From the Vandermonde matrix it is clear that temporal dynamics, retrieved by the DMD, consists of single (distinct) frequencies.
The prototype Algorithm (2) summarizes the method for computing the DMD using rSVD.
3.2 DMD for Background Modeling
In the previous section we have seen how DMD can be used to decompose and reconstruct a matrix. However, using (26) for modeling the video background directly is a bad strategy. Of course, we can hope that when computing the lowrank dynamic mode decomposition, that the dominant dynamic modes are not corrupted by any moving objects and only capture background structures. Like PCA, this works when we train DMD on a set of clean video frames. However, that is an unrealistic scenario in real world applications. More desirable is a decomposition into lowrank (background components) and sparse components (foreground components) similar to RPCA candes2011robust
(29) 
Unlike robust PCA, DMD is not capable of directly separating a matrix into these two components. Instead, DMD allows us to compute an approximation to it. First let us connect the DMD eigenvalues to the Fourier modes as follows Kutz2014dmd
(30) 
For standard videos we simply assume the time step and hence . By construction, the eigenvalues are complex. Hence the Fourier modes allow us to reveal interesting properties about the relating dynamic modes. The real part of determines the mode’s evolution over time, while the imaginary part is related to the mode’s oscillations. Now let us rewrite (26) in terms of the Fourier modes for a lowrank decomposition of a video matrix
(31) 
where is the time vector. From (31) it is clear that the Fourier modes dictate how the modes evolve, i.e., decay or grow in time. In light of this, the set of modes can be separated into a set that contains only Fourier modes who evolve slowly over time and corresponds to background modes. The second set contains modes describing fast moving objects. Exploiting this, (31) can be rewritten as
(32) 
The background video can then be reconstructed as follows
(33) 
Foreground objects (sparse components) can be identified as difference between the original video data and the background video (discarding the imaginary part)
(34) 
We illustrate the concept on a real video in the following examples. Figure 3 shows the Fourier modes of a lowrank dynamic mode decomposition with target rank .
The Fourier mode identifies the background mode, shown in Figure 4 (b). However, using just the zero mode leads to a static background model. Figure 4 (c) shows that the waving tree is captured as foreground object when using the zero mode only. Hence, to better cope with dynamic backgrounds, it is favorable to select a subset of modes for background modeling. Using the first 3 modes decreases the false positive rate, as shown in figure 4 (d). Deciding upon the number of modes used for modeling the background was semiarbitrary and we achieved qualitatively good results with 3 to 5 modes — whereas using the zero mode is computationally faster.
4 Experimental Evaluation
In this section we evaluate both the accuracy and computational performance of the proposed algorithm and compare it to other stateoftheart methods. To evaluate the effectiveness of rDMD for detecting moving objects we use two benchmark datasets. First, we test rDMD on ten synthetic videos from the BMC 2012 (Background Models Challenge) dataset vacavant2013benchmark . Further, to evaluate the performance on real videos, we use eight videos from the ChangeDetection.net (CD) dataset wang2014cdnet . The selected videos represent challenging examples in motion detection. For example:

Bootstrapping: A sequence of clean background images which are not available for training.

Dynamic backgrounds: Moving objects which belong to the background like waving trees, rain or snowfall.

Illumination changes: Gradual illumination changes of the environment due to fog or sun.

Camouflage: Foreground objects which have the same pixel intensity as background elements, i.e. same color.
4.1 Evaluation Measures
To evaluate the performance of background subtraction algorithms, a binary foreground mask using a suitable distance measure has to be computed
(35) 
For instance, the Euclidean distance is a common choice for measuring the distance between pixels of the actual video and the reconstructed background frame benezeth2010comparative . However, more sophisticated measures can be formulated and allow adaptive thresholding. The resulting vector is called the foreground or motion mask and its elements are binary . In the outcomes, classifies a pixel belonging to a foreground object, otherwise
as a background element. Thus we can visualize the classification results as an confusion matrix

(36) 
where TP denotes the (number of) True Positive predictions, i.e. pixels which are correctly classified as belonging to a moving foreground object. Similarly TN denotes the (number of) True Negative predictions, i.e. pixels which are correctly classified as background. False Positive (FP) and False Negative (FN) are the respective misclassifications for foreground and background elements. Based on the confusion matrix we can compute the following evaluation measures.
Recall (also called sensitivity, true positive rate or hit rate) measures the algorithm’s ability to correctly detect pixels belonging to foreground objects. It is computed as the ratio of predicted true positives to the total number of true positive foreground pixels
(37) 
Precision (also called false alarm rate or true positive accuracy) measures how confident we can be that a positive classified pixel actually belongs to a foreground object. It is computed as the ratio of predicted true positives to the total number of pixels predicted as foreground objects
(38) 
Specificity (also called true negative rate) measures the algorithm’s ability to correctly predict pixels belonging to the background. It is computed as the ratio of true negatives to the total number of true negative foreground pixels
(39) 
The Fmeasure combines recall and precision as their harmonic mean, weighting both measures evenly, defined as
(40) 
More general definitions of the Fmeasure also allow different weighting schemes.
From (35) it is obvious that the classification results depend on a predefined fixed threshold . To get a global understanding of the algorithm’s behaviors the evaluation measures can be computed over a range of different thresholds. The results can then be visualized using precisionrecall and Receiver Operator Characteristics (ROC) curves. An advantage of ROC graphs, plotting precision vs 1specificity, is the insensitivity to changes in the class distribution fawcett2005ROC . In particular in dynamic environments such as videos, the number of pixels belonging to foreground objects can vary significantly over frames and is generally much less then the number of pixels belonging to the background. A further advantage of using ROC curves is the convenient way to summarize the performance with a global single scalar value measuring the Area Under the Curve (AUC). The perfect ROC curve has an AUC of 1, while random guessing yields an AUC of . Thus a method with an AUC close to 0.5 or below can be considered as useless, while a method with a higher AUC is preferred.
4.2 Results
DMD is formulated as a batch algorithm here, i.e, previous modeled sequences do not effect the following. This allows the algorithm to adapt to changes in the scene, e.g., illumination changes. Also, foreground objects that become background objects (like a recently parked car) can be better captured. On the other hand, it does not allow for dealing with ‘sleeping’ foreground objects. The performance varies with the number of modes and the length of the snapshot sequence. Our results show that a snapshot length of about 100 to 300 video frames can be separated with a very low number of modes, e.g. . If the video is less noisy, a lower number of dynamic modes is sufficient. However, depending on how fast the foreground objects are moving, using less than 100 frames often leads to a poor detection performance. Another important issue is the choice of the initial condition used for computing the amplitudes. The default option is to use the first frame of the sequence, as stated in (25), however we often achieved better results using the median frame instead. Another interesting option is to recompute the amplitudes for small chunks of the sequence. This allows better capture of sudden illumination changes.
For the rDMD algorithm, two further tuning parameters and can be specified. The former is the oversampling parameter and the latter controls the number of power iterations of the rSVD algorithm. For computing rDMD in the following we keep the two parameters fixed as and . This parameter setting recovers almost exactly the results achieved with the ordinary DMD algorithm.
We first illustrate in Figure 5 the performance of rDMD compared to ordinary DMD, PCA oliver2000bayesian and robust PCA candes2011robust on two videos. While RPCA performs best both in terms of the AUC and the Fmeasure, rDMD and DMD can be seen as a reasonable approximation. The results also show that the performance difference between rDMD and DMD is insignificant. As expected, DMD performs significantly better than PCA in terms of the Fmeasure.
Table 1 shows the results of randomized DMD for the ten synthetic videos of the BMC dataset and compares them with three leading robust PCA algorithms: LSADM, TFOCS and GoDec. LSADM Goldfarb is a principal component pursuit algorithm, while TFOCS Becker is a quantizationbased principal component pursuit algorithm. GoDec Zhou11godec is an approximated RPCA algorithm based on bilateral random projections and like rDMD uses the concept of randomized matrix algorithms. Overall the average Fmeasure shows that the detection performance of rDMD is about 4 lower than the RPCA algorithms. The slightly poorer performance is due to the Street 512 and Rotary 522 videos, emulating windy scenes with additional noise. In these cases the background is very dynamic and the precision of the DMD algorithm is decreased. However, this problem can be compensated for by postprocessing the obtained foreground mask with a median filter. The overall performance of this approach leads to an improvement of about 2. The results on the other videos show that DMD is flexible enough to deal with illumination changes like clouds, fog or sun.
Measure  Street  Rotary  Average  

112  212  312  412  512  122  222  322  422  522  
LSADM Goldfarb et al. Goldfarb 
Recall  0.874  0.857  0.906  0.862  0.840  0.878  0.880  0.892  0.782  0.830   
Precision  0.830  0.965  0.867  0.935  0.742  0.940  0.938  0.892  0.956  0.869    
FMeasure  0.851  0.908  0.886  0.897  0.788  0.908  0.908  0.892  0.860  0.849  0.880  
TFOCS Becker et al. Becker 
Recall  0.910  0.843  0.867  0.903  0.834  0.898  0.892  0.892  0.831  0.877   
Precision  0.830  0.965  0.899  0.889  0.824  0.924  0.932  0.887  0.940  0.879    
FMeasure  0.868  0.900  0.882  0.896  0.829  0.911  0.912  0.889  0.882  0.878  0.885  
GoDec Zhou and Tao Zhou11godec 
Recall  0.841  0.875  0.850  0.868  0.866  0.822  0.879  0.792  0.813  0.866   
Precision  0.965  0.942  0.968  0.948  0.902  0.900  0.921  0.953  0.750  0.837    
FMeasure  0.899  0.907  0.905  0.906  0.884  0.859  0.900  0.865  0.781  0.851  0.876  
rDMD 
Recall  0.873  0.855  0.760  0.805  0.783  0.883  0.860  0.772  0.800  0.834   
Precision  0.887  0.912  0.902  0.900  0.656  0.896  0.907  0.876  0.902  0.770    
FMeasure  0.880  0.882  0.825  0.850  0.714  0.889  0.882  0.820  0.848  0.800  0.839  
rDMD (with median filter) 
Recall  0.859  0.833  0.748  0.793  0.801  0.862  0.834  0.808  0.761  0.831   
Precision  0.906  0.935  0.924  0.916  0.879  0.922  0.936  0.892  0.941  0.894    
FMeasure  0.882  0.881  0.826  0.850  0.838  0.891  0.882  0.847  0.842  0.861  0.860 
We show in Table 2 the evaluation results of 8 real videos from the CD dataset. The videos are from three different categories: ‘Baseline’, ‘Dynamic Background’ and ‘Thermal’. At first glance the overall performance of the rDMD algorithm looks poor. This can be related to several challenges faced here. While the performance on the two baseline videos ’Highway’ and ’Pedestrians’ are good, issues arise for the other two baseline videos. The PETS2006 is difficult, due to camouflage effects for the DMD algorithm, as well as because some of the objects are sleeping foreground objects. The ‘Office’ shows even more drastically that DMD cannot cope with sleeping foreground objects. However, integrating DMD into a simple system allowing for background maintenance can help to overcome this problem. The two dynamic background videos show the same problem as before with the synthetic videos. The recall rate is excellent, while the precision is lacking. But again just using a simple median filter for preprocessing increases the performance greatly. Figure 6 shows some visual results for 3 selected videos.
For comparison we show in Table 2
also the results of two algorithms leading the CD ranking. The FTSG (Flux Tensor with Split Gaussian models)
FTSG algorithm is based on mixture of Gaussians which won the 2014 CD challenge. The PAWCS (Pixelbased Adaptive Word Consensus Segmenter) PAWCS is a wordbased approach to background modeling. While the raw results of DMD clearly cannot compete with the two mentioned highly optimized methods, it can be seen that simple postprocessing can accelerate the performance substantially. Hence the object detection rate can be improved by learning from other background modeling methods and using for example, a more elaborate threshold, or integrating DMD into a system allowing background maintenance.Measure  Baseline  Dynamic Background  Thermal  Average  

Highway  Pedestrians  PETS2006  Office  Overpass  Canoe  Park  Lakeside  
PAWCS StCharles et al. PAWCS 
Recall  0.952  0.961  0.945  0.905  0.961  0.947  0.899  0.520   
Precision  0.935  0.931  0.919  0.972  0.957  0.929  0.768  0.752    
FMeasure  0.944  0.946  0.932  0.937  0.959  0.938  0.829  0.615  0.887  
FTSG Wang et al. wang2014static 
Recall  0.956  0.979  0.963  0.908  0.944  0.913  0.666  0.228   
Precision  0.934  0.890  0.883  0.961  0.941  0.985  0.724  0.960    
FMeasure  0.945  0.932  0.921  0.934  0.943  0.948  0.694  0.369  0.835  
rDMD 
Recall  0.810  0.943  0.680  0.482  0.797  0.854  0.736  0.680   
Precision  0.789  0.756  0.703  0.560  0.194  0.201  0.610  0.448    
FMeasure  0.799  0.839  0.691  0.518  0.312  0.325  0.667  0.540  0.586  
rDMD (with median filter) 
Recall  0.901  0.976  0.681  0.551  0.778  0.900  0.816  0.655   
Precision  0.899  0.945  0.713  0.642  0.929  0.937  0.744  0.571    
FMeasure  0.900  0.960  0.696  0.593  0.847  0.918  0.779  0.610  0.788 
4.3 Computational performance
We now evaluate the computational performance of rSVD and rDMD algorithm respectively. Our implementations are written in Python^{3}^{3}3Python Software Foundation. Python Language Reference, version 2.7. Available at http://www.python.org using the multithread MKL (Intel Math Kernel Library) accelerated linear algebra library LAPACK. For the GPU implementation we are using NVIDIA CUDA in combination with the linear algebra libraries cuBLAS CUDA and CULA CULA . To allow the comparison of rSVD with the fast LMSVD Liu12limitedmemory algorithm we have used Matlab. All the computations were performed on a standard gaming notebook (Intel Core i75500U 2.4GHz, 8GB DDR3 L memory and NVIDIA GeForce GTX 950M). It is important to note that in order to achieve any computational advantage with rSVD the target rank has to be , otherwise truncated SVD would be faster. Another requirement is that the matrix fits into the fast memory.
4.3.1 Randomized SVD
Figure 7 shows the computational time for rSVD with LMSVD and a partial SVD (svds) algorithm for two different sized matrices and a varying target rank. rSVD with one subspace iteration can achieve time savings of about a factor 10 to 30. Comparing to the LMSVD (with default options) the speedup is about 5 to 8 times. However, the reconstruction error shows that the LMSVD algorithm is more precise, in particular when comparing to rSVD without subspace iterations.
4.3.2 Randomized DMD
For feasible realtime processing, our aim was to accelerate the ordinary DMD algorithm by using a randomized matrix algorithm for computing the SVD. Figure 8 shows the computational times we obtain with a randomized algorithm on videos with two different resolutions. The randomized version allows us to accelerate the computational time by about a factor of 2. Even more drastic is the acceleration using a GPU implementation. This allows processing of up to 180 frames per second for a 720x480 video. For a 320x240 video it can increase the frames per second from about 300 up to 750. The GPU accelerated DMD implementation benefits in particular from the fast computations of dot products, QRdecomposition and the fast generation of random numbers. Using a highend graphic card can further improve the results, enabling realtime processing of HD 720 videos and beyond. The limitation of a GPU implementation is that the snapshot sequence has to fit into the graphic card’s memory.
5 Conclusion
We have presented a fast algorithm for computing the lowrank DMD using a randomized matrix algorithm. This subtle modification leads to substantial decreases in computational time, enabling DMD to process videos with high resolutions in realtime. Furthermore, the randomized version is of particular interest for parallel processing, e.g. GPU accelerated implementations. Randomized matrix algorithms may be beneficial for many other methods built on numerical linear algebra and in particular for those that use SVD.
The suitability of DMD for motion detection has been evaluated on synthetic and real videos using statistical metrics. The results show that DMD can be seen as a fast approximation of RPCA. The results compared to other robust PCA algorithms are competitive, but the results of DMD compared to advanced statistical models leading the CD ranking are less optimal in terms of accuracy metrics. However, we have also shown that simple postprocessing can enhance the results substantially. The performance can be further improved by adaptive thresholds or by integration into a background modeling system. This makes DMD interesting for applications where fast processing is more important then extremely high precision.
In addition, we note that DMD is not a method purely designed for background modeling. DMD comes with a rich mathematical framework, which offers potential for interesting further research and applications beyond video. For example, interesting research directions are opened by multiresolution dynamic mode decomposition mrdmd , making the algorithm particularly interesting for the task of object tracking and motion estimation. Another direction is compressed DMD for fast background modeling. Both compressed and randomized DMD offer the opportunity to use importance sampling strategies to model a more robust background.
Acknowledgment
For the many discussions and insightful remarks on dynamic mode decomposition we would like to thank J. Nathan Kutz and Steven L. Brunton. We would also like to express our gratitude to Leonid Sigal and the two anonymous reviewers for the many helpful comments to improve this paper. N. Benjamin Erichson acknowledges support from the UK Engineering and Physical Sciences Research Council (EPSRC).
References
 (1) T. Bouwmans, E. H. Zahzah, Robust pca via principal component pursuit: A review for a comparative evaluation in video surveillance, Computer Vision and Image Understanding 122 (2014) 22–34. doi:10.1016/j.cviu.2013.11.009.
 (2) J. N. Kutz, DataDriven Modeling and Scientific Computation: Methods for Complex Systems and Big Data, Oxford University Press, 2013.
 (3) J. Grosek, J. N. Kutz, Dynamic mode decomposition for realtime background/foreground separation in video (2014). arXiv:1404.7592.
 (4) T. Bouwmans, Traditional and recent approaches in background modeling for foreground detection: An overview, Computer Science Review 1112 (2014) 31–66. doi:10.1016/j.cosrev.2014.04.001.
 (5) A. Sobral, A. Vacavant, A comprehensive review of background subtraction algorithms evaluated with synthetic and real videos, Computer Vision and Image Understanding 122 (2014) 4–21. doi:10.1016/j.cviu.2013.12.005.
 (6) N. Oliver, B. Rosario, A. Pentland, A bayesian computer vision system for modeling human interactions, Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (8) (2000) 831–843. doi:10.1109/34.868684.
 (7) T. Bouwmans, Subspace learning for background modeling: A survey, Recent Patents on Computer Science 2 (3) (2009) 223–234.
 (8) T. Bouwmans, A. Sobral, S. Javed, S. K. Jung, E.H. Zahzah, Decomposition into lowrank plus additive matrices for background/foreground separation: A review for a comparative evaluation with a largescale dataset (2015). arXiv:1511.01245.
 (9) E. J. Candès, X. Li, Y. Ma, J. Wright, Robust principal component analysis?, Journal of the ACM 58 (3) (2011) 1–37. doi:10.1145/1970392.1970395.
 (10) Z. Zhou, X. Li, J. Wright, E. Candes, Y. Ma, Stable principal component pursuit, in: Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, 2010, pp. 1518–1522. doi:10.1109/ISIT.2010.5513535.
 (11) D. Goldfarb, S. Ma, K. Scheinberg, Fast alternating linearization methods for minimizing the sum of two convex functions, Mathematical Programming 141 (12) (2013) 349–382. doi:10.1007/s1010701205302.
 (12) S. Becker, E. J. Candès, M. C. Grant, Templates for convex cone problems with applications to sparse signal recovery., Mathematical Programming Computation 3 (3) (2011) 165–218. doi:10.1007/s1253201100295.
 (13) C. Guyon, T. Bouwmans, E.H. Zahzah, Moving object detection by robust pca solved via a linearized symmetric alternating direction method, in: Advances in Visual Computing, Springer, 2012, pp. 427–436.
 (14) X. Liu, Z. Wen, Y. Zhang, Limited memory block krylov subspace optimization for computing dominant singular value decompositions (2012).
 (15) X. Liu, Z. Wen, Y. Zhang, An efficient gauss–newton algorithm for symmetric lowrank product matrix approximations, SIAM Journal on Optimization 25 (3) (2015) 1571–1608. doi:10.1137/140971464.
 (16) A. Frieze, R. Kannan, S. Vempala, Fast montecarlo algorithms for finding lowrank approximations, Journal of the ACM 51 (6) (2004) 1025–1041.
 (17) P. Drineas, R. Kannan, M. W. Mahoney, Fast monte carlo algorithms for matrices ii: Computing a lowrank approximation to a matrix, SIAM Journal on Computing 36 (1) (2006) 158–183.
 (18) P.G. Martinsson, V. Rokhlin, M. Tygert, A randomized algorithm for the decomposition of matrices, Applied and Computational Harmonic Analysis 30 (1) (2011) 47–68. doi:10.1016/j.acha.2010.02.003.
 (19) N. Halko, P. G. Martinsson, J. A. Tropp, Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions, SIAM Review 53 (2) (2011) 217–288. doi:10.1137/090771806.
 (20) M. Gu, Subspace iteration randomization and singular value problems, SIAM Journal on Scientific Computing 37 (3) (2015) A1139–A1173. doi:10.1137/130938700.
 (21) M. W. Mahoney, Randomized algorithms for matrices and data, Foundations and Trends in Machine Learning 3 (2) (2011) 123–224. doi:10.1561/2200000035.
 (22) T. Zhou, D. Tao, Godec: Randomized lowrank & sparse matrix decomposition in noisy case, in: International Conference on Machine Learning, ICML, 2011, pp. 1–8.
 (23) D. S. Watkins, Fundamentals of Matrix Computations, 2nd Edition, John Wiley & Sons, Inc., New York, NY, USA, 2002.
 (24) S. Voronin, P.G. Martinsson, Rsvdpack: Subroutines for computing partial singular value decompositions via randomized sampling on single core, multi core, and gpu architectures (2015). arXiv:1502.05366.
 (25) J. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics, 1997. doi:10.1137/1.9781611971446.
 (26) F. Woolfe, E. Liberty, V. Rokhlin, M. Tygert, A fast randomized algorithm for the approximation of matrices, Applied and Computational Harmonic Analysis 25 (3) (2008) 335–366.
 (27) P. J. Schmid, Dynamic mode decomposition of numerical and experimental data, Journal of Fluid Mechanics 656 (2010) 5–28. doi:10.1017/S0022112010001217.
 (28) C. W. Rowley, I. Mezic̀, S. Bagheri, P. Schlatter, D. S. Hennigson, Spectral analysis of nonlinear flows, Journal of Fluid Mechanics 641 (2009) 115–127. doi:10.1017/S0022112009992059.
 (29) J. N. Kutz, X. Fu, S. L. Brunton, Multiresolution dynamic mode decomposition (2015). arXiv:1506.00564.
 (30) J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, J. N. Kutz, On dynamic mode decomposition: Theory and applications (2013). arXiv:1312.0041.
 (31) G. Golub, W. Kahan, Calculating the singular values and pseudoinverse of a matrix, Journal of the Society for Industrial & Applied Mathematics, Series B: Numerical Analysis 2 (2) (1965) 205–224.
 (32) M. Jovanovic, P. Schmid, J. Nichols, Lowrank and sparse dynamic mode decomposition, Center for Turbulence Research Annual Research Briefs (2012) 139–152.
 (33) A. Vacavant, T. Chateau, A. Wilhelm, L. Lequievre, A benchmark dataset for outdoor foreground/background extraction, in: Computer Vision–ACCV 2012 Workshops, Springer, 2013, pp. 291–300.

(34)
Y. Wang, P.M. Jodoin, F. Porikli, J. Konrad, Y. Benezeth, P. Ishwar, Cdnet 2014: an expanded change detection benchmark dataset, in: IEEE Workshop on Computer Vision and Pattern Recognition, IEEE, 2014, pp. 393–400.
 (35) Y. Benezeth, P.M. Jodoin, B. Emile, H. Laurent, C. Rosenberger, Comparative study of background subtraction algorithms, Journal of Electronic Imaging 19 (3) (2010) –. doi:10.1117/1.3456695.
 (36) T. Fawcett, An introduction to roc analysis, Pattern Recogn. Lett. 27 (8) (2006) 861–874. doi:10.1016/j.patrec.2005.10.010.
 (37) R. Wang, F. Bunyak, G. Seetharaman, K. Palaniappan, Static and moving object detection using flux tensor with split gaussian models, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, IEEE, 2014, pp. 420–424.
 (38) P.L. StCharles, G.A. Bilodeau, R. Bergevin, A selfadjusting approach to change detection based on background word consensus, in: Applications of Computer Vision (WACV), 2015 IEEE Winter Conference on, IEEE, 2015, pp. 990–997.
 (39) R. Wang, F. Bunyak, G. Seetharaman, K. Palaniappan, Static and moving object detection using flux tensor with split gaussian models, in: Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on, IEEE, 2014, pp. 420–424.
 (40) J. Nickolls, I. Buck, M. Garland, K. Skadron, Scalable parallel programming with cuda, Queue 6 (2) (2008) 40–53. doi:10.1145/1365490.1365500.
 (41) J. R. Humphrey, D. K. Price, K. E. Spagnoli, A. L. Paolini, E. J. Kelmelis, Cula: hybrid gpu accelerated linear algebra routines (2010). doi:10.1117/12.850538.
 (42) J. N. Kutz, X. Fu, S. L. Brunton, Multiresolution analysis of dynamical systems using dynamic mode decomposition, in: Proceedings of the World Congress on Engineering, Vol. 1, WCE, 2015, pp. 1–4.
Comments
There are no comments yet.