I Introduction
The significant developments in the field of sparse models during the last decades lead to the opening of the new research and application fields.
One of the first application for sparse modelling is the linear regression problem where
and norm regularisation is considered. The latter has the advantage that a regulariser term is convex, while it has not so obvious sparse interpretation [1].Sparse modelling is further developed in the field of signal processing in compressive sensing [2], where the main idea is to minimise the number of measurements of the signal without loss of the decoding accuracy. Compressive sensing concerns the two main problems: selecting the optimal design matrix and solving illposed regression, that arises in the original signal decoding from the measurements [3].
The idea of sparse Bayesian modelling is mentioned in [4]
. This imposes the sparsityinducing Laplace prior on the data, but does not give the inference for the whole distribution, only a maximum aposteriori probability estimate. The full inference to this model is provided in
[5], using the Expectation Propagation (EP) technique. Another work is [6], where the prior is modified to the hierarchical GaussGamma distribution. These models are used as a basis for Bayesian compressive sensing in
[7] and [8].The recent monograph [9] presents the sparse modelling application for image and video processing. One of the essential problems in video processing is foreground detection which is mostly solved by background subtraction. Background subtraction aims to distinguish foreground (moving objects) from background (static ones). Sparseness is natural for the background subtraction problem as the foreground objects occupy the small regions on a frame. Background subtraction hence represents a natural application area for sparse modelling.
The idea to apply compressive sensing for background subtraction is originally proposed in [10] and developed in [11]. In contrast to these works in this paper we focus on the sparse Bayesian methods for background subtraction and the comprehensive comparison of these methods with the conventional compressive sensing one.
The contribution of this paper is in applying the Bayesian compressive sensing approach for the background subtraction problem. As far as the authors know, this approach for moving object detection has not been considered yet. Also several algorithms are overviewed and compared to evaluate their applicability in different situations.
Ii Framework
Assume that we have a static camera and we can acquire a frame from the camera that is referenced as the background. The video from the camera consists of the sequential frames . The aim is to estimate the mask of the foreground objects in these frames.
Iia Video preprocessing
We convert the source video frames to greyscale. The background frame
is converted to a vector
, the video frames are converted to vectors , where .IiB Compressive sensing
Typically the foreground objects take only a part of the image. Therefore the foreground mask has many values that are close to zero. This intuition can be represented as an assumption of
(1) 
pseudonorm is the number of nonzero elements of a vector.
We apply the compressive sensing theory to this problem. It reduces the number of measurements that need to be taken [2] and also the results may be denoised [9]. The values of the foreground mask are estimated based on the set of the compressed measurements :
(2) 
where the design matrix consists of i.i.d Gaussian variables. It is selected according to the method proposed in [12].
Since , the estimates of the coefficients can be done on the acquisition step as
(3) 
The vectors and are the linear combinations of the pixels of the video frames, therefore a single pixel camera may be used. The problem of the foreground mask reconstruction is more difficult.
The linear system (3) is underdetermined when therefore an infinite amount of solutions exists. The problem can be determined by the regulariser imposing in the assumption that the signal has a sparse structure. The common regularisers that are used in compressive sensing are minimisers of the norm, where .
The conventional methods to solve such systems are following [13, Chapter 13]:

 minimisation. The greedy algorithms based on least squares estimates, stochastic search, variational inference;

 minimisation. Coordinate descent, LARS, the proximal and gradient projection methods;

Nonconvex minimisation. Bridge regression, hierarchical adaptive lasso
In this paper we will focus on the Bayesian methods [7, 14] and compare them with orthogonal matching pursuit (OMP) [15], that is a greedy algorithm for minimisation. The following represents the brief review of these methods.
IiB1 Bayesian compressive sensing (BCS)
The system (3) is reformulated as a linear regression model in [7]:
(4) 
where
is a vector which elements are the independent noise from the Gaussian distribution:
. Therefore the likelihood can be expressed as(5) 
where is the th element of the vector , – the th row of the matrix .
To implement the full Bayesian approach, the prior distributions are imposed on all parameters:
(6) 
where is the th element of the vector , is a prior parameter vector, is the th element of the vector ;
(7) 
(8) 
where
denotes the Gamma distribution. The values of the hyperparameters
are set uniform and close to zero.According to the Bayes rule the posterior distribution can be written as follows:
(9) 
where is the likelihood term, is the prior term, is the evidence term. The latter can be expressed as:
(10) 
This integral is intractable, therefore some kind of approximation should be used.
In Bayesian compressive sensing [7]
the decomposition of the posterior probability into the product of the tractable and intractable probabilities is used and the intractable one is approximated with the deltafunction in its mode:
(11) 
The Bayes rule for the first term of (11) is as follows:
(12) 
These are all the Gaussians, so the probability can be calculated straightforwardly. It is the Gaussian distribution with the parameters
(13)  
(14) 
where .
The second term of the posterior probability (11) can be expressed as:
(15) 
As it has been already shown, the denominator here is not tractable. The most probable values of
are used. The hyperpriors are uniform, therefore only the term
needs to be maximised:(16) 
Maximisation of (16) w.r.t gives the following iterative process:
(17)  
(18) 
where
Note that
(19) 
This is the Studentt distribution, that has the most probable area concentrated around zero. Thereby it leads to the sparse vector .
The graphical model is displayed in Figure 1.
IiB2 Multitask Bayesian compressive sensing (Multitask BCS)
In [14] the Bayesian method to process several signals that have a similar sparse structure is proposed. The multitask setting reduces the number of measurements that should be taken comparing to processing all the signals independently. The hyperparameter is considered to be shared by all the tasks. The graphical model is displayed in Figure 2.
IiB3 Matching Pursuit
The greedy algorithms are proposed for the minimisation in [15]. These methods start with a null vector and iteratively add variables to it until a convergence to a threshold.
Iii Experiments
We use the Convoy dataset [11], which consists of 260 greyscale frames and the background frame. The frames are scaled to the less resolution of to avoid memory problems. For the multitask algorithm the batches of 40 frames are run together, while for the Bayesian compressive sensing and OMP algorithms all the frames are processed independently. There are two sets of the experiments: one with measurements and the other with measurements. For both sets of the experiments all three methods are run for 10 times with 10 different design matrices shared among the methods. For the quantitative comparison the median values of quality measures among these runs are presented.
The qualitative comparison of the models with the same design matrix is displayed in Figures 18  34. The three demonstrative frames are presented. One can notice that with the same design matrix the models demonstrate similar results. The figures show that measurements can be used for object region detection, while measurements which is only about of the input resolution are enough even to distinguish parts of the objects like doors and windows of the cars.
For the quantitative comparison of the results the following measures are used:

Reconstruction error: where is the signal ground truth, is the signal, reconstructed by the algorithm;

Background subtraction quality measure (BS quality): where is the ground truth foreground pixels, is the algorithm detected foreground pixels, is the cardinality of the set;

Peak signaltonoise ratio (PSNR): where peakval is the maximum possible pixel value, that is 255 in our case. MSE is the mean square error between and ;

Structural similarity index (SSIM): where , , , ,
are the local means, standard deviations, and crosscovariance for the images
, respectively, and are the regularisation constants
The difference between the uncompressed current frame and the uncompressed background frame is used as the ground truth signal for every frame (the second columns in Figures 18  34), since this is the signal which is compressed by (3).
The results are presented in Figure 43. All the quality measures – reconstruction error, BS quality, PSNR and SSIM – are calculated for every frame. The mean values among the frames for each measure can be found in Tables I – II.
Algorithm  Mean frame reconstruction error  Mean frame BS quality  Mean frame PSNR  Mean frame SSIM  Mean computational time (hours)^{1}^{1}1The computational time is provided for a batch of 40 frames (BCS and OMP process each frame independently with 4 parallel workers, multitask BCS processes all 40 frames together). Implementation is made on the laptop with i74702HQ CPU with 2.20GHz, 16 GB RAM using MATLAB 2015a 
BCS  0.8037  0.3518  34.2007  0.7198  0.23 
Multitask BCS  0.7608  0.4820  37.542  0.8384  0.67 
OMP  0.8028  0.3510  34.1705  0.7204  0.51 
Algorithm  Mean frame reconstruction error  Mean frame BS quality  Mean frame PSNR  Mean frame SSIM  Mean computational time (hours)^{†}^{†}footnotemark: 
BCS  0.4713  0.8119  43.8251  0.9186  0.9 
Multitask BCS  0.4702  0.8421  45.0028  0.9212  8.5 
OMP  0.4578  0.8109  43.2720  0.9266  4.8 
Multitask Bayesian compressive sensing demonstrates the best results according to almost each measure. Bayesian compressive sensing and OMP show the competitive results but Bayesian compressive sensing works faster. It is worth to note that multitask Bayesian compressive sensing has the biggest variance among the runs with the different design matrices, while the variances of the Bayesian compressive sensing and OMP runs for the same matrices are quite small.
Iv Conclusions and future work
This work presents two Bayesian compressive sensing algorithms in the application of background subtraction. These are the applications of the conventional Bayesian compressive sensing and of the multitask Bayesian compressive sensing algorithms. The large size of the video frames leads to the high computational time for all methods, that is presented in Tables I – II. However, the results presented in Figures 18 – 34 demonstrate the appropriate reconstruction quality of the original image based on only 5000 measurements (that is 30% of the original image size).
The conventional Bayesian compressive sensing method demonstrates the similar results to the greedy algorithm OMP but BCS is more effective in terms of the computational time. If the computational time is not critical the extension of the Bayesian method designed for a multitask problem can improve the performance in terms of the different measures. Therefore other extensions of the Bayesian method to include the prior information need further research.
The following problems can be addressed in future work. Further research can be done on implementing different sparse Bayesian methods. The EPbased framework with the Laplace prior proposed in [5]
can be compared in terms of computational times and reconstruction errors. It uses the different inference scheme and prior, so the results should be different. Also the Markov Chain Monte Carlo (MCMC)
[16] framework can be added to the comparison.The current methods assume that the components of the foreground intensities are not correlated. For most cases the objects are grouped into several clusters, therefore more sophisticated sparsity models can be introduced to reflect the structure of the foreground. The Bayesian framework allows to implement such modifications.
Exploring the applications in video tracking is one more avenue for further research.
V Acknowledgements
The authors Olga Isupova and Lyudmila Mihaylova are grateful for the support provided by the EC Seventh Framework Programme [FP7 20132017] TRAcking in compleX sensor systems (TRAX) Grant agreement no.: 607400. Lyudmila Mihaylova acknowledges also the support from the UK Engineering and Physical Sciences Research Council (EPSRC) via the Bayesian Tracking and Reasoning over Time (BTaRoT) grant EP/K021516/1.
References
 [1] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski, “Optimization with sparsityinducing penalties,” Found. Trends Mach. Learn., vol. 4, no. 1, pp. 1–106, Jan. 2012.
 [2] E. Candes and M. Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21–30, March 2008.
 [3] A. Y. Carmi, L. S. Mihaylova, and S. J. Godsill, “Introduction to compressed sensing and sparse filtering,” in Compressed Sensing and Sparse Filtering, ser. Signals and Communication Technology, A. Y. Carmi, L. Mihaylova, and S. J. Godsill, Eds. Springer Berlin Heidelberg, 2014, pp. 1–23.
 [4] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 58, no. 1, pp. 267–288, 1996.

[5]
M. Seeger, “Bayesian Inference and Optimal Design in the Sparse Linear Model,”
Journal of Machine Learning Research
, vol. 9, pp. 759–813, 2008.  [6] M. E. Tipping, “Sparse bayesian learning and the relevance vector machine,” The journal of machine learning research, vol. 1, pp. 211–244, 2001.
 [7] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Transactions on Signal Processing, vol. 56, no. 6, pp. 2346–2356, June 2008.
 [8] M. W. Seeger and H. Nickisch, “Compressed sensing and bayesian experimental design,” in Proceedings of the 25th International Conference on Machine Learning, ser. ICML ’08. New York, NY, USA: ACM, 2008, pp. 912–919.
 [9] J. Mairal, F. R. Bach, and J. Ponce, “Sparse modeling for image and vision processing,” CoRR, vol. abs/1411.3230, 2014.
 [10] V. Cevher, A. Sankaranarayanan, M. F. Duarte, D. Reddy, and R. G. Baraniuk, “Compressive sensing for background subtraction,” in European Conf. Comp. Vision (ECCV), 2008, pp. 155–168.
 [11] G. Warnell, S. Bhattacharya, R. Chellappa, and T. Basar, “AdaptiveRate Compressive Sensing Using Side Information,” ArXiv eprints, Jan. 2014.
 [12] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253–263, 2008.
 [13] K. P. Murphy, Machine Learning: A Probabilistic Perspective. The MIT Press, 2012.
 [14] S. Ji, D. Dunson, and L. Carin, “Multitask compressive sensing,” IEEE Transactions on Signal Processing, vol. 57, no. 1, pp. 92–106, Jan 2009.
 [15] S. Mallat and Z. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Transactions on Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec 1993.
 [16] T. Park and G. Casella, “The bayesian lasso,” Journal of the American Statistical Association, vol. 103, no. 482, pp. 681–686, 2008.