Consider the problem of reconstructing a sequence of sparse signals from a limited number of measurements. Let be the signal at time and
be the vector of signal measurements at time, where . Assume the signals evolve according to the dynamical model
where is modeling noise and is a sensing matrix. In (1a), is a known, but otherwise arbitrary, map that describes as a function of past signals. We assume that each and is sparse, i.e., it has a small number of nonzero entries. Our goal is to reconstruct the signal sequence from the measurement sequence . We require the reconstruction scheme to be recursive (or online), i.e., is reconstructed before acquiring measurements of any future signal , , and also to use a minimal number of measurements. We formalize the problem as follows.
Problem statement. Given two unknown sparse sequences and satisfying (1), design an online algorithm that 1) uses a minimal number of measurements at time , and 2) perfectly reconstructs each from acquired as in (1b), and possibly , .
Note that our setting immediately generalizes from the case where each is sparse to the case where has a sparse representation in a linear, invertible transform.111If is not sparse but is, where is an invertible matrix, then redefine
is an invertible matrix, then redefineas the composition and as . The signal satisfies (1) with and .
Many problems require estimating a sequence of signals from a sequence of measurements satisfying the model in (1
). These include classification and tracking in computer vision systems[2, 3], radar tracking , dynamic MRI  and several tasks in wireless sensor networks .
Our application focus, however, is compressive background subtraction . Background subtraction is a key task for detecting and tracking objects in a video sequence and it has been applied, for example, in video surveillance [8, 9], traffic monitoring [10, 11], and medical imaging [12, 13]. Although there are many background subtraction techniques, e.g., [14, 3, 15], most of them assume access to full frames and, thus, are inapplicable in compressive video sensing [16, 17, 18], a technology used in cameras where sensing is expensive (e.g., infrared, UV wavelengths).
In compressive video sensing, one has access not to full frames as in conventional video, but only to a small set of linear measurements of each frame, as in (1b). Cevher et al.  noticed that background subtraction is possible in this context if the foreground pixels, i.e., those associated to a moving object, occupy a small area in each frame. Assuming the background image is known beforehand, compressed sensing techniques [19, 20] such as -norm minimization allow reconstructing each foreground. This not only reconstructs the original frame (if we add the reconstructed foreground to the known background), but also performs background subtraction as a by-product .
We mention that, with the exception of [21, 22], most approaches to compressive video sensing and to compressive background subtraction assume a fixed number of measurements for all frames [16, 7, 23, 24, 17, 18]. If this number is too small, reconstruction of the frames fails. If it is too large, reconstruction succeeds, but at the cost of spending unnecessary measurements in some or all frames. The work in [21, 22] addresses this problem with an online scheme that uses cross validation to compute the number of required measurements. Given a reconstructed foreground, [21, 22] estimates the area of the true foreground using extra cross-validation measurements. Then, assuming that foreground areas of two consecutive frames are the same, the phase diagram of the sensing matrix, which was computed beforehand, gives the number of measurements for the next frame. This approach, however, fails to use information from past frames in the reconstruction process, information that, as we will see, can be used to significantly reduce the number of measurements.
I-B Overview of our approach and contributions
Overview. Our approach to adaptive-rate signal reconstruction is based on the recent theoretical results of [25, 26]. These characterize the performance of sparse reconstructing schemes in the presence of side information. The scheme we are most interested in is the - minimization:
where is the optimization variable and is the -norm. In (2), is a vector of measurements and is a positive parameter. The vector is assumed known and is the so-called prior or side information: a vector similar to the vector that we want to reconstruct, say . Note that if we set in (2), we obtain basis pursuit , a well-known problem for reconstructing sparse signals and which is at the core of the theory of compressed sensing [20, 19]. Problem (2) generalizes basis pursuit by integrating the side information . The work in [25, 26] shows that, if has reasonable quality and the entries of
are drawn from an i.i.d. Gaussian distribution, the number of measurements required by (2) to reconstruct is much smaller than the number of measurements required by basis pursuit. Furthermore, the theory in [25, 26] establishes that is an optimal choice, irrespective of any problem parameter. This makes the reconstruction problem (2) parameter-free.
We address the problem of recursively reconstructing a sequence of sparse signals satisfying (1) as follows. Assuming the measurement matrix is Gaussian,222Although Gaussian matrices are hard to implement in practical systems, they have optimal performance. There are, however, other more practical matrices with a similar performance, e.g., [28, 29]. we propose an algorithm that uses (2) with to reconstruct each signal . And, building upon the results of [25, 26], we equip our algorithm with a mechanism to automatically compute an estimate on the number of required measurements. As application, we consider compressive background subtraction and show how to generate side information from past frames.
Contributions. We summarize our contributions as follows:
We propose an adaptive-rate algorithm for reconstructing sparse sequences satisfying the model in (1).
We establish conditions under which our algorithm reconstructs a finite sparse sequence
with large probability.
We describe how to apply the algorithm to compressive background subtraction problems, using motion-compensated extrapolation to predict the next image to be acquired. In other words, we show how to generate side information.
Given that images predicted by motion-compensated extrapolation are known to exhibit Laplacian noise, we then characterize the performance of (2) under this model.
Finally, we show the impressive performance of our algorithm for performing compressive background subtraction on a sequence of real images.
Besides the incorporation of a scheme to compute a minimal number of measurements on-the-fly, there is another aspect that makes our algorithm fundamentally different from prior work. As overviewed in Section II, most prior algorithms for reconstructing dynamical sparse signals work well only when the sparsity pattern of varies slowly with time. Our algorithm, in contrast, operates well even when the sparsity pattern of varies arbitrarily between consecutive time instants, as shown by our theory and experiments. What is required to vary slowly is the “quality” of the prediction given by each (i.e., the quality of the side information) and, to a lesser extent, not the sparsity pattern of but only its sparsity, i.e., the number of nonzero entries.
Section II overviews related work. In Section III, we state the results from [25, 26] that are used by our algorithm. Section IV describes the algorithm and establishes reconstruction guarantees. Section V concerns the application to compressive background subtraction. Experimental results illustrating the performance of our algorithm are shown in section VI; and section VII concludes the paper. The appendix contains the proofs of our results.
Ii Related work
There is an extensive literature on reconstructing time-varying signals from limited measurements. Here, we provide an overview by referring a few landmark papers.
The Kalman filter. The classical solution to estimate a sequence of signals satisfying (1
) or, in the control terminology, the state of a dynamical system, is the Kalman filter. The Kalman filter is an online algorithm that is least-squares optimal when the model is linear, i.e., , and the sequence is Gaussian and independent across time. Several extensions are available when these assumptions do not hold [31, 32, 33]. The Kalman filter and its extensions, however, are inapplicable to our scenario, as they do not easily integrate the additional knowledge that the state is sparse.
Dynamical sparse signal reconstruction. Some prior work incorporates signal structure, such as sparsity, into online sparse reconstruction procedures. For example, [34, 35] adapts a Kalman filter to estimate a sequence of sparse signals. Roughly, we have an estimate of the signal’s support at each time instant and use the Kalman filter to compute the (nonzero) signal values. When a change in the support is detected, the estimate of the support is updated using compressed sensing techniques. The work in [34, 35], however, assumes that the support varies very slowly and does not provide any strategy to update (or compute) the number of measurements; indeed, the number of measurements is assumed constant along time. Related work that also assumes a fixed number of measurements includes , which uses approximate belief propagation, and , which integrates sparsity knowledge into a Kalman filter via a pseudo-measurement technique. The works in [38, 39] and  propose online algorithms named GROUSE and PETRELS, respectively, for estimating signals that lie on a low-dimensional subspace. Their model can be seen as a particular case of (1), where each map is linear and depends only on the previous signal. Akin to most prior work, both GROUSE and PETRELS assume that the rank of the underlying subspace (i.e., the sparsity of ) varies slowly with time, and fail to provide a scheme to compute the number of measurements.
where and is the Euclidean -norm. Problem (3) is the Lagrangian version of the problem we obtain by replacing the constraints of (2) with , where is a bound on the measurement noise; see problem (9) below. For in a given range, the solutions of (9) and (3) coincide. This is why the approach in  is so closely related to ours. Nevertheless, using (9) has two important advantages: first, in practice, it is easier to obtain bounds on the measurement noise than it is to tune ; second, and more importantly, the problem in (9) has well-characterized reconstruction guarantees [25, 26]. It is exactly those guarantees that enable our scheme for computing of the number of measurements online. The work in , as most prior work, assumes a fixed number of measurements for all signals.
Iii Preliminaries: Static signal
reconstruction using - minimization
This section reviews some results from , namely reconstruction guarantees for (2) in a static scenario, i.e., when we estimate just one signal, not a sequence. As mentioned before, is an optimal choice: it not only minimizes the bounds in , but also leads to the best results in practice. This is the reason why we use henceforth.
- minimization. Let be a sparse vector, and assume we have linear measurements of : , where . Denote the sparsity of with , where is the cardinality of a set. Assume we have access to a signal similar to (in the sense that is small) and suppose we attempt to reconstruct by solving the - minimization problem (2) with :
Note that the number of components of that contribute to are the ones defined on the support of ; thus, .
Theorem 1 (Th. 1 in ).
Theorem 1 establishes that if the number of measurements is larger than (6) then, with high probability, (4) reconstructs perfectly. The bound in (6) is a function of the signal dimension and sparsity , and of the quantities and , which depend on the signs of the entries of and , but not on their magnitudes. When approximates reasonably well, the bound in (6) is much smaller than the one for basis pursuit333Recall that basis pursuit is (2) with . in :
Namely,  establishes that if (7) holds and if has i.i.d. Gaussian entries with zero mean and variance then, with probability similar to the one in Theorem 1, is the unique solution to basis pursuit. Indeed, if and is larger than a small negative constant, then (6) is much smaller than (7). Note that, in practice, the quantities , , and are unknown, since they depend on the unknown signal . In the next section, we propose an online scheme to estimate them using past signals.
where . Let be any solution of
Then, with overwhelming probability, , i.e., (9) reconstructs stably. Our algorithm, described in the next section, adapts easily to the noisy scenario, but we provide reconstruction guarantees only for the noiseless case.
Iv Online sparse signal estimation
Algorithm 1 describes our online scheme for reconstructing a sparse sequence satisfying (1). Although described for a noiseless measurement scenario, the algorithm adapts to the noisy scenario in a straightforward way, as discussed later. Such an adaptation is essential when using it on a real system, e.g., a single-pixel camera .
Iv-a Algorithm description
The algorithm consists of two parts: the initialization, where the first two signals and are reconstructed using basis pursuit, and the online estimation, where the remaining signals are reconstructed using - minimization.
Part I: Initialization. In steps 4-9, we compute the number of measurements and according to the bound in (7), and then reconstruct and via basis pursuit. The expressions for and in step 5 require estimates and of the sparsity of and , which are given as input to the algorithm. Henceforth, variables with hats refer to estimates. Steps 10-12 initialize the estimator : during Part II of the algorithm, should approximate the right-hand side of (6) for , i.e., with , , and , where the subscript indicates that these are parameters associated with .
Part II: Online estimation. The loop in Part II starts by computing the number of measurements as , where , an input to the algorithm, is a (positive) safeguard parameter. We take more measurements from than the ones prescribed by , because is only an approximation to the bound in (6), as explained next. After acquiring measurements from , we reconstruct it as via - minimization with (step 19). Next, in step 20, we compute the sparsity and the quantities in (5), and , for . If the reconstruction of is perfect, i.e., , then all these quantities match their true values. In that case, in step 21 will also match the true value of the bound in (6). Note, however, that the bound for , , is computed only after is reconstructed. Consequently, the number of measurements used in the acquisition of , , is a function of the bound (6) for . Since the bounds for and might differ, we take more measurements than the ones specified by by a factor , as in step 16. Also, we mitigate the effect of failed reconstructions by filtering with an exponential moving average filter, in step 22. Indeed, if reconstruction fails for some , the resulting might differ significantly from the true bound in (6). The role of the filter is to smooth out such variations.
Extension to the noisy case. Algorithm 1 can be easily extended to the scenario where the acquisition process is noisy, i.e., . Assume that is arbitrary noise, but has bounded magnitude, i.e., we know such that . In that case, the constraint in the reconstruction problems in steps 8 and 19 should be replaced by . The other modification is in steps 11 and 21, whose expressions for are multiplied by as in (8). Our reconstruction guarantees, however, hold only for the noiseless case.
Remarks. We will see in the next section that Algorithm 1 works well when each is chosen according to the prediction quality of : the worse the prediction quality, the larger should be. In practice, it may be more convenient to make constant, as we do in our experiments in Section VI. Note that the conditions under which our algorithm performs well differ from the majority of prior work. For example, the algorithms in [34, 35, 7, 36, 21, 37, 44, 40, 38, 39, 22] work well when the sparsity pattern of varies slowly between consecutive time instants. Our algorithm, in contrast, works well when the quality parameters and of the side information and also the sparsity of vary slowly; in other words, when the quality of the prediction of varies slowly.
Iv-B Reconstruction guarantees
The following result bounds the probability with which Algorithm 1 with perfectly reconstructs a finite-length sequence . The idea is to rewrite the condition that (6) applied to is times larger than (6) applied to . If that condition holds for the entire sequence then, using Theorem 1 and assuming that the matrices are drawn independently, we can bound the probability of successful reconstruction. The proof is in Appendix A.
Let , , and fix . Let also, for all ,
where . Assume , for , i.e., that the initial sparsity estimates and are not smaller than the true sparsity of and . Assume also that the matrices in Algorithm 1 are drawn independently. Then, the probability (over the sequence of matrices ) that Algorithm 1 reconstructs perfectly in all time instants is at least
When the conditions of Lemma 2 hold, the probability of perfect reconstruction decreases with the length of the sequence, albeit at a very slow rate: for example, if is as small as , then (11) equals for , and for . If is larger, these numbers are even closer to .
Suppose and are signals for which . In that case, , and condition (12) tells us that the oversampling factor should be larger than the relative variation of from time to time . In general, the magnitude of and can be significant, since they approach zero at a relatively slow rate, . Hence, those terms should not be ignored.
Remarks on the noisy case. There is an inherent difficulty in establishing a counterpart of Lemma 2 for the noisy measurement scenario: namely, the quality parameters and in (5) are not continuous functions of . So, no matter how close a reconstructed signal is from the original one, their quality parameters can differ arbitrarily. And, for the noisy measurement case, we can never guarantee that the reconstructed and the original signals are equal; at most, if (8) holds, they are within a distance , for .
So far, we have considered and to be deterministic sequences. In the next section, we will model (and thus ) as a Laplacian stochastic process.
V Compressive Video Background subtraction
We now consider the application of our algorithm to compressive video background subtraction. We start by modeling the problem of compressive background subtraction as the estimation of a sequence of sparse signals satisfying (1). Our background subtraction system, based on Algorithm 1, is then introduced. Finally, we establish reconstruction guarantees for our scheme when in (1a) is Laplacian noise.
Let be a sequence of images, each with resolution , and let with be the (column-major) vectorization of the th image. At time instant , we collect linear measurements of : , where is a measurement matrix. We decompose each image as , where is the th foreground image, typically sparse, and is the background image, assumed known and to be the same in all the images. Let and be vectorizations of and , respectively. Because the background image is known, we take measurements from it using : . Then, as suggested in , we subtract to :
This equation tells us that, although we cannot measure the foreground image directly, we can still construct a vector measurements, , as if we would. Given that is usually sparse, the theory of compressed sensing tells us that it can be reconstructed by solving, for example, basis pursuit [20, 19]. Specifically, if has sparsity and the entries of
are realizations of i.i.d. zero-mean Gaussian random variables with variance, then measurements suffice to reconstruct perfectly  [cf. (7)].
Notice that (13) is exactly the equation of measurements in (1b). Regarding equation (1a), we will use it to model the estimation of the foreground of each frame, , from previous foregrounds, . We use a motion-compensated extrapolation technique, as explained in Subsection V-C. This technique is known to produce image estimates with an error well modeled as Laplacian and, thus, each is expected to be small. This perfectly aligns with the way we integrate side information in our reconstruction scheme: namely, the second term in the objective of the optimization problem in step 19 of Algorithm 1 is nothing but .
V-B Our background subtraction scheme
Fig. 1 shows the block diagram of our compressive background subtraction scheme and, essentially, translates Algorithm 1 into a diagram. The scheme does not apply to the reconstruction of the first two frames, which are reconstructed as in , i.e., by solving basis pursuit. This corresponds to Part I of Algorithm 1. The scheme in Fig. 1 depicts Part II of Algorithm 1. The motion extrapolation module constructs a motion-compensated prediction of the current frame, , by using the two past (reconstructed) frames, and . Motion estimation is performed in the image domain () rather than in the foreground domain (), as the former contains more texture, thereby yielding a more accurate motion field. Next, the background frame is subtracted from to obtain a prediction of the foreground , i.e., the side information . These two operations are modeled in Algorithm 1 with the function , which takes a set of past reconstructed signals (in our case, and , to which we add , obtaining and , respectively), and outputs the side information . This is one of the inputs of the - block, which solves the optimization problem (4). To obtain the other input, i.e., the set of foreground measurements , we proceed as specified in equation (13): we take measurements of the current frame and, using the same matrix, we take measurements of the background . Subtracting them we obtain . The output of the - module is the estimated foreground , from which we obtain the estimate of the current frame as .
V-C Motion-compensated extrapolation
To obtain an accurate predition , we use a motion-compensated extrapolation technique similar to what is used in distributed video coding for generating decoder-based motion-compensated predictions [45, 46, 47]. Our technique is illustrated in Fig. 2. In the first stage, we perform forward block-based motion estimation between the reconstructed frames and . The block matching algorithm is performed with half-pel accuracy and considers a block size of pixels and a search range of
pixels. The required interpolation for half-pel motion estimation is performed using the 6-tap filter of H.264/AVC. In addition, we use the -norm (or sum of absolute differences: SAD) as error metric. The resulting motion vectors are then spatially smoothed by applying a weighted vector-median filter 
. The filtering improves the spatial coherence of the resulting motion field by removing outliers (i.e., motion vectors that are far from the true motion field). Assuming linear motion betweenand , and and , we linearly project the motion vectors between and to obtain , our estimate of ; see Fig. 2. During motion compensation, pixels in that belong to overlapping prediction blocks are estimated as the average of their corresponding motion-compensated pixel predictors in . Pixels in uncovered areas (i.e., no motion-compensated predictor is available) are estimated by taking averaging the three neighbor pixel values in (up, left and up-left pixel positions, following a raster scan of the frame) and the corresponding pixel in .
V-D Reconstruction guarantees for Laplacian modeling noise
It is well known that the noise produced by a motion-compensated prediction module, as the one just described, is Laplacian [45, 50]. In our model, that corresponds to each in (1a) being Laplacian. We assume each is independent from the matrix of measurements .
are independent and have zero-mean. The probability distribution ofis then
where , and is the parameter of the distribution of . The entries of , although independent, are not identically distributed, since they have possibly different parameters . The variance of each component is given by .
Resulting model for . The sequence being stochastic implies that is also stochastic. Indeed, if each in (1) is measurable, then is a sequence of random variables. Given the independence across time and across components of the sequence , the distribution of given is also Laplacian, yet not necessarily with zero-mean. That is, for and ,
where is the th component of . In words, the distribution of each component of conditioned on all past realizations , , is Laplacian with mean and parameter . Furthermore, it is independent from the other components.
Reconstruction guarantees. Note that and being stochastic processes implies that the quantities in (5), which we will denote with and for signal , are random variables. Hence, at each time , the conditions of Theorem 1, namely that and that there is at least one index such that , become events, and may or may not hold. We now impose conditions on the variances that guarantee the conditions of Theorem 1 are satisfied and, thus, that - minimization reconstructs perfectly, with high probability. Given a set , we use to denote its complement in .
Let be given. Let have distribution (14), where the variance of component is . Define , and the sets and . Assume , that is, there exists such that and . Assume is generated as in Theorem 1 with a number of measurements
for some , where . Let denote the solution of - minimization (4). Then,
The proof is in Appendix B. By assuming each component is Laplacian with parameter (independent from the other components), Theorem 3 establishes a lower bound on the number of measurements that guarantee perfect reconstruction of with probability as in (17). Note that all the quantities in (16) are deterministic. This contrasts with the direct application of Theorem 1 to the problem, since the right-hand side of (6) is a function of the random variables , , and . The assumption implies , which means that some components of have zero variance and, hence, are equal to zero with probability . Note that, provided the variances are known, all the quantities in (16), and consequently in (17), are known.
The proof of Theorem 3 uses the fact that the sparsity of is with probability . This implies that the bound in (16) is always smaller than the one for basis pursuit in (7) whenever . Since , this holds if .