Visual surveillance is a task that often involves collecting a large amount of data in search of information contained in relatively small segments of video. For example, a surveillance system tasked with intruder detection will often spend most of its time collecting observations of a scene in which no intruders are present. Without any such foreground objects, the corresponding surveillance video is useless: it is only the portions of video that depict these unexpected objects in the environment that are useful for surveillance. However, because it is unknown when such objects will appear, many systems gather the same amount of data regardless of scene content. This static approach to sensing is wasteful in that resources are spent collecting unimportant data. However, it is not immediately clear how to efficiently acquire useful data since the periods of scene activity are unknown in advance. If this information were available a priori, a better scheme would be to collect data only during times when foreground objects are present.
In any attempt to do so, the system must make some sort of real-time decision regarding scene activity. However, such a decision can be made only if real-time data to that effect is available. We shall refer to such data as side information. Broadly, this information can come from two sources: a secondary modality and/or the primary video sensor itself. In this paper, we develop two adaptive sensing schemes that exploit side information that comes from an example of each. Our first strategy employs a single video sensor to continuously make observations that are simultaneously used to infer both the foreground and the scene activity. The second adaptive method we present determines scene activity using observations that come from a secondary visual sensor. Both methods utilize a compressive sensing (CS)      camera as the primary modality. While many such sensors are beginning to emerge , our methods are specifically developed for a fast variant of a spatially multiplexing camera such as the single-pixel camera  .
In this paper, we consider the following basic scenario: a CS camera is tasked with observing a region for the purpose of obtaining foreground video. Since the foreground often occupies only a relatively small number of pixels, Cevher et al.  have shown that a small number of compressive measurements provided by this camera are sufficient to ensure that the foreground can be accurately inferred. However, the solution provided in that work implicitly relies on an assumption that is pervasive in the CS literature: that an upper bound on the sparsity (number of significant components) of the signal(s) under observation is known. Such an assumption enables the use of a static measurement process for each image in the video sequence. However, foreground video is a dynamic entity: changes in the number and appearance of foreground objects can cause large changes in sparsity with respect to time. Underestimating this quantity will lead to the use of a CS system that will provide too few measurements for an accurate reconstruction. Overestimating signal sparsity, on the other hand, will require the collection of more measurements than necessary to achieve such a reconstruction. For example, consider Figure 1. The true foreground’s (Figure 1LABEL:sub@fig::intro_measurementrate_true) reconstruction is poor when too few compressive measurements are collected (Figure 1LABEL:sub@fig::intro_measurementrate_low), but looks virtually the same whether or not an optimal or greater-than-optimal number of measurements are acquired (Figures 1LABEL:sub@fig::intro_measurementrate_opt and 1LABEL:sub@fig::intro_measurementrate_high, respectively). Therefore, dependent on the number of measurements acquired at each time instant, the static CS approach is insufficient at worst and wasteful at best.
We provide in this paper novel, adaptive-rate CS strategies that seek to address this problem. The approaches we present utilize two different forms of side information: cross-validation measurements and low-resolution measurements. In each case, we use the extra information in order to predict the number of foreground pixels (sparsity) in the next frame.
I-a Related Work
Adapting the standard CS framework to a dynamic, time-varying signal is something that has been studied from various perspectives by several researchers.
Wakin et al. , Park and Wakin , Sankaranarayanan et al. , and Reddy et al.  have each proposed video-specific versions of CS. Each one leverages video-specific signal dynamics such as temporal correlation and optical flow. For measurement models that provide streaming CS measurements, Sankaranarayan et al. , Asif and Romberg , and Angelosante et al.  have proposed adaptive CS decoding procedures that are faster and more accurate than those that do not explicitly model the video dynamics.
Vaswani et al.   , Cossalter et al. , and Stankovic et al.   propose modifications to the CS decoding step that leverage extra signal support information in order to provide more accurate reconstructions from a fixed number of measurements. More generally, Scarlett et al.  provide generic information-theoretic bounds for any support-adaptive decoding procedure. Malioutov et al.  and Boufonous et al. 
propose decoders with adaptive stopping criteria: sequential signal estimates are made until either a consistency or cross-validation criterion is met.
Several researchers have also considered adaptive encoding techniques. These techniques primarily focus on finding and using the “best” compressive measurement vectors at each instant of time. Ashoket al.  propose an offline procedure in order to design entire measurement matrices optimized for a specific task. Similarly, Duarte-Carvajalino et al.  compute class-specific optimal measurements offline, but decide which class to use using an online procedure with a fixed number of measurements. Purely online procedures include those developed by Averbuch et al. , Ji et al. , Chou et al. , and Haupt et al. : the next-best measurement vectors are computed by optimizing criterion functions that seek to minimize quantites such as posterior entropy and expected reconstruction error. Some of these methods use a fixed measurement rate, while others propose stopping criterion similar to several of the adaptive decoding procedures.
Some of the above methods exhibit an adaptive measurement rate in that they stop collecting measurements when certain criteria are met. However, due to the dynamic nature of video signals, it may not be possible to evaluate these criteria (as they often involve CS decoding) and collect a new measurement before the signal has significantly changed. Recent adaptive-rate work by Yuan et al.  and Schaeffer et al.  sidesteps this problem by using a static spatial measurement rate and considering how to adaptively select the temporal compression rate through batch analysis. In contrast, we propose here techniques that specify a fixed number of spatially-multiplexed measurements to acquire before sensing the signal at a given time instant and modify this quantity between each acquisition without assuming that the signal remains static between acquisitions. That is, we consider a system in which the decoding procedure is fixed and we are able to change the encoding procedure, which is fundamentally different from the previously-discussed work on adaptive decoding procedures (e.g., that of Vaswani et al.   ).
This paper is organized as follows. In Section II, we provide a brief overview of CS. Sections III and IV contain a precise formulation of and context for our rate-adaptive CS algorithms. Our measurement acquisition technique is described in Section V. The proposed adaptive rate CS techniques are discussed in Sections VI and VII, and they are experimentally validated in Section VIII. Finally, we provide a summary and future research directions in Section IX.
Ii Compressive Sensing
Compressive sensing is a relatively new theory in sensing which asserts that a certain class of discrete signals can be adequately sensed by capturing far fewer measurements than the dimension of the ambient space in which they reside. By “adequately sensed,” it is meant that the signal of interest can be accurately inferred using the measurements acquired by the sensor.
In this paper, we use CS in the context of imaging. Consider a grayscale image , vectorized in column-major order as . A traditional camera uses an array of photodetectors in order to produce measurements of : each detector records a single value that defines the corresponding component of . If we are instead able to gather measurements of a fundamentally different type, CS theory suggests that we may be able to determine from far fewer than of them. Specifically, these compressive measurements record linear combinations of pixel values, i.e., , where is referred to as a measurement matrix and .
CS theory presents three general conditions under which the above claim is valid. First, should be sparse or compressible. In general, a vector is said to be sparse if very few of its components are nonzero; more precisely, vectors having no more than nonzero components are said to be -sparse. A vector is said to be compressible if it is well-approximated by a sparse signal, i.e., it has a small number of components with a large magnitude and many with much smaller magnitudes.
Second, the measurement matrix (encoder) should exhibit the restricted isometry property (RIP) of a certain order and constant. Specifically, exhibits the RIP of order with constant if the following inequality holds for all -sparse :
While we will discuss proposed construction methods for a that exhibits the RIP for specified and in Section V, they generally involve selecting such that it exceeds a lower bound that grows with increasing and decreasing .
Finally, an appropriate decoding procedure, , should be used. While many successful decoding schemes have been discussed in the literature, we shall focus here on one in particular:
where the norm is given explicitly by .
With these three conditions in mind, CS theory provides us with the following result: for an -sparse measured with a that exhibits the RIP of order with , will exactly recover . If is compressible, a similar result that bounds the reconstruction error is available. Thus, by modifying the sensor and decoder to implement and , respectively, can be adequately sensed using only measurements.
Sensors based on the above theory are still just beginning to emerge . One of the most notable is the single-pixel camera , where measurements specified by each row of are sequentially computed in the optical domain via a digital micromirror device and a single photodiode. Throughout the remainder of this paper, we shall assume that such a device is the primary sensor.
Iii Problem Statement
We assume that we possess a CS camera that is capable of acquiring a variable number of compressive measurements at discrete instants of time. We denote the measurement matrix at time by , and we construct it via a process that depends only on our choice for (see Section V). The value used for will be determined by the adaptive sensing strategy prior to time . The images we observe will be of size , and will denote the specific image at time . Vectorizing using column-major order as allows us to write the compressive measurement process at time as .
We will present two adaptive sensing strategies that will each exploit a different type of side information. The first strategy uses a small set of cross-validation measurements, obtained from a static linear measurement operator , i.e., . here is referred to as a cross-validation matrix. The second strategy we present relies on a set of low-resolution measurements, that we obtain via a secondary sensor that collects lower-resolution measurements of . Such multi-camera systems are not uncommon in the surveillance literature (see, e.g.,  ).
Having established the above notation, the problem we address in this paper is that of how to use the observations , , and to select a minimal value for that will ensure gathers enough information to ensure accurate reconstruction of the foreground (dynamic) component of the high-resolution .
Iv Compressive Sensing for Background Subtraction
We present our work in the context of the problem of background subtraction for video sequences. Broadly, background subtraction is the process of decomposing an image into foreground and background components, where the foreground usually represents the objects of interest in the environment under observation. For our purposes, we shall adopt the following model for images :
where is an unknown but deterministic static component of each image in the video sequence and
is a random variable. At time, we estimate the locations of foreground pixels by computing the set of indices , for some pre-defined threshold . We further assume that the components of that correspond to are bounded in magnitude, i.e., for all .
Throughout this work, we shall assume that the components of are distributed as follows:
where each component is assumed to be independent of the others. We have approximated the intensity distribution of those pixels not in as a zero-mean Gaussian under the assumption that is much smaller than .
Following the work of Cevher et al. , we seek to perform background subtraction in the compressive domain. Often, it is the case that the foreground occupies only a very small portion of the image plane, i.e., . Given the foreground model (4), this implies that is compressible in the spatial domain. Therefore, if is known, we can use it, (3), and compressive image measurements to generate the following estimate of :
where and .
As we will discuss in Section V, we construct by taking a subset of rows from a fixed matrix, , and rescaling the result. We can therefore calculate from by similarly dropping components and rescaling. Noting (4), a maximum-likelihood estimate of can be found by computing the mean of compressive measurements of a background-only video sequence, i.e.,
where and for all in the summation. These measurements can be obtained in advance by using the full sensing matrix, , to observe the scene when it is known that there is no foreground component.
V Sensing Matrix Design
In this section, we will discuss our method for constructing adaptive rate measurement matrices for the purpose of recovering sparse signals from a minimal amount of measurements.
V-a Theoretical Guarantees
In Section II, we presented a theoretical result from CS literature that states that will exactly recover an -sparse from if exhibits the RIP of order with
. One of the most prevalent methods discussed in the literature for constructing such matrices involves drawing each matrix entry from a Gaussian distribution with parameters that depend on the number of rows that the matrix possesses. For, this technique defines entries
as independent realizations of a Gaussian random variable with zero mean and variance, i.e.,
with probability exceeding
The scenarios discussed in this paper require us to find the minimum that will ensure the constructed matrix can successfully recover -sparse signals. Therefore, we now consider the case where , , and are fixed. If we impose a lower bound, , on the probability of success given by (8), rearranging terms reveals that the theory requires
For practical measurement matrices, we are only interested in the case where (i.e., matrices for which compression actually occurs). Combining this requirement with (9) yields the following lower bound for :
For -sparse signals, the reconstruction guarantee that accompanies requires that exhibits the RIP of order with . Using only the second term of the lower bound in (10) and noting that the first term is always positive, we see that requiring such a means that can be no greater than .
In our system, represents the percentage of foreground pixels in the image, and it is unreasonable to expect that this quantity will never exceed . Therefore, if we wish to use CS for compression (i.e., with a measurement matrix that has fewer rows than columns), we must design and use matrices without the guarantee provided by the above result. However, that result is merely sufficient: in the next part, we will experimentally show that similarly-constructed matrices with far fewer rows are indeed still able to provide measurements that enable accurate sparse signal reconstruction.
V-B Practical Sensing Matrix Design Based on Phase Diagrams
Given a candidate sensing matrix construction technique, Donoho and Tanner  discuss an associated phase diagram: a numerical representation of how useful the generated matrices are for CS. Specifically, the ratios (signal undersampling) and (signal sparsity) are considered. A phase diagram is a function defined over the phase space . We discretize this space and perform multiple sense-and-reconstruct experiments at each grid point in order to approximate the phase diagram there: the value of provides the information necessary for matrix construction, and provides the information necessary to generate random sparse signals. We make the approximation using the percentage of trials that result in successful signal recovery, which we define as a normalized reconstruction error of or less.
Even though we cannot use the theoretical guarantee discussed earlier in this section, the first matrix construction technique we use is based on randomly-generated matrices that rely on independent realizations of a Gaussian random variable. Specifically, we use the following construction technique: we generate by drawing each entry according to (7). Then, for a given value of , we form the corresponding matrix via
where denotes the submatrix of corresponding to the first rows. The scaling factor ensures that the relationship between the variance and the number of rows defined in (7) is preserved.
We also analyze a second matrix construction technique based on the discrete Fourier transform (DFT). Specifically, we generateby randomly permuting the rows of the DFT matrix and form according to (11).
In this paper, we will make predictions regarding the sparsity of the signals we are about to observe. Given a prediction , we will seek the minimum such that (11) generates a sensing matrix capable of providing enough measurements to ensure accurate reconstruction of -sparse signals. In order to determine the mapping from to , we use the associated phase diagram. We construct this diagram (see Figure 2) during a one-time, offline analysis. Then, given and a minimum probability of reconstruction success , we use the phase diagram as a lookup table to find the smallest value of that yields at least a success rate for -sparse signals.
Vi Method I: Cross Validation
In this section, we describe a rate-adaptive CS method that utilizes a set of linear cross-validation measurements . An earlier version of this work was presented by Warnell et al. .
Vi-a Compressive Sensing with Cross Validation
Let be a set of compressive measurements of a sparse signal obtained using , i.e., . In this section, we will use to denote the -sparse point estimate of this signal obtained using , where is defined as in (2) and denotes a truncation operation that sets all but the largest-magnitude components of the vector-valued argument to zero.
Ward  bounds the error of the above estimate using a cross-validation technique that is based on the Johnson-Lindenstrauss lemma . At the same time is collected, we use a static cross-validation matrix to collect cross-validation measurements . We construct
by drawing each of its entries from an i.i.d. Bernoulli distribution with zero mean and variance. Such a construction leads to the following statement: for given accuracy and confidence parameters and (respectively), rows suffice to ensure that
with probability exceeding .
Let denote the optimal -sparse approximation error measured with respect to the norm, i.e.,
where the -norm is given by . Using the fact that is -sparse, the upper bound in (12) can be extended to as follows:
That is, the observable CV error can be used to upper bound the unobservable optimal -sparse approximation error.
Vi-B Adaptive-Rate Compressive Sensing via Cross Validation
Let denote the true value of the foreground sparsity at time , i.e., . The method we present here relies on an estimate of this quantity, which we denote as . Before sensing begins at time , we assume to be -sparse, and select the corresponding minimal (and thus ) according to the phase diagram technique described in Section V. We then use and to collect and . Using the technique described in Section IV, we can find and form the foreground estimate . In a similar fashion, we can also find by subtracting a precalculated set of cross-validation measurements of the static signal component, , from . Finally, we select based on the result of a multiple hypothesis test that uses and .
We formulate the multiple hypothesis test by first assuming that we are able to observe
. We define the null hypothesis,, as the scenario under which exceeds . If this hypothesis is true, then (i.e., the optimal -sparse approximation to ) captures all foreground pixels and background pixels while neglecting the remaining background pixels. Using (4), it can be shown that is a random variable with mean, , and variance, , given by
We also define a set of hypotheses that are possible when is not true. Let describe the scenario under which . Under , cannot capture all foreground pixels: it neglects the smallest of them and the background pixels. Using (4), it can be shown that the mean, , and variance, , of under these hypotheses are given by
The hypothesis test can be succintly written as
for . Let
denote the probability density function forunder the assumption that is true for . We will evaluate explicit assumptions regarding the form of in Section VIII. The optimal decision rule for (17) under the minimum probability of error criterion with an equal prior for each hypothesis is given by
Assuming that the sparsity of is a slowly-varying quantity, we choose to set equal to what we believe to be. If , it is our belief that , and we expect that the error in to be very small. Therefore, we find the set of foreground entries for this signal, , and set . For any other value of , we set .
Unfortunately, it is impossible to directly observe . However, we can upper bound this quantity using the cross-validation measurements as specified in (14). Therefore, we propose the following modification to (18):
Observing that and are increasing functions of , it is apparent that (19) will potentially yield a value of greater than that which would have been selected by (18). This will result in a higher-than-necessary measurement rate at time , but it will not negatively impact the quality of .
We term the strategy we have outlined above adaptive-rate compressive sensing via cross validation (ARCS-CV) and summarize the procedure in Algorithm 1.
Vii Method II: Low-Resolution Tracking
In this section, we propose an adaptive method that utilizes a much richer form of side information than the random projections of the previous section: low-resolution images, , that have been captured using a traditional (i.e., non-compressive) camera.
Vii-a Low-Resolution Measurements
We assume that the low- and high-resolution images, and , repectively, are related by a simple downsampling operation. Let denote the coordinates of a pixel in the image plane of the low-resolution camera. If we use to denote the corresponding coordinate in the image plane of the compressive camera, the effect of the downsampling operation on coordinates is given by
where we assume the dowsampling factor, , to be an integer. Using (20), each pixel in maps to the center of a unique block of pixels in . The effect of the downsampling operation on image intensity is given by averaging the intensities within this block, i.e.,
where the coordinates of the pixels in the block are given explicitly as
Vii-B Object Tracking and Foreground Sparsity
, we assume that we are able to track the foreground objects. Specifically, we assume that at each time index, we are able to estimate a zero-skew affine warp parameterthat maps coordinates in an object template image, , to their corresponding location in . Using to denote a pixel coordinate in , specifies the corresponding coordinate in via
We further assume that the time-evolution of is governed by a known Markov dynamical system, i.e.,
for known and i.i.d. system noise .
Let be the set of corner coordinates of in any order that traces its outline. Then, given , we can calculate the position of the tracked object’s bounding box in using (21) and (20). We shall assume that the area of this bounding box specifies the number of foreground components in , i.e., . If this area is not integer-valued, we simply round up. Using the well-known formula for the area of a polygon from its corner coordinates, can be written as , where
and . Above, represents the ceiling function.
From (23), it is clear that the distribution of the random variable is a function of the distribution of . For the remainder of this section, we will use to denote the corresponding probability mass function.
Figure 3 illustrates the relationship between a typical high- and low-resolution image pair and shows an example bounding box found by a tracker using the low-resolution image.
Vii-C Sparsity Estimation
We now turn our attention to selecting a value to use for , , on the basis of the previous image’s track, . Once a value has been selected, we use the method presented in Section V to select a minimal and the corresponding . We then use to collect compressive measurements of and calculate . Using this procedure, the -generated estimate will obey
One criterion we will consider when selecting is the expected value of the reconstruction error, i.e., we would like to minimize . However, since the nonlinearity of makes determining the statistics of that quantity very difficult, we instead look to minimize the right-hand side of (24). It is easy to see that this quantity can be minimized by selecting as high as possible, which would provide no compression. Therefore, inspired by results from the model-order selection literature   , we penalize larger values of and instead propose to select by solving
where is an importance factor that specifies the tradeoff between low reconstruction error and a small sparsity estimate.
We term the strategy that we have outline above as adaptive-rate compressive sensing via low-resolution tracking (ARCS-LRT) and summarize the procedure in Algorithm 2.
We tested the proposed algorithms on real video sequences captured using traditional cameras. The compressive, cross-validation, and low-resolution measurements were simulated via software. The SPGL1   software package was used to implement the decoding procedure (2). Three video sequences were used: convoy2, marker_cam, and PETS2009_S2L1. convoy2 is a video of vehicles driving past a stationary camera. The vehicles comprise the foreground, and the foreground sparsity varies as a result of these vehicles sequentially entering and exiting the camera’s field of view. marker_cam is a video sequence we captured using a surveillance camera mounted to the side of our building at the University of Maryland, College Park. The sequence begins with a single pedestrian walking in a parking lot, with a second pedestrian joining him halfway through the sequence. The two pedestrians comprise the foreground, and the foreground sparsity varies due to the entrance of the second pedestrian and the variation in each pedestrian’s appearance as he moves relative to the camera. The PETS2009_S2L1 video sequence is a segment taken from the PETS 2009 benchmark data . This sequence consists of four pedestrians entering and exiting the camera’s field of view. Similar to marker_cam, the foreground sparsity changes as a function of the number and appearance of pedestrians. Example images from each dataset are shown in Figure 4.
Viii-a Practical Considerations
Implementation of the ARCS methods presented in Sections VI and VII requires certain practical choices. In this part, we describe the choices we made that generated the results presented later in this section. Specific choices for parameter values for each video sequence are given in Table I.
Viii-A1 Foreground Model
The foreground model specified in (4) is parameterized by and . The value that should be used for will depend on the quality of the estimate of (or, more accurately, in our system): the better (3) describes images in the video sequence, the smaller can be. Since represents the foreground-background intensity threshold, its value depends on the value selected for : should be set high enough to ensure that is sufficiently low, but low enough to ensure that it does not neglect intensities belonging to the foreground.
). While we are able to calculate the first- and second-order moments ofunder the various hypotheses, the maximum-likelihood decision rule (19) requires the entire probability density functions,
, for each. In our implementation, we approximate these densities by a normal distribution with mean and covariance specified by (15) and (16) under and , respectively. That is, we make the approximation . As a consequence of this approximation, we observed that (19) sometimes yielded a nonzero for sufficiently small cross-validation error upper bounds. However, when this upper bound is low, it is clear that we should select . Therefore, we explicitly impose a selection of for cross-validation error upper bounds that are less than by using
The ARCS-LRT method of Section VI requires low-resolution object tracks in order to reason about the sparsity of the high-resolution foreground. In order to focus on the performance of the adaptive algorithm, we first determined these tracks manually, i.e., by hand-marking bounding boxes around each low-resolution foreground image. We only did this for images in which the object was fully visible. We shall also consider automatically-obtained tracks later in this section.
We used to define the system dynamics in (22) with i.i.d. for each , where the value of should vary with the expected type of object motion.
Given this selection for , represents our belief about the next track given the current one. Due to the complexity of in (23), it is difficult to obtain an exact form for . Therefore, we used the unscented transformation  to obtain the first- and second-order moments, and , respectively. We then approximated using the pdf for a discrete approximation to the normal distribution with the computed mean and covariance.
The sparsity estimator (26) requires values for both and . Since our phase diagram lookup table returns an for which recovers -sparse signals, we selected . We then selected a that provided a good balance between the reconstruction error and foreground sparsity. For each video sequence, we chose this value by trying out many and selecting one that provided a good balance between low reconstruction error and a low sparsity estimate.
Viii-B Comparitive Results
In order to provide some context in which to interpret the results from our ARCS methods, we present them alongside those from the best-case sensing strategy: oracle CS. Oracle CS uses the true value of as its sparsity estimate, which is impossible to obtain in practice. We compare the average measurement rates and foreground reconstruction errors for the three methods (oracle, ARCS-CV, and ARCS-LRT) in Table II, and show the more detailed dynamic behavior in Figure 5. Note that the measurement values reported for the ARCS algorithms include the necessary overhead for the side information (i.e., the cross-validation and low-resolution measurements).
|Average # of Measurements ()||Average Reconstruction Error ()|
We first observe that the ARCS-LRT algorithm uses a significantly larger measurement rate than any of the others. This is due to the necessary overhead for the low-resolution side information. In our experiments, we used , i.e. is at least of . A smaller could be selected at the risk of poorer low-resolution tracking. The ARCS-CV algorithm performs much better in terms of measurement rate since the side-information overhead is relatively small (for all datasets, is less than of ).
It can also be seen that the ARCS-LRT sparsity estimate lags behind the true foreground sparsity for those images in which an object is entering or exiting the camera’s field-of-view but not fully visible. The phenomenon is especially visible in the third column (convoy2) of Figure 5. It is due to the fact that we have manually imposed the condition that the object cannot be tracked unless it is fully visible. This leads to the large spikes in foreground reconstruction error. However, when the object becomes fully visible, the low-resolution tracks provide the algorithm with enough information to monitor the high-resolution signal sparsity and the effect disappears.
Viii-C Steady-State Behavior
We analyzed the behavior of our ARCS methods when the signal under observation is static (i.e., for all ). To do so, we created a synthetic data sequence by repeating a single image in the convoy2 data set for which . Figure 6 shows the behavior of each algorithm when the initial sparsity estimate, , is wrong. For each method, we ran two experiments. For the first one, we initialized the sparsity estimate using a value that was too low (). For the second one, we initialized with a value that was too high (). Note that both methods are able to successfully adapt to the true value of , and the ARCS-LRT method adapts very quickly (requiring only a single image) due to the immediate availability of the low-resolution track.
Viii-D ARCS-LRT and Automatic Tracking
We also investigated the effect of using low-resolution tracks obtained via an automatic method. To do so, we implemented a simple blob tracker in MATLAB for the convoy2 sequence and used the resulting tracks in the ARCS-LRT framework. A comparison of algorithm performance between using automatic tracks and our manually-marked tracks is shown in Figure 7. Given the negligible effect of the blob tracker on the behavior of ARCS-LRT, we would not expect more sophisticated automatic tracking techniques to negatively affect performance.
Ix Summary and Future Work
We have described two techniques for using side information to adjust the measurement rate of a dynamic compressive sensing system. These techniques were developed in the specific context of using this system for video background subtraction. The first technique involves collecting side information in the form of a small number of extra cross-validation measurements and using an error bound to infer underlying signal sparsity. The second method uses side information from a secondary, low-resolution, traditional camera in order to infer the sparsity of the high-resolution images. In either case, we used a pre-computed phase diagram as a lookup table to map sparsity estimates to minimal compressive measurement rates. We validated these techniques on real video sequences using practical approximations for theoretical quantities.
This work provides a framework that allows for numerous extensions:
It may be possible to achieve more optimal measurement rates by modifying the decoder. For example, using techniques like those developed by Vaswani et al. , the phase diagrams we use could be updated.
In addition to modifying the number of rows, the content of the measurement matrix could be adjusted between acquisitions as well. Such a strategy would be theoretically similar to the previously-discussed work of Duarte-Carvajalino  et al. and others    , but with a fixed measurement budget at each time instant that would change from acquisition to acquisition.
The assumption that the side sensor in ARCS-LRT is co-located with the compressive camera could be removed. This might involve a more complicated mapping function (23) that also incorporates knowledge of the geometrical relationship between the two sensors.
The authors would like the thank Vishal Patel, Rachel Ward, Mark Davenport, and Aswin Sankaranarayanan for their correspondence during the development of this paper.
-  E. Candès, “Compressive sampling,” in Proceedings of the International Congress of Mathematics, 2006.
-  D. Donoho, “Compressed sensing,” IEEE Transactions on Information Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006.
-  E. Candes and T. Tao, “Near-optimal signal recovery from random projections: universal encoding strategies?” IEEE Transactions on Information Theory, vol. 52, no. 12, pp. 5406–5425, Dec. 2006.
-  E. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006.
-  R. G. Baraniuk, “Compressive sensing [lecture notes],” IEEE Signal Processing Magazine, vol. 24, no. July, pp. 118–121, Jan. 2007.
-  R. M. Willett, R. F. Marcia, and J. M. Nichols, “Compressed sensing for practical optical imaging systems: a tutorial,” Optical Engineering, vol. 50, no. 7, 2011.
-  M. Duarte, M. Davenport, D. Takhar, J. Laska, K. Kelly, and R. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 83–91, Mar. 2008.
-  J. Romberg, “Imaging via compressive sampling,” IEEE Signal Processing Magazine, vol. 25, no. 2, pp. 14–20, Mar. 2008.
V. Cevher, A. Sankaranarayanan, M. Duarte, D. Reddy, R. Baraniuk, and
R. Chellappa, “Compressive sensing for background subtraction,” in
Proceedings of the European Conference on Computer Vision, 2008.
-  M. Wakin, J. Laska, M. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. Kelly, and R. Baraniuk, “Compressive imaging for video representation and coding,” in Proceedings of the Picture Coding Symposium, 2006.
-  J. Park and M. Wakin, “A multiscale framework for compressive sensing of video,” in Picture Coding Symposium, 2009.
-  A. Sankaranarayanan, C. Studer, and R. Baraniuk, “CS-MUVI: video compressive sensing for spatial-multiplexing cameras,” in Proceedings of the International Conference on Computational Photography, 2012.
D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: programmable pixel
compressive camera for high speed imaging,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011.
-  A. Sankaranarayanan, P. Turaga, R. Chellappa, and R. Baraniuk, “Compressive acquisition of linear dynamical systems,” SIAM Journal on Imaging Sciences, vol. 6, no. 4, pp. 2109–2133, 2013.
-  M. Asif and J. Romberg, “Sparse recovery of streaming signals using l1 homotopy,” arXiv, 2013.
-  D. Angelosante, J. Bazerque, and G. Giannakis, “Online adaptive estimation of sparse signals: where RLS meets the l1 norm,” IEEE Transactions on Signal Processing, vol. 58, no. 7, pp. 3436–3447, 2010.
N. Vaswani, “Kalman filtered compressed sensing,” inProceedings of the IEEE International Conference on Image Processing, 2008.
-  N. Vaswani and W. Lu, “Modified-CS: modifying compressive sensing for problems with partially known support,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4595–4607, Sep. 2010.
-  N. Vaswani, “LS-CS-residual (LS-CS): compressive sensing on least squares residual,” IEEE Transactions on Signal Processing, vol. 58, no. 8, pp. 4108–4120, Aug. 2010.
-  M. Cossalter, G. Valenzise, M. Tagliasacchi, and S. Tubaro, “Joint compressive video coding and analysis,” IEEE Transactions on Multimedia, vol. 12, no. 3, pp. 168–183, Apr. 2010.
-  V. Stankovic, L. Stankovic, and S. Cheng, “Compressive image sampling with side information,” in Proceedings of the IEEE International Conference on Image Processing, 2009.
-  ——, “Sparse signal recovery with side information,” in Proceedings of the European Signal Processing Conference, 2009.
-  J. Scarlett, J. Evans, and S. Dey, “Compressed sensing with prior information: information-theoretic limits and practical decoders,” IEEE Transactions on Signal Processing, vol. 61, no. 2, pp. 427–439, 2013.
-  D. Malioutov, S. Sanghavi, and A. Willsky, “Sequential compressed sensing,” IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 2, pp. 435–444, Apr. 2010.
-  P. Boufounos, M. Duarte, and R. Baraniuk, “Sparse signal reconstruction from noisy compressive measurements using cross validation,” in Proceedings of the IEEE Workshop on Statistical Signal Processing, 2007.
-  A. Ashok, P. Baheti, and M. Neifeld, “Compressive imaging system design using task-specific information.” Applied Optics, vol. 47, no. 25, pp. 4457–71, Sep. 2008.
J. Duarte-Carvajalino, G. Yu, L. Carin, and G. Sapiro, “Task-driven adaptive statistical compressive sensing of gaussian mixture models,”IEEE Transactions on Signal Processing, vol. 61, no. 3, pp. 585–600, 2012.
-  A. Averbuch, S. Dekel, and S. Deutsch, “Adaptive compressed image sensing using dictionaries,” SIAM Journal on Imaging Sciences, vol. 5, no. 1, pp. 57–89, Jan. 2012.
-  S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE Transactions on Signal Processing, vol. 56, no. 6, pp. 2346–2356, Jan. 2008.
-  C. Chou, R. Rana, and W. Hu, “Energy efficient information collection in wireless sensor networks using adaptive compressive sensing,” in Proceedings of the IEEE Conference on Local Computer Networks, 2009.
-  J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, “Sequentially designed compressed sensing,” in Proceedings of the IEEE Statistical Signal Processing Workshop, 2012.
-  X. Yuan, J. Yang, P. Llull, X. Liao, G. Sapiro, D. Brady, and L. Carin, “Adaptive temporal compressive sensing for video,” arXiv, 2013.
-  H. Schaeffer, Y. Yang, and S. Osher, “Real-time adaptive video compressive sensing,” UCLA CAM, Tech. Rep., 2013.
-  R. Baraniuk, M. Davenport, M. Duarte, and C. Hegde, An introduction to compressive sensing. Connexions, 2011. [Online]. Available: http://cnx.org/content/col11133/1.5/
-  X. Clady, F. Collange, F. Jurie, and P. Martinet, “Object tracking with a pan-tilt-zoom camera: application to car driving assistance,” in Proceedings of the IEEE International Conference on Robotics and Automation, 2001.
-  A. Senior, A. Hampapur, and M. Lu, “Acquiring multi-scale images by pan-tilt-zoom control and automatic multi-camera calibration,” in Proceedings of the IEEE Workshop on Applications of Computer Vision, 2005.
-  R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253–263, Jan. 2008.
-  D. Donoho and J. Tanner, “Precise undersampling theorems,” Proceedings of the IEEE, vol. 98, no. 6, pp. 913–924, Jun. 2010.
-  G. Warnell, D. Reddy, and R. Chellappa, “Adaptive rate compressive sensing for background subtraction,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012.
-  R. Ward, “Compressed sensing with cross validation,” IEEE Transactions on Information Theory, vol. 55, no. 12, pp. 5773–5782, 2009.
-  W. Johnson and J. Lindenstrauss, “Extensions of Lipschitz maps into a Hilbert space,” Contemporary Mathematics, vol. 26, pp. 189–206, 1984.
-  R. Kashyap, “A Bayesian comparison of different classes of dynamic models using empirical data,” IEEE Transactions on Automatic Control, vol. 22, no. 5, pp. 715–727, 1977.
-  G. Schwarz, “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978.
-  J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, no. 5, pp. 465–471, 1978.
-  E. van den Berg and M. P. Friedlander, “Probing the Pareto frontier for basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31, no. 2, pp. 890–912, 2008.
-  ——, “SPGL1: A solver for large-scale sparse reconstruction,” June 2007, http://www.cs.ubc.ca/labs/scl/spgl1.
-  “Pets 2009 benchmark data,” http://www.cvg.rdg.ac.uk/PETS2009/a.html.
-  S. Julier, J. Uhlmann, and H. Durrant-Whyte, “A new approach for filtering nonlinear systems,” in Proceedings of the American Control Conference, 1985.
-  MATLAB, version R2013a. Natick, Massachusetts: The MathWorks Inc., 2013.