Compressive Acquisition of Dynamic Scenes

Compressive sensing (CS) is a new approach for the acquisition and recovery of sparse signals and images that enables sampling rates significantly below the classical Nyquist rate. Despite significant progress in the theory and methods of CS, little headway has been made in compressive video acquisition and recovery. Video CS is complicated by the ephemeral nature of dynamic events, which makes direct extensions of standard CS imaging architectures and signal models difficult. In this paper, we develop a new framework for video CS for dynamic textured scenes that models the evolution of the scene as a linear dynamical system (LDS). This reduces the video recovery problem to first estimating the model parameters of the LDS from compressive measurements, and then reconstructing the image frames. We exploit the low-dimensional dynamic parameters (the state sequence) and high-dimensional static parameters (the observation matrix) of the LDS to devise a novel compressive measurement strategy that measures only the dynamic part of the scene at each instant and accumulates measurements over time to estimate the static parameters. This enables us to lower the compressive measurement rate considerably. We validate our approach with a range of experiments involving both video recovery, sensing hyper-spectral data, and classification of dynamic scenes from compressive data. Together, these applications demonstrate the effectiveness of the approach.



page 4

page 15

page 16

page 17

page 18

page 20

page 22


Video Compressive Sensing for Spatial Multiplexing Cameras using Motion-Flow Models

Spatial multiplexing cameras (SMCs) acquire a (typically static) scene t...

Snapshot compressed sensing: performance bounds and algorithms

Snapshot compressed sensing (CS) refers to compressive imaging systems w...

Adaptive-Rate Compressive Sensing Using Side Information

We provide two novel adaptive-rate compressive sensing (CS) strategies f...

Mixed one-bit compressive sensing with applications to overexposure correction for CT reconstruction

When a measurement falls outside the quantization or measurable range, i...

Reconstruction from Periodic Nonlinearities, With Applications to HDR Imaging

We consider the problem of reconstructing signals and images from period...

Universal 1-Bit Compressive Sensing for Bounded Dynamic Range Signals

A universal 1-bit compressive sensing (CS) scheme consists of a measurem...

A Deep Learning Approach to Structured Signal Recovery

In this paper, we develop a new framework for sensing and recovering str...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The Shannon-Nyquist theorem dictates that to sense features at a particular frequency, we must sample uniformly at twice that rate. For generic imaging applications, this sampling rate might be too high; in modern digital cameras, invariably, the sensed imaged is compressed immediately without much loss in quality. For other applications, such as high speed imaging and sensing in the non-visual spectrum, camera/sensor designs based on the Shannon-Nyquist theorem lead to impractical and costly designs. Part of the reason for this is that the Shannon-Nyquist sampling theory does not exploit any structure in the sensed signal beyond that of band-limitedness. Signals with redundant structures can potentially be sensed more parsimoniously. This is the key idea underlying the new field of compressive sensing (CS) [7]. When the signal of interest exhibits a sparse representation, CS enables sensing at measurement rates below the Nyquist rate. Indeed, signal recovery is possible from a number of measurements that is proportional to the sparsity level of the signal, as opposed to its bandwidth.

In this paper, we consider the problem of sensing videos compressively. We are interested in this problem motivated by the success of video compression algorithms, which indicates that videos are highly redundant. Bridging the gap between compression and sensing can lead to compelling camera designs that significantly reduce the amount of data sensed and enable designs for application domains where sensing is inherently costly.

Video CS is challenging for two main reasons:

  • Ephemeral nature of videos: The scene changes during the measurement process; moreover, we cannot obtain additional measurements of an event after it has occurred.

  • High-dimensional signals: Videos are significantly higher-dimensional than images. This makes the recovery process computationally intensive.

One way to address these challenges is to narrow our scope to certain parametric models that are suitable for a broad class of videos; this morphs the video recovery problem to one of parameter estimation and provides a scaffold to address the challenges listed above.

In this paper, we develop a CS framework for videos modeled as linear dynamical systems (LDSs), which is motivated, in part, by the extensive use of such models in characterizing dynamic textures [10, 15, 33], activity modeling, and video clustering [37]. Parameteric models, like LDSs, offer lower dimensional representations for otherwise high-dimensional videos. This significantly reduces the number of free parameters that need to be estimated and, as a consequence, reduces the amount of data that needs to be sensed. In the context of video sensing, LDSs offer interesting tradeoffs by characterizing the video signal using a mix of dynamic/time-varying parameters and static/time-invariant parameters. Further, the generative nature of LDSs provides a prior for the evolution of the video in both forward and reverse time. To a large extent, this property helps us circumvent the challenges presented by the ephemeral nature of videos.

The paper makes the following contributions. We propose a framework called CS-LDS for video acquisition using an LDS model coupled with sparse priors for the parameters of the LDS model. The core of the framework is a two-step measurement strategy that enables the recovery of the LDS parameters from compressive measurements by solving a sequence of linear and convex problems. We demonstrate that CS-LDS is capable of sensing videos with far fewer measurements than the Nyquist rate. Finally, the LDS parameters form an important class of features for activity recognition and scene analysis, thereby making our camera designs purposive [25] as well.

2 Background

2.1 Compressive sensing

CS deals with the recovery of a signal from undersampled linear measurements of the form , where is the measurement matrix, and is the measurement noise [7, 14]. Estimating from the measurements is ill-conditioned, since the linear system formed by is under-determined. CS works under the assumption that the signal is sparse in a basis ; that is, the signal , defined as , has at most non-zero components. Exploiting the sparsity of , the signal can be recovered exactly from measurements provided the matrix satisfies the so-called restricted isometry property (RIP) [4]. In particular, when is an orthonormal basis and the entries of the matrix

are i.i.d. samples from a sub-Gaussian distribution, the product

satisfies the RIP. Further, the signal can be recovered from by solving a convex problem of the form


where is an upper bound on the measurement noise . It can be shown that the solution to (1

) is with high probability the

-sparse solution that we seek. The theoretical guarantees of CS have been extended to compressible signals, where the sorted coefficients of decay rapidly according to a power-law [22].

There exist a wide range of algorithms to solve (1) under various approximations or reformulations [7, 38]. Greedy techniques such as Orthogonal Matching Pursuit [28] and CoSAMP [26] solve the sparse approximation problem efficiently with strong convergence properties and low computational complexity. It is also simple to impose structural constraints such as block sparsity into CoSAMP, giving variants such as model-based CoSAMP [3].

2.2 Video compressive sensing

In this paper, we model a video as a sequence of time-indexed images. Specifically, if is the image of a scene at time , then is the video of the scene from time to . Further, we also refer to as the “video frame” at time .

In video CS, the goal is to sense a time-varying scene using compressive measurements of the form , where and are the compressive measurements, the measurement matrix and the video frame at time , respectively. Given the sequence of compressive measurements , our goal is to recover the video . There are currently two fundamentally different imaging architectures for video CS: the single pixel camera (SPC) and the programmable pixel camera. The SPC [16] uses a single or a small number of sensing elements. Typically, a photo-detector is used to obtain a single measurement at each time instant of the form , where

is a pseudo-random vector of

s and s. Typically, under an assumption of a slowly varying scene, consecutive measurements from the SPC are grouped as measurements of the same video frame. This assumption works only when the scene motion is small or when the number of measurements associated with a frame is small. The SPC provides complete freedom in the spatial multiplexing of pixels; however, there is no temporal multiplexing. In contrast, programmable pixel cameras [43, 31, 23] use a full frame sensor array; during each exposure of the sensor array, the shutter at each pixel is temporally modulated. This enables extensive temporal multiplexing but a limited amount of spatial multiplexing. A key advantage of SPC-based designs is that they can operate efficiently at wavelengths (such as the far infrared) that require exotic detectors; in such cases, building a full frame sensor can be prohibitively expensive.

To date, recovery algorithms for the SPC have used various signal models to reconstruct the sensed scene. Wakin et al. [45] use 3D wavelets as the sparsifying basis for recovering videos from compressive measurements. Park and Wakin [27] use a coarse-to-fine estimation framework wherein the video, reconstructed at a coarse scale, is used to estimate motion vectors that are subsequently used to design dictionaries for reconstruction at a finer scale. Vaswani [40] and Vaswani and Lu [41] use a sequential framework that exploits the similarity of support of the signal between adjacent frames of a video. Under this model, a frame of video is reconstructed using a linear inversion over the support at the previous time instant and a small-scale CS recovery over the residue to detect components beyond the known support. Cevher et al. [9] provide a CS framework for directly sensing innovations over a static scene thereby enabling background subtraction from compressive measurements.

Figure 1:

An example of an LDS and the models that define it. (a) A few frames of a video of six LEDs flashing independently. (b) Top 20 singular values of the data matrix

— formed by stacking frames of the video as its columns. Note how the singular values, outside the top six, decay rapidly. This is a consequence of the linear nature of light suggests that the frames of the video lie on a six-dimensional subspace. In practice, deviations from linearity due to saturation lead to small deviations from the six-dimensional subspace as noted from the decaying singular values. (c) Basis vectors associated with a six-dimensional approximation of the data. Blacker pixels denote non-negative entries while whiter pixels denote positive entries. Together, they define a six-dimensional subspace that defines the observation model of the LDS. (d) State sequence associated with the six-dimensional approximation. The smooth variation of the state values indicate predictability over small time durations — one of the key hallmarks of an LDS. These smooth transitions are captured by the state transition model.

2.3 Linear dynamical system model for video sequences

Linear dynamical systems (LDSs) represent an important class of parametric models for time-series data. A wide variety of spatio-temporal signals have often been modeled as realizations of LDSs. These include dynamic textures [15], traffic scenes [10], video inpainting [13], multi-camera tracking [2] and human activities [37]. The interested reader is referred to [36]

for a survey of the use of LDSs as a concise representation for a wide range of computer vision problems.

Intuitively, a LDS for a video comprises of two models. First, an observation model that suggests that frames of the video lie close to a -dimensional subspace; the frame of the video at time can be represented as where is a basis for the subspace and are the subspace coefficients or the state vector at time . Second, the trajectory that the video charts out in this -dimensional subspace varies smoothly, is predictable and modeled by a linear evolution of the form . Figure 1 provides an example of an LDS.

We now formally define the LDS for a video. The model equations are given by


where is the state vector at time , is the dimension of the state space, is the state transition matrix, is the observation matrix, represents the observed measurements, where for the videos of interest in this paper, . and are noise components modeled as Gaussian with mean vector and covariance matrices given by and , respectively. The Gaussian assumption for the process noise is not necessarily an optimal one, but is made for the sake of simplifying the model estimation algorithm. It is known to work well for representing a large class of dynamic textures [15].

An LDS is parameterized by the matrix pair . Note that the choice of and the state sequence is unique only up to a linear transformation given the inherent ambiguities in the notion of a state space. In particular, given any invertible matrix , the LDS defined by with the state sequence is equivalent to the LDS defined by with the state sequence . This lack of uniqueness has implications that we will touch upon later in Section 5.

Given a video sequence, the most common approach to fitting an LDS model is to first estimate a lower-dimensional embedding of the observations via principal component analysis (PCA) and then learn the temporal dynamics captured in

, and equivalently . The most popular model estimation algorithms are N4SID [39], PCA-ID [35]

, and expectation-maximization (EM)

[10]. N4SID is a subspace identification algorithm that provides an asymptotically optimal solution for the model parameters. However, for large problems the computational requirements make this method prohibitive. PCA-ID [35] is a sub-optimal solution to the learning problem. It makes the assumption that estimation of the observation matrix and the state transition matrix can be separable, which makes it possible to estimate the parameters of the model very efficiently via PCA. Under this assumption, one first estimates the observation matrix , (space-filter) and then uses the result to estimate the state state transition matrix (time-filter) [15]. This learning problem can also be posed as a maximum likelihood estimation of the model parameters that maximize the likelihood of the observations, which can be solved by the EM algorithm [10].

3 CS-LDS Architecture

We provide a high level overview of our proposed framework for video CS; the goal here is to build a CS framework, implementable on the SPC, for videos that are modeled as LDSs. We flesh out the details in Sections 4 and 5. This amounts to estimating the LDS parameters from compressive measurements, i.e, we seek to recover the model parameters and given compressive measurements of the form . We recall that is the time-invariant observation matrix of the LDS, and and are the video frame and the state at time , respectively. The compressive measurements are hence expressed as bilinear terms in the unknown parameters and . Handling bilinear unknowns typically requires non-convex optimization techniques thereby invalidating conventional CS recovery algorithms. To avoid this, we propose a two-step sensing method that is specifically designed to address the bilinearity; we refer to this sensing method and its associated recovery algorithm as the CS-LDS framework [34] .

Measurement model:

We summarize the CS-LDS measurement model as follows. At time , we take two sets of measurements:


where and such that the total number of measurements at each frame is .111The SPC obtains only one measurement at each time instant. Multiple measurements for a video frame are obtained by grouping consecutive measurements from the SPC. When is small, compared to the sampling rate of the SPC, this is an acceptable approximation especially for slowly varying scenes. The measurement matrix in (4) is composed of two distinct components: the time-invariant part and the time-varying part . We denote by the common measurements and by the innovation measurements.

We solve for the LDS parameters in two steps. First, we obtain an estimate of the state sequence using only the common measurements . Second, we use this state sequence estimate to recover the observation matrix using the innovation measurements.

State sequence estimation:

We recover the state sequence using only the common measurements . The key idea is that when form the observations of an LDS with system matrices , the measurements form the observations of an LDS with system matrices

. Estimation of the state sequence now can be mapped to a simple exercise in system identification. In particular, an estimate of the state sequence can be obtained by the singular value decomposition (

) of the block-Hankel matrix


Given the , the state sequence estimate is given by

In Section 4, we leverage results from system identification to analyze the properties of this particular estimate as well as characterize the number of measurements required.

Observation matrix estimation:

Given an estimate of the state sequence, , the relationship between the observation matrix and the innovation measurements is linear, i.e., . In addition, is time-invariant. Hence, we can accumulate innovation measurements over a duration of time to stably reconstruct . This significantly reduces the number of innovation measurements required at each frame. This is especially important in the context of sensing videos, since the scene changes as we acquire measurements. Hence, requiring fewer measurements for each reconstructed frame of the video implies less error due to motion blur.

Using the estimates of the state sequence , we can recover by solving the following convex problem:


where denotes the -th column of and is a sparsifying basis for the columns of . Note that, in (6), we use all of the compressive measurements obtained for each frame of the video — that is, we use both the common and innovation measurements since the common measurement, much like the innovation measurements, are linear measurements of the frames. Further, as we show later in Section 5.2, ambiguities in the estimation of the state sequence induce a structured sparsity pattern in the support of . The convex program (6) can be modified to incorporate such constraints. In addition to this, in Section 5, we also propose a greedy alternative for solving a variant of the convex program.

Figure 2: Block diagram of the CS-LDS framework.

To summarize, the two-step measurement process described in (4) enables a two-step recovery (see Figure 2). First, we obtain an estimate of the state sequence using SVD on just the common measurements. Second, we use the state sequence estimate for recovering the observation matrix using a convex program. The details of these two steps are discussed in the next two sections.

4 Estimating the state sequence

In this section, we discuss methods to estimate the state sequence from the compressive measurements . In particular, we seek to establish sufficient conditions under which the state sequence can be estimated reliably.

4.1 Observability of the state sequence

Consider the compressive measurements given by


where are the compressive measurements at time , is the corresponding measurement matrix, and is the measurement noise. Note that is time-invariant; hence, (7) is a part of the measurement model described in (4) relating to the common measurements. A key observation is that, when form the observations of an LDS defined by , the compressive measurement sequence forms an LDS as well; that is,

The LDS associated with is parameterized by the system matrices . Estimating the state sequence from the observations of an LDS is possible only when the LDS is observable [5]. Thus, it is important to consider the question of observability of the LDS parameterized by .222Observability of LDSs in the context of CS has been studied earlier by Wakin et al. [46], who consider the scenario when the observation matrix is randomly generated and the state vector at is sparse. In contrast, the analysis we present is for a non-sparse state vector.

Definition 4.1 (Observability of an LDS [5])

An LDS is observable if, for any possible state sequence, the current state can be estimated from a finite number of observations.

Lemma 4.2 (Test for observability of an LDS [5])

An LDS defined by the system matrices and of state space dimension is observable if and only if the observability matrix


is full rank.

A necessary condition for the observability of the LDS defined by is that the LDS defined by is observable. However, for the LDSs we consider in this paper, ; for such systems, the LDS defined by is observable. Given this assumption, we consider the observability of the LDS parameterized by next.

Lemma 4.3

For , the LDS defined by is observable, with high probability, if and the entries of the matrix are sampled i.i.d. from a sub-Gaussian distribution.

This is established by proving that when . Assume that , i.e., such that . Let be a row of . The event that is one of negligible probability when the elements of are assumed to be i.i.d. according to a sub-Gaussian distribution such as Gaussian or Bernoulli. Hence, with high probability when .

Observability is the key criterion for recovering the state sequence from the common measurements. When the LDS associated with the common measurements is observable, we can estimate the state sequence — up to a linear transformation — by factorizing the block Hankel matrix in (5). can be written as

Hence, when the observability matrix is full rank, we can recover the state sequence by factoring the Hankel matrix using the . Suppose the SVD of the Hankel matrix is . Then, the estimate of the state sequence is obtained by


where is the diagonal matrix containing the -largest singular values in , and is the matrix composed of the right singular vectors corresponding to these singular values. The estimate of the state sequence obtained from differs from its true value by a linear transformation. This is a fundamental ambiguity that stems from the lack of uniqueness in the definition of the state space (see Section 2.3). The state sequence estimate in (9) can be improved, especially for high levels of measurement noise, by using system identification techniques mentioned in Section 2.3. However, the simplicity of this estimate makes it amenable for further analysis.

When , we can choose to factorize a smaller-sized Hankel matrix provided . Note that when , we do not enforce the constraints provided by the state transition model, thereby simply reducing the LDS to a linear system. For , we enforce the state transition model over successive time instants; i.e., we enforce

Larger values of lead to smoother state sequences, since the estimates conform to the state transition model for longer durations.

We next study the observability properties of specific classes of interesting LDSs and the conditions on under which the observability of holds.

4.2 Case:

A particularly interesting scenario is when we obtain exactly one common measurement for each video frame. For such a scenario, and, hence, the measurement matrix can be written as a row-vector: . We now establish conditions when the observability matrix is full rank for this particular scenario. Let and . We seek a condition when the observability matrix, or equivalently its transpose,


is full rank.333There is an interesting connection to Krylov-subspace methods here.  In Krylov-subspace methods, a low-rank approximation to a matrix is obtained by forming the matrix with randomly chosen. Convergence proofs for this method are closely related to Theorem 4.4. To the best of our knowledge, diagonalizability of plays an important role in most of these proofs. The interested reader is referred to [32] for more details. We concentrate on the specific scenario where the matrix (and hence, ) is diagonalizable, i.e., , where

is an invertible matrix (hence, full rank) and

is a diagonal matrix with diagonal elements . For such matrices, the transpose of the observability matrix can be written as

where . This can be expanded as

and further into

We can establish a sufficient condition for when the observability matrix is full rank.

Theorem 4.4

Let and let the elements of

be i.i.d. from a sub-Gaussian distribution. Then, with high probability, the observability matrix is full rank when the state transition matrix is diagonalizable and its eigenvectors and eigenvalues are unique.

From the discussion above, the observability matrix can be written as a product of three square matrices: , the matrix of eigenvectors of ; a diagonal matrix with entries defined by the vector ; and a Vandermonde matrix defined by the vector of eigenvalues of . When the eigenvectors and eigenvalues are distinct, the first and last matrices are full rank. Given that the elements of are i.i.d., the probability that is negligible and, hence, the diagonal matrix is full rank with high probability. Since the product of full rank square matrices is full rank as well, this implies that the observability matrix is full rank with high probability.

Remark: Theorem 4.4

requires that the state-transition matrix be full-rank (non-zero Eigenvalues) and be diagonalizable with unique Eigenvalues. Most matrices are diagonalizable (once, we allow complex Eigenvalues) and hence, the requirement that state transition matrix be diagonalizable is not restrictive. A more restrictive condition is requiring the Eigenvalues of the matrix to be unique. Unfortunately, this eliminates some commonly observed state transition matrix such as the Identity matrix — which is coupled with Brownian processes. Nonetheless, Theorem

4.4 is intriguing, since it guarantees recovery of the state sequence even when we obtain only one common measurement per time instant. This is immensely useful in reducing the number of measurements required to sense a video sequence.

Interestingly, we can reduce even further. This is achieved by not obtaining common measurements at some time instants.

4.3 Missing measurements: Case

If we do not obtain common measurements at some time instants, then is it still possible to obtain an estimate of the state sequence? One way to view this problem is that we have incomplete knowledge of the Hankel matrix defined in (5) and we seek to complete this matrix. Matrix completion, especially for low rank matrices, has received significant attention recently [30, 6, 8].

Given that the Hankel matrix in (5) is low rank for videos modeled as LDSs, we formulate the missing measurement recovery problem as one of matrix completion. Suppose that we have the common measurements only at time instants given by the index set , i.e., we have knowledge of . We can recover the missing measurements by exploiting the low-rank property of . Specifically, we solve the following problem to obtain the missing measurements:

However, is a non-convex function which renders the above problem NP-complete. In practice, we can solve a convex relaxation of this problem444Historically, the use of nuclear norm-based optimization for system identification goes back to Fazel et al. [19, 20]. Since then, there has been much work towards establishing the equivalence of these two problems [30, 6]. Further, the convex program in (11) was used for video inpainting in [13].


where is the nuclear norm of the matrix , which equals the sum of its singular values. Once we fill in the missing measurements, we use (9) to recover an estimate of the state sequence.

An important quantity to characterize is the proportion of time instants in which we can choose to not obtain common measurements. This amounts to developing a sampling theorem for the completion of low-rank Hankel matrices; to the best of our knowledge, there has been little theoretical work on this problem. Instead, we address it empirically in Section 6.

5 Estimating the observation matrix

In this section, we discuss estimation of the observation matrix given the estimates of the state space sequence .

5.1 Need for innovation measurements

Given estimates of the state sequence , the matrix is linear in the compressive measurements which enables a host of conventional -based methods as well as -based recovery algorithms to estimate . However, recall that the is a matrix and, hence, the common measurements by themselves are not enough to recover , unless is large.

The common measurements used in the estimation of the state sequence are measured using a time-invariant measurement matrix . A time-invariant measurement matrix, by itself, is not sufficient for estimating unless is very large. To alleviate this problem, we take additional compressive measurements of each frame using a time-varying measurement matrix. Let where and are the compressive measurements and the corresponding measurement matrix at time . As mentioned earlier in Section 3, we refer to these as innovation measurements. Noting that is a time-invariant parameter, we can collect innovation measurements over a period of time before reconstructing . This enables a significant reduction in the number of measurements taken at each time instant.

5.2 Structured sparsity for

Individual frames of a video, being images, exhibit sparsity/compressibility in a certain transform bases such as wavelets and DCT. If the support of the frames are highly overlapping — this is to be expected given the redundancies in a video — then columns of are compressible in the same transform bases; a consequence of being a basis for the frames of the video. Further, note that the columns of are also the top principal components and hence, capture the dominant motion patterns in the scene; when motion in the scene is spatially correlated, the columns of are compressible in wavelet/DCT basis. For these reasons, we assume that the columns of are compressible in a wavelet/DCT basis and employ sparse priors in the recovery of the observation matrix . We can potentially obtain an estimate of by solving the following convex program:


Here, we denote the columns of the matrix as . is a sparsifying basis for the columns of ; we have the freedom to choose different sparsifying bases for different columns of .

The assumption of compressibility in a transform basis was sufficient for all the videos we test on (see Section 6). However, it is entirely possible that a video is not compressible in a transform basis. There are two possible ways to address such a scenario. First, given training data, we can use dictionary learning algorithms [24] to learn an appropriate basis where in the columns of are sparse/compressible. Second, in the absence of training data, we revert to -based methods to recover ; in such cases, we would typically need more measurements to recover .

However, the convex program is not sufficient as-is to recover . The reason for this stems from ambiguities in the definition of the LDS (see Section 2.3). The use of for recovering the state sequence introduces an ambiguity in the estimates of the state sequence in the form of , where is an invertible matrix. As a consequence, this will lead to an estimate satisfying . Suppose the columns of are -sparse (equivalently, compressible for a certain value of ) each in with support for the -th column. Then, the columns of are potentially -sparse with identical supports . The support is exactly -sparse when the are disjoint and is dense. At first glance, this seems to be a significant drawback, since the overall sparsity of has increased to (the sparsity of is ). However, this apparent increase in sparsity is alleviated by the columns having identical supports, which can be exploited in the recovery process [17].

Given the estimates , we estimate the matrix by solving the following convex program:


where is the -th row of the matrix and is a sparsifying basis for the columns of . The above problem is an instance of an mixed-norm optimization that promotes group sparsity; in this instance, we use it to promote group column sparsity in the matrix , i.e., all columns have the same sparsity pattern.

There are multiple efficient ways to solve including solvers such as SPG-L1 [38] and model-based CoSAMP [3]. Algorithm 1 summarizes a model-based CoSAMP algorithm used for recovering the observation matrix . The specific model used here is a union-of-subspaces model that groups each row of into a single subspace/model.

      returns the support of largest elements of
      represents the submatrix of with rows indexed by and all columns.
      represents the submatrix of with columns indexed by and all rows.
while (stopping conditions are not met)  do
       Compute signal proxy:
       Compute energy in each row:
       Support identification and merger:
       Least squares estimation:
             Find that minimizes
       Pruning support:
       Form new estimate of :
       Update residue:
end while
Algorithm 1 Model-based CoSAMP

5.3 Value of

For stable recovery of the observation matrix , we need in total measurements; for a large class of practical solvers, a rule of thumb is . Given that we measure time-varying compressive measurements at each time instant, over a period of time instants, we have compressive measurements for estimating . Hence, for stable recovery of , we need approximately


This indicates extremely favorable operating scenarios for the CS-LDS framework, especially when is large (as in high frame rate capture). Let where is the time duration of the video in seconds and is the sampling rate of the measurement device. The number of compressive measurements required in this case is . Given that the complexity of the LDS typically (however, not always) depends on , for a fixed the number of measurements required to estimate decreases as as the sampling rate is increased. Indeed, as the sampling rate increases, can be decreased while keeping constant. This will ensure that (14) is satisfied, enabling stable recovery of .

5.4 Mean + LDS

In many instances, a dynamical scene is modeled better as an LDS over a static background, that is, . This can be handled with two small modifications to the Algorithm 1. First, the state sequence is obtained by performing an SVD on the matrix modified such that each row sums to zero. This works under the assumption that the sample mean of is equal to , the compressive measurement of . Second, given that the support of need not be similar to that of , the resulting optimization problem can be reformulated as


As with the convex formulation, the model-based CoSAMP algorithm described in Algorithm 1 can be modified to incorporate the mean term ; an additional modification here is the requirement to specify a priori the sparsity of the mean .

6 Experiments

We present a range of experiments validating various aspects of the CS-LDS framework. We use permuted noiselets [12] for the measurement matrices, since they have a fast scalable implementation. We use the term compression ratio to denote the reduction in the number of measurements as compared to the Nyquist rate. Finally, we use the reconstruction SNR to evaluate the recovered videos. Given the ground truth video and a reconstruction , the reconstruction SNR in dB is defined by


We compare CS-LDS against frame-by-frame CS, where each frame of the video is recovered separately using conventional CS techniques. We use the term oracle LDS when the parameters and video reconstruction are obtained by operating on the original data itself. Oracle LDS estimates the parameters using a rank- approximation of the ground truth data. The reconstruction SNR of the oracle LDS gives an upper bound on the achievable SNR. Finally, the ambiguity in the observation matrix (due to non-uniqueness of the SVD based factorization) as estimated by oracle LDS and CS-LDS is resolved by finding the best linear transformation that registers the two estimates.

(a) (b)
Figure 3: Accuracy of state sequence estimation from common measurements. Shown are aggregate results over Monte-Carlo runs for an LDS with and . For each Monte-Carlo run, the system matrices and the state sequence were generated randomly. (a) Reconstruction SNR as a function of the number of common measurements per frame. Each curve is for a different level of measurement noise as measured using input SNR. For low noise levels, we obtain a good reconstruction SNR ( 20 dB) even at ; this hints at very high compression ratios. (b) Reconstruction SNR of the Hankel matrix for the scenario with missing common measurements. We can estimate the Hankel matrix very accurately even at missing measurements. This suggests immense flexibility in the implementation of the CS-LDS system.

6.1 State sequence estimation

We first provide empirical verification of the results derived in Sections 4.1 and 4.2. It is worth noting that, in the absence of noise, Theorem 4.4 suggests exact recovery of the state sequence. In practice, it is important to check the robustness of the estimate to measurement noise. Figure 3(a) analyzes the performance of the state space estimation for different values of the number of common measurements and different SNRs of the measurement noise. We define input SNR in dB as , where

is the standard deviation of the noise. Here, we consider the scenario when

. The underlying state space dimension is with frames. As expected, for low SNRs, the reconstruction SNR is very high even for small values of . In addition to this, the accuracy at is acceptable, especially at low SNRs.

Next, we validate the implications of Section 4.3, where we discuss the scenario of by simulating various proportions of missing common measurements. Figure 3(b) shows reconstruction SNR for the Hankel matrix in (5) for varying amounts of missing measurements. We recover the Hankel matrix by solving (11) using CVX [21]. Figure 3(b) demonstrates a very high reconstruction SNR even at a very high rate of missing measurements. As mentioned earlier, not having to sense common measurements at all frames is very useful, since we can stagger our acquisition of common and innovation measurements. In theory, this enables a measurement strategy where we need to sense only one measurement per frame of the video without having to group consecutive measurements of the SPC. Hence, we can aim to reconstruct videos at the sampling rate of the SPC. To the best of our knowledge, this is the first video CS acquisition design capable of doing this.

Figure 4: Reconstruction of a fire texture of length frames and resolution of pixels. (a-d) Sampling of frames of the (a) Ground truth video, (b) Oracle LDS reconstruction, (c) CS-LDS reconstruction, and (d) naive frame-to-frame CS reconstruction. The CS-LDS reconstruction closely resembles the oracle LDS result. For the CS-LDS results, compressive measurements were obtained at and measurements per frame, there by giving a compression ratio of Reconstruction was performed with and . (e) Ground truth observation matrix . (f) CS-LDS estimate of the observation matrix . In (e) and (f), the column of the observation matrix is visualized as an image. Both the frames of the videos and the observation matrices are shown in false-color for better contrast.

6.2 Dynamic Textures

Our test dataset comprises of videos from the DynTex dataset [29]. We used the mean+LDS model from Section 5.4 for all the video CS experiments with the 2D DCT as the sparsifying basis for the columns of and 2D wavelets as the sparsifying basis for the mean. We used the model-based CoSAMP solver in Algorithm 1 for these results, since it provides explicit control of the sparsity of the mean and the columns of . We used (14) as a guide to select these values.

Figure 4 shows video reconstruction of a dynamic texture from the DynTex dataset [29]. Reconstruction results are under a compression this is an operating point where frame-to-frame CS recovery is completely infeasible. However, the dynamic component of the scene is relatively small (), which allows us to recover the video from relatively few measurements. The reconstruction SNRs of the recovered videos shown are as follows: oracle LDS = dB, frame-to-frame CS = dB and CS-LDS = dB.

Figure 5: Reconstruction of a video comprising of 6 blinking LED lights. We used , , and chosen based on the overall compression ratio . Each row shows a sampling of frames of the video reconstructed at a different compression ratios. Inset in each row is the resolution of the video used as well as the compression at sensing and the reconstruction SNR. While performance degrades with increasing compression, it also gains significantly for higher dimensional data; the reconstruction at pixels preserves finer details.

Figure 5 shows the reconstruction of a video, of 6 blinking LED lights, from the DynTex dataset. We show reconstruction results at different compression ratios as well as different image resolutions. It is noteworthy that, even at a compression, the reconstruction at a resolution of pixels preserves fine details.

Figure 6: Resilience of the CS-LDS framework to measurement noise. (a) Performance plot charting the reconstruction SNR as a function of compression ratio . Each curve is for a different level of measurement noise as measured using and input SNR. Reconstruction SNRs were computed using 32 Monte-Carlo simulations. The “black-dotted” line shows the reconstruction SNR for an oracle LDS. (b-d) Snapshots of video frames at various operating points. The dynamic texture of Fig. 4 was used for this result.

Performance with measurement noise:

We validate the performance of our recovery algorithm under various amounts of measurement noise. Note that the columns of with larger singular values are, inherently, better conditioned to deal with this measurement error. The columns corresponding to the smaller singular values are invariably estimated with higher error. Figure 6 shows the performance of the recovery algorithm for various levels of measurement noise. The effect of the measurement noise on the reconstructions is perceived only at low input SNRs. In part, this robustness to measurement noise is due to the LDS model mismatch dominating the reconstruction error at high input SNRs. As the input SNR drops significantly below the model mismatch term, predictably, it starts influencing the reconstructions more. This provides a certain amount of flexibility in the design of potential CS-LDS cameras.

Computation time and spatial resolution:

Figure 7 shows recovery algorithm applied to a video of length frames at different spatial resolutions. Shown in Figure 7 are the amount of time taken for each recovery, which scales gracefully for increasing spatial resolution, and reconstruction SNR, which approaches the performance of an oracle LDS. The improvement in reconstruction comes due to the increase in the number of compressive measurements at high resolutions, since the compression ratio is held fixed. However this does comes at the cost of requiring a faster compressive camera to acquire the data since a larger number of measurements.

Figure 7: Reconstruction of a video at different spatial resolutions. (a) Two frames from the ground truth video of frames. (b) Reconstructed frame using an oracle with . (c-f) CS-LDS reconstructions for varying spatial resolution, at a compression of and with . Shown are reconstruction SNR as well as recovery times for each reconstruction. Note that as the spatial resolution increases, the reconstruction performance increases and approaches the performance of oracle LDS. However, for the same compression, recovering at a higher resolution also requires a compressive camera capable of sampling faster.
Figure 8: A gallery of reconstruction results using the CS-LDS framework. Each sub-figure (a-i) shows reconstruction results for a different video. The three rows of each sub-figure correspond to, from top-bottom, the ground truth video and CS-LDS reconstructions at compression ratios of and . Each column is a frame of the video and its reconstruction. Also noted with each reconstruction is the value of and the reconstruction SNR for that result. All videos are from the DynTex dataset [29] downsampled at a spatial resolution of pixels. The “code” in quotes refer to the name of the sequence in the database. For all videos, . Results are best viewed under the “zoom” tool. The interested reader is directed to the project webpage [1] and the supplemental material for videos of these results.

Gallery of results:

Finally, in Figure 8, we demonstrate performance of the CS-LDS methodology for sensing and reconstructing a wide range of videos. The reader is directed to the supplemental material as well as the project webpage [1] for animated videos of these results.

6.3 Application in activity analysis

As mentioned in Section 2.3, LDSs are often used in classification problems, especially in the context of scene/activity analysis. A key experiment in this context is to check if the CS-LDS framework recovers videos that are sufficiently informative for such applications. To this end, we experiment with two different activity analysis datasets: the UCSD Traffic Dataset [10] and the UMD Human Activity Dataset [42].

Activity recognition methodology

In both the scenarios considered here (single human activity, and traffic), we model the observed video using the linear dynamical model framework. For recognition, we used the Procrustes distance [11]

between the column spaces of the observability matrices in conjunction with a nearest-neighbor classifier. Given the observability matrix

defined in (8), let be an orthonormal matrix such that . Given two LDSs, the squared Procrustes distance between them is given by

where and . We use this distance function in a nearest neighbor classifier in both the activity classification experiment.

The UCSD Traffic Dataset

[10] consists of videos capturing traffic of three types: light, moderate, and heavy. Each video is of length frames at a resolution of pixels. Figure 9 shows the reconstruction results on a traffic sequence from the dataset. We perform a classification experiment of the videos into these three categories. There are four different train-test scenarios provided with the dataset. For comparison, we also perform the same experiments with fitting the LDS model on the original frames (oracle LDS). We perform classification at two different values of the state space dimension and at a fixed compression ratio of . Table 1 shows classification results. We also show comparative results obtained using a probabilistic kernel on dynamic texture models [10] in conjunction with SVMs in the last two rows of the table. Results for each individual experiment were not reported, only an aggregate number was reported which is shown in the table. It can be seen that even without sophisticated non-linear classifiers, we are able to obtain comparable performance using a simple nearest neighbor classifier using the dynamic texture model parameters. This shows that the obtained parameters possess discriminatory properties, and can be used in conjunction with other sophisticated classifiers that build on dynamic texture models as in [10].

The UMD Human Activity Dataset

[42] consists of videos, each of length frames, depicting different activities: pickup object, jog, push, squat, wave, kick, bend, throw, turn around and talk on cellhpone. Each activity was repeated times, so there were a total of

sequences in the dataset. As with the traffic experiment, we use an LDS model on the image intensity values without any feature extraction. Images were cropped to contain the human and resized to

. The state space dimension was fixed at and the compression was varied from to . We performed a leave-one-execution-out test. The results are summarized in table 2. As can be seen, the CS-LDS framework obtained a classification performance that is comparable to the oracle LDS. For this dataset, both oracle LDS and CS-LDS obtained a perfect classification score of up to a compression ratio of . Further, as shown in Table 2, we obtain comparable performance to a far more sophisticated method employing advance shape-based features for activity recognition. This suggests that the CS-LDS framework should be extremely useful in a wide range of applications beyond just video recovery, and can provide a basis to acquire more sophisticated features for tackling challenging activity recognition problems.

Figure 9: Reconstructions of a traffic scene of pixels at a compression ratio , with and . (a, c) Sampling of the frames of the ground truth and reconstructed video. (b, d) The first ten columns of the observation matrix and the estimated matrix ; both are shown in false color for improved contrast. The quality of reconstruction and LDS parameters is sufficient for capturing the flow of traffic as seen in the classification results in Table 1.
Expt 1 Expt 2 Expt 3 Expt 4 Average
(d = 10)
Oracle LDS 85.71 85.93 87.5 92.06 87.8%
CS-LDS 84.12 87.5 89.06 85.71 86.59%
(d = 5)
Oracle LDS 77.77 82.81 92.18 80.95 83.42%
CS-LDS 85.71 73.43 78.1 76.1 78.34%
State KL-SVM (d = 10)[10] n.a. n.a. n.a. n.a. 93%
State KL-SVM (d = 5)[10] n.a. n.a. n.a. n.a. 87%
Table 1: Classification results (in ) on the UCSD Traffic Dataset
Activity Shape dynamics [44]
Pickup Object 100 100 100 100
Jog 100 100 90 100
Push 100 90 50 100
Squat 90 100 100 100
Wave 100 100 60 100
Kick 100 90 80 100
Bend 100 100 100 100
Throw 100 100 90 100
Turn Around 100 100 100 100
Talk on Cellphone 100 20 10 100
Average 94% 90% 78% 100%
Table 2: Classification results (in ) on the UMD Human Activity Database

7 Discussion

In this paper, we have proposed a framework for the compressive acquisition of dynamic scenes modeled as LDSs. In particular, this paper emphasizes the power of predictive/generative video models. In this regard, we have shown that a strong model for the scene dynamics enables stable video reconstructions at very low measurement rates. In particular, it enables the estimation of the state sequence associated with a video even at fractional number of common measurements per video frame (). The use of CS-LDS for dynamic scene modeling and classification also highlights the purposive nature of the framework.

Implementation issues:

The results provided in the paper are mainly based on simulations. While a full-fledged implementation on hardware is beyond the scope of this paper, we discuss some of the key issues and challenges in obtaining such results. Focusing on the single pixel camera (SPC) as our imaging architecture, the achievable compression and resolution are limited by the amount of motion in the scene and the sampling rate of the camera. We discuss the roles these two parameters play in practice.

Amount of motion determines an inherent notion of frame-rate of the video; note that real life scenes have no notion of “frame-rate”. If the scene changes negligibly for a time duration , then (for the largest value of ) becomes a good measure of frame-rate for a scene. For example, static scenes do not change over an infinite time duration () and hence, can be sensed at fps. Given that we seek to sense this scene at a spatial resolution of pixels, a Nyquist camera would need to operate at measurements per second.

Suppose this scene over a duration of seconds can be well approximated by a -dimensional LDS, then the total number of free variables to estimate is approximately for the state sequence and for the observation matrix. An SPC operating at samples per second obtains a total of compressive measurements. If CS-LDS were employed at a compression ratio of , then

The key dependence here are on how , and change as a function of . In particular, even if and increased as , then would need be scale linearly in to maintain the same compression level.

Connection to affine-rank minimization:

The pioneering work of Fazel [18] in developing convex optimization techniques to system identification problems has interesting parallels to the ideas proposed in this paper. One of the key ideas espoused in [18] is that, when the video sequence is an LDS, the block Hankel matrix is low rank. When we have linear measurements of the video frames, we can solve an affine-rank problem to recover the video. However, such methods optimize on the Hankel matrix directly and lead to computationally infeasible designs even for videos of very small dimensions. In contrast, CS-LDS has been shown to be fast and computationally feasible for very large videos involving millions of variables. The key is our two-step solution that isolates the space of unknowns into two manageable sets and solves for each separately.


An attractive property of random matrix-based CS measurement is the universality of the measurement process. Universality implies that the sensing process is independent of the subsequent reconstruction algorithm. This makes the sensing design “future-proof”; for such systems, if we devise a more sophisticated and powerful recovery algorithm in the future, then we do not need to redesign the camera or the sensing framework. The CS-LDS framework violates this property. The two-step measurement process of Section

3, which is key to breaking the bilinearity introduced by the LDS prior, implies that the CS-LDS design is not universal. An intriguing direction for future research is the design of a universal CS-LDS measurement process.

Online tracking:

We have made the assumption of a static observation matrix . However, as the length of the video increases, the assumption of a static is satisfied only by increasing the state space dimension. An alternate approach is to allow for a time-varying observation matrix and track it from the compressive measurements. This would give us the benefit of a low state space dimension and yet, be accurate when we sense for long durations.

Figure 10: Performance of MPEG-4 video compression and CS-LDS on the “candle” video (see Figure 7). Shown are (a) reconstruction SNR at various compression ratios, and (b, c - top) a few reconstructed frames and (b,c - bottom) error in reconstruction magnified 10. It is worth nothing that the MPEG-4 algorithm has complete access to the ground truth video, while CS-LDS works purely with undersampled linear measurements of the video. None-the-less, even at the same reconstruction SNR, the quality of MPEG-4 recovery is significantly better. This can be attributed to the non-linear and adapted coding that seeks to mitigate errors that are perceptually dominant.

Beyond LDS:

Figure 10 captures the relative performance of MPEG-4 compression algorithm and CS-LDS on a video. MPEG-4 has access to the ground truth video and, as a consequence, it achieves significantly better compressions for the same performance in recovery (see Figure 10(a)). Further, it is worth noting that the non-linear encoding in MPEG-4 produces errors that are imperceptible and hence, even at the same level of reconstruction error, produces videos that are of higher visual quality (see Figure 10(b,c)). This points at the inherent drawbacks of a linear encoder. While the CS-LDS framework makes a compelling case study of LDSs for video CS, its applicability to arbitrary videos is limited. In particular, it does not extend to simple non-stationary scenes such as people walking or panning cameras (see the result associated with Figure 8(h)). This motivates the search for models more general than LDS. In this regard, a promising line of future research is to leverage models from the video compression literature for CS recovery.


ACS and RGB were partially supported by the grants NSF CCF-0431150, CCF-0728867, CCF-0926127, CCF-1117939, ARO MURI W911NF-09-1-0383, W911NF-07-1-0185, DARPA N66001-11-1-4090, N66001-11-C-4092, N66001-08-1-2065, ONR N00014-12-1-0124 and AFOSR FA9550-09-1-0432.

RC was partially supported by the Office of Naval Research under the Grant N00014-12-1-0124.


  • [1] CS-LDS Project webpage. URL =
  • [2] M. Ayazoglu, B. Li, C. Dicle, M. Sznaier, and O. I. Camps, Dynamic subspace-based coordinated multicamera tracking, in IEEE Intl. Conf. Comp. Vision, 2011.
  • [3] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, Model-based compressive sensing, IEEE Trans. Inf. Theory, 56 (2010), pp. 1982–2001.
  • [4] R. G. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constr. Approx., 28 (2008), pp. 253–263.
  • [5] R. W. Brockett, Finite Dimensional Linear Systems, Wiley, 1970.
  • [6] E. J. Candès and B. Recht, Exact matrix completion via convex optimization, Found. Comp. Math., 9 (2009), pp. 717–772.
  • [7] E. J. Candès, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inf. Theory, 52 (2006), pp. 489–509.
  • [8] E. J. Candès and T. Tao, The power of convex relaxation: Near-optimal matrix completion, IEEE Trans. Inf. Theory, 56 (2010), pp. 2053–2080.
  • [9] V. Cevher, A. C. Sankaranarayanan, M. F. Duarte, D. Reddy, R. G. Baraniuk, and R. Chellappa, Compressive sensing for background subtraction, in Euro. Conf. Comp. Vision, Oct. 2008.
  • [10] A. B. Chan and N. Vasconcelos, Probabilistic kernels for the classification of auto-regressive visual processes, in IEEE Conf. Comp. Vision and Pattern Recog, June 2005.
  • [11] Y. Chikuse, Statistics on special manifolds, Springer Verlag, 2003.
  • [12] R. Coifman, F. Geshwind, and Y. Meyer, Noiselets, Appl. Comp. Harm. Anal., 10 (2001), pp. 27–44.
  • [13] T. Ding, M. Sznaier, and O. I. Camps, A rank minimization approach to video inpainting, in IEEE Intl. Conf. Comp. Vision, 2007.
  • [14] D. L. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, 52 (2006), pp. 1289–1306.
  • [15] G. Doretto, A. Chiuso, Y. N. Wu, and S. Soatto, Dynamic textures, Intl. J. Comp. Vision, 51 (2003), pp. 91–109.
  • [16] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, Single-pixel imaging via compressive sampling, IEEE Signal Process. Mag., 25 (2008), pp. 83–91.
  • [17] M. F. Duarte, M. B. Wakin, D. Baron, S. Sarvotham, and R. G. Baraniuk, Measurement bounds for sparse signal ensembles via graphical models, IEEE Trans. Inf. Theory, 59 (2013), pp. 4280–4289.
  • [18] M. Fazel, Matrix rank minimization with applications, PhD thesis, Stanford University, 2002.
  • [19] M. Fazel, H. Hindi, and S. P. Boyd,

    A rank minimization heuristic with application to minimum order system approximation

    , in IEEE Amer. Control Conf., June 2001.
  • [20]  , Log-det heuristic for matrix rank minimization with applications to hankel and euclidean distance matrices, in IEEE Amer. Control Conf., June 2003.
  • [21] M. Grant and S. Boyd, CVX: Matlab software for disciplined convex programming, version 1.21, Available at http://cvxr. com/cvx, (2011).
  • [22] J. Haupt and R. Nowak, Signal reconstruction from noisy random projections, IEEE Trans. Inf. Theory, 52 (2006), pp. 4036–4048.
  • [23] Y. Hitomi, J. Gu, M. Gupta, T. Mitsunaga, and S. K. Nayar, Video from a single coded exposure photograph using a learned over-complete dictionary, in IEEE Intl. Conf. Comp. Vision, Nov. 2011.
  • [24] K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T. W. Lee, and T. J. Sejnowski, Dictionary learning algorithms for sparse representation, Neural Comp., 15 (2003), pp. 349–396.
  • [25] S. K. Nayar, V. Branzoi, and T. E. Boult, Programmable imaging: Towards a flexible camera, Intl. J. Comp. Vision, 70 (2006), pp. 7–22.
  • [26] D. Needell and J. A. Tropp, Cosamp: Iterative signal recovery from incomplete and inaccurate samples, Appl. Comp. Harm. Anal., 26 (2009), pp. 301–321.
  • [27] J. Y. Park and M. B. Wakin, A multiscale framework for compressive sensing of video, in Pict. Coding Symp., May 2009.
  • [28] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, Orthogonal matching pursuit: Recursive function approximation with applications to wavelet decomposition, in Asilomar Conf. Signals Sys. Comp., Nov. 1993.
  • [29] R. Péteri, S. Fazekas, and M.J. Huiskes, DynTex: A comprehensive database of dynamic textures, Pattern Recog. Letters, 31 (2010), pp. 1627–1632.
  • [30] B. Recht, M. Fazel, and P. A. Parrilo, Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization, arXiv:0706.4138, (2007).
  • [31] D. Reddy, A. Veeraraghavan, and R. Chellappa, P2C2: Programmable pixel compressive camera for high speed imaging, in IEEE Conf. Comp. Vision and Pattern Recog, June 2011.
  • [32] Y. Saad, Krylov subspace methods for solving large unsymmetric linear systems, Math. Comput., 37 (1981), pp. 105–126.
  • [33] P. Saisan, G. Doretto, Y. Wu, and S. Soatto, Dynamic texture recognition, in IEEE Conf. Comp. Vision and Pattern Recog, Dec. 2001.
  • [34] A. C. Sankaranarayanan, P. Turaga, R. Baraniuk, and R. Chellappa, Compressive acquisition of dynamic scenes, in Euro. Conf. Comp. Vision, Sep. 2010.
  • [35] S. Soatto, G. Doretto, and Y. N. Wu, Dynamic textures, in IEEE Intl. Conf. Comp. Vision, July 2001.
  • [36] M. Sznaier, Compressive information extraction: A dynamical systems approach, in System Identification, vol. 16, 2012, pp. 1559–1568.
  • [37] P. Turaga, A. Veeraraghavan, and R. Chellappa, Unsupervised view and rate invariant clustering of video sequences, Comp. Vision and Image Understd., 113 (2009), pp. 353–371.
  • [38] E. van den Berg and M. P. Friedlander, Probing the pareto frontier for basis pursuit solutions, SIAM J. Scientific Comp., 31 (2008), pp. 890–912.
  • [39] P. Van Overschee and B. De Moor, N4SID: Subspace algorithms for the identification of combined deterministic-stochastic systems, Automatica, 30 (1994), pp. 75–93.
  • [40] N. Vaswani, Kalman filtered compressed sensing, in IEEE Conf. Image Process., Oct. 2008.
  • [41] N. Vaswani and W. Lu, Modified-CS: Modifying compressive sensing for problems with partially known support, in Intl. Symp. Inf. Theory, June 2009.
  • [42] A. Veeraraghavan, R. Chellappa, and A. K. Roy-Chowdhury, The function space of an activity, in IEEE Conf. Comp. Vision and Pattern Recog, June 2006.
  • [43] A. Veeraraghavan, D. Reddy, and R. Raskar, Coded strobing photography: Compressive sensing of high speed periodic events, IEEE Trans. Pattern Anal. Mach. Intell., 33 (2011), pp. 671–686.
  • [44] A. Veeraraghavan, A. K. Roy-Chowdhury, and R. Chellappa, Matching shape sequences in video with applications in human movement analysis, IEEE Trans. Pattern Anal. Mach. Intell., 27 (2005), pp. 1896–1909.
  • [45] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, and R. G. Baraniuk, Compressive imaging for video representation and coding, in Pict. Coding Symp., Apr. 2006.
  • [46] M. B. Wakin, B. M. Sanandaji, and T. L. Vincent, On the observability of linear systems from random, compressive measurements, in IEEE Conf. on Decision and Control, Dec. 2010.