Compressed sensing (CS) aims at achieving accurate signal reconstruction while sampling signals at a low sampling rate, typically far smaller than that of Nyquist/Shannon. Let be a signal of interest, a non-adaptive sensing matrix (encoder), consisting of measurements, a measured signal, and a decoder used to reconstruct from . CS develops encoder-decoder pairs such that a small reconstruction error can be achieved.
Reconstructing from is an ill-posed problem whose solution requires some prior information on the signal. Instead of the frequency band-limit signal model assumed in classic Shannon sampling theory, conventional CS adopts a sparse signal model, i.e., there exists a dictionary, typically an orthogonal basis , a linear combination of whose columns generates an accurate approximation of the signal, , the coefficients , , having their amplitude decay fast after being sorted. For signals following the sparse model, it has been shown that using some random sensing matrices such as Gaussian and Bernoulli matrices with measurements, and an minimization or a greedy matching pursuit decoder promoting sparsity, with high probability CS leads to accurate signal reconstruction: The obtained approximation error is tightly upper bounded by a constant times the best -term approximation error, the minimum error that one may achieve by keeping the largest coefficients in [candes2006robust, candes2005decoding, cohen2009compressed, donoho2006compressed]. Redundant and signal adaptive dictionaries that further improve the CS performance with respect to orthogonal bases have been investigated [candes2010compressed, duarte2009learning, peyre2010best]. In addition to sparse models, manifold models have been considered for CS as well [baraniuk2009random, chen2010compressive].
The present paper introduces a novel framework of CS, namely statistical compressed sensing (SCS). As opposed to conventional CS that deals with one signal at a time, SCS aims at efficiently sampling a collection of signals and having accurate reconstruction on average. Instead of restricting to sparse models, SCS works with general Bayesian models. Assuming that the signals
follow a distribution with probability density function (pdf), SCS designs encoder-decoder pairs so that the average error
where is a norm, is small. As an important example, SCS with Gaussian models is here shown to have improved performance (bounds) relative to conventional CS, the signal reconstruction calculated with an optimal decoder implemented via a fast linear filtering. Moreover, for Gaussian mixture models (GMMs) that better describe most real signals, SCS with a piecewise linear decoder is investigated.
The motivation of SCS with Gaussian models is twofold. First, controlling the average error over a collection of signals is useful in signal acquisition, not only because one is often interested in acquiring a collection of signals in real applications, but also because more effective processing of an individual signal, an image or a sound for example, is usually achieved by dividing the signal in (often overlapping) local subparts, patches (see Figure 10) or short-time windows for instance, so a signal can be regarded as a collection of subpart signals [aharon2006k, buades2006review, yu2008audio, yu2010PLE]. In addition, Gaussian mixture models (GMMs), which model signals or subpart signals with a collection of Gaussians, assuming each signal drawn from one of them, have been shown effective in describing real signals, leading to state-of-the-art results in image inverse problems [yu2010PLE] and missing data estimation [leger2010Matrix].
SCS based on a single Gaussian model is first developed in Section II. Following a similar mathematical approach as the one adopted in conventional CS performance analysis [cohen2009compressed], it is shown that with the same random matrices as in conventional CS, but with a considerably reduced number of measurements, and with the optimal decoder implemented via linear filtering, significantly faster than the decoders applied in conventional CS, the average error of Gaussian SCS is tightly upper bounded by a constant times the best -term approximation error with overwhelming probability, the failure probability being orders of magnitude smaller than that of conventional CS. Moreover, stronger yet simpler results further show that for any sensing matrix, the average error of Gaussian SCS is upper bounded by a constant times the best -term approximation with probability one, and the bound constant can be efficiently calculated.
Section III extends SCS to GMMs. A piecewise linear GMM-based SCS decoder, which essentially consists of estimating the signal using each Gaussian model included in the GMM and then selecting the best model, is introduced. The accuracy of the model selection, at the heart of the scheme, is analyzed in detail in terms of the properties of the Gaussian distributions and the number of sensing measurements. These results are then important in the general area of model selection from compressed measurements.
Following [yu2010PLE], Section IV presents an maximum a posteriori expectation-maximization (MAP-EM) algorithm that iteratively estimates the Gaussian models and decodes the signals. GMM-based SCS calculated with the MAP-EM algorithm is applied in real image sensing, leading to improved results with respect to conventional CS, at a considerably lower computational cost.
Ii Performance Bounds for a Single Gaussian Model
This section analyzes the performance bounds of SCS based on a single Gaussian model. Perfect reconstruction of degenerated Gaussian signals is briefly discussed. After reviewing basic properties of linear approximation for Gaussian signals, the rest of the section shows that for Gaussian signals with fast eigenvalue decay, the average error of SCS usingmeasurements and decoded by a linear estimator is tightly upper bounded by that of best -term approximation.
Signals are assumed to follow a Gaussian distribution
in this section. Principal Component Analysis (PCA) calculates a basis changeof the data , with the orthonormal PCA basis that diagonalizes the data covariance matrix
where is a diagonal matrix whose diagonal elements are the sorted eigenvalues, and
the PCA coefficient vector[mallat2008wts]. In this section, for most of the time we will assume without loss of generality that by looking in the PCA domain. For Gaussian and Bernoulli matrices that are known to be universal, analyzing CS in canonical basis or PCA basis is equivalent [baraniuk2008simple].
Ii-a Degenerated Gaussians
Conventional CS is able to perfectly reconstruct -sparse signals, i.e., with at most non-zero entries (typically ), with measurements [cohen2009compressed]. Degenerated Gaussian distributions , where with at most non-zero eigenvalues, give the counterpart of -sparsity for the Gaussian signal models considered in this paper. Such signals belong to a linear subspace . The next lemma gives a condition for perfect reconstruction of signals in .
If is any matrix and is a positive integer, then there is a decoder such that , for all , if and only if .
Suppose there is a decoder such that , for all . Let . We can write where both . Since , . Plugging and into the decoder , we obtain and then .
Suppose . If with , then , so . is thus a one-to-one map. Therefore there must exist a decoder such that . ∎
It is possible to construct matrices of size with which satisfies the requirement of the Lemma. A trivial example is , where
is the identity matrix of sizeand
is a zero matrix of size. Comparing with conventional compressed sensing that requires measurements for exact reconstruction of -sparse signals, with only measurements signals in can be exactly reconstructed. Indeed, in contrast to the -sparse signals where the positions of the non-zero entries are unknown (-sparse signals reside in multiple -dimensional subspaces), with the degenerated Gaussian model , the position of the non-zero coefficients are known a priori to be the first ones. measurements thus suffice for perfect reconstruction.
In the following, we will concentrate on the more general case of non-degenerated Gaussian signals (i.e., Gaussians with full-rank covariance matrices ) with fast eigenvalue decay, in analogy to compressible signals for conventional CS. As mentioned before and will be further experimented in this paper, such simple models not only lead to improved theoretical bounds, but also provide state-of-the-art image reconstruction results.
Ii-B Optimal Decoder
To simplify the notation, we assume without loss of generality that the Gaussian has zero mean , as one can always center the signal with respect to the mean.
It is well-known that the optimal decoders for Gaussian signals are calculated with linear filtering:
[kay1998fundamentals] Let be a random vector with prior pdf , and , , be a sensing matrix. From the measured signal , the optimal decoder that minimizes the mean square error (MSE) as well as the mean absolute error (MAE) where , is obtained with a linear MAP estimator,
and the resulting error is Gaussian with mean zero and with covariance matrix whose trace yields the MSE of SCS.
In contrast to conventional CS, for which the minimization or greedy matching pursuit decoders, calculated with iterative procedures, have been shown optimal under certain conditions on and the signal sparsity [candes2006robust, donoho2006compressed], Gaussian SCS enjoys the advantage of having an optimal decoder (2) calculated fast via a closed-form linear filtering for any .
Applying an independent random matrix realization to sense each signal has been considered in [cohen2009compressed]. In real applications, these random sensing matrices need not to be stored, since the decoder can regenerate them itself given the random seed.
Following a PCA basis change (1), it is equivalent to consider signals and , where is a diagonal matrix whose diagonal elements are the sorted eigenvalues. Theorem 1 and Corollary 1 clearly hold for , with (2), (3), and (4) rewritten as
Note that PCA bases, as sparsifying dictionaries, have been applied to do conventional CS based on sparse models [masiero2010data], which is fundamentally different than the Gaussian models and SCS here studied.
Ii-C Linear vs Nonlinear Approximation
Before proceeding with the analysis of the SCS performance, let us make some comments on the relationship between linear and non-linear approximations for Gaussian signals. In particular, the following is observed via Monte Carlo simulations:
For Gaussian signals , where whose eigenvalues decay fast, the best -term linear approximation
and the nonlinear approximation
where is a thresholding operator that keeps the coefficients of largest amplitude and setting others to zero, lead to comparable approximation errors
Monte Carlo simulations are performed to test this. Assuming a power decay of the eigenvalues [mallat2008wts],
where is the decay parameter, with , Figure 1 plots the MSEs
normalized by the ideal signal energy , of best -term linear and nonlinear approximations as a function of , with typical (for image patches of size for example) values and ( and ). Both MSEs decrease as increases, i.e., as the eigenvalues decay faster. With typical values (similar to the eigenvalue decay calculated with typical image patches) and or 16, both approximations are accurate and generate small and comparable MSEs, their difference being about of the signal energy and ratio about 2.
Following this, the error of Gaussian SCS will be compared with that of best -term linear approximation, which is comparable to that of best -term nonlinear approximation. For simplicity, the best -term linear approximation errors will be denoted as
Note that .
Ii-D Performance of Gaussian SCS – A Numerical Analysis At First
This section numerically evaluates the MSE of Gaussian SCS, and compares it with the minimal MSE generated by best -term linear approximation, proceeding the theoretical bounds later developed.
As before, a power decay of the eigenvalues (11), with , is assumed in the Monte Carlo simulations. An independent random Gaussian matrix realization is applied to sense each signal [cohen2009compressed].
Figures 2 (a) and (c)-top plot the MSE (normalized by the ideal signal energy) of SCS and that of the best -term linear approximation, as well as their ratio as a function of , with fixed at typical values and ( and ). As increases, i.e., as the eigenvalues decay faster, the MSEs for both methods decrease. Their ratio increases almost linearly with . The same is plotted in figures 2 (b) and (c)-bottom, with eigenvalue decay parameter fixed at a typical value , and with varying from to ( from to ). As increases, both MSEs decrease, their ratio being almost constant at about .
These results indicate a good performance of Gaussian SCS, its MSE is only a small number of times larger than that of the best -term linear approximation. 111Simulations using the same coefficient energy power decay model (11) show that the ratio between conventional CS based on sparse models, with measurements, and that of the best -term nonlinear approximation, varies as a function of the decay parameter and . For typical values , the ratio is typically an order of magnitude larger than that between the MSE of SCS and that of the best -term linear approximation. The next sections provide mathematical analysis of this performance.
Let us notice that while the best -term linear approximation decoding is feasible for signals following a single Gaussian distribution, it is impractical with GMMs (assuming multiple Gaussians and that each signal is generated from one of them with an unknown index), since the Gaussian index of the signal is unknown. SCS with GMMs, which describe real data considerably better than a single Gaussian model [yu2010PLE], will be described in sections III and IV.
Ii-E Performance Bounds
Following the analysis techniques in [cohen2009compressed], this section shows that with Gaussian and Bernoulli random matrices of measurements, considerably smaller than the required by conventional CS, the average error of Gaussian SCS is tightly upper bounded by a constant times the best -term linear approximation error with overwhelming probability, the failure probability being orders of magnitude smaller than that of conventional CS.
We consider only the encoder-decoder pairs that preserve , i.e., , satisfied by the optimal in (5) for Gaussian signals , .
Ii-E1 From Null Space Property to Instance Optimality
The instance optimality in expectation bounds the average error of SCS with a constant times that of the best -term linear approximation (13), defining the desired SCS performance:
Let be a random vector that follows a certain distribution. Let be any subset of indices. We say that is instance optimal in expectation in in , with a constant , if
where is the signal restricted to (, and otherwise), the expectation on the left side considered with respect to , and to if one random is drawn independently for each . Similarly, the MSE instance optimality in is defined as
In particular, if , then we say that is instance optimal in expectation of order in , with a constant , if
and is instance optimal of order in MSE, with a constant , if
The null space property in expectation defined next will be shown equivalent to the instance optimality in expectation.
Let be a random vector that follows a certain distribution. Let be any subset of indices. We say that in has the null space property in expectation in in , with constant , if
where is the signal restricted to (, and otherwise), the expectation considered on the left side with respect to , and to if one random is drawn independently for each . Note that . Similarly, the MSE null space property in is defined as
In particular, if , with , then we say that in has the null space property in expectation of order in , with constant , if
and has the MSE null space property of order , if
Let be a random vector that follows a certain distribution. Given an matrix , a norm , and a subset of indices , a sufficient condition that there exists a decoder such that the instance optimality in expectation in in (14) holds with constant , is that the null space property in expectation (18) holds with for this :
A necessary condition is the null space property in expectation (18) with :
To prove the sufficiency of (22), we consider the decoder such that for all ,
By (22), we have
where the second inequality uses the triangle inequality, and the last inequality follows from the choice of the decoder (24).
To prove the necessity of (23), let be any decoder for which (14) holds. Let and let be the linear approximation of in (, and otherwise). Let be any splitting of into two vectors in the linear space . We can write
with . As the right side of (14) is equal to 0 for , we deduce . Since , we have , so that . We derive
A similar proof proceeds for MSE instance optimality and null space property. The second part of the theorem is a direct consequence of the first part. ∎
Comparing to conventional CS that requires the null space property to hold with the best -term nonlinear approximation error [cohen2009compressed], the requirement for Gaussian SCS is relaxed to , thanks to the linearity of the best -term linear approximation for Gaussian signals.
Theorem 2 proves the existence of the decoder for which the instance optimality in expectation holds for , given the null space property in expectation. However, it does not explain how such decoder is implemented. The following Corollary, a direct consequence of theorems 1 and 2, shows that for Gaussian signals the optimal decoder (5) leads to the instance optimality in expectation.
For Gaussian signals , if an sensing matrix satisfies the null space property in expectation (20) of order in , with constant , or the MSE null space property (21) of order with constant , then the optimal and linear decoder satisfies the instance optimality in expectation (16) in , or the MSE instance optimality (17).
Ii-E2 From RIP to Null Space Property
The Restricted Isometry Property (RIP) of a matrix measures its ability to preserve distances, and is related to the null space property in conventional CS [candes2006near, donoho2006compressed]. The new linear RIP of order restricts the requirement of conventional RIP of order to a union of -dimensional linear subspaces with consecutive supports:
Let be a positive integer. Let define a linear subspace of functions with support in the first indices in , a linear subspace of functions with support in the next indices, and so on. The functions in the last linear subspace defined this way may have support with less than indices. An matrix is said to have linear RIP of order with constant if
The linear RIP is a special case of the block RIP [eldar2009robust], with block sparsity one and blocks having consecutive support of the same size.
Let be a random vector that follows a certain distribution. Let be an matrix that satisfies the linear RIP of order with , and let be a decoder. Let . Assume further that decays in : , . Then satisfies the null property in expectation of order in (20), with constant . 222As in [cohen2009compressed], the result here is in the norm, while in the next section we will consider a natural extension of the RIP for SCS which can be studied in the norm, something possible for conventional CS only in a probabilistic setting, with one random sensing matrix independently drawn for each signal [cohen2009compressed].
Let denote the set of first indices of the entries in , the next indices, the next indices, etc. We have
where the second and last inequalities follow the linear RIP property of , the third inequality follows from the triangle equality, and the equality holds since . Hence we have
Ii-E3 From Random Matrices to Linear RIP
The next Theorem shows that Gaussian and Bernoulli matrices satisfy the conventional RIP for one subspace with overwhelming probability. The linear RIP will be addressed after it.
[achlioptas2003database, baraniuk2008simple] Let be a random matrix of size drawn according to any distribution that satisfies the concentration inequality
where , and is a constant depending only on . Then for any set with , we have the conventional RIP condition
where is the set of all vectors in that are zero outside of , with probability greater than or equal to Gaussian and Bernoulli matrices satisfy the concentration inequality (32).
Suppose that , and are given. Let be a random matrix of size drawn according to any distribution that satisfies the concentration inequality (32). Then there exist constants depending only on such that the linear RIP of order (27) holds with probability greater than or equal to for with the prescribed and .
Thus for a fixed , whenever , the exponent in the exponential on the right side of (34) is provided that . We can always choose small enough to ensure . This proves that with a probability , the matrix will satisfy the linear RIP (27). ∎
Comparing with conventional CS, where the null space property requires that the RIP (33) holds for subspaces [baraniuk2008simple, candes2006near, donoho2006compressed], the number of subspaces in the linear RIP (27) is sharply reduced to for Gaussian SCS, thanks to the coefficients pre-ordering and the linear estimation in consequence. Therefore with the same number of measurements , the probability that a Gaussian or Bernoulli matrix satisfies the linear RIP is substantially higher than that for the conventional RIP. Equivalently, given the same probability that satisfies the linear RIP or the conventional RIP of order , the required number of measurements for the linear RIP is , substantially smaller than the required for the conventional RIP. Similar improvements have been obtained with model-based CS that assumes structured sparsity on the signals [baraniuk2010model].
With the results above, we have shown that for Gaussian signals, with sensing matrices satisfying the linear RIP (27) of order , for example Gaussian or Bernoulli matrices with rows, with overwhelming probability, and with the optimal and linear decoder (5), SCS leads to the instance optimality in expectation of order in (16), with constant . is typically small by the definition of CS.
Ii-F Performance Bounds with RIP in Expectation
This section shows that with an RIP in expectation, a matrix isometry property more adapted to SCS, the Gaussian SCS MSE instance optimality (17) of order and constant , holds in the norm with probability one for any matrix. has a closed-form and can be easily computed numerically.
Let be a random vector that follows a certain distribution. Let be an sensing matrix and let be a decoder. Let . in is said to have RIP in expectation in with constant if
where , is the signal restricted to (, and otherwise), and the expectation is with respect to , and to if one random is drawn independently for each .
The conventional RIP is known to be satisfied only by some random matrices, Gaussian and Bernoulli matrices for example, with high probability. For a given matrix, checking the RIP property is however NP-hard [baraniuk2008simple]. By contrast, the constant of the RIP in expectation (35) can be measured for any matrix via a fast Monte Carlo simulation, the quick convergence guaranteed by the concentration of measure [talagrand1996new]. The next proposition, directly following from (6) and (7), further shows that for Gaussian signals, the RIP in expectation has its constant in a closed form.
Assume , is an sensing matrix and is the optimal and linear decoder (5). Then in satisfies the RIP in expectation in ,
where is an extraction matrix giving , i.e., , , all the other entries being zero. The expectation with respect to is calculated if one random is drawn independently for each .
The next Theorem shows that the RIP in expectation leads to the MSE null space property holding in equality.
Let be a random vector that follows a certain distribution, an sensing matrix, and a decoder. Assume and , for some . Assume that in has the RIP in expectation in with constant , and in with constant :
where , and is the signal restricted to (, and otherwise). Then satisfies
where . In particular, if , with , then satisfies the MSE null space property of order , which holds with equality,
We derive (38) by