Compressed sensing (CS) is an emerging signal processing methodology where signals are acquired in compressed form as undersampled linear measurements. The applications of CS are abundant, ranging from radar and error correction to many areas of image processing . The underlying assumption that makes such acquisition and reconstruction possible is that most natural signals are sparse or compressible. We say that a signal is -sparse when
Compressible signals are those which are well-approximated by sparse signals. In the CS framework, we acquire nonadaptive linear measurements of the form
where is an appropriate linear operator and
is vector modeling additive noise. The theory of CS ([22, 9, 21]) ensures that under suitable assumptions on , compressible signals can be approximately reconstructed by solving an minimization problem:
above, and denote the standard and Euclidean norms, bounds the noise level , and denotes the set of minimizers (we use the notation of equality rather than set notation here and use the convention that if the minimizer is not unique, we may choose a solution lexicographically; however, in this particular case under standard compressed sensing assumptions it is known that the solution is indeed unique ). The program may be cast as a second order cone program (SOCP) and can be solved efficiently using standard convex programming methods (see e.g. [3, 17]).
One property of the operator that guarantees sparse signal recovery via , as introduced by Candès and Tao in , is the restricted isometry property (RIP).
A linear operator is said to have the restricted isometry property (RIP) of order and level if
Many distributions of random matrices of dimension
, including randomly subsampled rows from the discrete Fourier transform or from a bounded orthonormal system more generally [47, 45, 43, 44, 5], and randomly-generated circulant matrices , are known to generate RIP matrices of order and level if the number of measurements satisfies . Note that here and throughout we have used the notation (analogously ) to indicate that there exists some absolute constant such that ( once [10, 36, 47, 1].
Candès, Romberg, and Tao  showed that when the measurement operator has the RIP of order and sufficiently small constant , the program recovers an estimation to that satisfies the error bound
where denotes the best -sparse approximation to the signal . Using properties about Gel’fand widths of the ball due to Kashin  and Garnaev–Gluskin , this is the optimal minimax reconstruction rate for -minimization using nonadaptive linear measurements. Due to the rotational-invariance of an RIP matrix with randomized column signs , a completely analogous theory holds for signals that are compressible with respect to a known orthonormal basis or tight frame by replacing with inside the -norm of the minimization problem [6, 31].
1.1 Imaging with CS
Natural images are highly compressible with respect to their gradient representation. For an image one defines its discrete directional derivatives by
The discrete gradient transform is defined in terms of the directional derivatives,
The anisotropic total variation seminorm is defined as
and the isotropic total variation seminorm as
The recovery guarantees we derive apply to both anisotropic and isotropic total variation semi norms, and we use the notation to refer to either choice of seminorm. For brevity of presentation, we provide details only for the isotropic total variation seminorm, but refer the reader to  for the analysis of the anisotropic variant.
The total variation seminorm is a regularizer of choice in many image processing applications. That is, given a collection of noisy linear measurements with of an underlying image , total variation minimization is used to pick from among the possibly infinitely-many images consistent with these measurements:
Properties of TV minimizers in inverse problems have been studied in the discrete and continuous settings [2, 40, 20, 48, 46, 41, 12, 13], and convergence rates of stability measures for TV have also been established [4, 24]. In the setting of compressed sensing and more broadly in other imaging applications, total variation regularization has been used for denoising, deblurring, and inpainting (see e.g. [8, 11, 7, 42, 14, 33, 34, 32, 39, 35, 25, 27, 50, 38] and the references therein). In this article, we focus on recovery guarantees for (TV) in the compressed sensing setting. Along the way, we derive strengthened Sobolev inequalities for discrete signals lying near the null space of operators incoherent with the Haar wavelet basis, and we believe that such bounds should be useful in a broader context for understanding the connection between total variation minimization and -wavelet coefficient minimization.
While (TV) is similar to the -minimization program , the RIP-based theoretical guarantees for do not directly translate to recovery guarantees for (TV) because the gradient map is not well-conditioned on the orthogonal complement of ). In fact, viewed as an invertible operator over mean-zero images, the condition number of the gradient map is proportional to the image side length .333One sees that the norm of is a constant whereas the norm of its inverse is proportional to (one can observe this scaling, for example, by noting it is obtained by the image whose entries are constant). For anisotropic , recovery guarantees in the compressed sensing setting were obtained in  for two-dimensional images.
Theorem A (from ).
For a number of measurements , there are choices of linear operators for which the following holds for any image : Given noisy measurements with noise level , the reconstructed image
satisfies the error bound
Here and throughout, denotes the best -term approximation to the array .
In words, the total variation minimizer estimates to within a factor of the noise level and best -term approximation error of its gradient. The bound in (9) is optimal up to the logarithmic factor .
The contribution of this paper is twofold: We extend the recovery guarantees of Theorem A to the multidimensional setting and to the setting of isotropic total variation minimization, for arbitrary dimension . The precise statement of results is given in Theorem 3. The proofs involve extending the Sobolev inequalities for random subspaces from  to higher-dimensional signal structures, using bounds of Cohen, Dahmen, Daubechies, and DeVore in  on the compressibility of wavelet representations in terms of the bounded variation of a function, which hold for functions in dimension . Hence, our results for total variation, do not hold in dimension . See  for results on one-dimensional total variation under assumptions other than the RIP.
The article is organized as follows. In Section 2 we recall relevant background material on the multidimensional total variation seminorm and multidimensional orthonormal wavelet transform. Section 3 states our main result: total variation minimization provides stable signal recovery for signals of arbitrary dimension . The proof of this result will occupy the remainder of the paper; in Section 4 we prove that the signal gradient is recovered stably, while in Section 5 we pass from stable gradient recovery to stable signal recovery using strengthened Sobolev inequalities which we derive for random subspaces. The proofs of propositions and theorems used along the way are contained in the appendix.
2 Preliminaries for multidimensional signal analysis
The setting for this article is the space of multidimensional arrays of complex numbers,
From here on out, we will use the shorthand . We will also use the convention that vectors such as (apart from index vectors ) are boldface and their scalar components such as are normal typeface. We also treat as the Hilbert space equipped with inner product
where denotes the conjugate of . This Hilbert space is isometric444Recall that if , and is a Hilbert space equipped with the inner product . to the subspace of functions which are constant over cubes of side length , and the isometry is provided by identifying with the function satisfying for . More generally, we denote by the entrywise -norm of the signal .
For , the discrete derivative of in the direction of is the array defined component-wise by
and we define the -dimensional discrete gradient transform through its components
The -dimensional anisotropic total variation seminorm is defined as , while the isotropic total variation seminorm is a mixed - norm of the -dimensional discrete gradient,
A linear operator can be regarded as a sequence of multidimensional arrays componentwise,
and a linear operator can be expressed similarly through its components . If and then the row direct sum operator is the linear operator from to with component arrays given by
Alternatively, for linear operators and , the column direct sum operator has component arrays given by
2.1 The multidimensional Haar wavelet transform
The Haar wavelet transform provides a sparsifying basis for natural signals such as images and movies, and is closely related to the discrete gradient. For a comprehensive introduction to wavelets, we refer the reader to .
The (continuous) multidimensional Haar wavelet basis is derived from a tensor-product representation of the univariate Haar basis, which forms an orthonormal system for square-integrable functions on the unit interval and consists of the constant function
the step function
and dyadic dilations and translations of the step function,
The Haar basis for the higher dimensional space of square-integrable functions on the unit cube consists of tensor-products of the univariate Haar wavelets. Concretely, for and , we define the multivariate functions by
The orthonormal Haar system on is then comprised of the constant function along with all functions of the form
The discrete multidimensional Haar transform is derived from the continuous construction via the isometric identification between and : defining
the matrix product computing the discrete Haar transform can be expressed as , where the indices are in the range (16) but with , which is a set of size . Note that with this normalization, the transform is orthonormal.
2.2 Gradient versus wavelet sparsity
The following is a corollary of a remarkable result from Cohen, Dahmen, Daubechies, and DeVore  which bounds the compressibility of a function’s wavelet representation by its bounded variation, and will be very useful in our analysis.
Proposition 2 (Corollary of Theorem 1.1 from ).
There is a universal constant such that the following holds for any mean-zero in dimension : if the Haar transform coefficients are partitioned by their support into blocks of cardinality , then the coefficient block of th largest -norm, denoted by has -norm bounded by
Thus by equivalence of the anisotropic and isotropic total variation seminorms up to a factor of ,
3 The main result
Our main result concerns near-optimal recovery guarantees for multidimensional total variation minimization from compressed measurements. Recall that a linear operator is said to have the restricted isometry property (RIP) of order and level when
A linear operator satisfies the RIP if and only if the matrix whose th row consists of the unraveled entries of the th multidimensional array satisfies the classical RIP, (1), and so without loss of generality we treat both definitions of the RIP as equivalent.
For our main result it will be convenient to define for a multidimensional array the associated arrays and obtained by concatenating a block of zeros to the beginning and end of oriented in the th direction:
The following lemma relating gradient measurements with to signal measurements with and can be verified by direct algebraic manipulation and thus the proof is omitted.
Given and ,
where the directional derivative is defined in (11).
For a linear operator we define the operators and as the sequences of arrays and , respectively. We conclude from Lemma 3 that .
We are now prepared to state our main result which shows that total variation minimization yields stable recovery of -dimensional signals from RIP measurements.
Let . Fix integers and . Let be such that, composed with the orthonormal Haar wavelet transform, has the restricted isometry property of order and level . Let with be such that has the restricted isometry property of order and level . Set , and consider the linear operator given by
The following holds for any . From noisy measurements with noise level , the solution to
where is the isotropic total variation seminorm, is the associated mixed norm, and is the signal gradient restricted to the subset of largest-magnitude block norms .
1. Following the same lines of reasoning as the proof of this Main Theorem, one may derive error bounds for anisotropic TV minimization that are a bit tighter, namely, one only requires RIP of order . We believe that the suboptimal bounds for isotropic TV are merely an artifact of our proof technique, as isotropic TV minimization is preferred in practice.
2. The third bound shows that the reconstruction error is proportional (up to a logarithmic factor) to the noise level and the tail of the gradient of the signal . A number of i.i.d. and properly normalized Gaussian measurements can be used to construct the measurement operator which, with high probability, satisfies the required RIP conditions of the theorem [1, 47]. From this number of measurements, the error guarantees are optimal up to the factor of required RIP measurements, factor of on the noise dependence, and logarithmic factor in the signal dimension . We emphasize here that the specific construction of the measurement ensemble is likely only an artifact of the proof, and that more general RIP measurements are likely possible. See also  for results using Fourier measurements (for ).
3. The main theorem recovers the total variation guarantees of  when up to a term. This term is lost only because in the higher-dimensional analysis, our proofs require blocking of the wavelet coefficients. This term can be recovered by applying a more efficient blocking strategy and by writing in terms of in (32) of the proof. We write the bound as-is for simplicity.
4. The requirement of sidelength is not an actual restriction, as signals with arbitrary side-length can be extended via reflections across each dimension to a signal of side-length without increasing the total variation by more than a factor of . This requirement again seems to be only an artifact of the proof and one need not perform such changes in practice.
We now turn to the proof of the main theorem. We will first prove the gradient-level recovery bounds and , and then use these to prove the signal-level recovery bound via Sobolev inequalities for incoherent subspaces. Along the way, we must take care to balance estimates involving the gradient vectors and their block norms , as well as the blocks of wavelet coefficients associated to a dyadic cube and their block norms .
4 Stable gradient recovery
In this section we prove statements and of the main theorem concerning stable gradient recovery, using standard results in compressed sensing combined with a summation by parts trick provided by the specific form of the measurements in the Main Theorem and Lemma 3.
Suppose that is a linear operator satisfying the restricted isometry property of order and level , and suppose that the signal satisfies a tube constraint
Suppose further that using the notation of the Main Theorem, for a subset of cardinality (meaning ), satisfies a cone-constraint
Proposition 4 generalizes results in  and its proof is included in the appendix. Using Proposition 4 and RIP assumptions on the operator , the gradient-level recovery guarantees (i) and (ii) reduce to proving that the discrete gradient of the residual signal error satisfies the tube and cone constraints.
(Main Theorem, statements (i) and (ii).)
Let be the residual error, and set .
Then we have
Cone Constraint. Consider the block norm associated to the index , and denote by the index of the th largest block norm , and let . Since is a minimizer of (TV) and satisfies the feasibility constraint in (TV), we have that . By the reverse triangle inequality,
This yields the cone constraint
Tube constraint. Recall that . Since both and are feasible solutions to (TV), Jensen’s inequality gives
By Lemma 3, we have for each component operator ,
Then (where we assume that is ordered appropriately) and
In light of Proposition 4 the proof is complete.
The component operator from the main theorem was not used at all in deriving properties and ; on the other hand, only the measurements in will be used to derive property from and . We conjecture that all measurements in the main result apart from those in the component operator are artifacts of the proof techniques herein.
5 A Sobolev inequality for incoherent subspaces
We now derive a strengthened Sobolev inequality for signals lying near the null space of a matrix which is incoherent to the Haar wavelet basis.
Theorem 6 (Sobolev inequality for incoherent subspaces).
Let and let . Let be a linear map such that, composed with the multivariate Haar wavelet transform , the resulting operator satisfies the restricted isometry property of order and level . Then there is a universal constant such that the following holds: if satisfies the tube constraint , then
and so, by equivalence of the isotropic and anisotropic total variation seminorms up to a factor of ,
The RIP assumptions on imply that for with -sparse wavelet representation,
with the final equality holding because is unitary. This implies that the null space of cannot contain any signals admitting an -sparse wavelet expansion, apart from the zero vector. In particular, the null space of contains no nontrivial constant signals and thus intersects the null space of trivially.
Theorem 6 admits corollaries for various families of random matrices with restricted isometries. For Gaussian random matrices, the theorem implies the following.
Let be a linear map realizable as an matrix whose entries are mean-zero i.i.d. Gaussian random variables. Then there is a universal constant such that with probability exceeding , the following bound holds for any lying in the null-space of :
Proof of Corollary 8.
Proof of Theorem 6.
Consider the signal error , and consider its orthogonal decomposition where is constant and is mean-zero (recall that has constant entries equal to ). Let represent the orthonormal Haar transform of . Suppose without loss of generality that the desired sparsity level is either smaller than or a positive multiple of , and write where either or (for arbitrary , we could consider which satisfies ).
Let be the index set of cardinality which includes the constant Haar coefficient along with the largest-magnitude entries of (not including ). Let be the set of largest-magnitude entries of in , and so on. Note that and similar expressions above and below can have both the meaning of restricting to the indices in as well as being the array whose entries are set to zero outside .
Now, since is mean-zero and , we may apply Proposition 2. To that end, consider the decomposition of into blocks of cardinality as in Proposition 2, according to the support of their wavelets. By definition, is at least as large as for any other of cardinality . Consequently, is smaller than for any other of cardinality . Thus, because ,
where the second to last inequality follows from Proposition 2 and the last inequality from properties of the geometric summation.
We use a similar procedure to bound the -norm of the residual,
By assumption, satisfies the tube constraint and satisfies the restricted isometry property of order . We conclude that