# Near-optimal compressed sensing guarantees for total variation minimization

Consider the problem of reconstructing a multidimensional signal from an underdetermined set of measurements, as in the setting of compressed sensing. Without any additional assumptions, this problem is ill-posed. However, for signals such as natural images or movies, the minimal total variation estimate consistent with the measurements often produces a good approximation to the underlying signal, even if the number of measurements is far smaller than the ambient dimensionality. This paper extends recent reconstruction guarantees for two-dimensional images to signals of arbitrary dimension d>1 and to isotropic total variation problems. To be precise, we show that a multidimensional signal x can be reconstructed from O(sd*log(N^d)) linear measurements using total variation minimization to within a factor of the best s-term approximation of its gradient. The reconstruction guarantees we provide are necessarily optimal up to polynomial factors in the spatial dimension d.

## Authors

• 53 publications
• 29 publications
• ### Guarantees of Total Variation Minimization for Signal Recovery

In this paper, we consider using total variation minimization to recover...
01/28/2013 ∙ by Jian-Feng Cai, et al. ∙ 0

• ### Compressed Sensing with 1D Total Variation: Breaking Sample Complexity Barriers via Non-Uniform Recovery (iTWIST'20)

This paper investigates total variation minimization in one spatial dime...
09/07/2020 ∙ by Martin Genzel, et al. ∙ 0

• ### Compressed Sensing with 1D Total Variation: Breaking Sample Complexity Barriers via Non-Uniform Recovery

This paper investigates total variation minimization in one spatial dime...
01/27/2020 ∙ by Martin Genzel, et al. ∙ 0

• ### Compressed sensing in the presence of speckle noise

The problem of recovering a structured signal from its linear measuremen...
07/31/2021 ∙ by Wenda Zhou, et al. ∙ 0

• ### On the Error in Phase Transition Computations for Compressed Sensing

Evaluating the statistical dimension is a common tool to determine the a...
06/27/2018 ∙ by Sajad Daei, et al. ∙ 0

• ### Iterative Alpha Expansion for estimating gradient-sparse signals from linear measurements

We consider estimating a piecewise-constant image, or a gradient-sparse ...
05/15/2019 ∙ by Sheng Xu, et al. ∙ 0

• ### Removing Stripes, Scratches, and Curtaining with Non-Recoverable Compressed Sensing

Highly-directional image artifacts such as ion mill curtaining, mechanic...
01/23/2019 ∙ by Jonathan Schwartz, et al. ∙ 8

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Compressed sensing (CS) is an emerging signal processing methodology where signals are acquired in compressed form as undersampled linear measurements. The applications of CS are abundant, ranging from radar and error correction to many areas of image processing [18]. The underlying assumption that makes such acquisition and reconstruction possible is that most natural signals are sparse or compressible. We say that a signal is -sparse when

 ∥x∥0def=|supp(x)|≤s≪p. (1)

Compressible signals are those which are well-approximated by sparse signals. In the CS framework, we acquire nonadaptive linear measurements of the form

 y=M(x)+ξ,

where is an appropriate linear operator and

is vector modeling additive noise. The theory of CS (

[22, 9, 21]) ensures that under suitable assumptions on , compressible signals can be approximately reconstructed by solving an minimization problem:

 ^x=argminw∥w∥1such that% ∥M(w)−y∥2≤ε; (L1)

above, and denote the standard and Euclidean norms, bounds the noise level , and denotes the set of minimizers (we use the notation of equality rather than set notation here and use the convention that if the minimizer is not unique, we may choose a solution lexicographically; however, in this particular case under standard compressed sensing assumptions it is known that the solution is indeed unique [51]). The program may be cast as a second order cone program (SOCP) and can be solved efficiently using standard convex programming methods (see e.g. [3, 17]).

One property of the operator that guarantees sparse signal recovery via , as introduced by Candès and Tao in [9], is the restricted isometry property (RIP).

###### Definition 1.

A linear operator is said to have the restricted isometry property (RIP) of order and level if

 (2)

Many distributions of random matrices of dimension

, including randomly subsampled rows from the discrete Fourier transform

[47] or from a bounded orthonormal system more generally [47, 45, 43, 44, 5], and randomly-generated circulant matrices [28], are known to generate RIP matrices of order and level if the number of measurements satisfies . Note that here and throughout we have used the notation (analogously ) to indicate that there exists some absolute constant such that (

). Moreover, a matrix whose entries are independent and identical (i.i.d.) realizations of a properly-normalized subgaussian random variable will have the RIP with probability exceeding

once  [10, 36, 47, 1].

Candès, Romberg, and Tao [8] showed that when the measurement operator has the RIP of order and sufficiently small constant , the program recovers an estimation to that satisfies the error bound

 ∥^x−x∥2≤C(∥x−xs∥1√s+ε), (3)

where denotes the best -sparse approximation to the signal . Using properties about Gel’fand widths of the ball due to Kashin [26] and Garnaev–Gluskin [23], this is the optimal minimax reconstruction rate for -minimization using nonadaptive linear measurements. Due to the rotational-invariance of an RIP matrix with randomized column signs [29], a completely analogous theory holds for signals that are compressible with respect to a known orthonormal basis or tight frame by replacing with inside the -norm of the minimization problem  [6, 31].

### 1.1 Imaging with CS

Natural images are highly compressible with respect to their gradient representation. For an image one defines its discrete directional derivatives by

 xu:CN2→C(N−1)×N,(xu)j,k = xj+1,k−xj,k (4) xv:CN2→CN×(N−1),(xv)j,k = xj,k+1−xj,k. (5)

The discrete gradient transform is defined in terms of the directional derivatives,

 ((∇x)j,k,1,(∇x)j,k,2)def=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩((xu)j,k,(xv)j,k),1≤j≤N−1,1≤k≤N−1(0,(xv)j,k),j=N,1≤k≤N−1((xu)j,k,0),k=N,1≤j≤N−1(0,0),j=k=N

The anisotropic total variation seminorm is defined as

 ∥x∥TV1def=N∑j,k=1∣∣(∇x)j,k,1∣∣+∣∣(∇x)j,k,2∣∣, (6)

and the isotropic total variation seminorm as

 ∥x∥TV2def=N∑j,k=1((∇x)2j,k,1+(∇x)2j,k,2)1/2. (7)

The recovery guarantees we derive apply to both anisotropic and isotropic total variation semi norms, and we use the notation to refer to either choice of seminorm. For brevity of presentation, we provide details only for the isotropic total variation seminorm, but refer the reader to [37] for the analysis of the anisotropic variant.

The total variation seminorm is a regularizer of choice in many image processing applications. That is, given a collection of noisy linear measurements with of an underlying image , total variation minimization is used to pick from among the possibly infinitely-many images consistent with these measurements:

 ^x=argminz∥z∥TV such % that ∥M(z)−y∥2≤ε (TV)

Properties of TV minimizers in inverse problems have been studied in the discrete and continuous settings [2, 40, 20, 48, 46, 41, 12, 13], and convergence rates of stability measures for TV have also been established [4, 24]. In the setting of compressed sensing and more broadly in other imaging applications, total variation regularization has been used for denoising, deblurring, and inpainting (see e.g. [8, 11, 7, 42, 14, 33, 34, 32, 39, 35, 25, 27, 50, 38] and the references therein). In this article, we focus on recovery guarantees for (TV) in the compressed sensing setting. Along the way, we derive strengthened Sobolev inequalities for discrete signals lying near the null space of operators incoherent with the Haar wavelet basis, and we believe that such bounds should be useful in a broader context for understanding the connection between total variation minimization and -wavelet coefficient minimization.

While (TV) is similar to the -minimization program , the RIP-based theoretical guarantees for do not directly translate to recovery guarantees for (TV) because the gradient map is not well-conditioned on the orthogonal complement of ). In fact, viewed as an invertible operator over mean-zero images, the condition number of the gradient map is proportional to the image side length .333One sees that the norm of is a constant whereas the norm of its inverse is proportional to (one can observe this scaling, for example, by noting it is obtained by the image whose entries are constant). For anisotropic , recovery guarantees in the compressed sensing setting were obtained in [38] for two-dimensional images.

###### Theorem A (from [38]).

For a number of measurements , there are choices of linear operators for which the following holds for any image : Given noisy measurements with noise level , the reconstructed image

 ^x=argminz∥z∥TV1such that∥M(z)−y∥2≤ε (8)

satisfies the error bound

 ∥x−^x∥2≲log(N2/s)(∥∇x−(∇x)s∥1√s+ε). (9)

Here and throughout, denotes the best -term approximation to the array .

In words, the total variation minimizer estimates to within a factor of the noise level and best -term approximation error of its gradient. The bound in (9) is optimal up to the logarithmic factor .

The contribution of this paper is twofold: We extend the recovery guarantees of Theorem A to the multidimensional setting and to the setting of isotropic total variation minimization, for arbitrary dimension . The precise statement of results is given in Theorem 3. The proofs involve extending the Sobolev inequalities for random subspaces from [38] to higher-dimensional signal structures, using bounds of Cohen, Dahmen, Daubechies, and DeVore in [15] on the compressibility of wavelet representations in terms of the bounded variation of a function, which hold for functions in dimension . Hence, our results for total variation, do not hold in dimension . See [49] for results on one-dimensional total variation under assumptions other than the RIP.

### 1.2 Organization

The article is organized as follows. In Section 2 we recall relevant background material on the multidimensional total variation seminorm and multidimensional orthonormal wavelet transform. Section 3 states our main result: total variation minimization provides stable signal recovery for signals of arbitrary dimension . The proof of this result will occupy the remainder of the paper; in Section 4 we prove that the signal gradient is recovered stably, while in Section 5 we pass from stable gradient recovery to stable signal recovery using strengthened Sobolev inequalities which we derive for random subspaces. The proofs of propositions and theorems used along the way are contained in the appendix.

## 2 Preliminaries for multidimensional signal analysis

 x=(xα),α≡(α1,α2,…,αd)∈{1,2,…,N}d.

From here on out, we will use the shorthand . We will also use the convention that vectors such as (apart from index vectors ) are boldface and their scalar components such as are normal typeface. We also treat as the Hilbert space equipped with inner product

 ⟨x,y⟩=∑α∈[N]dxα⋅¯yα, (10)

where denotes the conjugate of . This Hilbert space is isometric444Recall that if , and is a Hilbert space equipped with the inner product . to the subspace of functions which are constant over cubes of side length , and the isometry is provided by identifying with the function satisfying for . More generally, we denote by the entrywise -norm of the signal .

For , the discrete derivative of in the direction of is the array defined component-wise by

 (xrℓ)αdef=x(α1,α2,…,αℓ+1,…,αd)−x(α1,α2,…,αℓ,…αd), (11)

and we define the -dimensional discrete gradient transform through its components

 (∇x)α=(∇x)α,ℓdef={(xrℓ)α,αℓ≤N−1,0,else (12)

The -dimensional anisotropic total variation seminorm is defined as , while the isotropic total variation seminorm is a mixed - norm of the -dimensional discrete gradient,

 ∥x∥TV2def=∑α∈[N]d(d∑ℓ=1(∇x)2α,ℓ)1/2def=∥∇x∥1,2. (13)

A linear operator can be regarded as a sequence of multidimensional arrays componentwise,

 yk=[A(x)]k=⟨ak,x⟩, (14)

and a linear operator can be expressed similarly through its components . If and then the row direct sum operator is the linear operator from to with component arrays given by

 mk={ak,1≤k≤r1,bk−r1,1+r1≤k≤r1+r2.

Alternatively, for linear operators and , the column direct sum operator has component arrays given by

 (nk)α,ℓ={(ak)α,ℓ=1,(bk)α,ℓ=2.

### 2.1 The multidimensional Haar wavelet transform

The Haar wavelet transform provides a sparsifying basis for natural signals such as images and movies, and is closely related to the discrete gradient. For a comprehensive introduction to wavelets, we refer the reader to [19].

The (continuous) multidimensional Haar wavelet basis is derived from a tensor-product representation of the univariate Haar basis, which forms an orthonormal system for square-integrable functions on the unit interval and consists of the constant function

 h0(t)={10≤t<1,0,otherwise,

the step function

 h1(t)={10≤t<1/2,−11/2≤t<1,

and dyadic dilations and translations of the step function,

 hj,k(t)=2j/2h1(2jt−k);j∈N,0≤k<2j. (15)

The Haar basis for the higher dimensional space of square-integrable functions on the unit cube consists of tensor-products of the univariate Haar wavelets. Concretely, for and , we define the multivariate functions by

 he(u)=∏eihei(ui).

The orthonormal Haar system on is then comprised of the constant function along with all functions of the form

 hej,k(u)=2jd/2he(2ju−k),e∈V,j≥1,k∈Zd∩2jQ. (16)

The discrete multidimensional Haar transform is derived from the continuous construction via the isometric identification between and : defining

 h0(α)=N−d/2,hj,k,e(α)=N−d/2hej,k(α/N),α∈[N]d, (17)

the matrix product computing the discrete Haar transform can be expressed as , where the indices are in the range (16) but with , which is a set of size . Note that with this normalization, the transform is orthonormal.

### 2.2 Gradient versus wavelet sparsity

The following is a corollary of a remarkable result from Cohen, Dahmen, Daubechies, and DeVore [15] which bounds the compressibility of a function’s wavelet representation by its bounded variation, and will be very useful in our analysis.

###### Proposition 2 (Corollary of Theorem 1.1 from [15]).

There is a universal constant such that the following holds for any mean-zero in dimension : if the Haar transform coefficients are partitioned by their support into blocks of cardinality , then the coefficient block of th largest -norm, denoted by has -norm bounded by

 ∥c(k)∥2≤C∥x∥TV1k⋅2d/2−1

Thus by equivalence of the anisotropic and isotropic total variation seminorms up to a factor of ,

 ∥c(k)∥2≤C√d∥x∥TV2k⋅2d/2−1.

Proposition 2, whose derivation from Theorem 1.1 of  [15] is outlined in the appendix, will be crucial in our proofs of robust recovery via total variation.

## 3 The main result

Our main result concerns near-optimal recovery guarantees for multidimensional total variation minimization from compressed measurements. Recall that a linear operator is said to have the restricted isometry property (RIP) of order and level when

 (18)

A linear operator satisfies the RIP if and only if the matrix whose th row consists of the unraveled entries of the th multidimensional array satisfies the classical RIP, (1), and so without loss of generality we treat both definitions of the RIP as equivalent.

For our main result it will be convenient to define for a multidimensional array the associated arrays and obtained by concatenating a block of zeros to the beginning and end of oriented in the th direction:

 (a0ℓ)α={0,αℓ=1aα1,…,αℓ−1,…,αd,2≤αℓ≤N (19)

and

 (a0ℓ)α={0,αℓ=Naα1,…,αℓ,…,αd,1≤αℓ≤N−1. (20)

The following lemma relating gradient measurements with to signal measurements with and can be verified by direct algebraic manipulation and thus the proof is omitted.

###### Lemma 3.

Given and ,

 ⟨a,xrℓ⟩=⟨a0ℓ,x⟩−⟨a0ℓ,x⟩,

where the directional derivative is defined in (11).

For a linear operator we define the operators and as the sequences of arrays and , respectively. We conclude from Lemma 3 that .

We are now prepared to state our main result which shows that total variation minimization yields stable recovery of -dimensional signals from RIP measurements.

###### Main Theorem.

Let . Fix integers and . Let be such that, composed with the orthonormal Haar wavelet transform, has the restricted isometry property of order and level . Let with be such that has the restricted isometry property of order and level . Set , and consider the linear operator given by

 (21)

The following holds for any . From noisy measurements with noise level , the solution to

 ^x=argminz∥x∥TV2such that∥M(z)−y∥2≤ε (22)

satisfies:

1. ,

where is the isotropic total variation seminorm, is the associated mixed norm, and is the signal gradient restricted to the subset of largest-magnitude block norms .

Remarks.

1. Following the same lines of reasoning as the proof of this Main Theorem, one may derive error bounds for anisotropic TV minimization that are a bit tighter, namely, one only requires RIP of order . We believe that the suboptimal bounds for isotropic TV are merely an artifact of our proof technique, as isotropic TV minimization is preferred in practice.

2. The third bound shows that the reconstruction error is proportional (up to a logarithmic factor) to the noise level and the tail of the gradient of the signal . A number of i.i.d. and properly normalized Gaussian measurements can be used to construct the measurement operator which, with high probability, satisfies the required RIP conditions of the theorem [1, 47]. From this number of measurements, the error guarantees are optimal up to the factor of required RIP measurements, factor of on the noise dependence, and logarithmic factor in the signal dimension . We emphasize here that the specific construction of the measurement ensemble is likely only an artifact of the proof, and that more general RIP measurements are likely possible. See also [30] for results using Fourier measurements (for ).

3. The main theorem recovers the total variation guarantees of [38] when up to a term. This term is lost only because in the higher-dimensional analysis, our proofs require blocking of the wavelet coefficients. This term can be recovered by applying a more efficient blocking strategy and by writing in terms of in (32) of the proof. We write the bound as-is for simplicity.

4. The requirement of sidelength is not an actual restriction, as signals with arbitrary side-length can be extended via reflections across each dimension to a signal of side-length without increasing the total variation by more than a factor of . This requirement again seems to be only an artifact of the proof and one need not perform such changes in practice.

We now turn to the proof of the main theorem. We will first prove the gradient-level recovery bounds and , and then use these to prove the signal-level recovery bound via Sobolev inequalities for incoherent subspaces. Along the way, we must take care to balance estimates involving the gradient vectors and their block norms , as well as the blocks of wavelet coefficients associated to a dyadic cube and their block norms .

In this section we prove statements and of the main theorem concerning stable gradient recovery, using standard results in compressed sensing combined with a summation by parts trick provided by the specific form of the measurements in the Main Theorem and Lemma 3.

Recall that when a signal obeys a tube and cone constraint we can bound the norm of the entire signal, as in [11]. We refer the reader to Section A.1 of [38] for a complete proof.

###### Proposition 4.

Suppose that is a linear operator satisfying the restricted isometry property of order and level , and suppose that the signal satisfies a tube constraint

 ∥B(h)∥2≤√2dε.

Suppose further that using the notation of the Main Theorem, for a subset of cardinality (meaning ), satisfies a cone-constraint

 ∥hRc∥1,2≤∥hR∥1,2+σ. (23)

Then

 ∥h∥2≲σ√s+√dε (24)

and

 ∥h∥1,2≲σ+√sdε. (25)

Proposition 4 generalizes results in [8] and its proof is included in the appendix. Using Proposition 4 and RIP assumptions on the operator , the gradient-level recovery guarantees (i) and (ii) reduce to proving that the discrete gradient of the residual signal error satisfies the tube and cone constraints.

###### Proof.

(Main Theorem, statements (i) and (ii).)
Let be the residual error, and set .
Then we have

Cone Constraint. Consider the block norm associated to the index , and denote by the index of the th largest block norm , and let . Since is a minimizer of (TV) and satisfies the feasibility constraint in (TV), we have that . By the reverse triangle inequality,

 ∥(∇x)S∥1,2−∥hS∥1,2−∥(∇x)Sc∥1,2 +∥hSc∥1,2 ≤∥(∇x)S−hS∥1,2+∥(∇x)Sc−hSc∥1,2 =∥∇^x∥1,2 ≤∥∇x∥1,2 =∥(∇x)S∥1,2+∥(∇x)Sc∥1,2.

This yields the cone constraint

 ∥hSc∥1,2≤∥hS∥1,2+2∥(∇x)Sc∥1,2

Tube constraint. Recall that . Since both and are feasible solutions to (TV), Jensen’s inequality gives

 ∥M(v)∥22≤2∥M(x)−y∥22+2∥M(^x)−y∥22≤4ε2

By Lemma 3, we have for each component operator ,

 Bj(vrj) = [Bj]0j(v)−[Bj]0j(v) (26)

Then (where we assume that is ordered appropriately) and

 ∥B(∇v)∥22 = ∥d∑j=1Bj(vrj)∥22 (27) ≤ dd∑j=1∥Bj(vrj)∥22 ≤ 2dd∑j=1(∥Bj]0j(v)∥22+∥Bj]0j(v)∥22) ≤ 2d∥M(v)∥22 ≤ 8dε2.

In light of Proposition 4 the proof is complete.

###### Remark 5.

The component operator from the main theorem was not used at all in deriving properties and ; on the other hand, only the measurements in will be used to derive property from and . We conjecture that all measurements in the main result apart from those in the component operator are artifacts of the proof techniques herein.

## 5 A Sobolev inequality for incoherent subspaces

We now derive a strengthened Sobolev inequality for signals lying near the null space of a matrix which is incoherent to the Haar wavelet basis.

###### Theorem 6 (Sobolev inequality for incoherent subspaces).

Let and let . Let be a linear map such that, composed with the multivariate Haar wavelet transform , the resulting operator satisfies the restricted isometry property of order and level . Then there is a universal constant such that the following holds: if satisfies the tube constraint , then

 ∥v∥2≤C∥v∥TV1√slog(Nd)+ε. (28)

and so, by equivalence of the isotropic and anisotropic total variation seminorms up to a factor of ,

 ∥v∥2≤C∥v∥TV2√s/dlog(Nd)+ε. (29)
###### Remark 7.

The RIP assumptions on imply that for with -sparse wavelet representation,

 ∥A(v)∥2=∥AH∗H(v)∥2≈(1±δ)∥H(v)∥2≈(1±δ)∥v∥2,

with the final equality holding because is unitary. This implies that the null space of cannot contain any signals admitting an -sparse wavelet expansion, apart from the zero vector. In particular, the null space of contains no nontrivial constant signals and thus intersects the null space of trivially.

Theorem 6 admits corollaries for various families of random matrices with restricted isometries. For Gaussian random matrices, the theorem implies the following.

###### Corollary 8.

Let be a linear map realizable as an matrix whose entries are mean-zero i.i.d. Gaussian random variables. Then there is a universal constant such that with probability exceeding , the following bound holds for any lying in the null-space of :

 ∥x∥2≲∥x∥TV1√m[log(Nd)]2. (30)

and

 ∥x∥2≲√d∥x∥TV2√m[log(Nd)]2. (31)
###### Proof of Corollary 8.

From results on Gaussian matrices and the restricted isometry property (see e.g. [9, 36, 1, 47]), satisfies the RIP of order and level with probability exceeding when is proportional to . Substituting this value for into (28) and (29) yields the claim. ∎

###### Proof of Theorem 6.

Consider the signal error , and consider its orthogonal decomposition where is constant and is mean-zero (recall that has constant entries equal to ). Let represent the orthonormal Haar transform of . Suppose without loss of generality that the desired sparsity level is either smaller than or a positive multiple of , and write where either or (for arbitrary , we could consider which satisfies ).

Let be the index set of cardinality which includes the constant Haar coefficient along with the largest-magnitude entries of (not including ). Let be the set of largest-magnitude entries of in , and so on. Note that and similar expressions above and below can have both the meaning of restricting to the indices in as well as being the array whose entries are set to zero outside .

Now, since is mean-zero and , we may apply Proposition 2. To that end, consider the decomposition of into blocks of cardinality as in Proposition 2, according to the support of their wavelets. By definition, is at least as large as for any other of cardinality . Consequently, is smaller than for any other of cardinality . Thus, because ,

 ∥cSc0∥1 ≤ ∑j≥p+1∥c(j)∥1 (32) ≤ (2d−1)1/2∑j≥p+1∥c(j)∥2 ≲ ∥v∥TV1Nd∑ℓ=p+11ℓ ≲ ∥v∥TV1log(Nd),

where the second to last inequality follows from Proposition 2 and the last inequality from properties of the geometric summation.

We use a similar procedure to bound the -norm of the residual,

 ∥cSc0∥22 ≲ ∑j≥p+1∥c(j)∥22 (33) ≲ ∥v∥2TV12dNd∑ℓ=p+11ℓ2 ≲ (∥v∥TV1)22dmax(1,p) ≲ (∥v∥TV1)2s.

Then, .

By assumption, satisfies the tube constraint and satisfies the restricted isometry property of order . We conclude that

 ε ≥ ∥A(v)∥2=∥AH∗(c)∥2 (34) ≥ ∥AH∗(cS0+cS1)∥2−r