Log In Sign Up

Predictive refinement methodology for compressed sensing imaging

The weak-ℓ^p norm can be used to define a measure s of sparsity. When we compute s for the discrete cosine transform coefficients of a signal, the value of s is related to the information content of said signal. We use this value of s to define a reference-free index E, called the sparsity index, that we can use to predict with high accuracy the quality of signal reconstruction in the setting of compressed sensing imaging. That way, when compressed sensing is framed in the context of sampling theory, we can use E to decide when to further partition the sampling space and increase the sampling rate to optimize the recovery of an image when we use compressed sensing techniques.


page 17

page 22

page 23


Do log factors matter? On optimal wavelet approximation and the foundations of compressed sensing

A signature result in compressed sensing is that Gaussian random samplin...

Anisotropic compressed sensing for non-Cartesian MRI acquisitions

In the present note we develop some theoretical results in the theory of...

Compressed Sensing: From Research to Clinical Practice with Data-Driven Learning

Compressed sensing in MRI enables high subsampling factors while maintai...

Packet Compressed Sensing Imaging (PCSI): Robust Image Transmission over Noisy Channels

Packet Compressed Sensing Imaging (PCSI) is digital unconnected image tr...

Bespoke Fractal Sampling Patterns for Discrete Fourier Space via the Kaleidoscope Transform

Sampling strategies are important for sparse imaging methodologies, espe...

Multi-echo Reconstruction from Partial K-space Scans via Adaptively Learnt Basis

In multi echo imaging, multiple T1/T2 weighted images of the same cross ...

Bias Compensation in Iterative Soft-Feedback Algorithms with Application to (Discrete) Compressed Sensing

In all applications in digital communications, it is crucial for an esti...

1 Introduction

In order to reproduce the voice of a singer who can sing up to a “soprano C”, or at a frequency of Hz, Claude Shannon [31] proved that we need to sample her voice once every seconds. He named this number the Nyquist sampling rate for a signal of band Hz, i.e., a signal with frequencies no higher than Hz, after Harry Nyquist, who had “pointed out the fundamental importance of the time interval in connection with telegraphy.”

Shannon notes that this result was known in other forms by the mathematician J. M. Whittaker [34], but that otherwise had not appeared explicitly in the literature of communication theory. The idea must have been in the air since Nyquist [24]; Bennett [4], in the steady state case; and Gabor [16] had pointed out that approximately numbers are sufficient to capture a signal of band Hz that lasts for seconds.

Further in “Communication in the presence of noise” [31], published a year after his seminal “A mathematical theory of communication” [30]

, Shannon establishes a method to represent geometrically any communication system, and explores the utility of mapping a sequence of samples of a band limited signal into a high dimensional vector space. And it is here where he makes the most interesting of all remarks, on page 13: “[…] in the case of speech, the ear is insensitive to a certain amount of phase distortion. Messages differing only in the phases of their components […] sound the same. This may have the effect of

reducing the number of essential dimensions in the message space.”

In other words, even if the dimension of the ambient vector space where we embed a representation of a signal is very high, we may come up with an equivalence class for which member points will have similar information content as the original signal, as far as the end user is concerned; and that equivalence class, in turn, will induce a low dimensional manifold in the vector space where similar messages can be mapped.

These ideas make it natural to frame the theory of compressed sensing [10, 9, 13, 7, 20] in the context of sampling and information theories. To see this, observe that compressed sensing makes it possible to reconstruct a signal, under certain circumstances, with fewer measurements than the otherwise required number of samples dictated by the Nyquist sampling rate. Moreover, even when the reconstruction is not exact, the error will be small.

In specific, compressed sensing deals with the problem of recovering a signal or message of interest , which we assume can be represented as for a matrix , with , from an incomplete set of linear measurements,


where is the vector of measurements, is the object to recover, and is the measurement matrix, with and is a full rank matrix.111If , we are in the setting of transform coding, where represents a unitary transform, for example; and if , we can talk of a dictionary or a frame representation of . Given a measurement vector , eq. 1 represents an underdetermined system of linear equations, with an infinite number of solutions. However, if has at most significant components, compared to the rest, we can recover it exactly, or very closely, by solving the constrained problem,


where , and . Here counts the number of nonzero entries of . If is a solution to eq. 2, we then synthesize an approximate reconstruction of by using .

Note that since , we have used fewer measurements than the number of coordinates of , in effect compressing the sensing, hence the name compressed sensing; possibly beating the Nyquist sampling rate; going from a large dimensional message space, , to a smaller dimensional measurement space, , in a manner that hopefully captures the essence of the signal of interest. Just like Shannon envisioned.

For all of this to work, we need to make precise the notion of what the “significant components” of are, notion which traditionally has translated into talking of sparsity. However, we show in example 1 that the commonly used notion of sparsity—the number of nonzero entries in a vector—is defective, and we propose instead in section 2 a refined notion of sparsity that extends the traditional meaning of the word as used in the compressed sensing and sparse representation literatures. The definition of is based on the weak- norm, which we define and study in section 2.1. The weak- norm helps us define, for a given , the sparsity function and the sparsity relation , which induces a strict partial order on . We show that, for a given vector , is a convex function of , and we use this fact to compute effectively , which we define as the sparsity of . See section 2.2.

In section 2.3 we study unitary transformations and the sparsity , which we use to define sparsifying transforms and their properties, formalizing well known energy shifting properties of unitary transforms commonly used in compression, for example. This leads in section 2.4 to the study of error analysis and sparsity when we truncate the signal representation of a vector under a sparsifying unitary transform

. This is done in terms of the peak signal-to-noise ratio or PSNR, for which we find a lower bound in terms of


This error analysis and musings on information theoretical matters in appendix A motivate the definition of the sparsity index in section 2.5, which we use in the context of compressed sensing image reconstruction, by example of the single pixel camera, which is described in detail in section 3: In section 3.1 we provide background on the origin of the single pixel camera, in section 3.2 we provide a physical realization and mathematical modeling of a single pixel camera, and how to go about obtaining an image from it both in an inefficient way, section 3.2.1, and the compressed sensing way, section 3.2.2. In section 3.3 we show how to solve the single pixel camera compressed sensing problem with either the orthogonal matching pursuit algorithm (OMP), section 3.3.1, or the more efficient and better basis pursuit algorithm (BP), section 3.3.2, for which, in appendix B, we provide the specific methods that we use to implement it. The characteristics of OMP help us tie in the use of the sparsity index with the calculation of a lower bound of the PSNR of the various compressed sensing image reconstructions conducted in section 4 with BP, given that the solutions obtained with OMP and BP are close. Our results show that we can predict the quality of the reconstruction of images with very good accuracy without knowledge of the original, i.e., we show that we have in the sparsity index a reference-free tool to decide when to sample at a higher rate a given region to guarantee a minimum local PSNR reconstruction.

2 Sparsity

In this section we define the weak- norm, go over some of its properties, and use it to redefine the notion of sparsity, which in common parlance refers to the counting of nonzero entries in a vector. We do this because we show with an example why the commonly used notion of sparsity is not fully satisfactory, and propose instead a new measure of sparsity that utilizes the weak- norm, mentioned as a measure of sparsity in [6], and used in that capacity in, for example, [12] and [8]. We then derive some properties of this measure of sparsity.

2.1 The weak- norm and its properties

It is easy to see that given a vector , there exists a unique vector satisfying the following two properties:

  1. For all , there is a such that , and

  2. For all we have that .

These two properties naturally define the ordering operator , which assigns to its corresponding . We then write , and say that is the ordering of .

Definition 1 (Weak- norm)

Let and . We define the weak- norm of vector as the number

where .

We are interested in the weak- norm because for values of , for a given vector , the quantity can be used as a measure of sparsity of . We elaborate on this later on. First, we address how to effectively compute .

Theorem 1

Given a vector , and , we have that


where . We define the index as the smallest index where the right hand side of eq. 3 reaches its maximum.

Proof 1

The statement is trivially true for . Assume then that is a nonzero vector with corresponding ordering . First observe that, for a given , the order in which we count the number of entries in that are greater in absolute value than , does not depend on said order. Therefore, for a given , we have that .

Since , there is an integer such that . Let be the smallest of such integers. Consider the following partition of . We compute the supremum of over each of the intervals defining . For , we have that , and since raising a number to the power is a monotonically increasing operation, we clearly have that . Similarly, for , we have that . Finally, for , we have that , and therefore . The result follows from observing that the supremum of over is the maximum of the supremums of over each and all of the intervals .

We state without proof the following properties of the weak- norm, derived from theorem 1.

Theorem 2

Let , , and . Then

  1. .

  2. if and only if .

  3. .

  4. The weak- norm does not satisfy the triangle inequality.

  5. , where is the -norm.

Therefore, the weak- norm, is not a true norm, but almost. It is a quasi norm, but for simplicity we will refer to it as a “norm”. We explore and get acquainted with two more properties of the weak- norm that will be relevant later on.

From the result of theorem 1, we observe that the power of the weak- norm of a vector corresponds to the largest area of a rectangle of width , and height , where . Recall that in theorem 1 we defined to be the smallest index for which this maximal area is achieved since we will use often. For a graphic representation of this concept, see figs. 2 and 1.

Note that for any value of , for a given , ; while tends to either 1 or 0, depending on whether or , respectively, i.e.,

tends to the characteristic function

as goes to zero. Here . It follows that

We conclude from the previous two paragraphs that


Hence, in the case when , the weak- norm tends to the -norm, which counts the nonzero entries of a vector, as defined in eq. 4. Note that the -norm is not a norm either, since for when , but it is commonly called a “norm” nonetheless.

However, the -norm is not nuanced at all when we are trying to measure sparsity, usually defined as the count of the nonzero entries of a vector, in cases where a vector has relatively few entries that are considerably larger than the rest in absolute value, a circumstance which we would like to distinguish for reasons that will become clear later on. With this in mind, we propose a new definition and measure of sparsity next.

Figure 1: Graph of the power of the entries of versus . The red circle marks the point , for , in this case. See theorem 1 for the definition of .
Figure 2: Graph of versus . The red circle marks the point , for . Recall that . This graph corresponds to the same vector used in fig. 1.

2.2 Defining and measuring sparsity

In common parlance, as we mentioned in section 2.1, we say that a vector is sparse if its -norm is smaller than . In other words,


As argued above, though, this measure of sparsity will not distinguish the following two vectors in as radically different:

Example 1

Consider and . From the -norm point of view, they are both sparse, moreover, their -norms are equal, yet, most of the entries of are 1, while most of the entries of are practically 0.

Clearly, the notion of sparsity defined by eq. 5 cannot distinguish the very different nature of these two vectors, and .

Note that in the example above, we deliberately chose both vectors to have approximately equal energy, if we define the energy of a vector as . With these observations in hand, we put forth the following definitions.

Definition 2 (Sparsity and sparsity relation )

Let . Consider the set , and define the binary relation as follows:

We call the sparsity relation (for of order ), and will write for simplicity whenever . If , we say that is sparser than . We say that has sparsity of order equal to , or simply, that has sparsity .

Theorem 3

Let , then is a strict partially ordered set.

Proof 2

Let . For all , we have that is trivially irreflexive, i.e., , since , hence . Let and assume that and . Then, by definition, we must have that and , as well as and , since both and are transitive in , we have that and , and therefore , i.e., is transitive.

When we have a partially ordered set, e.g., , we are usually interested in knowing if there are maximal or minimal elements in it with respect to its ordering. Assuming the Axiom of Choice in the form of Zorn’s lemma—which states that a partially ordered set in which every chain (i.e., every totally ordered subset), has an upper (lower) bound, necessarily contains at least one maximal (minimal) element—we would then set to find upper (lower) bounds in for each energy level to conclude that there exist maximal (minimal) elements in with respect to the partial order . We leave the task of establishing the existence of maximal or minimal elements in for another occasion, since this departs from the focus of our endeavors.

Note that the proof of theorem 3 does not use anywhere that and is, in fact, true for any . However, given the aforementioned observations stemming from eq. 4 and eq. 5, it is clear that measuring sparsity with and comparing the sparsity of two vectors with the sparsity relation , are meaningful and sensible concepts only when . Therefore, going forward, we will assume that , unless otherwise noted.

Theorem 4 (Convexity of as a function of )

For all , the function that maps is a convex function. Moreover, if is the ordering of and is such that , then is strictly convex. (See theorem 1 for the definition of .)

Proof 3

Let , , , and . We have that, by theorem 1,


where is the ordering of . If we prove that, for all ,


combining creftype 3 and eq. 7, it follows that,


proving that is convex. Hence, we proceed to prove eq. 7. Let , and define the functions and . We then have that,

That is, both functions coincide at values . Noting that the graph of is a line, and observing that , it follows that is convex, and conclude that for all . Setting and , we get that for ,


proving that eq. 7 holds, as required to complete the first half of the proof. For the second half of the claim, simply note that if is such that , creftypeplural 3, 8 and 7 become strict inequalities, resulting then in strict convexity for .

Definition 3 (Sparsity )

We define the sparsity as the function that assigns to every vector the number

To check that is well defined, we simply need to prove that for every vector , . Let . Then, from theorem 2 and LABEL:def:_s_p_and_<_{s_p}, we have that

hence the set is bounded and, therefore, the number exists and is unique, which means that is well defined. Moreover, in light of theorem 4, computing can be easily achieved by convex minimization techniques.

It is easy to see that has the following properties, which we state without proof.

Theorem 5

Let , and its ordering. Then,

  1. .

  2. If is such that , i.e., the ordering of is a vector in the diagonal of , then

    where . Recall that, by definition of , we must have .

  3. If is such that , then .

Figure 3: Graph of sparsity as a function of for vectors and from example 1. The red line at the top corresponds to , and the black graph, two line segments, corresponds to . Note that we are using a logarithmic scale on the ordinates axis. The point at the bottom shows approximately where reaches its minimum, equal to the sparsity .

With this new definition of sparsity in hand, we revisit example 1 by computing and , recalling that and . This calculation requires from us to compute repeatedly, for which we refer the reader to theorem 1 on how to do it from now on.

We have that , and therefore , which, we note, is equal to . Now for , we have that . If we draw the graph of as a function of , we see that it is the union of two curves and , with and , where is the abscissa such that , readily seen as the minimum of over . It is easy to compute that , from which , faithfully reflecting the fact that most of the entries in are practically zero, except for one of them, which is distinctly nonzero. See fig. 3.

2.3 Unitary transforms and sparse representations

In this section we use our new definition of sparsity to explore unitary transforms and sparse representations stemming from them, which we define next.

Definition 4 (Sparsifying transform and sparse representation)


be a unitary matrix, and consider the transform

that assigns to every vector the vector . We say that is a sparsifying transform for if and only if for every vector , we have that

In this case we say that is a sparsifying matrix for , is a sparse representation of (under ), and admits ( as) a sparse representation (under ).

Note that the notion of a sparsifying transform is well defined since it applies to unitary matrices, which preserve energy, i.e., for all , and therefore and can be compared by the sparsity relation , see LABEL:def:_s_p_and_<_{s_p}.

Theorem 6 (Sparsity and energy distribution)

Let be a sparsifying matrix for , and let be a vector whose transform is . If and are the orderings of and , respectively, then there exists an integer such that and for all . Moreover,

Proof 4

Let , and . Since is sparsifying for , we have, by definition 4, that , from which, by theorem 1,

hence . Therefore, the set . Let , be the largest integer in , from which it follows that,


Now, since is a unitary matrix, we must have that , from which,

from which the inequalities in eq. 10 are easily derived.

Observe that theorem 6 tells us that the energy in a signal gets redistributed into potentially fewer coefficients of its sparse representation , when is a sparsifying matrix for . We can colloquially say that the energy got squeezed to the right in the ordering of the transform when compared to the ordering of the signal. See fig. 4, for example.

Figure 4: Graph of the squared entries of the orderings and of a signal and its sparse transform , respectively, under a sparsifying matrix for that signal. The red circle denotes the point , with in this example.
Theorem 7

Let be a sparsifying matrix for , and let be a vector whose transform is . Then . Moreover, if , then .

Proof 5

Let and . Since is a sparsifying matrix for , . But, by definition, , hence is a lower bound for . Therefore, . Hence, , where .

Now assume that . LABEL:def:_s_p_and_<_{s_p} and eq. 4 imply that . By definition, this means that for all , there exists a such that for all , . Let , then there exists a such that for all we have that,

Since is a continuous function of , and is compact, there exists a such that , therefore . Since is a sparsifying matrix for , we have that . Hence, .

In a similar but opposite observation to what happens to the energy in view of theorem 6, here, the sparsity of the transform of a signal under a sparsifying matrix gets shifted to the left of the sparsity value of said signal.

In the proof of theorem 7, given a vector , we used the notation to talk about a value of for which reaches its minimum as a function of on , resulting in . If is such that , by theorem 4, is strictly convex as a function of , making unique in this case. When , from eq. 4, we can set , and think of as the unique value of for which . These results and observations can be summarized in the following theorem.

Theorem 8

Given a vector , if , then there is a unique such that . Recall that, , and that is the smallest integer such that .

A couple of remarks are in order. The rather technical condition in theorem 8, for a given vector , that


is necessary for there to be a unique value such that , is not uncommon when is a random or semi-structured vector. We don’t have a proof of this statement, but it is our empirical observation that all vectors that are the transform of some real life vector , such as a natural image, under a unitary matrix , satisfy the condition summarized in eq. 12. Moreover, even if the set of minimizers of