Coherence and sufficient sampling densities for reconstruction in compressed sensing

02/12/2013 ∙ by Franz J Király, et al. ∙ Freie Universität Berlin UCL 0

We give a new, very general, formulation of the compressed sensing problem in terms of coordinate projections of an analytic variety, and derive sufficient sampling rates for signal reconstruction. Our bounds are linear in the coherence of the signal space, a geometric parameter independent of the specific signal and measurement, and logarithmic in the ambient dimension where the signal is presented. We exemplify our approach by deriving sufficient sampling densities for low-rank matrix completion and distance matrix completion which are independent of the true matrix.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Compressed Sensing, Randomness, and

Compressed sensing is the task of recovering a signal from some low-complexity measurement(s) , the samples of . The sampling process, that is, the acquisition process of the sample , is usually random and undirected, and it comes with a so-called sampling rate. Increasing the sampling rate usually improves quality of the reconstruction but comes at a cost, whereas decreasing it makes the acquisition easier but hinders reconstruction. Therefore, a central question of compressed sensing is what the minimal sampling rate has to be, in order to allow reconstruction of the signal from the sample.

The oldest and best-known example of this is the Nyquist sampling theorem [14], which roughly states that a signal bandlimited to frequency has to be sampled with a frequency/density at least , in order to allow reconstruction. Landau’s [12, Theorem 1] and a simple Poisson approximation (or coupon collector) argument imply that for uniform sampling, to ascertain a density of , a rate , or a number of total samples, is necessary and sufficient. The ratio can be interpreted as the average informativity of one equidistant (non-random) measurement, in the sense of how much it contributes to reconstruction.

Sufficient sampling rates of the form “”, with some problem-specific constant and a natural measure of the problem size, appear through the modern compressed sensing literature. A few examples are: image reconstruction [6, Theorem 1], matrix completion [2, Theorems 1.1, 1.2], dictionary learning [17, Theorems 7,8], and phase retrieval, [1, Theorem 1.1]. Usually, these bounds are derived by analyzing some optimization problem, or information theoretic thresholds, under assumptions, which, while not overly restrictive, are very specific to one problem.

We argue that those bounds on the sampling rates are epiphenomena of guiding principles in compressed sensing, similar to the Nyquist sampling bound. To do this, we give a general formulation of the problem in which, associated to the sampling process there are two numerical invariants: the coherence and the ambient dimension . The “dictionary” between the classical setting and our novel framework for compressed sensing is, intuitively:

Classical Compressed Sensing
Signal Space Signal Manifold
Sampling (random) Random Projection
Bandlimit Manifold Dimension
Sampling Density Ambient Dimension
Informativity Coherence
(in general )
Sampling Rate

Sampling Probability

Our main Theorem 1 says that a sampling rate of is sufficient for signal reconstruction, w.h.p. Further, we will see that , where

is the number of degrees of freedom in choosing the signal

. This relation shows that, when the coherence is near the lower bound, is in complete analogy with the bandlimit from the classical setting and that independently chosen measurements are sufficient for signal reconstruciton. Coherence captures the structural constraint on the sufficient sampling rate, and the term appears because measurements are chosen independently, with the same probability. This result is existentially optimal, since the terms cannot be removed in some examples.

1.2 The Mathematical Sampling Model

We will consider the following sampling model for compressed sensing: the signals will be considered as being contained in

, with the standard basis of elementary vectors. The field

is always or in this paper.

This setup imposes no restriction on the signal , since we are only fixing a finite/discrete representation by numbers, and the continuous case is recovered by taking the limit in . Examples include representing as a bandlimited DFT or as a finite matrix instead of a kernel function and graphon.

We will model the sampling process by a map , which chosen uniformly from a restricted family. For example, in the case of the bandlimited signal, the mapping would be initially linear, of the form with being the chosen sampling points, the Fourier basis, and the Fourier coefficients of . The map would send the to the . To obtain a universal formulation, we now perform a change of parameterization on the left side, by changing it to contain all possible measurements, instead of the signal. In the example, the signal would be parameterized not by the Fourier coefficients , but instead by all possible , being a much larger set than the actual measurements contained in when sampling once.

This re-parameterization makes large, but it makes the single coordinates dependent as well. (In the example, the dependencies are linear.) In other words, under the re-parameterization all possible signals lie in a low-dimensional submanifold of . The sampling process then becomes a coordinate projection map onto entries of the true signal chosen uniformly and independently.

Under the re-parameterization, is chosen independently of the problem. All the structural information about the signal is moved to the manifold , which determines dependencies between the coordinates. The key concept of coherence will then be a property of , as opposed to the usual view where compression and sampling constraints are assumed the particular signal or enforced by special assumption on the sampling operator .

1.3 Contributions

Our main contributions, discussed in more detail below, are:

  • A problem-independent formulation of compressed sensing.

  • A problem-independent generalization of the sampling density, given by the coherence of the signal class . Determination of coherence for linear sampling, matrix completion, combinatorial rigidity and kernel matrices.

  • Derivation of problem-independent bounds for the sampling rate, taking the form . We recover bounds known in compressed sensing literature, and derive novel bounds for combinatorial rigidity and kernel matrices.

  • Explanation of the term as an epiphenomenon of sampling randomness.

1.4 Main theorem: coherence and reconstruction

Our main result elates the coherence of the signal space to the sampling rate of a typical signal which suffices to achieve reconstruction of . We show:

Theorem 1.

Let be an irreducible algebraic variety, let be the projection onto a set of coordinates, chosen independently with probability , let be generic. There is an absolute constant such that if

then is reconstructible from - i.e., is finite - with probability at least .

Here generic can be taken to mean that if is sampled from a (Hausdorff-)continuous probability density on , then the statement holds with probability one.

1.5 Applications

We illustrate Theorem 1 by a number of examples, which will also show that the bounds on the sampling rate there cannot be lowered much.

Linear Sampling and the Nyquist bound

When the sampling manifold is a -dimensional linear subspace of , as in the case of the bandlimited signal, Proposition 2.4, below, implies that . We then recover a statement which is qualitatively similar to the random version of the Nyquist bound:

Theorem 2.

Let be a linear space. Let be generic. There is an absolute constant , such that if each coordinate of is observed independently with probability

then, can be reconstructed from the observations with probability at least .

If is random in the sense of Definition 2.7, then , and the required number of samples is , which is in line with our discussion in section 1.1 regarding the Nyquist criterion. In section 1.6, we will give a simple example showing that this cannot be improved.

Low-Rank Matrices

Another important application is low-rank matrix completion. Here, is the set of low-rank matrices of rank , which we show has . We therefore obtain:

Theorem 3.

Let be fixed, let be a generic matrix of rank at most . There is an absolute constant , such that if each entry of is observed independently with probability

then can be reconstructed from the observations with probability at least .

Bounds of this type have been observed in [4], [9] and [10], while all of these results are stated in the context of some reconstruction method, and therefore make sampling assumptions on the matrix . Thus, the novelty of Theorem 3 is that it applies to a full-measure subset of low-rank matrices. Also, it has been noted already in [4] that the order of the bound in cannot be improved.

The analogue to Theorem 3 holds with if is symmetric. We will also show that similar types of bounds hold for kernel matrices.

Distance Matrices

A further related topic is the complexity of distance matrices, in which either the signal, or the sampling, exhibits the dependencies of a distance matrix (also sometimes called similarity matrix). The best-known case is that of an Euclidean distance matrix is an matrix such that if for some set of points The sampling rate in distance matrix completion describes (a) the density of random measurements needed to reconstruct and incomplete distance matrix, and, simultaneously, (b) describes the sampling threshold at which the points

can be triangulated. On a theoretical side, the asymptotics of this phase transition has attracted a lot of attention in the context of combinatorial rigidity theory 

[8], where the exact bound for this phase transition has not been known except in the cases and , i.e., points on the line and on the plane. By bounding the coherence of the set of distance matrices as for some global constant , we determine this sampling rate for all dimensions :

Theorem 4.

Let be fixed, let be a generic distance matrix of points in -space. There is a global constant , such that if each entry of is observed independently with probability

then can be reconstructed from the observations with probability at least .

In the language of rigidity theory [8], Theorem 4 says that with the stated sampling rate , the random graph is generically rigid w.h.p. Because the minimum degree of a graph that is generically rigid in dimension must be at least , the order of the lower bound on cannot be improved by more than a factor of - again, this can be seen as a coupon collector’s argument. Our result can be seen as a density extension of Laman’s Theorem [11] to dimensions , which is known [7] to imply a necessary and sufficient bound on the sampling density in dimension . We will also argue that similar results hold for kernel distance matrices as well.

1.6 Fixed coordinates and the logarithmic term

Before continuing, we want to highlight an important conceptual point. Since all the projections are linear, it could be counter-intuitive that the number of measurements needed for reconstruction is on the order of and not simply , especially in light of the following (probably folklore) theorem, proven in the Appendix:

Theorem 5.

Let be an algebraic variety of dimension , let . Let be a generic linear map. If , then is uniquely determined by the values of , and the condition that

Thus, we could guess naïvely that total samples are enough. The subtlety that the naïve guess misses is that we are dealing with coordinate projections, and that can be inherently aligned with the coordinate system in a way that requires more samples. Consider the case of a linear space , as in Theorem 2 and assume that is an integer. Let be spanned by vectors which are supported on disjoint sets of coordinates and have in the non-zero coordinates. It is easy to see that , which is minimal. However, to have any hope of reconstructing a point , we need to measure at least one coordinate in the support of each . A coupon collector’s argument then shows that, indeed, samples are required, and this is when . At the other extreme, if is spanned by coordinate vectors , then samples are needed. Examples like these illustrate why coherence is the right concept: it depends on the coordinate system chosen for and the embedding of . Dimension, on the other hand, is intrinsic to , so it can’t capture the behavior of coordinate projections.

1.7 Acknowledgements

FK is supported by Mathematisches Forschungsinstitut Oberwolfach (MFO), and LT by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement no 247029-SDModels.

2 Coherence and Signal Reconstruction

2.1 Coherence and Bounds on Coherence

In this section, we introduce our concepts, and define what the coherence should be. As discussed in section 1.2, the sampling process consists of randomly and independently observing coordinates of the signal without repetition, and this is no restriction of generality, as we have also discussed there.

Definition 2.1

Let be an analytic variety. Fix coordinates for Let be a the Bernouilli random experiment yielding a random subset of where each is contained in independently with probability (the sampling density). We will call the projection map defined by of onto the coordinates selected by

, which is an analytic-map-valued random variable, a

random sample of with sampling rate .

The coherence takes the place of the factor of oversampling needed to guarantee reconstruction. Intuitively, it can be also interpreted as the infinitesimal randomness of a signal. We define it first for linear sampling, as we have in the case of the bandlimited signal discussed in section 1.2. Figure 1 (a) gives a schematic of the concept.

Definition 2.2

Let be a -dimensional affine space (for short, a -flat). Let the unitary projection operator onto , let be a fixed orthonormal basis of Then the coherence of with respect to the basis is defined as

When not stated otherwise, the basis will be the canonical basis of the ambient space.

Note that coherence is always coherence with respect to the fixed coordinate system of the sampling regime, and this will be understood in what follows.

Remark 2.3

Let be a -flat. Then the coherence does not depend on whether we consider as a -flat in , or as a -flat in for (assuming the chosen basis of contains the basis of ). Moreover, if , the coherence of equals that of the complex closure of . Therefore, while coherence depends on the choice of coordinate system, it is invariant under extensions of the coordinate system.

Figure 1: Schematic of coherent and incoherent spaces: (a) The projections of the coordinate vectors onto the linear space are roughly the same size, so it has nearly minimal coherence. The flat is a translate of the span of , giving it maximal coherence; observing the coordinate gives no information about any point in . (b) A variety in and the tangent flats at two points. The coherence of is close to minimal, as witnessed by the point .

A crucial property of the coherence is that it is bounded in both directions:

Proposition 2.4.

Let be a -flat in Then,

and both bounds are achieved.

Proof.

Without loss of generality, we can assume that and therefore that is linear, since coherence, as defined in Definition 2.2, is invariant under translation of . First we show the upper bound. For that, note that for an orthogonal projection operator and any , one has Thus, by definition, For strictness, take as the span of Now we show the lower bound. We proceed by contradiction. Assume for all This would imply which is a contradiction, where in the last equality we used the fact that orthonormal projections onto a -dimensional space have Frobenius norm . When is an integer, the tightness of the lower bound follows from the example in section 1.6. In general, it follows from the existence of finite tight frames111We cordially thank Andriy Bondarenko for pointing this out. [5]. ∎

We extend the coherence to arbitrary manifolds by minimizing over tangent spaces; see Figure 1 (b) for an example.

Definition 2.5

Let be an (real or complex) irreducible analytic variety of dimension (affine or projective). Let a smooth point, and let be the tangent -flat of at . We define

If it is clear from the context in which variety we consider to be contained, we also write Furthermore, we define the coherence of to be

where denotes the set of smooth points (=the so-called smooth locus) of .

Remark 2.3 again implies that the coherence is invariant under the choice of ambient space and depends only on the coordinate system. Also, if is a -flat, then the definitions of , given by Definitions 2.2 and 2.5 agree. Therefore, we again obtain:

Proposition 2.6.

Let be an irreducible analytic variety. Then, , and both bounds are tight.

Proof.

Let . Irreducibility of implies that, at each smooth point , the tangent space is a -flat in . Both bounds and their tightness then follow from Proposition 2.4. ∎

Definition 2.7

An analytic variety is called maximally incoherent if

2.2 The Main Theorem

With all concepts in place, we state our main result, which we recall from the introduction. See 1

Proof.

By the definition of coherence, for every , there exists an such that is smooth at , and Now let , we can assume by possible changing that is also smooth at . Let be the respective tangent spaces at and . Note that

is a point-valued discrete random variable, and

is a flat-valued random variable. By the equivalence of the statements (iv) and (v) in Lemma B.1, it suffices to show that the operator

is contractive, where is projection, from onto , with probability at least under the assumptions on . Let and let be the orthonormal coordinate system for , and the projection onto . Then the projection has, when we consider to be embedded into , the matrix representation

where are independent Bernoulli random variables with probability for and for . Thus, in matrix representation,

By Rudelson’s Lemma B.2, it follows that

for an absolute constant provided the right hand side is smaller than . The latter is true if and only if

Now let be an open neighborhood of such that for all . Then, one can write

with a countable subset By construction of , one has

Applying Talagrand’s Inequality in the form [2, Theorem 9.1], one obtains

with an absolute constant and Since was arbitrary, it follows that

Substituting , and proceeding as in the proof of Theorem 4.2 in [2] (while changing absolute constants), one arrives at the statement. ∎

Remark 2.8

That the manifold in the theorem needs to be algebraic is no major restriction, since in the cases we are going to consider, the dependencies in will be algebraic, or can be made algebraic by a canonical transform. Moreover, Theorem 1 cannot be expected to hold for general analytic manifolds, since one might “piece together” pieces of manifolds with different identifiability characteristic. For such an object there is not global, prototypical generic behavior.

Remark 2.9

By the bounds given in Proposition 2.6, the best obtainable bound in Theorem 1 is with , in the case where is maximally incoherent.

2.3 Coherence of subvarieties and secants

In the following section, we derive some further results how coherence behaves under restriction, and summation of signals, which will prove useful for computing or bouding coherence in specific examples.

Lemma 2.10.

Let be a -flat, let be a subvariety. Then,

Proof.

We first prove the statement for the case where is a flat; without loss of generality one can then assume that . Let be the unitary projection onto , similarly the unitary projection onto . Since , it holds that for any . Thus,

The statement for the case where is an irreducible variety follows from the statement for vector spaces. Namely, for , it implies , since the tangent space of at is contained in . By taking the infimum, we obtain the statement. ∎

Lemma 2.11.

Let be a analytic varieties, let be the sum of and . Then,

Proof.

Denote , let be an arbitrary smooth point. By definition, there are smooth such that . Let be the tangent space to at , let be the tangent space of at . An elementary calculation shows , thus by Lemma 2.10. Since was arbitrary, we have . ∎

Remark 2.12

In general, it is false that Consider for example and .

3 Coherence for Matrix Completion, Rigidity, and Kernels

3.1 Low-Rank Matrix Completion: the Determinantal Variety

In this section, we compute the coherence for completion of non-symmetric and symmetric bounded rank matrices, and then apply Theorem 1 to obtain boundary sampling rates for identifiability in matrix completion.

Definition 3.1

Denote by the set of matrices in of rank or less, and by the set of symmetric real resp. Hermitian complex matrices of rank or less, i.e.,

Since the matrices in are symmetric resp. Hermitian, we will consider it as canonically embedded in -space.

is called the determinantal variety of -matrices of rank (at most) , the determinantal variety of symmetric -matrices of rank (at most) .

We first obtain the coherences of fixed matrices:

Proposition 3.2.

Let let be the row span of , and the column span of . Then, and, if is symmetric/resp. Hermitian, then .

Proof.

The calculation leading to [2, Equation 4.9] shows in both cases that , from which the statement follows. ∎

Proposition 3.3.

In particular, is maximally incoherent, whereas is not.

Proof.

We recall the fact that any pair of linear -flats and in -resp. -space, there exists an such that the row span of is exactly , and the column span of is exactly . Similarly, there is such that row and column span of are equal to . By Proposition 2.4, there exist -flats and with and . Therefore, by Proposition 3.2, there is with , so follows from the lower bound in Proposition 2.6. For the equality , it suffices to show The inequality follows from Proposition 3.2 by considering . For the converse, let It suffices to show that there is with . Let be row and column span of , such that . Choosing an with column (and thus also row) span yields, by Proposition 3.2, an with . ∎

From our main Theorem 1, we obtain the following corollary for low-rank matrices: See 3

Proof.

Combine Theorem 1 with the explicit formula for the coherence in Proposition 3.3. ∎

3.2 Distance Matrix Completion: the Cayley-Menger Variety

In this section, we will bound the coherence of the Cayley-Menger variety, i.e., the set of Euclidean distance matrices, by relating it to symmetric low-rank matrices. We first introduce notation for the set of signals:

Definition 3.4

Assume We will denote by the set of real Euclidean distance matrices of points in -space, i.e.,

Since the the elements of are symmetric, and have zero diagonals, we will consider as canonically embedded in -space.

is called the Cayley-Menger variety of points in -space.

We will now continue with introducing maps related to the above sets:

Definition 3.5

We define canonical surjections

Note that depend on and , but are not explicitly written as parameters in order to keep notation simple. Which map is referred to will be clear from the format of the argument.

We now define a “normalized version” of :

Definition 3.6

Denote by . Then, define Since contains only symmetric matrices with diagonal entries one, we will consider it as a subset of -space.

Remark 3.7

The maps are algebraic maps, and the sets
are irreducible algebraic varieties222irreducibility for follows from irreducibility of the respective ranges of the complex closure of and surjectivity, irreducibility of can be shown in a similar way; note that the real maps are in general not surjective.

Lemma 3.8.

For arbitrary , one has

Proof.

If , then is surjective, so the statement follows. If , note that the coherence of a general matrix does not depend on the variety it is considered in, since . Take . Then, take any matrix whose rows are a basis for the row span of . Then, and by Proposition 3.2, The statement follows from this. ∎

The dimensions of the above varieties are classically known:

Proposition 3.9.

One has and the dimensions are the same for the complex closures.

Central in the proof will be the following map:

Definition 3.10

For we will denote by

the map which considers a point

as a point in the hyperplane

and projects it onto . (if we fix any branch of the square root)

Proposition 3.11.

For any it holds one has .

Proof.

Lemma 3.13 implies that the claim then follows from and Lemma 3.8. ∎

We can bound the coherence of as follows:

Proposition 3.12.

There is a global constant , such that .

Proof.

It follows from [2, Lemma 2.2]

that for any fixed set of singular values there exists a matrix

with such that has these singular values. By taking the singular values of to be all one, and replacing with a symmetric matrix having the same row or column span as , as in the proof of Proposition 3.3, we see by Proposition 3.2 that

Our stated bounds on the number of samples required for distance matrix reconstruction then follow from the following lemma:

Lemma 3.13.

Let Let and , let the respective tangent flats. Then, for , we have convergence where we consider the tangent flats as points on the real Grassmann manifold of -flats in -space.

Proof.

Note that

An explicit calculation shows:

where is the usual Kronecker delta. Thus,

which implies that both converges to in the Grassmann manifold when taking the limit ; the statement directly follows. ∎

See 4

Proof.

This follows from Theorem 1 and the coherence bounds from Propositions 3.11 and 3.12. ∎

3.3 Kernels

Our framework can also be applied to analyze kernel functions via their coherence; namely, coherence can be interpreted as the average contribution one entry of the kernel matrix makes to characterize the whole of the data. While the set of kernel matrices is in general not algebraic anymore, it is analytic, and can be related to the examples above, yielding the following result:

Theorem 6.

Let be a polynomial kernel or an RBF kernel, let be an symmetric kernel matrix in . Then, there is a global constant , such that if each entry of is observed independently with probability

then is determined by the observations with probability at least .

Proof.

This follows from Theorems 3 and 4, and the fact the entries of are finite (degree) functions over either a rank-- or a distance matrix. ∎

Theorem 6 means that while kernel matrices are not necessarily algebraic, they also exhibit sampling bounds with a coherence-equivalent of .

4 Conclusion

We expect that the framework presented here will serve as the basis for investigations into a broader set of applications than just the examples here.

Also, we would expect an investigation of Theorem 1 for different sampling scenarios to be very interesting.

Namely, one can ask in which cases the term can be removed, in dependence of the particular sampling distribution, or the signal space - keeping in mind that the coupon collector’s lower bound is not compulsory in every scenario, and that various results exist, which assert, under different sampling assumptions or other kinds of sparsity assumptions, bounds that are linear in .

Any result along these lines would potentially allow us to address the question of the required sampling rates needed for reconstruction of only a linear-size fraction of the coordinates of the signal, which is enough for many practical scenarios.

References

  • Candès et al. [2012] E. J. Candès, T. Strohmer, and V. Voroninski. PhaseLift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 2012.
  • Candès and Recht [2009] Emmanuel J. Candès and Benjamin Recht. Exact matrix completion via convex optimization. Found. Comput. Math., 9(6):717–772, 2009.
  • Candès and Romberg [2007] Emmanuel J. Candès and Justin Romberg. Sparsity and incoherence in compressive sampling. Inverse Problems, 23(3):969–985, 2007.
  • Candès and Tao [2010] Emmanuel J. Candès and Terence Tao. The power of convex relaxation: near-optimal matrix completion. IEEE Trans. Inform. Theory, 56(5):2053–2080, 2010.
  • Casazza and Leon [2006] Peter G. Casazza and Manuel T. Leon. Existence and construction of finite tight frames. J. Concr. Appl. Math., 4(3):277–289, 2006.
  • Donoho [2006] David L. Donoho. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289–1306, april 2006.
  • Jackson et al. [2007] Bill Jackson, Brigitte Servatius, and Herman Servatius. The 2-dimensional rigidity of certain families of graphs. J. Graph Theory, 54(2):154–166, 2007.
  • Kasiviswanathan et al. [2011] Shiva Kasiviswanathan, Cristopher Moore, and Louis Theran. The rigidity transition in random graphs. In Proc. of SODA’11, 2011.
  • Keshavan et al. [2010] Raghunandan H. Keshavan, Andrea Montanari, and Sewoong Oh. Matrix completion from a few entries. IEEE Trans. Inform. Theory, 56(6):2980–2998, 2010.
  • Király et al. [2012] Franz J. Király, Louis Theran, Ryota Tomioka, and Takeaki Uno. The algebraic combinatorial approach for low-rank matrix completion. Preprint, arXiv:1211.4116, 2012.
  • Laman [1970] G. Laman. On graphs and rigidity of plane skeletal structures. J. Engrg. Math., 4:331–340, 1970.
  • Landau [1967] Henry J. Landau.

    Necessary density conditions for sampling and interpolation of certain entire functions.

    Acta Mathematica, 117(1):37–52, 1967. ISSN 0001-5962.
  • Mumford [1999] David Mumford. The Red Book of Varieties and Schemes. Lecture Notes in Mathematics. Springer-Verlag Berlin Heidelberg, 1999.
  • Nyquist [1928] Harry Nyquist. Thermal agitation of electric charge in conductors. Phys. Rev., 32:110–113, Jul 1928.
  • Rudelson [1999] M. Rudelson. Random vectors in the isotropic position. J. Funct. Anal., 164(1):60–72, 1999.
  • Rudin [1976] Walter Rudin. Principles of mathematical analysis. McGraw-Hill Book Co., New York, third edition, 1976. International Series in Pure and Applied Mathematics.
  • Spielman et al. [2012] Daniel A. Spielman, Huan Wang, and John Wright. Exact recovery of sparsely-used dictionaries.

    Journal of Machine Learning Research - Proceedings Track

    , 23:37.1–37.18, 2012.

Appendix A Finiteness of Random Projections

The theorem, which will be proved in this section and which is probably folklore, states that for a general system of coordinates, a number of observation is sufficient for identifiability.

Theorem 7.

Let be an algebraic variety or a compact analytic variety, let a generic linear map. Let be a smooth point. Then, is finite if and only if , and if

Proof.

The theorem follows from the the more general height-theorem-like statement that

where is a generic -flat. Then, the first statement about generic finiteness follows by taking a generic and observing that where is generic if . That implies in particular that if , then the fiber for a generic consists of finitely many points, which can be separated by an additional generic projection, thus the statement follows. ∎

Theorem 7 can be interpreted in two ways. On one hand, it means that any point on can be reconstructed from exactly random linear projections. On the other hand, it means that if the chosen coordinate system in which lives is random, then measurements suffice for (finite) identifiability of the map - no more structural information is needed. In view of Theorem 1, this implies that the log-factor and the probabilistic phenomena in identifiability occur when the chosen coordinate system is degenerate with respect to the variety in the sense that it is intrinsically aligned.

Appendix B Analytic Reconstruction Bounds and Concentration Inequalities

This appendix collects some analytic criteria and bounds which are used in the proof of Theorem 1. The first lemma relates local injectivity to generic finiteness and contractivity of a linear map. It is related to [2, Corollary 4.3 ].

Lemma B.1.

Let be a surjective map of complex algebraic varieties, let , and be smooth points of resp. . Let

be the induced map of tangent spaces333 is the tangent plane of at , which is identified with a vector space of formal differentials where is interpreted at . Similarly, is identified with the formal differentials around . The linear map is induced by considering and setting one checks that this is a linear map since are smooth. Furthermore, and can be endowed with the Euclidean norm and scalar product it inherits from the tangent planes. Thus, is also a linear map of normed vector spaces which is always bounded and continuous, but not necessarily proper. . Then, the following are equivalent:

(i)

There is an complex open neighborhood such that the restriction is bijective.

(ii)

is bijective.

(iii)

There exists an invertible linear map

(iv)

There exists a linear map such that the linear map

where is the identity operator, is contractive444A linear operator is contractive if for all with ..

If moreover is irreducible, then the following is also equivalent:

(v)

is finite for generic .

Proof.

(ii) is equivalent to the fact that the matrix representing

is an invertible matrix. Thus, by the properties of the matrix inverse, (ii) is equivalent to (iii), and (ii) is equivalent to (i) by the constant rank theorem (e.g., 9.6 in

Rudin [16]).

By the upper semicontinuity theorem (I.8, Corollary 3 in Mumford [13]), (i) is equivalent to (v) in the special case that is irreducible.

(ii) (iv): Since is bijective, there exists a linear inverse such that Thus

which is by definition a contractive linear map.

(iv) (iii): We proceed by contradiction. Assume that no linear map is invertible. Since is surjective, also is, which implies that for each , the linear map is rank deficient. Thus, for every , there exists a non-zero By linearity and surjectivity of , there exists a non-zero with Without loss of generality we can assume that , else we multiply and by the same constant factor. By construction,

so cannot be contractive. Since was arbitrary, this proves that (iv) cannot hold if (iii) does not hold, which is equivalent to the claim. ∎

The second lemma is a consequence of Rudelson’s Lemma, see Rudelson [15], for Bernoulli samples.

Lemma B.2.

Let be vectors in let be i.i.d. Bernoulli variables, taking value with probability and with probability . Then,

with an absolute constant , provided the right hand side is or smaller.

Proof.

The statement is exactly Theorem 3.1 in Candès and Romberg [3], up to a renaming of variables, the proof can also be found there. It can also be directly obtained from Rudelson’s original formulation in Rudelson [15] by substituting in the above formulation for

in Rudelson’s formulation and upper bounding the right hand side in Rudelson’s estimate. ∎