On the convex geometry of blind deconvolution and matrix completion

02/28/2019 ∙ by Felix Krahmer, et al. ∙ 0

Low-rank matrix recovery from structured measurements has been a topic of intense study in the last decade and many important problems like matrix completion and blind deconvolution have been formulated in this framework. An important benchmark method to solve these problems is to minimize the nuclear norm, a convex proxy for the rank. A common approach to establish recovery guarantees for this convex program relies on the construction of a so-called approximate dual certificate. However, this approach provides only limited insight in various respects. Most prominently, the noise bounds exhibit seemingly suboptimal dimension factors. In this paper we take a novel, more geometric viewpoint to analyze both the matrix completion and the blind deconvolution scenario. We find that for both these applications the dimension factors in the noise bounds are not an artifact of the proof, but the problems are intrinsically badly conditioned. We show, however, that bad conditioning only arises for very small noise levels: Under mild assumptions that include many realistic noise levels we derive near-optimal error estimates for blind deconvolution under adversarial noise.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

A number of recent works have explored the observation that various ill-posed inverse problems in signal processing, imaging, and machine learning can be naturally formulated as the task of recovering a low-rank matrix

from an underdetermined system of structured linear measurements

where is a linear map and , , represents additive noise. Such problems include, for example, matrix completion [2], phase retrieval [3], blind deconvolution [4], robust PCA [5], and demixing [6]. In this paper, we aim to analyze the worst case scenario, that is, we do not make any assumptions on the noise except for the bound on its Euclidean norm (this scenario is sometimes referred to as adversarial noise, as it allows for noise specifically designed to be most harmful in a given situation). A natural first approach to recover that remains an important benchmark is to solve the semidefinite program

minimize
subject to

where

denotes the nuclear norm, i.e., the sum of the singular values. Recovery guarantees have been shown under the assumption that the measurement operator

possesses a certain degree of randomness. To establish such guarantees various proof strategies have been proposed, including approaches via the restricted isometry property [7, 8], descent cone analysis [9], and so-called approximate dual certificates [10, 11]. While the latter approach remains state of the art for many structured problems including the highly relevant problems of randomized blind deconvolution and matrix completion, it seemingly has some disadvantages. Most prominently, the resulting recovery guarantees take the form

(1)

where denotes a minimizer of the semidefinite program above and denotes the Frobenius norm, whereas under comparable normalization, the first two approaches, when applicable, give rise to superior recovery guarantees of the form

Before this paper it was open whether the additional dimension scaling factor in (1) is a proof artifact. Similarly, for randomized blind deconvolution one of the coherence terms appearing in the result was believed to arise only from the proof technique (cf. [12, Remark 2]).

Another drawback of proceeding via an approximate dual certificate is that it gives only limited insight into geometric properties of the problems such as the null-space property [13], which is also an important ingredient for the study of some more efficient non-convex algorithms [14, 15].

Approaches via descent cone analysis [9], in contrast, provide much more geometrical insight. The underlying idea of such approaches is to study the minimum conic singular value defined by

for the descent cone of the underlying atomic norm – the nuclear norm in case of low-rank matrix recovery. For a more detailed review of this approach including a precise definition of the descent cone we refer to Section 2.3 below. Through the study of the minimum conic singular value many superior results were obtained for low-rank recovery problems, most importantly in the context of phase retrieval [16, 17]. Furthermore, minimum conic singular values can also help to understand certain nonlinear measurement models [18].

For all these reasons, it would be desirable to apply this approach also for matrix completion and blind deconvolution. A challenge that one faces, however, is that for both problems one cannot hope to recover all low-rank matrices; rather, only matrices that satisfy certain coherence constraints are admissible (cf. the discussion in [19, Section 5.4]). In this article we address this challenge, providing the first geometric analysis of these problems. We find that the dimensional factors appearing in the error bounds are the true scaling of the minimum conic singular value and hence intrinsically relate to the underlying geometry. Nevertheless for blind deconvolution, near-optimal recovery is possible, if the noise level is not too small.

1.1 Organization of the paper and our contribution

In Section 2 we will review blind deconvolution, matrix completion, as well as some techniques related to descent cone analysis. In Section 3 we will present the main results of this paper. Theorems 3.1 and 3.5 establish that for both blind deconvolution and matrix completion, nuclear norm minimization is intrinsically ill-conditioned. In contrast, Theorem 3.7 provides a near-optimal error bound for blind deconvolution when the noise level is not too small, implying that the conditioning problems only take effect for very small noise levels. The upper bounds for the minimum conic singular value which are the main ingredients of Theorems 3.1 and 3.5 are derived in Section 4. In Section 5 we prove the stability results for blind deconvolution.
We believe that not only our results, but also the proof techniques and geometric insights in this manuscript will be of general interest and help to obtain further understanding of low-rank matrix recovery models, in particular under coherence constraints. We discuss interesting directions for future research in Section 6.

2 Background and related work

2.1 Blind deconvolution

Blind deconvolution problems arise in a number of different areas in science and engineering such as astronomy, imaging, and communications. The goal is to recover both an unknown signal and an unknown kernel from their convolution. In this paper we work with the circular convolution, which is defined by

where the index difference is considered modulo . Without further assumptions on and this bilinear map is far from injective. Consequently, it is crucial to impose structural constraints on both and . Arguably, the simplest such model is given by linear constraints, that is, both and are constrained to known subspaces. Such a model is reasonable in many applications. In wireless communication, for example, it makes sense to assume that the channel behaviour is dominated by the most direct paths and for the signal a subspace model can be enforced by embedding the message via a suitable coding map into a higher-dimensional space before transmission.

The first rigorous recovery guarantees for such a model were derived by Ahmed, Recht, and Romberg [4]. More precisely, they assume that , where is a fixed, deterministic matrix such that (i.e., is an isometry) and they model , where denotes the complex-conjugate of . Here, the matrix

is a random matrix, whose entries are independent and identically distributed with circular symmetric normal distribution

. In this paper we also adopt this model.

Using the well-known fact that the Fourier transform diagonalizes the circular convolution one can rewrite

where denotes the normalized, unitary discrete Fourier matrix, and

Denoting by the th row of the matrix , and by the th row of the matrix , one observes that

Furthermore, because of the rotation invariance of the circular symmetric normal distribution all the entries of the vectors

are (jointly) independent and identically distributed with distribution . Noting that the expression is linear in , Ahmed, Recht, and Romberg [4] defined the operator by

(2)

obtaining the measurement model

where is additive noise and . The goal is then to determine and from up to the inherent scaling ambiguity, or, equivalently, to find the rank-one matrix .

For , among all solutions of the equation , the matrix is the one with the smallest rank. For this reason, Ahmed, Recht, and Romberg [4] suggested to minimize a natural proxy for the rank, the nuclear norm , defined as the sum of the singular values of a matrix.

(3)

Here is an a priori bound for the noise level, that is, we assume that . For this semidefinite program, they establish the following recovery guarantee.

Theorem 2.1 ([4]).

Consider measurements of the form for , , , and as defined in (2). Assume that and

Then with probability exceeding

every minimizer of the SDP (3) satisfies

(4)

Here and are coherence parameters, which are defined via

and

The third coherence factor is a technical term corresponding to a partition that is constructed as a part of the proof of Theorem 2.1, which is based on the Golfing Scheme [11].

To put the impact of the coherence factors into perspective, observe that if all vectors have the same -norm, one obtains that ; this will be the case, for example, when is a low-frequency Fourier matrix, as it appears for applications in wireless communication. The second coherence factor always satisfies . If is smaller, this indicates that the mass is distributed fairly evenly among . For example, if , then for all . Numerical simulations in [4] confirm that many corresponding to large show worse performance, indicating that this factor may be necessary.
The last coherence factor , in contrast, will no longer appear in our result below, which is why we refrain from detailed discussion. We refer the interested reader to [12, Remark 2.1] and [20, Section 2.3] for details.
For generic the parameters and are reasonably small. For example, if

is chosen from the uniform distribution on the sphere, one can show that with high probability

.
For the noiseless case, i.e., , Theorem 2.1 yields exact recovery, and the required sample complexity

is optimal up to logarithmic factors, as the number of degrees of freedom is

(see [21] for an exact identifiability analysis based on algebraic geometry.) However, if there is noise, the bound for the reconstruction error scales with , in contrast to other measurement scenarios such as low-rank matrix recovery from Gaussian measurements (see, e.g., [9]).

Let us comment on some related work. The foundational paper [4] has triggered a number of follow-up works on the problem of randomized blind deconvolution. A first line of works extended the result to recovering signals from their superposition , a problem often referred to as blind demixing [12, 20]. Another line of works investigated non-convex (gradient-descent based) algorithms [22, 23, 24], which have the advantage that they are computationally less expensive, as they operate in the natural parameter space. It has been shown that they require a near-optimal number of measurements for recovery. For such an algorithm, [22] derived near-optimal noise-bounds for a Gaussian noise model. However, as in this paper, we focus on the scenario of adversarial noise (instead of random noise) the resulting guarantees are not comparable to ours below.

2.2 Matrix completion

The matrix completion problem of reconstructing a low-rank matrix (we assume that w.l.o.g. ) from only a part of its entries arises in many different applications such as in collaborative filtering [25] and multiclass learning [26]. For this reason one could observe a flurry of work on this problem in the last decade, and we will only be able to give a very selective overview of this topic. The precise sampling model that we consider is that entries of are sampled uniformly at random with replacement. Denoting by the standard coordinate vectors in and , respectively, the corresponding measurement operator can be written as

(5)

where is chosen uniformly at random for each (and independently from all other measurements). The scaling factor in the definition of the measurement operator is chosen to ensure that . (Some other papers on matrix completion choose a different scaling. We have chosen this normalization because in this way the results for the matrix completion problem can be better compared to those for the blind deconvolution scenario.) Alternative sampling models analyzed in other works include sampling a subset uniformly from (i.e., without replacement, see, e.g., [27]), or sampling using random selectors.

Again we aim to recover from noisy observations , with a noise vector that satisfies via the SDP

(6)

For matrix completion, this approach has first been studied in [2].

It is well known that similarly to the blind deconvolution problem, some incoherence assumptions are necessary to allow for successful recovery. Indeed, suppose that . Then, if with high probability it holds that and one cannot hope to recover . To avoid such special cases, one needs to ensure that the mass of the Frobenius norm of is spread out over all entries rather evenly. This property is captured by the coherence parameter [11]

Here for and

arising from the singular value decomposition

the three terms on the right hand side are given by

For this coherence parameter, a series of works [2, 27, 11, 28] established the following recovery guarantee for the noiseless scenario.

Theorem 2.2 ([28]).

Consider measurements of the form , where is a rank- matrix and is given by (5). Assume that

Then with probability at least the matrix is the unique minimizer of the SDP (6) with .

As for blind deconvolution, this result has been shown using an approximate dual certificate. In [29] this result has been generalized to the case of adversarial noise, showing that with high probability the minimizer of (6) satisfies

(7)

whenever . As in the blind deconvolution framework, this error bound differs from the case of full Gaussian measurements as discussed, for example, in [9], and also from oracle estimates [5, Section III.B] by a dimensional scaling factor, which will be addressed in this paper.

Also random noise models for matrix completion have been studied in a number of works. In particular, we would like to mention [30, 31], which derive near-optimal rates (both in sample size and estimation error) for matrix completion under subexponential noise with a slightly different nuclear-norm penalized estimator than the one we consider as long as the noise-level is not too small. Similar bounds have also been obtained in [32] using an estimator, which is closer to the one in this work.

Apart from convex methods also many nonconvex algorithms have been proposed and analysed, for example a number of variants of gradient descent (see, e.g., [33, 34, 35, 36, 37, 14, 15, 23]). Arguably the strongest result for matrix completion under adversarial noise has been shown in [33, 38]. These works propose a non-convex algorithm based on Riemannian optimization and show that if the number of measurements is larger than the true matrix can be reconstructed up to an estimation error superior to the one in [29]. Namely for denoting the condition number of the matrix they show that the output of their algorithm satisfies (in our notation)

(8)

provided the noise level is below a certain, small threshold that scales with the smallest singular value of . For error vectors that are spread out evenly and matrices that are well conditioned, one has that , so this bound is superior to (7) in the sense that the scaling factors that appear only scale with the rank and not the dimension. It should be noted though that in contrast to nuclear norm minimization the underlying algorithm requires precise knowledge of the true rank of the matrix to be recovered.

Just before completion of this manuscript, Chen et al. [39] bridged convex and nonconvex approaches, using nonconvex methods to analyze a convex recovery scheme. Their results provide near optimal recovery guarantees for the matrix completion problem via nuclear norm minimization under a subgaussian random noise model for a much larger range of admissible noise levels than the aforementioned works. More precisely, the proof is based on the observation that in their scenario the minimizer of the convex problem is very close to an approximate critical point of a non-convex gradient based method. This allows them to transfer existing stability results [23] for non-convex optimization to the convex problem. However, the required sample complexity scales suboptimally in the rank of the matrix and similarly to (8), the error bound depends on the condition number .

2.3 Descent cone analysis

In recent years a number of works have studied low-rank matrix recovery and compressed sensing via a descent cone analysis. This approach has been pioneered for -norm minimization in [40] and for more general (atomic) norms in [9]. Here the descent cone of a norm at a point is the set of all possible directions such that the norm does not increase. For the nuclear norm, this leads to the following definition.

Definition 2.3.

For any matrix define its descent cone by

To understand its relevance for recovery guarantees assume for a moment that we are in the noiseless scenario, i.e.,

and . Then the matrix is the unique minimizer the semidefinite program (3), if and only if the null space of does not intersect the descent cone . In the case of noise, the constraint in the SDPs (3) and (6) defines a region around , i.e., the affine subspace consistent with the observed measurements in the noiseless scenario. The intersection of this region with the set of all signals that have a smaller nuclear norm than the ground truth is the set of feasible solutions that are preferred to . The following quantity for a matrix , which is often referred to as minimum conic singular value, quantifies the size of this intersection

If becomes larger, this intersection becomes smaller, which translates into stronger recovery guarantees. The following theorem confirms this intuition.

Theorem 2.4.

[9, Proposition 2.2] Let be a linear operator and assume that with . Then any minimizer of the SDP (3) satisfies

When measurement matrices of the operator are full Gaussian matrices (in contrast to rank- measurements as in this paper) and is normalized such that , for an arbitrary low-rank matrix one has with high probability that . Consequently, Theorem 2.4 yields an optimal estimation error even for adversarial noise. As we will show this is no longer the case for blind deconvolution and matrix completion.

The geometric analysis of linear inverse problems via the descent cone and the minimum conic singular value has lead to many new results and insights in compressed sensing and low-rank matrix recovery. For convex programs the phase transition of the success rate could be precisely predicted

[41]. As the proofs are specific to full Gaussian measuements, they do not apply for a number of important structured and heavy-tailed measurement scenarios. Stronger results [42, 43, 17, 44, 16] were subsequently obtained using Mendelson’s small ball method [45, 46], a powerful tool for bounding a nonnegative empirical process from below, now often refereed to as Mendelson’s small ball method.

2.4 Notation

For we will write to denote the set . For any set we will denote its cardinality by . For a complex number we will denote its real part by and its imaginary part by . By we will denote the logarithm to the base . By

we will denote the expectation of a random variable

and by we denote the probability of an event . If we will denote its -norm by and its Hermitian transpose by . For the (Euclidean) inner product is defined by . Furthermore, for its spectral norm is given by , i.e., the dual norm of the nuclear norm . Moreover, the Frobenius norm of is defined by with corresponding inner product , where . When we study matrix completion, we will work with matrices and the previous quantities will be defined analogously. Moreover, in that scenario we will use the notation , where .

3 Our results

3.1 Instability of low-rank matrix recovery

3.1.1 Blind deconvolution

Our first main result states that randomized blind deconvolution can be unstable under adversarial noise.

Theorem 3.1.

Let . Assume that is an integer multiple of and that

Then there exists a matrix satisfying and with having rows of equal norm, i.e., , such that for all and the following holds:
With probability at least , where , there is such that for all there exists an adversarial noise vector with that admits an alternative solution with the following properties.

  • is feasible, i.e., for the noisy measurement vector

  • is preferred to by the SDP (3), i.e., , but

  • is far from the true solution in Frobenius norm, i.e.,

The constants , , and are universal.

Remark 3.2.

The matrix in the above result exactly fits into the framework of Theorem 2.1, and also one can check that for our choice of (see the proof of Proposition 3.3 for its definition) one has that . That is, the assumptions of Theorem 2.1 cannot be enough to deduce stability.

We do not expect, however, that this kind of instability is observed for arbitrary isometric embeddings . In particular, if is a random embedding we expect that a similar result as in [17] applies.

To put our results in perspective note that for , which is the minimal number of measurements required for noiseless recovery, it holds that . Up to logarithmic factors, this coincides with the rate predicted by (4), whenever .

Theorem 3.1 is a direct consequence of the following proposition, which we think is interesting in its own right.

Proposition 3.3.

Let . Assume that is an integer multiple of and that

(9)

Then there exists satisfying and , whose corresponding measurement operator satisfies the following.
Let , and set . Then with probability at least it holds that

(10)

Here , , and are absolute constants.

The proof of Proposition 3.3 will be provided in Section 4. Note that by definition of the minimum conic singular value Proposition 3.3 is equivalent to the statement that with high probability there is such that

Our construction of such relies on the observation that with high probability there is a rank-one matrix in the null-space of which is relatively close to the descent cone (with respect to the -distance). Perturbing by for a suitable one can then obtain a matrix , which fulfills (3.1.1).
The existence of such a matrix also reveals a fact about the geometry of the problem, which we find somewhat surprising: while the null space of does not intersect the descent cone (otherwise exact recovery would not be possible), the angle between those objects is very small. This is very different from the behavior for measurement matrices with i.i.d. Gaussian entries (instead of ).

Remark 3.4.

While is preferred to the true solution by the SDP (3) is typically not a minimizer of (3). To see this, assume that without noise exact recovery is possible, which is the case with high probability by Theorem 2.1. Then consider for of the form with and such that , as in the proof of Proposition 3.3. As (otherwise exact recovery would not be possible) it follows that for

where the last line is due to .

On the other hand, we also have that due to and, hence, is admissible whenever is admissible. Consequently, the SDP (3) will always prefer to and will never be a minimizer. It remains an open problem what one can say about the minimizer of (3), see also Section 6.
Even if the minimizer of (3) is closer to the ground truth (in -distance) than , however, the nuclear norms of and will be very close, which can easily lead to numerical instabilities.

3.1.2 Matrix completion

Our second main result states that for arbitrary incoherent low-rank matrices, matrix completion is unstable with high probability. Note that in contrast to Theorem 3.1 which is based on a specific choice of parameters the following result holds for an arbitrary incoherent matrix .

Theorem 3.5.

Let and let be defined as in (5). Assume that is a rank matrix with singular value decomposition . Moreover, assume that

Then with probability at least there is such that for all there exists an adversarial noise vector with that admits an alternative solution with the following properties.

  • is feasible, i.e., for the noisy measurement vector

  • is preferred to by the SDP (6), i.e., , but

  • is far from the true solution in Frobenius norm, i.e.,

Here the constants , , and are universal.

Again, to put our results in perspective note that for , which is the minimal number of measurements required for noiseless recovery, it holds that . Up to logarithmic factors, this coincides with the rate predicted by (7).

Theorem 3.5 is a direct consequence of the following proposition, which in our opinion is of independent interest, as it provides a negative answer to a question by Tropp [19, Section 5.4].

Proposition 3.6.

Let be a rank- matrix with corresponding singular value decomposition . Moreover, assume that

(11)

Then with probability at least it holds that

(12)

The constants , , and are universal.

Proposition 3.6 corresponds to Proposition 3.3 for blind deconvolution and will be proved analogously. We will again show that with high probability there is such that and is relatively close to the descent cone of in -distance. Setting for a suitable yields an element of with

3.2 Stable recovery

A geometric interpretation of Theorems 3.1 and 3.5 is that the nuclear norm ball is near-tangential to both the kernels of matrix completion and randomized blind deconvolution. Given that tangent spaces only provide local approximation, these results leave open, what happens in some distance, i.e., for larger noise levels – this will depend on the curvature of the nuclear norm ball.

Our third main result concerns exactly this problem for the randomized blind deconvolution setup. As it turns out, the descent directions with very small correspond to directions of significant curvature. That is, only a very short segment in this direction will have smaller nuclear norm than , and the corresponding alternative solutions all correspond to very small . For noise levels large enough, in contrast, these directions can be excluded and one can obtain near-optimal error bounds. In order to precisely formulate this observation, we denote the set of -incoherent vectors with respect to for by

With this notation, our result reads as follows.

Theorem 3.7.

Let and such that . Assume that

Then with probability at least the following statement holds for all , all , all , and all with :
Any minimizer of (3) satisfies

Here , , and are absolute constants.

In words, this theorem establishes linear scaling in the noise level with only a logarithmic dimensional factor for , in contrast to the polynomial factor required for small noise levels as a consequence of Theorem 3.1. Here the value of can be chosen arbitrarily small, at the expense of an increased number of measurements. For example when one is interested in noise levels for some (this is the largest order to expect meaningful error bounds despite the additional logarithmic factors) one should choose , and near-linear error bounds will be guaranteed for a sample complexity of

Remark 3.8.

A similar approach to the proof of Theorem 3.7 also yields a corresponding result for rank-one matrix completion. Arguably, however, matrix completion is mainly of interest for ground truth matrices of rank higher than one, so we decided to omit the proof details.

4 Upper bounds for the minimum conic singular values

4.1 Characterization of the descent cone of the nuclear norm

The goal of this section is to prove Proposition 3.3 and Proposition 3.6, from which we will then be able to deduce Theorem 3.1 and Theorem 3.5. For that we first discuss a characterization of the descent cone . In order to state this characterization, Lemma 4.1, we need to introduce some additional notation. Let be a matrix of rank . We will denote its corresponding singular value decomposition by , where is a diagonal matrix with nonnegative entries and and are unitary matrices, i.e., . This allows us to define the tangent space of the manifold of rank- matrices at the point by

(13)

By we will denote the orthogonal projection onto , by the projection onto its orthogonal complement.

Lemma 4.1.
111Lemma 4.1 is similar to well-known results in convex optimization and may be known to the community. As we could not find it in the literature in this form, we decided to include a proof for completeness.

Let be a matrix of rank with corresponding singular value decomposition . Then

where denotes the topological closure of .

The proof of Lemma 4.1 relies on the duality between the descent cone and the subdifferential of a convex function. In the following we will denote by the subdifferential of the nuclear norm at the point . We will use that a characterization of is well-known [47]. Namely, for all with corresponding singular value decomposition it holds that

(14)
Proof.

Recall that for a set of matrices its polar cone is defined by

For all we have the following polarity relation between the descent cone and the subdifferential (see, e.g., [48, Theorem 23.7] 222Reference [48] considers only sets and functions defined in with the usual Euclidean inner product. However, note that together with the inner product given by may be identified with an -dimensional real-valued vector space with standard Euclidean inner product. Applying the results of [48] to this vector space yields the above-mentioned results.)

Thus, it follows from the bipolar theorem (see, e.g., [49, p. 53]) that

Hence, in order to complete the proof it is sufficient to show that

(15)

First, suppose that satisfies . We have to show that for all . Indeed,

In the first inequality we have used that the spectral norm is the dual norm of the nuclear norm. The second inequality follows from . Hence, we have shown that . Next, let be arbitrary. Choose such that and . Then by (14) it follows that and as