# Blind Deconvolutional Phase Retrieval via Convex Programming

We consider the task of recovering two real or complex m-vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution. We prove that if the two signals belong to known random subspaces of dimensions k and n, then they can be recovered up to the inherent scaling ambiguity with m >> (k+n) ^2 m phaseless measurements. Our method provides the first theoretical recovery guarantee for this problem by a computationally efficient algorithm and does not require a solution estimate to be computed for initialization. Our proof is based Rademacher complexity estimates. Additionally, we provide an ADMM implementation of the method and provide numerical experiments that verify the theory.

## Authors

• 16 publications
• 11 publications
• 16 publications
• ### Simultaneous Phase Retrieval and Blind Deconvolution via Convex Programming

We consider the task of recovering two real or complex m-vectors from ph...
04/26/2019 ∙ by Ali Ahmed, et al. ∙ 0

• ### A convex program for bilinear inversion of sparse vectors

We consider the bilinear inverse problem of recovering two vectors, x∈R^...
09/22/2018 ∙ by Alireza Aghasi, et al. ∙ 0

• ### Bilinear Compressed Sensing under known Signs via Convex Programming

We consider the bilinear inverse problem of recovering two vectors, x∈R^...
06/25/2019 ∙ by Alireza Aghasi, et al. ∙ 0

• ### Polar Deconvolution of Mixed Signals

The signal demixing problem seeks to separate the superposition of multi...
10/14/2020 ∙ by Zhenan Fan, et al. ∙ 0

• ### Phase Transitions in Recovery of Structured Signals from Corrupted Measurements

This paper is concerned with the problem of recovering a structured sign...
01/03/2021 ∙ by Zhongxing Sun, et al. ∙ 9

• ### Semidefinite Relaxation Based Blind Equalization using Constant Modulus Criterion

Blind equalization is a classic yet open problem. Statistic-based algori...
08/22/2018 ∙ by Kun Wang, et al. ∙ 0

• ### Composite optimization for robust blind deconvolution

The blind deconvolution problem seeks to recover a pair of vectors from ...
01/06/2019 ∙ by Vasileios Charisopoulos, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

This paper considers recovery of two unknown signals (real- or complex-valued) from the magnitude only measurements of their convolution. Let , and be vectors residing in , where denotes either , or . Moreover, denote by the DFT matrix with entries We observe the phaseless Fourier coefficients of the circular convolution of , and

 (1)

where returns the element wise absolute value of the vector . We are interested in recovering , and from the phaseless measurements of their circular convolution. In other words, the problem concerns blind deconvolution of two signals from phaseless measurements. The problem can also be viewed as identifying the structural properties on such that its convolution with the signal/image of interest makes the phase retrieval of a signal well-posed. Since , and are both unknown, and in addition, the measurements are phaseless, the inverse problem becomes severly ill-posed as many pairs of , and correspond to the same

. We show that this non-linear problem can be efficiently solved, under Gaussian measurements, using a semidefinite program and also theoretically prove this assertion. We also propose a heuristic approach to solve the proposed semidefinite program computationally efficiently. Numerical experiments show that, using this algorithm, one can successfully recover a blurred image from the magnitude only measurements of its Fourier spectrum.

Phase retrieval has been of continued interest in the fields of signal processing, imaging, physics, computational science, etc. Perhaps, the single most important context in which phase retrieval arises is the X-ray crystallography [Har93, Mil90]

, where the far-field pattern of X-rays scattered from a crystal form a Fourier transform of its image, and it is only possible to measure the intensities of the electromagnetic radiation. However, with the advancement of imaging technologies, the phase retrieval problem continues to arise in several other imaging modalities such as diffraction imaging

[BDP07], microscopy [MISE08], and astronomical imaging[FD87]. In the imaging context, the result in this paper would mean that if rays are convolved with a generic pattern (either man made or naturally arising due to propagation of light through some unknown media) prior to being scattered/reflected from the object, the image of the object can be recovered from the Fourier intensity measurements later on. As is well known from Fourier optics [Goo08], the convolution of a visible light with a generic pattern can be implemented using a lens-grating-lens setup.

Blind deconvolution is a fundamental problem in signal processing, communications, and in general system theory. Visible light communication has been proposed as a standard in 5G communications for local area networks [ATO13, ROJ15, ATO10]. Propagation of information carrying light through an unknown communication medium is modeled as a convolution. The channel is unknown and at the receiver it is generally difficult to measure the phase information in the propagated light. The result in this paper says that the transmitted signal can be blindly deconvolved from the unknown channel from the Fourier intensity measurements of the light only. The reader is referred to Section 4.1 of the Appendix for a detailed description of the visible light communication and its connection to our formulation.

### 1.1 Observations in Matrix Form

The phase retrieval, and blind deconvolution problem has been extensively studied in signal processing community in recent years [CLS15, ARR14] by lifting the unknown vectors to a higher dimensional matrix space formed by their outer products. The resulting rank-1 matrix is recovered using nuclear norm as a convex relaxation of the non-convex rank constraint. Recently, other forms of convex relaxations have been proposed [BR17b, GS18, AAH17a, AAH17b] that solve both the problems in the native (unlifted) space leading to computationally efficiently solvable convex programs. This paper handles the non-linear convolutional phase retrieval problem by lifting it into a bilinear problem. The resulting problem, though still non-convex, gives way to an effective convex relaxation that provably recovers , and exactly.

It is clear from (1) that uniquely recovering , and is not possible without extra knowledge or information about the problem. We will address the problem under a broad and generally applicable structural assumptions that both the vectors , and are members of known subspaces of . This means that , and can be parameterized in terms of unknown lower dimensional vectors , and , respectively as follows

 w=Bh, x=Cm, (2)

where , and are known matrices whose columns span the subspaces in which , and reside, respectively. Recovering , and would imply the recovery of , and , therefore, we take , and as the unknowns in the inverse problem henceforth. Since the circular convolution operator diagonalizes in the Fourier domain, the measurements in (1) take the following form after incorporating the subspace constraints in (2)

 y=1√m|^Bh⊙^Cm|,

where , , and represent the Hadamard product. Denoting by and the rows of , and , respectively, the entries of the measurements can be expressed as

 y2ℓ=1m|⟨bℓ,h⟩⟨cℓ,m⟩|2, ℓ=1,2,3,…,m.

Evidently the problem is non-linear in both unknowns. However, it reduces to a bilinear problem in the lifted variables , and taking the form

 y2ℓ=1m⟨bℓb∗ℓ,hh∗⟩⟨cℓc∗ℓ,mm∗⟩=1m⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,M⟩, (3)

where , and are the rank-1 matrices , and , respectively. Treating the lifted variables , and as unknowns makes the measurements bilinear in the unknowns; a structure that will help us formulate an effective convex relaxation.

### 1.2 Novel Convex Relaxation

The task of recovering , and from in (3) can be naturally posed as an optimization program

 find H,M (4) subject to 1m⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,M⟩=y2ℓ, ℓ=1,2,3,…,m. rank(H)=1, % rank(M)=1.

However, both the measurement and the rank constraints are non-convex. Further, the immediate convex relaxation of each measurement constraint is trivial, as the convex hull of the set of satisfying is the set of all possible .

To derive our convex relaxation, recall that the true , and are also positive semidefinite (PSD). This means that incorporating the PSD constraint in the optimization program translates into the fact that the variables and are necessarily non-negative. That is,

 H≽0, and M≽0⟹uℓ≥0, and vℓ≥0,

where the implication simply follows by the definition of PSD matrices. This observation restricts the hyperbolic constraint set in Figure 1 to the first quadrant only. For a fixed , we propose replacing the non-convex hyperbolic set with its convex hull In short, our convex relaxation is possible because the PSD constraint from lifting happens to select a specific branch of the hyperbola given by any particular bilinear measurement, and this single branch has a nontrivial convex hull.

The rest of the convex relaxation is standard, as the rank constraint in (4) is then relaxed with a nuclear-norm minimization, which reduces to trace minimization in the PSD case. Hence, we study the convex program

 minimize Tr(H)+Tr(M) (5) subject to 1m⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,M⟩≥y2ℓ, ℓ=1,2,…,m H≽0, M≽0.

### 1.3 Main Result

As we are presenting the first analytical results on this problem, we choose the subspace matrices , and to be standard Gaussian:

 B[ℓ,i]∼Normal(0,1m),(ℓ,i)∈[m]×[k],and C[ℓ,i]∼Normal(0,1m),(ℓ,i)∈[m]×[n]. (6)

Note that this choice results in . We show that with this choice the optimization program in (5) recovers a global scaling of of the true solution We will interchangeably use the notation to denote the pair of matrices and , or the block diagonal matrix

 (H,M)=[H00M]. (7)

The exact value of the unknown scalar multiple can be characterized for the solution of (5). Observe that the solution of the convex optimization program in (5) obeys . We aim to show that the solution of the optimization program recovers the scaling of the true solution :

 ~H=√Tr(M♮)Tr(H♮)H♮, ~M=√Tr(H♮)Tr(M♮)M♮.

Note that . The main result can now be stated as follows.

###### Theorem 1 (Exact Recovery)

Given the magnitude only spectrum measurements (1) of the convolution of two unknown vectors , and in . Suppose that , and are generated as in (2), where , and are known standard Gaussian matrices as in (6). Then the convex optimization program in (5) uniquely recovers for

with probability at least

whenever , where is a constant that depends on .

### 1.4 Main Contributions

In this paper, we study the combination of two important and notoriously challenging signal recovery problems: phase retrieval and blind deconvolution. We introduce a novel convex formulation that is possible because the algebraic structure from lifting resolves the bilinear ambiguity just enough to permit a nontrivial convex relaxation of the measurements. The strengths of our approach are that it allows a novel convex program that is the first to provably permit recovery guarantees with optimal sample complexity for the joint task of phase retrieval and blind deconvolution when the signals belong to known random subspaces. Additionally, unlike many recent convex relaxations and nonconvex approaches, our approach does not require an initialization or estimate of the true solution in order to be stated or solved. Admittedly, our method, directly interpreted, is computationally prohibitive for large problem sizes because lifting squares the dimensionality of the problem. Nonetheless, techniques, such as Burer-Monteiro approaches that only maintain low-rank representations [BM03], have been developed for similar problems. This current work provides the theoretical justification for the exploration of such problems in this difficult combination of phase retrieval and blind deconvolution, and we leave such work for future research.

We do not want to give the reader the impression that the present paper solves the problem of blind deconvolutional phase retrieval in practice. The numerical experiments we perform do indeed show excellent agreement with the theorem in the case of random subspaces. Such subspaces are unlikely to appear in practice, and typically appropriate subspaces would be deterministic, including partial Discrete Cosine Transforms or partial Discrete Wavelet Transforms. Numerical experiments, not shown, indicate that our convex relaxation is less effective for the cases of these deterministic subspaces. We suspect this is due to the fact that the subspaces for both measurements should be mutually incoherent, in addition to both being incoherent with respect to the Fourier basis given by the measurements. As with the initial recovery theory for the problems of compressed sensing and phase retrieval, we have studied the random case in order to show information theoretically optimal sample complexity is possible by efficient algorithms. Based on this work, it is clear that blind deconvolutional phase retrieval is still a very challenging problem in the presence of deterministic matrices, and one for which development of convex or nonconvex methods may provide substantial progress in applications.

## 2 Proof of Theorem 1

To prove Theorem 1, we will show that is the unique minimizer of an optimization program with a larger feasible set defined by linear constraints.

###### Lemma 1

If is the unique solution to

 minimize ∥H∥∗+∥M∥∗ (8) subject to 1m(⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,~M⟩+⟨bℓb∗ℓ,~H⟩⟨cℓc∗ℓ,M⟩)≥2y2ℓ, ℓ=1,2,3,…,m.

then is the unique solution to (5).

proof:
Start by observing that the trace in (5) can be replaced with nuclear norm as on the set of PSD matrices both are equivalent. This gives

 minimize ∥H∥∗+∥M∥∗ (9) subject to 1m⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,M⟩≥y2ℓ, ℓ=1,2,…,m H≽0, M≽0.

It suffices now to show that the feasible set of (8) contains the feasible set of (9). Recall the notations

 uℓ=⟨bℓb∗ℓ,H⟩, vℓ=⟨cℓc∗ℓ,M⟩, ~uℓ=⟨bℓb∗ℓ,~H⟩, and ~vℓ=⟨cℓc∗ℓ,~M⟩.

Using the fact that a convex set with smooth boundary is contained in a half space defined by the tangent hyperplane at any point on the boundary of the set. Consider the point , and observe that

 {(uℓ,vℓ)∈R2 | 1muℓvℓ≥y2ℓ,uℓ≥0, and vℓ≥0}⊆{(uℓ,vℓ)∈R2 | 1m[~vℓ~uℓ]⋅[uℓ−~uℓvℓ−~vℓ]≥0}.

Rewriting and in the form of original constraints, we have that any feasible point of (9) satisfies

The geometry of the linearly constrained program (8) is also shown in Figure 1 (Right), where the hyperbolic set is replaced by an envelop of hyperplanes defined by the linear constraints of (8). Visually it is clear from Figure 1 that the feasible set of (8) is larger than that of (5).

Define a set , and and define a linear map as

 A((H,M))=[⟨A1,(H,M)⟩,…,⟨Am,(H,M)⟩]T;

one can imagine as a matrix with vectorized as its rows. The linear constraints in the (8) are ; the inequality here applies elementwise. Furthermore, define , and it is easy to see that

We want to show that any feasible perturbation around the truth strictly increases the objective. From the discussion above, it is clear that the perturbations do not change the objective and also lead to feasible points of (8). Our general strategy will be to resolve any perturbation into two components, one in and the other in , where is the orthogonal complement of the subspace . The component in does not affect the objective. We show that the components in of all the feasible perturbations lead to a strict increase in the objective of (8). This should imply that that the minimizer of (8) can be anywhere in the set . However, as we are minimizing the (trace) norms, an arbitrary large scaling of the solution is prevented and it is restricted to the subset . Moreover, among these solutions only lies in the feasible set of (9). Given this and the fact that is a minimizer of (8) implies that is the unique minimizer of (9).

We begin by characterizing the set of descent directions for the objective function of the optimization program (8). Let , and be the set of symmetric matrices of the form

 T~h:={X=~hz∗+z~h∗}, T~m:={X=~mz∗+z~m∗},

and denote the orthogonal complements by , and , respectively. Note that iff both the row and column spaces of are perpendicular to . denotes the orthogonal projection onto the set , and a matrix of appropriate dimensions can be projected into as

 PT~h(X):=~h~h∗∥~h∥22X+X~h~h∗∥~h∥22−~h~h∗∥~h∥22X~h~h∗∥~h∥22

Similarly, define the projection operator . The projection onto orthogonal complements are then simply , and , where is the identity operator. We use as a shorthand for . Using the notation in (7), the objective of (8) is , and subgradient of the objective at the proposed solution is

 ∂∥(~H,~M)∥∗:={G=(~h~h∗,~m~m∗)+(WT⊥~h,WT⊥~m), ∥(WT⊥~h,WT⊥~m)∥≤1}.

The set of descent directions of the objective of (8) is defined as

 {(δH,δM)∈N⊥:⟨(G,(δH,δM)⟩≤0,∀G∈∂∥(~H,~M)∥∗}⊆ {(δH,δM)∈N⊥:⟨(~h~h∗,~m~m∗),(δH,δM)⟩+ ∥(δHT⊥~h,δMT⊥~m)∥∗≤0,∀G∈∂∥(~H,~M)∥∗}⊂ {(δH,δM)∈N⊥:∥(δHT⊥~h,δMT⊥~m)∥∗≤∥(δHT~h,δMT~m)∥F, ∀G∈∂∥(~H,~M)∥∗} =:Q. (10)

We quantify the "width" of the set of descent directions through a Rademacher complexity, and a probability that the gradients of the constraint functions of (8) lie in a certain half space. This enables us to build an argument using the small ball method [KM15, Men14] that it is unlikely to have points that meet the constraints in (8) and still be in . Before moving forward, we introduce the above mentioned Rademacher complexity and probability term.

Denote the constraint functions as111For brevity, we will often drop the dependence on , and in the notation For a set , the Rademacher complexity of the gradients is defined as

 C(Q):=Esup(H,M)∈Q1√mm∑ℓ=1εℓ⟨∇fℓ,(H,M)∥(H,M)∥F⟩, (11)

where

are iid Rademacher random variables independent of everything else in the expression. For a convex set

, is a measure of the width of around origin interms of the gradients . For example, random choice of gradient might yield little overlap with a structured set leading to a smaller complexity .

Our result also depends on a probability and a positive parameter defined as

 pτ(Q):=inf(H,M)∈QP(⟨∇f,(H,M)⟩≥τ∥(H,M)∥F). (12)

The probability quantifies visibility of the set through the gradient vectors . A small value of and means that the set mainly remains invisible through the lenses of . This can be appreciated just by noting that depends on the correlation of the elements of with the gradient vectors .

Following lemma shows that the minimizer of the linear program (

8) almost always resides in the desired set for a sufficiently large quantified interms of , , and .

###### Lemma 2

Consider the optimization program in (8) and , characterized in (2), be the set of descent directions for which , and can be determined using (11) and (12). Choose

 m≥(2C(Q)+tττpτ(Q))2

for any . Then the minimizer of (8) lies in the set with probability at least .

Proof of this lemma is based on small ball method developed in [KM15, Men14] and further studied in [LM18, LM17]. The proof is mainly repeated using the argument in [BR17a], and is provided in the Appendix for completeness.

With Lemma 2 in place, an application Lemma 1 and the discussion after it proves that for choice of outlined in Lemma 2, is the unique minimizer of (5). The last missing piece in the proof of Theorem 1 is the computation of the Rademacher complexity , and for the .

We begin with evaluation of the complexity

 C(Q)

Splitting between , and , and using Holder’s inequalities, we obtain

 C(Q) ≤E∥∥1√mm∑ℓ=1εℓ(~vℓPT~h(bℓb∗ℓ),~uℓPT~m(cℓc∗ℓ))∥∥F⋅sup(δH,δM)∈Q ∥∥∥(δHT~h,δMT~m)∥(δH,δM)∥F∥∥∥F +E∥∥1√mm∑ℓ=1εℓ(~vℓbℓb∗ℓ,~uℓcℓc∗ℓ)∥∥⋅sup(δH,δM)∈Q ∥∥ ∥∥(δHT⊥~h,δMT⊥~m)∥(δH,δM)∥F∥∥ ∥∥∗

On the set , defined in (2), we have

 ∥∥(δHT⊥~h,δMT⊥~m)∥(δH,δM)∥F∥∥∗≤∥∥(δHT~h,δMT~m)∥(δH,δM)∥F∥∥F≤1.

Using Jensen’s inequality, the first expectation simply becomes

 E∥∥1√mm∑ℓ=1εℓ(~vℓPT~h(bℓb∗ℓ),~uℓPT~m(cℓc∗ℓ))∥∥F≤ ⎷1mE∥∥m∑ℓ=1εℓ(~vℓPT~h(bℓb∗ℓ),~uℓPT~m(cℓc∗ℓ))∥∥2F = ⎷1mm∑ℓ=1E(∥~vℓPT~h(bℓb∗ℓ)∥2F+∥~uℓPT~m(cℓc∗ℓ)∥2F),

where the last equality follows by going through with the expectation over ’s. Recall from the definition of the projection operator that , and . It can be easily verifies that and, therefore,

 E∥~vℓPT~h(bℓb∗ℓ)∥2F ≤E|c∗ℓ~m|42⋅E(2|b∗ℓ~h|2∥~h∥22∥bℓ∥22−|b∗ℓ~h|4∥~h∥42)≤3∥~m∥42(6k−3),

where we used a simple calculation involving fourth moments of Gaussians

. In an exactly similar manner, we can also show that . Putting these together gives us

 E∥∥1√mm∑ℓ=1εℓ(~vℓPT~h(bℓb∗ℓ),~uℓPT~m(cℓc∗ℓ))∥∥F≤5max(∥~h∥22,∥~m∥22)√k+n.

Moreover,

 E∥∥1√mm∑ℓ=1εℓ(~vℓbℓb∗ℓ,~uℓcℓc∗ℓ)∥∥≤Emaxℓ(~uℓ,~vℓ)⋅E∥∥1√mm∑ℓ=1εℓ(bℓb∗ℓ,cℓc∗ℓ)∥∥

Standard net arguments; see, for example, Sec. 5.4.1 of [EK12] show that

 P(∥∥1√mm∑ℓ=1εℓ(bℓb∗ℓ,cℓc∗ℓ)∥∥≥c√k+n)≤e−cm, provided that m≥c(k+n).

This directly implies that The random variables and being sub-exponential have Orlicz-1 norms bounded by . Using standard results, such as Lemma 3 in [vdGL13], we then have Putting these together yields

 E∥∥1√mm∑ℓ=1εℓ(~vℓbℓb∗ℓ,~uℓcℓc∗ℓ)∥∥≤cmax(∥~h∥22,∥~m∥22)√(k+n)log2m. (13)

We have all the ingredients for the final bound on stated below

 C(Q)≤cmax(∥~h∥22,∥~m∥22)√(k+n)log2m. (14)

### 2.2 Probability pτ(Q)

The calculation for the probability , and the positive parameter are given in Appendix due to limitation of space. We find that

 pτ(Q)≥c>0, and τ=cmax(∥~h∥22,∥~m∥22). (15)

The complexity estimate in (14), value of computed above, and stated in (15) together with an application of Lemma 2 prove Theorem 1.

## 3 Convex Implementation and Phase Transition

To implement the semi-definite convex program (5), we propose a numerical scheme based on the alternating direction method of multipliers (ADMM). Due to the space limit, the technical details of the algorithm are moved to Section 4.4 of the Appendix.

To illustrate the perfect recovery region, in Figure 2 we present the phase portrait associated with the proposed convex framework. For each fixed value of , we run the algorithm for 100 different combinations of and , each time using a different set of Gaussian matrices and . If the algorithm converges to a sufficiently close neighborhood of the ground-truth solution (a distance less than 1% of the solution’s norm), we label the experiment as successful. Figure 2 shows the collected success frequencies, where solid black corresponds to 100% success and solid white corresponds to 0% success. For an empirically selected constant , the success region almost perfectly stands on the left side of the line .

A similar phase transition diagram can be obtained when

is a subset of the columns of identity matrix, and

is Gaussian as before. This importantly hints that the convex framework is applicable to more realistic deterministic subspace models.

## 4 Appendix

The material presented in this section is supplementary to the manuscript above. The dsection contains extended discussions, additional technical proofs and details of the convex program implementation.

### 4.1 Visible Light Communication

As discussed in the body of the paper, an important application domain where blind deconvolution from phaseless Fourier measurements arises is the visible light communication (VLC). A stylized VLC setup is shown in Figure 3. A message is to be transmitted using visible light. The message is first coded by multiplying it with a tall coding matrix and the resultant information is modulated on a light wave. The light wave propagates through an unknown media. This propagation can be modeled as a convolution of the information signal with unknown channel . The vector contains channel taps, and frequently in realistic applications has only few significant taps. In this case, one can model

 w≈Bh,

where is a short vector, and in this case is a subset of the columns of an identity matrix. Generally, the multipath channels are well modeled with non-zero taps in top locations of . In that case, is exactly known to be top few columns of the identity matrix.

In visible light communication, there is always a difficulty associated with measuring phase information in the received light. Figure 3 shows a setup, where we measure the phaseless Fourier transform (light through the lens) of this signal. The measurements are therefore

and one wants to recover , and given the knowledge of , and the coding matrix . Since we chose to be random Gaussian, and is the columns of identity. As mentioned at the end of the numerics section that with this subspace model, we obtain similar recovery results as one would have for both , and being random Gaussians. The proposed convex program solves this difficult inverse problem and recovers the true solution with these subspace models.

### 4.2 Proof of Lemma 2

The proof is based on small ball method developed in [KM15, Men14] and further studied in [LM18] and [LM17]. The proof is mainly repeated using a similar line of argument as in [BR17a], and is provided here for completeness.

Rest of the proof now concerns showing that is the unique solution to the linearly constrained optimization program (8

). Define one sided loss function:

 L(H,M):=m∑ℓ=1(2y2ℓ−1m⟨bℓb∗ℓ,H⟩⟨cℓc∗ℓ,~M⟩−⟨bℓb∗ℓ,~H⟩⟨cℓc∗ℓ,M⟩)+, (16)

where denotes the positive side. Using this definition, we rewrite (8) compactly as

 minimize ∥H∥∗+∥M∥∗ (17) subject toL(H,M)≤0.

The goal of the proof is to show that all descent direction that also obey the constraint set have a small norm. Since is a feasible perturbation from the proposed optimal , we have from the constraints above that

 L(~H+δH,~M+δM)≤0 (18)

We begin by expanding the loss function below

 L(~H+δH,~M+δM) =m∑ℓ=1[(2y2ℓ−(⟨bℓb<