## 1 Introduction

We study the problem of recovering two unknown vectorss and in from observations

(1) |

where denotes entrywise product and is noise. Let and such that and with and . The bilinear inverse problem (BIP) we consider is to find and , up to the inherent scaling ambiguity, from , , and .

BIPs, in general, have many applications in signal processing and machine learning and include fundamental practical problems like phase retrieval

[Fienup, 1982, Candès and Li, 2012, Candès et al., 2013], blind deconvolution [Ahmed et al., 2014, Stockham et al., 1975, Kundur and Hatzinakos, 1996, Aghasi et al., 2016a], non-negative matrix factorization [Hoyer, 2004, Lee and Seung, 2001], self-calibration [Ling and Strohmer, 2015], blind source separation [O’Grady et al., 2005], dictionary learning [Tosic and Frossard, 2011], etc. These problems are in general challenging and suffer from identifiability issues that make the solution set non-unique and non-convex. A common identifiability issue, also shared by the BIP in (1), is the scaling ambiguity. In particular, if solves a BIP, then so does for any nonzero . In this paper, we resolve this scaling ambiguity by finding the point in the solution set closest to the origin with respect to the norm.Another identifiability issue of the BIP in (1) is if solves (1), then so does , where is the vector of ones. In prior works like Ahmed et al. [2014], this identifiability issue is resolved by assuming the unknown signals live in known subspaces. In contrast we resolve the identifiability issue by assuming the signals are sparse with respect to known bases or dictionaries. Natural choices for such bases include the standard basis, the Discrete Cosine Transform (DCT) basis, and a wavelet basis. In addition to sparsity of the unknown vectors, and , with respect to known dictionaries, we also require sign information of and . Knowing sign information is justified in, for example, imaging applications where and correspond to images, which contain non-negative pixel values.

In this paper, we study a sparse bilinear inverse problem, where the unknown vectors are sparse with respect to known dictionaries. Similar sparse BIPs have been extensively studied in the literature and are known to be challenging. In particular, the best known algorithm for sparse BIPs that can provably recovery and require measurements that scale quadratically, up to log factors, with respect to the sparsity levels, and . Recent work on sparse rank-1 matrix recovery problem in Lee et al. [2017], which is motivated by considering the lifted version of the sparse blind deconvolution problem, provides an exact recovery guarantee of the sparse vectors and that satisfy a ‘peakiness’ condition, i.e. for some absolute constant , using a non-convex approach. This result holds with high probability for random measurements if the number of measurements satisfy , up to a log factor. For general vectors that are not constrained to a class of sparse vectors like those satisfying the peakiness condition, the same work shows exact recovery is possible if the number of measurements satisfy , up to a log factor.

The main contribution of the present paper is to introduce a convex program in the natural parameter space for the sparse BIP described in (1) and show that it can stably recover sparse vectors, up to the global scaling ambiguity, provided they satisfy a comparable-effective-sparsity condition. Precisely, we say the sparse vectors and are -comparable-effective-sparse, for some , if there exists an that satisfies and

(2) |

Note that the ratio of the to norm of a vector is an approximate notion of the square root of the sparsity of a vector. Additionally, we assume the noise in (1) does not change the sign of the measurements. Specifically, we consider noise such that

(3) |

Under the assumptions that the sparse vectorss satisfy (2) and the noise is small as in (3), we show that the convex program stably recovers the unknown vectors, up to the global scaling ambiguity, with high probability provided and are random and the number measurements satisfy , up to log factors. Similar to the result in Lee et al. [2017], this results has optimal sample complexity, up to log factors, for a class of sparse signals, namely those with comparable sparsity levels.

### 1.1 Convex program and main results

We introduce a convex program written in the natural parameter space for the bilinear inverse problem described in (1). Let with and . Let , and , where and are the th row of and , respectively, and is the th entry of . Also, let and . The convex program we consider to recover is the -BranchHull program

(4) | ||||

The motivation for the feasible set in program (4) follows from the observation that each measurement defines a hyperbola in . As shown in Figure 3, the sign information restricts to one of the branches of the hyperbola. The feasible set in (4) corresponds to the convex hull of particular branches of the hyperbola for each . This also implies that the feasible set is convex as it is the intersection of convex sets.

The objective function in (4) is an minimization over that finds a point with . Geometrically, this happens as the solution lies at the intersection of the -ball and the hyperbolic curve (constraint) as shown in Figure 3. So, the minimizer of (4) in the noiseless case, under successful recovery, is .

We now present our main result which states that the -BranchHull program (4) stably recovers and , up to the global scaling ambiguity, in the presence of small dense noise. We show that if and live in random subspaces with and containing at most and non zero entries, and satisfy comparable-effective-sparsity condition (2), the noise satisfy (3), and there are at least number of measurements, then the minimizer of the -BranchHull program is close to the bilinear ambiguity set with high probability. Moreover, in the case noiseless case, the minimizer of the -BranchHull is the point on this bilinear ambiguity set with equal norm with high probability, i.e. the minimizer is with high probability.

###### Theorem 1 (Noisy recovery).

Fix . Fix that are -comparable-effective-sparse as defined in (2) with , and , . Let and have i.i.d. entries. Let contain measurements that satisfy (1) with noise satisfying (3). If for any then the -BranchHull program (4) recovers that satisfies

with probability at least . Here, and are constants that depend quadratically on and , respectively. Furthermore, if .

Theorem 1 shows that exact recovery, up to the global scaling ambiguity, of sparse vectors that satisfy the comparable-effective-sparsity condition is possible if the number of measurements satisfy . This result is optimal, up to the log factors. Numerical simulation on synthetic data verify Theorem 1 in the noiseless case and show that the constant in the sample complexity is not too large. We also present the results of numerical simulation on two real images which shows that a total variation reformulation of the convex program (4) can successfully recover the piecewise constant part of an otherwise distorted image.

### 1.2 Prior art for bilinear inverse problems

Recent approaches to solving bilinear inverse problems like blind deconvolution and phase retrieval include lifting the problems into a low rank matrix recovery task or to formulate a convex or non-convex optimization programs in the natural parameter space. Lifting transforms the problem of recovering and from bilinear measurements to the problem of recovering a low rank matrix from linear measurements. The low rank matrix can then be recovered using a semidefinite program. The result in Ahmed et al. [2014] for blind deconvolution showed that if and are coefficients of the target signals with respect to Fourier and Gaussian bases, respectively, then the lifting method successfully recovers the low rank matrix. The recovery occurs with high probability under near optimal sample complexity. Unfortunately, solving the semidefinite program is prohibitively computationally expensive because they operate in high-dimension space. Also, it is not clear whether or not it is possible to enforce additional structure like sparsity of and in the lifted formulation in a way that allows optimal sample complexity [Li and Voroninski, 2013, Oymak et al., 2015].

In comparison to the lifting approach for blind deconvolution and phase retrieval, methods that formulate an algorithm in the natural parameter space, such as alternating minimization and gradient descent based method, are computationally efficient and also enjoy rigorous recovery guarantees under optimal or near optimal sample complexity [Li et al., 2016, Candès et al., 2015, Netrapalli et al., 2013, Sun et al., 2016, Lee et al., 2017]. The work in Lee et al. [2017] for sparse blind deconvolution is based on alternating minimization. In the paper, the authors use an alternating minimization that successively approximate the sparse vectors while enforcing the low rank property of the lifted matrix. However, because these methods are non-convex, convergence to the global optimal requires a good initialization [Tu et al., 2015, Chen and Candes, 2015, Li et al., 2016].

Other approaches that operate in the natural parameter space include PhaseMax [Bahmani and Romberg, 2016, Goldstein and Studer, 2016] and BranchHull [Aghasi et al., 2016b]

. PhaseMax is a linear program which has been proven to find the target signal in phase retrieval under optimal sample complexity if a good anchor vector is available. As with alternating minimization and gradient descent based approach, PhaseMax requires a good initializer to even be stated. In PhaseMax the initialization is part of the objective function but in alternating minimization the initialization is part of the algorithmic implementation. BranchHull is a convex program which solves the BIP described in (

1) in the dense signal case under optimal sample complexity. Like the -BranchHull presented in this paper, BranchHull does not require an initialization but does require the sign information of the signals.The -BranchHull program (4) combines strengths of both the lifting method and the gradient descent based method. Specifically, the -BranchHull program is a convex program that operates in the natural parameter space, without a need for an initialization. These strengths are achieved at the cost of the sign information of the target signals and , which can be justified in imaging applications where the goal might be to recover pixel values of a target image, which are non-negative.

### 1.3 Discussion and extensions

The -BranchHull formulation is inspired by the BrachHull formation introduced in Aghasi et al. [2016b], which is a novel convex relaxation for the bilinear recovery of the entrywise product of vectors with known signs, and share many of its advantages and drawbacks. Like in BranchHull, -BranchHull finds a point in the convex feasibility set that is closest to the origin. The important difference between these formulations is that BranchHull finds the point with the least norm while -BranchHull finds the point with the least -norm. Another difference is that BranchHull enjoys recovery guarantee when those vectors belong to random real subspaces while -BranchHull enjoys recovery guarantee under a much weaker condition of those vectors admitting a sparse representation with respect to known dictionaries.

Similar to BranchHull, -BranchHull is a flexible and can altered to tolerate large outliers. In this paper, we show that the -BranchHull formulation is stable to small dense noise. However, as stated formulation (4) is not robust to outliers. This is because the formulations is particularly susceptible to noise that changes the sign of even a single measurement. For the bilinear inverse problem as described in (1) with small dense noise and arbitrary outliers, we propose the following robust -BranchHull program by adding a slack variable.

-RBH: | (5) | |||

For measurements with incorrect sign, the corresponding slack variables shifts the feasible set so that the target signal is feasible. In the outlier case, the penalty promotes sparsity of slack variable . We implement a slight variation of the above program to remove distortions from an otherwise piecewise constant signal. In the case where is a piecewise constant signal, is a distortion signal and is the distorted signal, the total variation version (6) of the robust BranchHull program (5), under successful recovery, produces the piecewise constant signal , up to a scaling.

(6) | ||||

In (6), TV is a total variation operator and is the norm of the vector containing pairwise difference of neighboring elements of the target signal . We implement (6) to remove distortions from images in Section 3 and leave detailed theoretical analysis of robust BranchHull (5) and its variant (6) to future work. It would also be interesting to develop convex relaxations in the natural parameter space that do not require sign information and to extend the analysis to the case when the phases of complex vectors are known and to the case of deterministic dictionaries instead of random dictionaries. All of these directions are left for future research.

### 1.4 Organization of the paper

The remainder of the paper is organized as follows. In Section 1.5, we present notations used throughout the paper. In Section 2, we present an Alternating Direction of Multipliers implementation of robust -BranchHull program (5). In Section 3, we observe the performance of -BranchHull on synthetic random data and natural images. In Section 4, we present the proof of Theorem 1.

### 1.5 Notation

Vectors and matrices are written with boldface, while scalars and entries of vectors are written in plain font. For example, is the the entry of the vector . We write as the vector of all ones with dimensionality appropriate for the context. We write as the identity matrix. For any , let such that . For any matrix , let be the Frobenius norm of . For any vector , let be the number of non-zero entries in . For and , is the corresponding vector in , and .

## 2 Algorithm

In this section, we present an Alternating Direction Method of Multipliers (ADMM) implementation of an extension of the robust -BranchHull program (5). The ADMM implementation of the -BranchHull program (4) is similar to the ADMM implementation of (7) and we leave it to the readers. The extension of the robust -BranchHull program we consider is

(7) | |||

where . The above extension reduces to the robust -BranchHull program if . Recalling that and , we form the block diagonal matrices and and define the vectors

Using this notation, our convex program can be compactly written as

Here is the convex feasible set of . Introducing a new variable the resulting convex program can be written as

We may now form the scaled ADMM through the following steps

(8) | ||||

(9) | ||||

(10) |

which are followed by the dual updates

We would like to note that the first three steps of the proposed ADMM scheme can be presented in closed form. The update in (8) is the following projection

where is the projection of onto . Details of computing the projection onto are presented in Appendix 2.1. The update in (9) can be written in terms of the soft-thresholding operator

where for ,

and is the th entry of . Finally, the update in (10) takes the following form

In our implementation of the ADMM scheme, we initialize the algorithm with the , , .

### 2.1 Evaluation of the Projection Operator

Given a point , in this section we focus on deriving a closed-form expression for , where

is the convex feasible set of . It is straightforward to see that the resulting projection program decouples into convex programs in as

(11) |

Throughout this derivation we assume that (derivation of the projection for the case is easy) and as a result of which the second constraint is never active (because then and the first constraint requires that ). We also consistently use the fact that and are signs and nonzero.

Forming the Lagrangian as

along with the primal constraints, the KKT optimality conditions are

(12) | ||||

(13) | ||||

(14) | ||||

(15) | ||||

(16) |

We now proceed with the possible cases.

Case 1. :

In this case we have and this result would only be acceptable when and .

Case 2. , :

In this case the first feasibility constraint of (11) requires that , which is not possible when .

Case 3. , :

Similar to the previous case, this cannot happen when .

Case 4. , :

In this case we have

Now combining this observation with (12) and (14) yields

(17) |

and therefore

(18) |

Similarly, (13) yields

(19) |

Knowing that , can be eliminated between (17) and (19) to generate the following forth order polynomial equation in terms of :

After solving this 4-th order polynomial equation (e.g., the root command in MATLAB) we pick the real root which obeys

(20) |

Note that the second inequality in (20) warrants nonnegative values for thanks to (18). After picking the right root, we can explicitly obtain using (19) and calculate the solutions and using (12) and (14). Technically, in using the ADMM scheme for each we solve a forth-order polynomial equation and find the projection.

## 3 Numerical Experiments

In this section, we provide numerical experiments on synthetic random data and natural images where the signals follow the model in (1). We first show a phase portrait that verifies Theorem 1 in the noiseless case. Consider the following measurements: fix , and let . Let the target signal be such that both and have non-zero entries with the nonzero indices randomly selected and set to . Let and be the number of nonzero entries in and , respectively. Let and such that and . Lastly, let and .

Figure 4 shows the fraction of successful recoveries from 10 independent trials using (4) for the bilinear inverse problem (1) from data as described above. Let be the output of (4) and let be the candidate minimizer. We solve (4) using an ADMM implementation similar to the ADMM implementation detailed in Section 2 with the step size parameter . For each trial, we say (4) successfully recovers the target signal if . Black squares correspond to no successful recovery and white squares correspond to 100% successful recovery. The line corresponds to with and indicates that the sample complexity constant in Theorem 1, in the noiseless case, is not very large.

We now show the result of using the total variation BranchHull program (6) to remove distortions from real images . In the experiments, The observation is the column-wise vectorization of the image , the target signal is the vectorization of the piecewise constant image and corresponds to the distortions in the image. We use (6) to recover piecewise constant target images like in the foreground of Figure 4(a) with TV, where in block form. Here, and with

We use the total variation BranchHull program on two real images. The first image, shown in Figure 4(a), was captured using a camera and resized to a image. The measurement is the vectorization of the image with . Let be the identity matrix. Let be the inverse DCT matrix. Let with the first column set to and remaining columns randomly selected from columns of without replacement. The matrix is scaled so that . The vector of known sign is set to . Let be the output of (6) with and . Figure 4(b) corresponds to and shows that the object in the center was successfully recovered.

The second real image, shown in Figure 4(c), is an image of rice grains. The size of the image is . The measurement is the vectorization of the image with . Let be the identity matrix. Let with the first column set to . The remaining columns of are sampled from Bessel function of the first kind with each column corresponding to a fixed . Specifically, fix with . For each remaining column of , fix and let . The matrix is scaled so that . The vector of known sign is set to . Let be the output of (6) with and . Figure 4(d) corresponds to .

## 4 Proof Outline

In this section, we provide a proof of Theorem 1 by considering a program similar to -BrachHull program (4) with a different representation of the constraint set. Let , and define a convex function

where is a piecewise constant function such that

(21) |

Let with . We note that

is also a convex function because its epigraph is a convex set. The epigraph is convex because it is the inverse image of a convex set over a linear map. Define a one-sided loss function

where denotes the positive side. We analyze the following generalized version of the -BranchHull program:

(22) | ||||

Program (22) is equivalent to the -BranchHull in the sense that the objective and the constraint set of both the programs are the same. Lemma 1 shows that the set defined by constraints with and the set defined by constraints are the same set.

###### Lemma 1.

Fix such that and . Let , and . Let contain measurements that satisfy (1). The set is equal to the set

###### Proof.

Fix an . It is sufficient to show that the set is equal to the set . Consider a . We have

(23) | ||||

(24) |

where (23) holds because and and (24) holds because . Thus, . Now, consider a . W.L.O.G. assume . The reverse implications above implies . Also, because . If , then . If , then

So, because implies . Thus,