1 Introduction
When faced with an underdetermined system of equations, one typically applies a regularization strategy in order to recover wellposedness. The choice of regularization depends on the particular application at hand and should be made to drive the solution towards desired properties. In the absence of precise goals, the most popular choice favors solutions with minimum norm. However, an alternative becoming more and more popular is to search for sparse solutions (which often have nonminimal norm). In the case of a system of linear equations, this alternative has been investigated in numerous works, see, e.g., [Bruckstein et al., 2009] for a review, and entails many applications of great importance, particularly for signal processing where it goes under the name of compressed sensing/sampling [Donoho, 2006, Candès, 2006]. Formally, finding the sparsest solutions of linear systems can be written as the minimization, under the constraints of the linear system, of the number of nonzero variables, which is a nonsmooth, nonconvex and NPhard problem. Two main approaches can be distinguished to tackle such problems. The first one, known as Basis Pursuit (BP), relies on a convex relaxation based on the minimization of an norm, while the second one applies a greedy strategy to add nonzero variables onebyone. Many results on the convergence of these methods to the sparsest solution are available in the literature [Bruckstein et al., 2009, Eldar and Kutyniok, 2012, Foucart and Rauhut, 2013].
More recently, a few works introduced extensions of this problem to nonlinear equations. In particular, the first greedy approaches appeared in [Blumensath and Davies, 2008, Blumensath, 2013, Beck and Eldar, 2013, Ehler et al., 2014], while BP algorithms were developed in [Ohlsson et al., 2013b] to find sparse solutions of systems of quadratic equations and in [Ohlsson et al., 2013a] for more general nonlinear equations. Formally, these problems can be formulated as
(1)  
s.t. 
where , are nonlinear functions and denotes the pseudonorm of , i.e, the number of nonzero components .
Here, we focus on the case where the ’s in (1) are polynomial functions of maximal degree :
(2) 
where is the set of multiindexes with and are the corresponding monomials. This includes the particular case considered in [Ohlsson et al., 2013b] with , while in [Ohlsson et al., 2013a] this setting is used to deal with the more general case via the Taylor expansions of the nonlinear functions . Note that the formulation in (1)–(2) also entails cases outside of the quasilinear setting considered in [Ehler et al., 2014].
The present paper proposes a new BP approach to solve (1)–(2), which, in comparison with the previous works [Ohlsson et al., 2013b, Ohlsson et al., 2013a], is more simple and amounts to solving an easier optimization problem. More precisely, the method proposed in Sect. 2.1 relies on a simple change of variable sufficient to result in an efficient algorithm implemented as a classical minimization problem. However, the structure of the polynomials is discarded and the solution may not satisfy the original polynomial constraints. Note that this change of variable is also found in [Ohlsson et al., 2013a], though with constraints handled in a more complex manner, and closely related to the lifting technique of [Ohlsson et al., 2013b] for quadratic BP and the one of [Vidal et al., 2005] for subspace clustering.
The method in [Ohlsson et al., 2013b] enforces the structure of the quadratic polynomials via rank constraints, which leads to optimization problems with additional levels of relaxation and the introduction of a parameter tuning the tradeoff between the minimization of the norm on the one hand and the satisfaction of the rank constraint on the other hand. In [Ohlsson et al., 2013a], the structure of higherdegree polynomials is enforced by a set of quadratic constraints on the monomials^{1}^{1}1For example, consider the monomials , , , , then the structure of and is enforced by and
. But note that since these constraints are relaxed in the final formulation, the estimation can yield
, which then recursively implies that all monomial constraints involving are meaningless., which are then relaxed as in [Ohlsson et al., 2013b].Instead, the method proposed in Sect. 2.2 implements the structural knowledge via groupsparsity. This results in an minimization, i.e., a secondorder cone program (SOCP) which can be solved more efficiently. In addition, this SOCP formulation is easily extended in Sect. 2.4 to benefit from reweighting techniques for improving the sparsity of the solution.
The conditions for exact recovery of the proposed BP methods are analyzed in Sect. 2.3. In particular, we show that though the simple minimization does not include structural constraints, exact recovery occurs for sufficiently sparse cases. A similar condition is proved for the minimization based on groupsparsity.
The greedy approach is discussed in Sect. 3, where two variants are proposed: an exact algorithm for solving the groupsparsity optimization problem in smallscale cases and an approximate one which remains efficient in higherdimensional settings. Previous greedy approaches [Blumensath, 2013, Beck and Eldar, 2013] considered the problem in its sparsityconstrained form, where the sum of squared errors over the equations (2) is minimized subject to for an a priori fixed . The iterative hard thresholding of [Blumensath, 2013] is a gradient descent algorithm with an additional projection onto the feasible set at each iteration via a simple thresholding operation. In [Beck and Eldar, 2013], this is interpreted as a fixed point iteration enforcing a necessary (but not sufficient) optimality condition, which requires a wellchosen stepsize to converge satisfactorily. In particular, the stepsize must be an upper bound on the Lipschitz constant of the gradient of the objective function, which however is not (globally) Lipschitz continuous for polynomials in (2) with degree . The sparsesimplex algorithm of [Beck and Eldar, 2013] is a coordinate descent method which enjoys similar but less restrictive convergence properties while being parameterfree. However, for polynomial equations as in (2), each iteration requires solving several onedimensional minimizations of a polynomial of degree , which becomes difficult for . On the contrary, the proposed greedy algorithms remain simple thanks to the groupsparse and linearized formulation of the problem: each iteration requires only solving leastsquares problems.
Extensions of the methods are discussed in Sections 4 and 5. In particular, Section 4 deals with the issue of purely nonlinear polynomials, for which the solution to (1)–(2) cannot be estimated as directly as for polynomials involving linear monomials, but which arise in important applications such as phase retrieval [Kohler and Mandel, 1973, Gonsalves, 1976, Gerchberg and Saxton, 1972, Fienup, 1982, Marchesini, 2007, Candès et al., 2013b, Candès et al., 2013a]. Then, the case where the equations (2) are perturbed by noise is considered in Sect. 5. In particular, the analysis provides stable recovery results for polynomial BP denoising in the flavor of the one obtained in [Donoho et al., 2006] for linear BP denoising.
Finally, numerical experiments in Sect. 6 show the efficiency of the proposed methods and extensions. Results are in line with the ones found in classical sparse optimization with linear constraints. In particular, all methods can recover the sparsest solution in sufficiently sparse cases and the greedy approach is the fastest while the BP methods based on convex relaxations benefit from a slightly higher probability of success.
2 Polynomial basis pursuit
This section develops two basis pursuit approaches to find sparse solutions of systems of polynomial equations via the minimization of an norm for the first one and of a mixed norm for the second one. The methods are developed under the following assumption, which will be relaxed in Sect. 4.
Assumption 1.
For all such that for determined such that for and for .
In particular, Assumption 1 ensures that the polynomials include a linear part, or more precisely, that for any variable , , the monomial has a nonzero coefficient in at least one of the polynomials.
2.1 Polynomial basis pursuit via minimization
We start by rewriting the constraints in (1)–(2) as
(3) 
where and is a mapping that computes all the monomials of degree , , with variables. Note that is embedded in as the components corresponding to the multiindexes , , such that for and for . Assuming that these are the first components of , i.e., , we also define the (linear) inverse mapping , such that , as
where is the byidentity matrix.
The highdimensional lifting by allows us to recast (1)–(2) as a standard (i.e., linear) problem in sparse optimization:
(4)  
s.t. 
where , and
is replaced by an unstructured vector
. This constitutes the first level of relaxation in the proposed approach, where the components of are not constrained to be interdependent monomials. While this yields a rather crude approximation, it will serve as the basis for the refined approach of Sect. 2.2, where additional structure will be imposed on .The second level of relaxation comes from the BP approach, in which problems such as (4) are typically solved via the convex relaxation
(5)  
s.t. 
where , with the th column of , is a diagonal matrix of precompensating weights. Then, for polynomial BP, an estimate of the solution to (1)–(2) can be easily computed under Assumption 1 as .
Problem (5
) can be formulated as a linear program and solved by generic interiorpoint algorithms in polynomial time, while it can also be solved by more efficient and dedicated algorithms, see, e.g.,
[Foucart and Rauhut, 2013, Chapter 15]. However, the complexity with respect to remains exponential because of the dimension which scales exponentially with .The literature on basis pursuit and compressed sensing [Bruckstein et al., 2009, Foucart and Rauhut, 2013] tells us that the sparser the solution to (4) is, the more likely the convex relaxation (5) is to yield its recovery. Here, by construction we know that there is at least a very sparse vector satisfying the constraints: with the sparse solution to (1)–(2), the sparsity level is better than , as stated by Proposition 1 below. Note that a better bound implying an increased level of sparsity depending on for very sparse cases is derived in Appendix A.
Proposition 1.
Let the mapping be defined as above. Then, the vector is at least as sparse as the vector in the sense that the inequality
holds for all .
Proposition 1 implies that if (1)–(2) has a sparse solution then so does (4). To prove Proposition 1, we first need the following lemma.
Lemma 1.
For all triplet such that , the inequality
holds.
Proof.
On the one hand, we have
and a similar expression with instead of . On the other hand, with , we have
which then yields the sought statement.
∎
We now give the proof of Proposition 1.
Proof.
By construction, the number of nonzeros in is equal to the sum over , , of the number of monomials of degree in variables. This yields
Since , Lemma 1 gives the bound
from which the statement follows.
∎
2.2 Polynomial basis pursuit via /minimization (group sparsity)
The approach proposed above relies on a rather crude approximation. Thus, the polynomial equations are not guaranteed to be satisfied by the solution , due to the factorization of the polynomial that may not hold for . In other words, the linearization in (3) together with the direct optimization of discard the desired structure for . In practice, the solution can be checked a posteriori with , . But in order to increase the probability of obtaining a satisfactory solution, i.e., one which satisfies the original polynomial constraints, we must embed the structure of the polynomial in the problem formulation.
Let us define the index sets
(6) 
Then, the structural information we aim at embedding is given by the following implication:
(7) 
which formalizes the fact that whenever a variable is zero, all monomials involving this variable must be zero. Such a structure can be favored via groupsparsity optimization, as detailed next.
Let us define the set of mappings , each computing the subset of components of involving the variable (see Appendix C for the precise value of ). Note that these mappings are nonlinear in but linear in : , where is an by binary matrix filled with zeros except for a 1 on each row at the th column for all . Further define the pseudonorm of a vectorvalued sequence as the number of nonzero vectors in the sequence:
We call a groupsparse vector if is small. By construction and due to (7), a sparse leads to a groupsparse with
Therefore, in order to solve (1)–(2) we search for a groupsparse solution to the linear system .
Such groupsparse solutions can be found by solving
(8)  
s.t. 
where we replaced each by a weight matrix , with , without effect on the pseudonorm. The reason for introducing these precompensating weights will become clear in the analysis of Sect. 2.3.
Let us define the norm of a vectorvalued sequence as
Problem (8) can be relaxed by replacing the pseudonorm with an norm. In this paper we only consider the case and , i.e.,
but other norms could be used, such as the norm. This yields the convex relaxation^{2}^{2}2Note that since all the groups have the same number of variables, they need not be weighted by a function of the number of variables in each group.
(9)  
s.t. 
which is easily reformulated as a Second Order Cone Program (SOCP) that can be solved by generic software such as CVX [Grant and Boyd, 2013, Grant and Boyd, 2008] or MOSEK [Andersen and Andersen, 2000] (more efficient dedicated solvers can also be found, e.g., [Deng et al., 2011]). However, as for the minimization (5), the complexity remains exponential with respect to the number of base variables via the dimension .
Adding structure via constraints.
All components of the mapping involving only even degrees of base variables are nonnegative. These structural constraints can help to drive the solution towards one that correctly factorizes and corresponds to a solution of the polynomial equations. This is obtained by solving
(10)  
s.t.  
This optimization problem has the same complexity as (9) since we simply added nonnegativity constraints to some variables.
2.3 Analysis
The following derives conditions of exact recovery of the sparse solution to the system of polynomial equations via various convex relaxations. These conditions are based on the mutual coherence of the matrix as defined e.g. in [Donoho and Huo, 2001, Bruckstein et al., 2009].
Definition 1.
The mutual coherence of a matrix is
In order for the mutual coherence of to be defined, we focus on the case where the following assumption holds.
Assumption 2.
All columns of the matrix are nonzero, i.e., , , or, equivalently, for all such that the corresponding polynomial coefficient .
Assumption 2 is slightly more restrictive than Assumption 1, which only constrains the first columns of . If Assumption 2 does not hold, the following analysis can be reproduced under Assumption 1 by considering the submatrix containing the nonzero columns of and a similarly truncated mapping . Adjustments then need to be made where numbers of columns are used, i.e., by substituting and for and . However, for the case where both Assumptions 1 and 2 do not hold, i.e., of zero columns corresponding to base variables, , , cannot be obtained by the inverse mapping . This particular case will be discussed in Sect. 4.
Note that whenever a condition requires the mutual coherence to be defined, Assumption 2 implicitly holds. In such cases, Assumption 2 will not be explicitly stated in the theorems below.
2.3.1 minimization method
The result below characterizes a case where the simple minimization method is sufficient to solve the sparse optimization problem (1)–(2).
Theorem 1.
Proof.
Assume (1)–(2) has a unique solution . Then, has a solution with a sparsity bounded by Proposition 1 as
But, according to Theorem 7 in [Bruckstein et al., 2009], if
then, on the one hand, is the sparsest solution to , and on the other hand, it is also the unique solution to (5). Thus, if (11) holds, is the unique solution to both (4) and (5), i.e., . Since is given by the first components of and by the ones of , this completes the proof.
∎
Other less conservative conditions can be similarly obtained by considering the exact value of or tighter bounds (see Appendix B), but these do not take the form of a simple inequality on . Another condition for very sparse cases (with ) can be similarly obtained by using Proposition 2 in Appendix A instead of Proposition 1.
2.3.2 /minimization method
The first result below shows that the groupsparse problem (8) can be used as a proxy for the polynomial problem (1)–(2).
Theorem 2.
Proof.
Assume there is an solution to the polynomial system (2) and at least as sparse as . Then
and
which contradicts the fact that is the unique solution to (8) unless and .
∎
The next result provides a condition on the sparsity of the solution to (1)–(2) under which it can be recovered by solving the convex problem (9) or its variant with nonnegativity constraints (10). This result requires the following lemma.
Lemma 2.
Let be an matrix with mutual coherence as defined in Definition 1. Let be the diagonal matrix of entries . Then, for all and , the bound
(12) 
holds.
Proof.
For all , we have , which further implies and
Then, we have
Note that is a normalized matrix with ones on the diagonal and off diagonal entries equal to and bounded by via Definition 1. Thus, we obtain
and, by adding to both sides, the bound
which is precisely (12).
∎
Theorem 3.
Proof.
The vector is the unique solution to (9) if the inequality
holds for all satisfying . The inequality above can be rewritten as
where . By the triangle inequality, , this condition is met if
or
(13) 
By defining as the set of indexes corresponding to nonzero columns of , Lemma 2 yields
where is the number of components of associated to a variable . Due to the fact that , we also have
which then leads to
Introducing this result in (13) gives the condition
Finally, given that , this yields
(14) 
or, after dividing by and rearranging the terms,
which can be rewritten as in the statement of the Theorem.
It remains to prove that is also the unique solution to (10) in the case where this condition is satisfied and is the unique solution to (9). To see this, note that is by definition a feasible point of (10). In addition, the feasible set of (10) is included in the one of (9), while the problems (9) and (10) share the same cost function. Thus, if is the unique solution to (9), there cannot be another feasible with lower or equal cost function value for (10).
∎
Corollary 1.
Proof.
Assume there exists another solution to (1)–(2), and thus with . Then, Theorem 3 implies that both and are unique solutions to either (9) or (10), and thus that (where is similarly defined as the solution to either (9) or (10) in this case). But this contradicts the definition of the mapping implying whenever . Therefore, the assumption cannot hold and is the unique solution to (1)–(2), while .
∎
Theorem 4.
Proof.
By following steps similar to those in the proof of Theorem 3 with instead of , we obtain (by replacing by in (14)) that is the unique solution to (9). Then, let be a solution to (8). This implies and, under the condition of the Theorem,
(15) 
By following similar steps again, we obtain that is also the unique solution to (9) and thus that is the unique solution to (8).
Finally, since is the unique solution to (8), if satisfies the polynomial equations (2), Theorem 2 implies that is the solution to (1)–(2).
∎
2.4 Enhancing sparsity
In practice, convex relaxations such as the ones described above provide a good step towards the solution but might fail to yield the exact solution with sufficient sparsity. In such cases, it is common practice to improve the sparsity of the solution by repeating the procedure with a wellchosen weighting of the variables as described in [Candès et al., 2008, Le et al., 2013]. These techniques can be directly applied to improve the minimization method of Sect. 2.1 while they are adapted below to groupsparsity as considered in Sect. 2.2.
2.4.1 Iterative reweighting
The classical reweighting scheme of [Candès et al., 2008] for sparse recovery improves the sparsity of the solution by solving a sequence of linear programs. It can be adapted to the groupsparse recovery problem by iteratively solving
(16)  
s.t.  
with weights initially set to 1 and refined at each iteration by
for a given small value of .
The basic idea is to decrease the influence of groups of variables with large norms that are assumed to be nonzero in the solution while increasing the weight of groups with small norms in order to force them towards zero.
2.4.2 Selective /minimization
The SM algorithm proposed in [Le et al., 2013] is another reweighted minimization mechanism which sets a single weight at zero at each iteration. Though typically requiring more computation time than the previous approach due to a number of iterations equal to the number of nonzero elements, this algorithm can recover sparse solutions in cases where the classical reweighting scheme of [Candès et al., 2008] fails. Other advantages include the absence of a tuning parameter and the presence of a convergence analysis [Le et al., 2013]. The SM algorithm below is an adaptation of SM to groupsparse problems.

Initialize all weights , .

Solve the weighted groupsparse problem (16).

Find .

Set (to relax the sparsity constraint on the th group).

Repeat from Step 2 until .
Note that in this algorithm, the number of iterations is equal to the number of nonzero groups^{3}^{3}3The maximal number of iterations is the number of groups , but if the correct sparsity pattern is recovered then the algorithm stops earlier., which, for polynomial basis pursuit, is and is typically small. This results in a fast and accurate method for polynomial basis pursuit, as will be seen in Sect. 6.
3 Greedy approach
As mentioned in the introduction, there are two major techniques to minimize an pseudonorm: the BP approach and the greedy approach. We now consider the second one to solve problem (8), and more particularly develop two variants of the greedy approach: the exact method and the approximate method. The exact method is intended for smallscale problems where the number of possible combinations of base variables remains small. The approximate method is designed to circumvent this limitation and applies to much larger problems.
Exact greedy algorithm.
The exact method is implemented as follows, where we let if the polynomial system of equations is assumed to be feasible, and otherwise.

Initialize: and .

.

For all combinations of variables among :

Set , where the index sets are defined as in (6).

Build the submatrix with the columns of with index in .

Solve

Update .

If , compute by setting its components of index in to the values in and the others to 0. Return .


If , repeat from Step 2, otherwise return an infeasibility certificate.
In Step 3.(a), corresponds to the support of when , where is a combination of indexes from 1 to . The maximal number of least squares (LS) problems to solve in Step 3.(c) is . But many of these are spared by starting with the sparsest combinations and stopping as soon as a solution is found. Thus, if a sparse solution with exists, only LS problems are solved. At iteration , the LS problem is of size , with . Thus, assuming a complexity for an LS problem of size , the overall complexity of the algorithm scales as , which is exponential in , and .
With , the exact greedy algorithm above can be slightly modified to compute all the solutions to (8), simply by letting the for loop in Step 3 complete instead of returning as soon as a solution is found. As a result, the algorithm could provide a uniqueness certificate for the solution of (8) from which Theorem 2 could be applied to conclude that the solution coincides with the unique minimizer of (1)–(2).
Approximate greedy algorithm.
The approximate method is similar except that it explores only a single branch of the tree of possible combinations. Its implementation uses a set of retained variables (more precisely, contains the indexes of these variables):

Initialize the set of nonzero variables: .

For all ,

Set , where the index sets are defined as in (6).

Build the submatrix with the columns of with index in .

Solve


Select the variable that minimizes the error if added to :

Update .

Repeat from Step 2 until .

Compute by setting its components of index in to the values in and the others to 0.

Return and the error .
The algorithm starts with an empty set of nonzero variables and adds a single variable to that set at each iteration. The variable retained at a given iteration is the one that, if added, leads to the minimum sum of squared errors for the equations . In Step 2.(a), corresponds to the support of when . Note that the value of the minimizer is not retained but reestimated at the next iteration. The reason for this is that there is no guarantee that the components of correspond to monomials of base variables.
Since one base variable is added at each iteration , the number of iterations equals the sparsity of the returned solution and . In this approximate variant of the algorithm, each iteration requires solving only LS problems, which yields a total number of LS problems equal to . Then, bounding the complexity of each LS problem as for the exact greedy algorithm leads to an overall complexity bounded by .
Since we do not have a result on the convergence of the algorithm to the sparsest solution, we cannot bound except by and the worstcase complexity is exponential in . However, in practice, if the algorithm finds a sparse solution , the complexity is exponential in its sparsity , but only linear in . This is rather promising given the empirical results to be shown in Sect. 6 which suggest that this occurs with high probability.
4 Purely nonlinear polynomials
We now consider the case of purely nonlinear polynomials without a linear part for some variables, i.e., with , , for some (according to the ordering of the multiindexes, the linear monomials correspond to the first coefficients with
Comments
There are no comments yet.