# Finding sparse solutions of systems of polynomial equations via group-sparsity optimization

The paper deals with the problem of finding sparse solutions to systems of polynomial equations possibly perturbed by noise. In particular, we show how these solutions can be recovered from group-sparse solutions of a derived system of linear equations. Then, two approaches are considered to find these group-sparse solutions. The first one is based on a convex relaxation resulting in a second-order cone programming formulation which can benefit from efficient reweighting techniques for sparsity enhancement. For this approach, sufficient conditions for the exact recovery of the sparsest solution to the polynomial system are derived in the noiseless setting, while stable recovery results are obtained for the noisy case. Though lacking a similar analysis, the second approach provides a more computationally efficient algorithm based on a greedy strategy adding the groups one-by-one. With respect to previous work, the proposed methods recover the sparsest solution in a very short computing time while remaining at least as accurate in terms of the probability of success. This probability is empirically analyzed to emphasize the relationship between the ability of the methods to solve the polynomial system and the sparsity of the solution.

## Authors

• 8 publications
• 5 publications
03/29/2013

### A problem dependent analysis of SOCP algorithms in noisy compressed sensing

Under-determined systems of linear equations with sparse solutions have ...
03/26/2018

### Sparse Recovery over Graph Incidence Matrices: Polynomial Time Guarantees and Location Dependent Performance

Classical results in sparse recovery guarantee the exact reconstruction ...
01/13/2020

### Solving Decomposable Sparse Systems

Amendola et al. proposed a method for solving systems of polynomial equa...
07/04/2012

### On unified view of nullspace-type conditions for recoveries associated with general sparsity structures

We discuss a general notion of "sparsity structure" and associated recov...
12/05/2019

### Towards identification of explicit solutions to overdetermined systems of differential equations

The authors proposed a general way to find particular solutions for over...
11/06/2020

### Sparse Approximate Solutions to Max-Plus Equations with Application to Multivariate Convex Regression

In this work, we study the problem of finding approximate, with minimum ...
11/23/2020

### A Closed-Form Solution to Local Non-Rigid Structure-from-Motion

A recent trend in Non-Rigid Structure-from-Motion (NRSfM) is to express ...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

When faced with an underdetermined system of equations, one typically applies a regularization strategy in order to recover well-posedness. The choice of regularization depends on the particular application at hand and should be made to drive the solution towards desired properties. In the absence of precise goals, the most popular choice favors solutions with minimum -norm. However, an alternative becoming more and more popular is to search for sparse solutions (which often have non-minimal -norm). In the case of a system of linear equations, this alternative has been investigated in numerous works, see, e.g., [Bruckstein et al., 2009] for a review, and entails many applications of great importance, particularly for signal processing where it goes under the name of compressed sensing/sampling [Donoho, 2006, Candès, 2006]. Formally, finding the sparsest solutions of linear systems can be written as the minimization, under the constraints of the linear system, of the number of nonzero variables, which is a nonsmooth, nonconvex and NP-hard problem. Two main approaches can be distinguished to tackle such problems. The first one, known as Basis Pursuit (BP), relies on a convex relaxation based on the minimization of an -norm, while the second one applies a greedy strategy to add nonzero variables one-by-one. Many results on the convergence of these methods to the sparsest solution are available in the literature [Bruckstein et al., 2009, Eldar and Kutyniok, 2012, Foucart and Rauhut, 2013].

More recently, a few works introduced extensions of this problem to nonlinear equations. In particular, the first greedy approaches appeared in [Blumensath and Davies, 2008, Blumensath, 2013, Beck and Eldar, 2013, Ehler et al., 2014], while BP algorithms were developed in [Ohlsson et al., 2013b] to find sparse solutions of systems of quadratic equations and in [Ohlsson et al., 2013a] for more general nonlinear equations. Formally, these problems can be formulated as

 minx∈Rn ∥x∥0 (1) s.t. yi=fi(x),i=1,…,N,

where , are nonlinear functions and denotes the -pseudo-norm of , i.e, the number of nonzero components .

Here, we focus on the case where the ’s in (1) are polynomial functions of maximal degree :

 yi=fi(x)=pdi(x)=bi+M∑k=1aikxαk,i=1,…,N, (2)

where is the set of multi-indexes with and are the corresponding monomials. This includes the particular case considered in [Ohlsson et al., 2013b] with , while in [Ohlsson et al., 2013a] this setting is used to deal with the more general case via the Taylor expansions of the nonlinear functions . Note that the formulation in (1)–(2) also entails cases outside of the quasi-linear setting considered in [Ehler et al., 2014].

The present paper proposes a new BP approach to solve (1)–(2), which, in comparison with the previous works [Ohlsson et al., 2013b, Ohlsson et al., 2013a], is more simple and amounts to solving an easier optimization problem. More precisely, the method proposed in Sect. 2.1 relies on a simple change of variable sufficient to result in an efficient algorithm implemented as a classical -minimization problem. However, the structure of the polynomials is discarded and the solution may not satisfy the original polynomial constraints. Note that this change of variable is also found in [Ohlsson et al., 2013a], though with constraints handled in a more complex manner, and closely related to the lifting technique of [Ohlsson et al., 2013b] for quadratic BP and the one of [Vidal et al., 2005] for subspace clustering.

The method in [Ohlsson et al., 2013b] enforces the structure of the quadratic polynomials via rank constraints, which leads to optimization problems with additional levels of relaxation and the introduction of a parameter tuning the trade-off between the minimization of the -norm on the one hand and the satisfaction of the rank constraint on the other hand. In [Ohlsson et al., 2013a], the structure of higher-degree polynomials is enforced by a set of quadratic constraints on the monomials111For example, consider the monomials , , , , then the structure of and is enforced by and

. But note that since these constraints are relaxed in the final formulation, the estimation can yield

, which then recursively implies that all monomial constraints involving are meaningless., which are then relaxed as in [Ohlsson et al., 2013b].

Instead, the method proposed in Sect. 2.2 implements the structural knowledge via group-sparsity. This results in an -minimization, i.e., a second-order cone program (SOCP) which can be solved more efficiently. In addition, this SOCP formulation is easily extended in Sect. 2.4 to benefit from reweighting techniques for improving the sparsity of the solution.

The conditions for exact recovery of the proposed BP methods are analyzed in Sect. 2.3. In particular, we show that though the simple -minimization does not include structural constraints, exact recovery occurs for sufficiently sparse cases. A similar condition is proved for the -minimization based on group-sparsity.

The greedy approach is discussed in Sect. 3, where two variants are proposed: an exact algorithm for solving the group-sparsity optimization problem in small-scale cases and an approximate one which remains efficient in higher-dimensional settings. Previous greedy approaches [Blumensath, 2013, Beck and Eldar, 2013] considered the problem in its sparsity-constrained form, where the sum of squared errors over the equations (2) is minimized subject to for an a priori fixed . The iterative hard thresholding of [Blumensath, 2013] is a gradient descent algorithm with an additional projection onto the feasible set at each iteration via a simple thresholding operation. In [Beck and Eldar, 2013], this is interpreted as a fixed point iteration enforcing a necessary (but not sufficient) optimality condition, which requires a well-chosen step-size to converge satisfactorily. In particular, the step-size must be an upper bound on the Lipschitz constant of the gradient of the objective function, which however is not (globally) Lipschitz continuous for polynomials in (2) with degree . The sparse-simplex algorithm of [Beck and Eldar, 2013] is a coordinate descent method which enjoys similar but less restrictive convergence properties while being parameter-free. However, for polynomial equations as in (2), each iteration requires solving several one-dimensional minimizations of a polynomial of degree , which becomes difficult for . On the contrary, the proposed greedy algorithms remain simple thanks to the group-sparse and linearized formulation of the problem: each iteration requires only solving least-squares problems.

Extensions of the methods are discussed in Sections 4 and 5. In particular, Section 4 deals with the issue of purely nonlinear polynomials, for which the solution to (1)–(2) cannot be estimated as directly as for polynomials involving linear monomials, but which arise in important applications such as phase retrieval [Kohler and Mandel, 1973, Gonsalves, 1976, Gerchberg and Saxton, 1972, Fienup, 1982, Marchesini, 2007, Candès et al., 2013b, Candès et al., 2013a]. Then, the case where the equations (2) are perturbed by noise is considered in Sect. 5. In particular, the analysis provides stable recovery results for polynomial BP denoising in the flavor of the one obtained in [Donoho et al., 2006] for linear BP denoising.

Finally, numerical experiments in Sect. 6 show the efficiency of the proposed methods and extensions. Results are in line with the ones found in classical sparse optimization with linear constraints. In particular, all methods can recover the sparsest solution in sufficiently sparse cases and the greedy approach is the fastest while the BP methods based on convex relaxations benefit from a slightly higher probability of success.

## 2 Polynomial basis pursuit

This section develops two basis pursuit approaches to find sparse solutions of systems of polynomial equations via the minimization of an -norm for the first one and of a mixed -norm for the second one. The methods are developed under the following assumption, which will be relaxed in Sect. 4.

###### Assumption 1.

For all such that for determined such that for and for .

In particular, Assumption 1 ensures that the polynomials include a linear part, or more precisely, that for any variable , , the monomial has a nonzero coefficient in at least one of the polynomials.

### 2.1 Polynomial basis pursuit via ℓ1-minimization

We start by rewriting the constraints in (1)–(2) as

 yi=fi(x)=bi+M∑k=1aikxαk=aTiϕ(x)+bi, (3)

where and is a mapping that computes all the monomials of degree , , with variables. Note that is embedded in as the components corresponding to the multi-indexes , , such that for and for . Assuming that these are the first components of , i.e., , we also define the (linear) inverse mapping , such that , as

 ϕ−1(ϕ)=[In0]ϕ,

where is the -by-identity matrix.

The high-dimensional lifting by allows us to recast (1)–(2) as a standard (i.e., linear) problem in sparse optimization:

 minϕ∈RM ∥ϕ∥0 (4) s.t. Aϕ=y−b,

where , and

is replaced by an unstructured vector

. This constitutes the first level of relaxation in the proposed approach, where the components of are not constrained to be interdependent monomials. While this yields a rather crude approximation, it will serve as the basis for the refined approach of Sect. 2.2, where additional structure will be imposed on .

The second level of relaxation comes from the BP approach, in which problems such as (4) are typically solved via the convex relaxation

 ^ϕ=argminϕ∈RM ∥Wϕ∥1 (5) s.t. Aϕ=y−b,

where , with the th column of , is a diagonal matrix of precompensating weights. Then, for polynomial BP, an estimate of the solution to (1)–(2) can be easily computed under Assumption 1 as .

Problem (5

) can be formulated as a linear program and solved by generic interior-point algorithms in polynomial time, while it can also be solved by more efficient and dedicated algorithms, see, e.g.,

[Foucart and Rauhut, 2013, Chapter 15]. However, the complexity with respect to remains exponential because of the dimension which scales exponentially with .

The literature on basis pursuit and compressed sensing [Bruckstein et al., 2009, Foucart and Rauhut, 2013] tells us that the sparser the solution to (4) is, the more likely the convex relaxation (5) is to yield its recovery. Here, by construction we know that there is at least a very sparse vector satisfying the constraints: with the sparse solution to (1)–(2), the sparsity level is better than , as stated by Proposition 1 below. Note that a better bound implying an increased level of sparsity depending on for very sparse cases is derived in Appendix A.

###### Proposition 1.

Let the mapping be defined as above. Then, the vector is at least as sparse as the vector in the sense that the inequality

 ∥ϕ(x)∥0M≤∥x∥0n

holds for all .

Proposition 1 implies that if (1)–(2) has a sparse solution then so does (4). To prove Proposition 1, we first need the following lemma.

###### Lemma 1.

For all triplet such that , the inequality

 1a(a+c−1c)≥1b(b+c−1c)

holds.

###### Proof.

On the one hand, we have

 1a(a+c−1c)=(a+c−1)!ac!(a−1)!=1ac!c−1∏i=0(a+i)=1c!c−1∏i=1(a+i)

and a similar expression with instead of . On the other hand, with , we have

 ∀i∈N, a+i≥b+i⇒1c!c−1∏i=1(a+i)≥1c!c−1∏i=1(b+i)

which then yields the sought statement.

We now give the proof of Proposition 1.

###### Proof.

By construction, the number of nonzeros in is equal to the sum over , , of the number of monomials of degree in variables. This yields

 ∥ϕ(x)∥0M=∑dq=1(∥x0∥0+q−1q)∑dq=1(n+q−1q)=∥x∥0n1∥x∥0∑dq=1(∥x0∥0+q−1q)1n∑dq=1(n+q−1q)

Since , Lemma 1 gives the bound

 1∥x∥0d∑q=1(∥x0∥0+q−1q)≤1nd∑q=1(n+q−1q)

from which the statement follows.

### 2.2 Polynomial basis pursuit via ℓ1/ℓ2-minimization (group sparsity)

The approach proposed above relies on a rather crude approximation. Thus, the polynomial equations are not guaranteed to be satisfied by the solution , due to the factorization of the polynomial that may not hold for . In other words, the linearization in (3) together with the direct optimization of discard the desired structure for . In practice, the solution can be checked a posteriori with , . But in order to increase the probability of obtaining a satisfactory solution, i.e., one which satisfies the original polynomial constraints, we must embed the structure of the polynomial in the problem formulation.

Let us define the index sets

 Ij={k∈{1,…,M}:(αk)j≠0},j=1,…,n. (6)

Then, the structural information we aim at embedding is given by the following implication:

 ∀j∈{1,…,n}, ∀k∈Ij,{xj=0 ⇒ (ϕ(x))k=0,(ϕ(x))k≠0 ⇒ xj≠0, (7)

which formalizes the fact that whenever a variable is zero, all monomials involving this variable must be zero. Such a structure can be favored via group-sparsity optimization, as detailed next.

Let us define the set of mappings , each computing the subset of components of involving the variable (see Appendix C for the precise value of ). Note that these mappings are nonlinear in but linear in : , where is an -by- binary matrix filled with zeros except for a 1 on each row at the th column for all . Further define the -pseudo-norm of a vector-valued sequence as the number of nonzero vectors in the sequence:

 ∥{uj}nj=1∥0=∣∣{j∈{1,…,n}:uj≠0}∣∣.

We call a group-sparse vector if is small. By construction and due to (7), a sparse leads to a group-sparse with

 ∥{Bjϕ(x)}nj=1∥0=∥x∥0.

Therefore, in order to solve (1)–(2) we search for a group-sparse solution to the linear system .

Such group-sparse solutions can be found by solving

 minϕ∈RM ∥{Wjϕ}nj=1∥0 (8) s.t. Aϕ=y−b,

where we replaced each by a weight matrix , with , without effect on the -pseudo-norm. The reason for introducing these precompensating weights will become clear in the analysis of Sect. 2.3.

Let us define the -norm of a vector-valued sequence as

 ∥{uj}nj=1∥p,q=(n∑j=1∥uj∥pq)1p.

Problem (8) can be relaxed by replacing the -pseudo-norm with an -norm. In this paper we only consider the case and , i.e.,

but other norms could be used, such as the -norm. This yields the convex relaxation222Note that since all the groups have the same number of variables, they need not be weighted by a function of the number of variables in each group.

 ^ϕ=argminϕ∈RM n∑j=1∥Wjϕ∥2 (9) s.t. Aϕ=y−b,

which is easily reformulated as a Second Order Cone Program (SOCP) that can be solved by generic software such as CVX [Grant and Boyd, 2013, Grant and Boyd, 2008] or MOSEK [Andersen and Andersen, 2000] (more efficient dedicated solvers can also be found, e.g., [Deng et al., 2011]). However, as for the -minimization (5), the complexity remains exponential with respect to the number of base variables via the dimension .

All components of the mapping involving only even degrees of base variables are nonnegative. These structural constraints can help to drive the solution towards one that correctly factorizes and corresponds to a solution of the polynomial equations. This is obtained by solving

 ^ϕ=argminϕ∈RM n∑j=1∥Wjϕ∥2 (10) s.t. Aϕ=y−b ϕk≥0, ∀k such that (αk)j is even for all j.

This optimization problem has the same complexity as (9) since we simply added nonnegativity constraints to some variables.

Other forms of prior knowledge can be easily introduced in (10). For instance, if (tight) box constraints on the base variables are available, then lower and upper bounds on all monomials can be derived. Finally, note that these structural constraints can also be added to the -minimization (5).

### 2.3 Analysis

The following derives conditions of exact recovery of the sparse solution to the system of polynomial equations via various convex relaxations. These conditions are based on the mutual coherence of the matrix as defined e.g. in [Donoho and Huo, 2001, Bruckstein et al., 2009].

###### Definition 1.

The mutual coherence of a matrix is

 μ(A)=max1≤i

In order for the mutual coherence of to be defined, we focus on the case where the following assumption holds.

###### Assumption 2.

All columns of the matrix are nonzero, i.e., , , or, equivalently, for all such that the corresponding polynomial coefficient .

Assumption 2 is slightly more restrictive than Assumption 1, which only constrains the first columns of . If Assumption 2 does not hold, the following analysis can be reproduced under Assumption 1 by considering the submatrix containing the nonzero columns of and a similarly truncated mapping . Adjustments then need to be made where numbers of columns are used, i.e., by substituting and for and . However, for the case where both Assumptions 1 and 2 do not hold, i.e., of zero columns corresponding to base variables, , , cannot be obtained by the inverse mapping . This particular case will be discussed in Sect. 4.

Note that whenever a condition requires the mutual coherence to be defined, Assumption 2 implicitly holds. In such cases, Assumption 2 will not be explicitly stated in the theorems below.

#### 2.3.1 ℓ1-minimization method

The result below characterizes a case where the simple -minimization method is sufficient to solve the sparse optimization problem (1)–(2).

###### Theorem 1.

Let denote the unique solution to (1)–(2). If the inequality

 ∥x0∥0

holds, then the solution to (5) is unique and equal to , thus providing .

###### Proof.

Assume (1)–(2) has a unique solution . Then, has a solution with a sparsity bounded by Proposition 1 as

 ∥ϕ0∥0≤Mn∥x0∥0.

But, according to Theorem 7 in [Bruckstein et al., 2009], if

 ∥ϕ0∥0<12(1+1μ(A))

then, on the one hand, is the sparsest solution to , and on the other hand, it is also the unique solution to (5). Thus, if (11) holds, is the unique solution to both (4) and (5), i.e., . Since is given by the first components of and by the ones of , this completes the proof.

Other less conservative conditions can be similarly obtained by considering the exact value of or tighter bounds (see Appendix B), but these do not take the form of a simple inequality on . Another condition for very sparse cases (with ) can be similarly obtained by using Proposition 2 in Appendix A instead of Proposition 1.

#### 2.3.2 ℓ1/ℓ2-minimization method

The first result below shows that the group-sparse problem (8) can be used as a proxy for the polynomial problem (1)–(2).

###### Theorem 2.

If the solution to (8) is unique and yields such that is a solution to the system of polynomial equations (2), then is the sparsest solution to the system of polynomial equations, i.e., the unique solution to (1)–(2).

###### Proof.

Assume there is an solution to the polynomial system (2) and at least as sparse as . Then

 Aϕ(x0)=y−b

and

 ∥{Wjϕ(x0)}nj=1∥0=∥x0∥0≤∥x∗∥0≤∥{Wjϕ∗}nj=1∥0,

which contradicts the fact that is the unique solution to (8) unless and .

The next result provides a condition on the sparsity of the solution to (1)–(2) under which it can be recovered by solving the convex problem (9) or its variant with nonnegativity constraints (10). This result requires the following lemma.

###### Lemma 2.

Let be an matrix with mutual coherence as defined in Definition 1. Let be the -diagonal matrix of entries . Then, for all and , the bound

 w2iδ2i≤μ2(A)1+μ2(A)∥Wδ∥22 (12)

holds.

###### Proof.

For all , we have , which further implies and

 −Wδ=(W−1ATAW−1−I)Wδ.

Then, we have

 w2iδ2i=(Wδ)2i =((W−1ATAW−1−I)Wδ)2i =(M∑j=1(W−1ATAW−1−I)i,j(Wδ)j)2 ≤M∑j=1(W−1ATAW−1−I)2i,j(Wδ)2j.

Note that is a normalized matrix with ones on the diagonal and off diagonal -entries equal to and bounded by via Definition 1. Thus, we obtain

 w2iδ2i≤μ2(A)∑j≠iw2jδ2j

and, by adding to both sides, the bound

 w2iδ2i≤μ2(A)1+μ2(A)M∑j=1w2jδ2j,

which is precisely (12).

###### Theorem 3.

Let be a solution to the polynomial equations (2) and . If the condition

 ∥x0∥0<12√m√1+1μ2(A),

where is the number of components of associated to a variable, holds, then is the unique solution to both (9) and (10).

###### Proof.

The vector is the unique solution to (9) if the inequality

 n∑j=1∥Wj(ϕ0+δ)∥2>n∑j=1∥Wjϕ0∥2

holds for all satisfying . The inequality above can be rewritten as

 ∑j∈I0∥Wjδ∥2+∑j∉I0∥Wj(ϕ0+δ)∥2−∥Wjϕ0∥2>0,

where . By the triangle inequality, , this condition is met if

 ∑j∈I0∥Wjδ∥2−∑j∉I0∥Wjδ∥2>0

or

 n∑j=1∥Wjδ∥2−2∑j∉I0∥Wjδ∥2>0. (13)

By defining as the set of indexes corresponding to nonzero columns of , Lemma 2 yields

 ∥Wjδ∥22 =∑i∈Gjw2iδ2i≤mjμ2(A)1+μ2(A)∥Wδ∥22,

where is the number of components of associated to a variable . Due to the fact that , we also have

 ∥Wδ∥22=M∑i=1w2iδ2i≤n∑k=1∑i∈Gkw2iδ2i=n∑k=1∥Wkδ∥22≤(n∑k=1∥Wkδ∥2)2,

 ∥Wjδ∥22 ≤mμ2(A)1+μ2(A)(n∑k=1∥Wkδ∥2)2.

Introducing this result in (13) gives the condition

 n∑j=1∥Wjδ∥2−2(n−|I0|)μ(A)√m√1+μ2(A)n∑k=1∥Wkδ∥2>0.

Finally, given that , this yields

 n∑j=1∥Wjδ∥2−2∥x0∥0μ(A)√m√1+μ2(A)n∑k=1∥Wkδ∥2>0. (14)

or, after dividing by and rearranging the terms,

 ∥x0∥0<√1+μ2(A)2μ(A)√m,

which can be rewritten as in the statement of the Theorem.

It remains to prove that is also the unique solution to (10) in the case where this condition is satisfied and is the unique solution to (9). To see this, note that is by definition a feasible point of (10). In addition, the feasible set of (10) is included in the one of (9), while the problems (9) and (10) share the same cost function. Thus, if is the unique solution to (9), there cannot be another feasible with lower or equal cost function value for (10).

###### Corollary 1.

Let be a solution to the polynomial equations (2) and be the number of components of associated to a variable. If the condition

 ∥x0∥0<12√m√1+1μ2(A)

holds, then is the unique solution to the minimization problem (1)–(2) and it can be computed as with the solution to either (9) or (10).

###### Proof.

Assume there exists another solution to (1)–(2), and thus with . Then, Theorem 3 implies that both and are unique solutions to either (9) or (10), and thus that (where is similarly defined as the solution to either (9) or (10) in this case). But this contradicts the definition of the mapping implying whenever . Therefore, the assumption cannot hold and is the unique solution to (1)–(2), while .

###### Theorem 4.

Let be a solution to (9) and be the number of components of associated to a variable. If the condition

 ∥{Wj^ϕ}nj=1∥0<12√m√1+1μ2(A)

holds, then is the unique solution to both (8) and (9). If, in addition, satisfies the polynomial constraints (2), then is the unique solution to (1)–(2).

###### Proof.

By following steps similar to those in the proof of Theorem 3 with instead of , we obtain (by replacing by in (14)) that is the unique solution to (9). Then, let be a solution to (8). This implies and, under the condition of the Theorem,

 ∥{Wjϕ∗}nj=1∥0<12√m√1+1μ2(A). (15)

By following similar steps again, we obtain that is also the unique solution to (9) and thus that is the unique solution to (8).

Finally, since is the unique solution to (8), if satisfies the polynomial equations (2), Theorem 2 implies that is the solution to (1)–(2).

In comparison with Theorem 3, Theorem 4 provides a condition that only depends on the estimate obtained by solving (9) rather than on the sought solution.

### 2.4 Enhancing sparsity

In practice, convex relaxations such as the ones described above provide a good step towards the solution but might fail to yield the exact solution with sufficient sparsity. In such cases, it is common practice to improve the sparsity of the solution by repeating the procedure with a well-chosen weighting of the variables as described in [Candès et al., 2008, Le et al., 2013]. These techniques can be directly applied to improve the -minimization method of Sect. 2.1 while they are adapted below to group-sparsity as considered in Sect. 2.2.

#### 2.4.1 Iterative reweighting

The classical reweighting scheme of [Candès et al., 2008] for sparse recovery improves the sparsity of the solution by solving a sequence of linear programs. It can be adapted to the group-sparse recovery problem by iteratively solving

 ^ϕ=argminϕ∈RM n∑j=1μj∥Wjϕ∥2 (16) s.t. Aϕ=y−b ϕk≥0, ∀k such that (αk)j is even for all j,

with weights initially set to 1 and refined at each iteration by

 μj=1∥Wj^ϕ∥2+ϵ

for a given small value of .

The basic idea is to decrease the influence of groups of variables with large -norms that are assumed to be nonzero in the solution while increasing the weight of groups with small norms in order to force them towards zero.

#### 2.4.2 Selective ℓ1/ℓ2-minimization

The SM algorithm proposed in [Le et al., 2013] is another reweighted -minimization mechanism which sets a single weight at zero at each iteration. Though typically requiring more computation time than the previous approach due to a number of iterations equal to the number of non-zero elements, this algorithm can recover sparse solutions in cases where the classical reweighting scheme of [Candès et al., 2008] fails. Other advantages include the absence of a tuning parameter and the presence of a convergence analysis [Le et al., 2013]. The SM algorithm below is an adaptation of SM to group-sparse problems.

1. Initialize all weights , .

2. Solve the weighted group-sparse problem (16).

3. Find .

4. Set (to relax the sparsity constraint on the th group).

5. Repeat from Step 2 until .

Note that in this algorithm, the number of iterations is equal to the number of nonzero groups333The maximal number of iterations is the number of groups , but if the correct sparsity pattern is recovered then the algorithm stops earlier., which, for polynomial basis pursuit, is and is typically small. This results in a fast and accurate method for polynomial basis pursuit, as will be seen in Sect. 6.

## 3 Greedy approach

As mentioned in the introduction, there are two major techniques to minimize an -pseudo-norm: the BP approach and the greedy approach. We now consider the second one to solve problem (8), and more particularly develop two variants of the greedy approach: the exact method and the approximate method. The exact method is intended for small-scale problems where the number of possible combinations of base variables remains small. The approximate method is designed to circumvent this limitation and applies to much larger problems.

##### Exact greedy algorithm.

The exact method is implemented as follows, where we let if the polynomial system of equations is assumed to be feasible, and otherwise.

1. Initialize: and .

2. .

3. For all combinations of variables among :

1. Set , where the index sets are defined as in (6).

2. Build the submatrix with the columns of with index in .

3. Solve

 ϕS=argminϕ∈R|S|∥ASϕ+b−y∥22.
4. Update .

5. If , compute by setting its components of index in to the values in and the others to 0. Return .

4. If , repeat from Step 2, otherwise return an infeasibility certificate.

In Step 3.(a), corresponds to the support of when , where is a combination of indexes from 1 to . The maximal number of least squares (LS) problems to solve in Step 3.(c) is . But many of these are spared by starting with the sparsest combinations and stopping as soon as a solution is found. Thus, if a sparse solution with exists, only LS problems are solved. At iteration , the LS problem is of size , with . Thus, assuming a complexity for an LS problem of size , the overall complexity of the algorithm scales as , which is exponential in , and .

With , the exact greedy algorithm above can be slightly modified to compute all the solutions to (8), simply by letting the for loop in Step 3 complete instead of returning as soon as a solution is found. As a result, the algorithm could provide a uniqueness certificate for the solution of (8) from which Theorem 2 could be applied to conclude that the solution coincides with the unique minimizer of (1)–(2).

##### Approximate greedy algorithm.

The approximate method is similar except that it explores only a single branch of the tree of possible combinations. Its implementation uses a set of retained variables (more precisely, contains the indexes of these variables):

1. Initialize the set of nonzero variables: .

2. For all ,

1. Set , where the index sets are defined as in (6).

2. Build the submatrix with the columns of with index in .

3. Solve

 ϕj=argminϕ∈R|Sj|∥ASjϕ+b−y∥22.
3. Select the variable that minimizes the error if added to :

 k=argminj∈{1,…,n}∥ASjϕj+b−y∥22.
4. Update .

5. Repeat from Step 2 until .

6. Compute by setting its components of index in to the values in and the others to 0.

7. Return and the error .

The algorithm starts with an empty set of nonzero variables and adds a single variable to that set at each iteration. The variable retained at a given iteration is the one that, if added, leads to the minimum sum of squared errors for the equations . In Step 2.(a), corresponds to the support of when . Note that the value of the minimizer is not retained but re-estimated at the next iteration. The reason for this is that there is no guarantee that the components of correspond to monomials of base variables.

Since one base variable is added at each iteration , the number of iterations equals the sparsity of the returned solution and . In this approximate variant of the algorithm, each iteration requires solving only LS problems, which yields a total number of LS problems equal to . Then, bounding the complexity of each LS problem as for the exact greedy algorithm leads to an overall complexity bounded by .

Since we do not have a result on the convergence of the algorithm to the sparsest solution, we cannot bound except by and the worst-case complexity is exponential in . However, in practice, if the algorithm finds a sparse solution , the complexity is exponential in its sparsity , but only linear in . This is rather promising given the empirical results to be shown in Sect. 6 which suggest that this occurs with high probability.

## 4 Purely nonlinear polynomials

We now consider the case of purely nonlinear polynomials without a linear part for some variables, i.e., with , , for some (according to the ordering of the multi-indexes, the linear monomials correspond to the first coefficients with