# Constrained Optimization Involving Nonconvex ℓ_p Norms: Optimality Conditions, Algorithm and Convergence

This paper investigates the optimality conditions for characterizing the local minimizers of the constrained optimization problems involving an ℓ_p norm (0<p<1) of the variables, which may appear in either the objective or the constraint. This kind of problems have strong applicability to a wide range of areas since usually the ℓ_p norm can promote sparse solutions. However, the nonsmooth and non-Lipschtiz nature of the ℓ_p norm often cause these problems difficult to analyze and solve. We provide the calculation of the subgradients of the ℓ_p norm and the normal cones of the ℓ_p ball. For both problems, we derive the first-order necessary conditions under various constraint qualifications. We also derive the sequential optimality conditions for both problems and study the conditions under which these conditions imply the first-order necessary conditions. We point out that the sequential optimality conditions can be easily satisfied for iteratively reweighted algorithms and show that the global convergence can be easily derived using sequential optimality conditions.

## Authors

• 220 publications
• 1 publication
• 4 publications
• 14 publications
10/24/2018

### Nonconvex and Nonsmooth Sparse Optimization via Adaptively Iterative Reweighted Methods

We present a general formulation of nonconvex and nonsmooth sparse optim...
10/11/2017

### Local Convergence of Proximal Splitting Methods for Rank Constrained Problems

We analyze the local convergence of proximal splitting algorithms to sol...
01/05/2021

### Effcient Projection Onto the Nonconvex ℓ_p-ball

This paper primarily focuses on computing the Euclidean projection of a ...
04/07/2021

### An Iteratively Reweighted Method for Sparse Optimization on Nonconvex ℓ_p Ball

This paper is intended to solve the nonconvex ℓ_p-ball constrained nonli...
05/20/2017

### Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Necessary conditions for high-order optimality in smooth nonlinear const...
03/16/2013

### l_2,p Matrix Norm and Its Application in Feature Selection

Recently, l_2,1 matrix norm has been widely applied to many areas such a...
10/05/2021

### Bilevel Imaging Learning Problems as Mathematical Programs with Complementarity Constraints

We investigate a family of bilevel imaging learning problems where the l...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Sparse regularization problems have attracted considerable attentions over the past decades, which have numerous applications in areas including compressed sensing [5], biomedical engineering [22, 10], sensor selection [1] and portfolio management [6]. This is because sparse solutions usually lead to better generalization performance and robustness. A common approach of promoting sparsity in the solution is to involve a sparse-inducing term such as the norm or the norm of the variables either in the objective as a penalty or in the constraint. In recent years, nonconvex and/or non-Lipschitz sparsity inducing terms such as the (quasi-)norm () are shown [13] to have preferable performance in many situations. As a consequence, in the past decade, many works focus on designing and analyzing the algorithms for solving the unconstrained regularized problems [23, 8, 11, 20, 18, 9]. However, when it comes to constrained cases, there are not many works despite its wider applicability. We list two examples of the constrained optimization problems involving an norm.

Example 1 In the first example, we consider the cloud radio access network (Cloud-RAN) power consumption problem [15, 19], which solves a group sparse problem to induce the group sparsity for the beamformers to guide the remote radio head (RRH) selection. This group sparse problem is addressed by minimizing the mixed -norm with , yielding the following problem

 minv L∑l=1√ρlzl∥~vl∥p2 s.t. √∑i≠k∥hHkvi∥22+σ2k≤1γkR(hHkvk) ∥~vl∥2≤√Pl, l=1,⋯,L,k=1,…,K.

Here the Cloud-RAN architecture of this model has RRHs and single-antenna Mobile Users (MUs), where the -th RRH is equipped with antennas.

is the transmit beamforming vector from the

-th RRH to the -th user with the group structure of transmit vectors . Denote the relative fronthaul link power consumption by , and the inefficient of drain efficiency of the radio frequency power amplifier by . The channel propagation between user and RRH is denoted as . is the maximum transmit power of the -th RRH. is the noise at MU . is the target signal-to-interference-plus-noise ratio (SINR).

Example 2. (The -constrained sparse coding) In the context of sparse coding [17], the task is to reconstruct the unknown sparse code word from the linear measurements , where represents the data with features, denotes the noise vector, and corresponds to the fixed dictionary that consists of atoms with respect to its columns. This problem can be formulated as

 minx∈Rn ∥Ax−y∥22s.t. ∥x∥pp≤θ, (1.1)

where the ball constraint is to induce sparsity in the code word.

In this paper, we consider the following two general forms of constrained nonlinear optimization with norms. The first one is the constrained regularized problem, meaning the norm appears in the objective as a penalty,

 min F(x):=f0(x)+λ∥x∥pp s.t.  fj(x)≤0, ∀j∈I; fj(x)=0, ∀j∈E. (P1)

The second one has the norm in the constraint and requires it to be smaller than a prescribed value,

 min f0(x) s.t.  ∥x∥pp≤θ; fj(x)≤0, ∀j∈I; fj(x)=0, ∀j∈E. (P2)

Here, are continuously differentiable on and with . The positive is a given regularization parameter and the positive is a prescribed scalar and is referred to as the radius of -ball.

Despite of the advantages of nonconvex norm in promoting sparse solutions, problems of the forms () and () are generally not easy to handle. This is largely due to the nonconvex and non-Lipschitz continuous nature of the norm, making it difficult to characterize the optimal solutions. In particular, verifiable optimality conditions are often difficult to derive, leaving it an obstacle for designing efficient numerical algorithms. For example, for (), many researchers [7, 12, 19, 18] tend to approximate the term by Lipschitz continuous functions and then solve for an approximate solution. As for (), to the best of our knowledge, not much has been done except the special case that only the ball constraint presents in the problem [24], meaning the projection onto the ball.

### 1.1 Literature review

The optimality conditions of the unconstrained and inequality constrained versions of () were studied in [21]. They derived the first-order and second-order necessary conditions by assuming the “extended” linear independence constraint qualification (ELICQ) is satisfied by (), meaning the LICQ is satisfied at the local minimizer in the subspace consisting of the nonzero variables. They also stated that the second-order optimality conditions can be derived by considering the reduced problems by fixing the zero components at a stationary point. In [3], Bian and Chen derived a first-order necessary optimality condition using the theory of the generalized directional derivative and the tangent cone. In particular, for the case that the constraints are all linear, Gabriel Haeser et al. [9] articulate first- and second-order necessary optimality conditions for this problem based on the perturbed problem and the limits of perturbation . Sufficient conditions for the -perturbed stationary points are also presented. As for (), [24] derived optimality conditions for the special case where only ball constraint exists using the concept of generalized Fréchet normal cone. To the best of our knowledge, there has been no study on the optimality conditions for more general cases of this problem.

### 1.2 Contributions

In this paper, we are interested in deriving the optimality conditions to characterize the local solutions of () and () under different constraint qualification (CQ). First of all, we analyze the basic properties of the norm and the norm ball. We derive the regular and general subgradients of the norm and the regular and general normal of the norm ball, which indicate the norm is subdifferentially regular and the ball is Clarke regular. For () and (), we derive the Karush-Kuhn-Tucker (KKT) conditions and discuss the constraint qualifications that ensure that the KKT conditions are satisfied at a local minimizer. For (), we believe this is the first result.

Recently, Andreani et al. [2] introduced the sequential optimality conditions, namely, the approximate KKT (AKKT) conditions for constrained smooth optimization problems, which is commonly satisfied by many algorithms. They also proposed the Cone-Continuity Property (CCP), under which the AKKT conditions implies the KKT conditions. This is widely believed to be one of the weakest qualification under which KKT conditions hold at local minimizer. We also define the sequential optimality conditions for () and () and explore the constraint qualification under which the sequential conditions imply KKT conditions. We believe these are much stronger results than contemporary ones.

To exhibit the use of the proposed sequential optimality conditions, we extend the well-known iteratively reweighted algorithms for solving unconstrained -regularized problem to constrained cases and show that those conditions are satisfied at the limit points of the generated iterates. Therefore, under the proposed constraint qualification, the limit points also satisfies the KKT conditions.

### 1.3 Notation and preliminary

We use as the vector filled with all zeros of appropriate size. For and , let be the reduced subspace of that consists of the components , and denote as the subvector of containing the elements . Letting , we also write to denote with and . For differentiable , let be the vector consisting of . In () and (), define

 Γ={x | fj(x)≤0, ∀j∈I;fj(x)=0, ∀j∈E},

and the index set of active inequalities by We define the sets of zeros and nonzeros in as

 Z(x)={i | xi=0}  and  N(x)={i | xi≠0}.

For simplicity, we use shorthands , and , .

For , define the neighborhood with radius as . For any cone , the polar of is defined to be the cone

 K∗:={v∈Rn∣⟨v,w⟩≤0 for all w∈K}.

Whenever , one has . Hence, and , implying . For , its horizon cone is defined by

 C∞={{x∣∃xν∈C,λν↘0, with λνxν→x}when C≠∅,{0}when C=∅.

Another operation on is the smallest cone containing , namely the positive hull of , which is defined as A vector is tangent to a set at a point , written , if The boundary of a set is denoted as .

###### Definition 1.1 ([14] Definition 6.3, 6.4).
1. Let and . A vector is a regular normal to at , written , if

 limsupx−→C¯x,x≠¯x⟨v,x−¯x⟩∥x−¯x∥≤0.

It is a (general) normal to at , written , if there are sequences and with . We call the regular normal cone and the normal cone to .

2. A set is regular at one of its points in the sense of Clarke if it is locally closed at and .

For a nonempty convex at ,

###### Definition 1.2 ([14] Definition 8.3).

Consider a function and a point with finite. For a vector , one says that

1. is regular subgradient of at , written , if

 f(x)≥f(¯x)+⟨v,x−¯x⟩+o(|x−¯x|);
2. is a (general) subgradient of at , written , if there are sequence and with ;

3. is a horizon subgradient of at , written , if there are sequence and with for some sequence .

For , the epigraph of is the set

###### Definition 1.3 ([14] Definition 7.25).

A function is called subdifferentially regular at if is finite and is Clarke regular at as a subset of .

## 2 First-order necessary optimality conditions

In this section, we present the first-order necessary optimality conditions for () and (). Before proceeding to the optimality conditions, we provide some basic properties.

### 2.1 Basic Properties

Denote and the norm ball . In this subsection, we provide basic properties about and . In particular, we derive regular and general subgradients of and the regular and the general normal cones of , and then show that the is subdifferentially regular and is Clarke regular on .

The regular, general, and horizon subgradients of can be calculated as follows.

###### Theorem 2.1.

For any , it holds that

 ∂ϕ(¯x)=ˆ∂ϕ(¯x)= {v∈Rn∣vj=sign(xj)p|¯xj|p−1, j∈¯N}, (2.1) ∞=∂∞ϕ(¯x)= {v∈Rn∣vj=0, j∈¯N}. (2.2)

Therefore, is subdifferentially regular at every .

###### Proof.

We first consider on . If , then . On the other hand, , implying for any . Therefore, if , it follows from the definition of regular subgradient that . Hence, . By [[14] Proposition 10.5], we have

 ˆ∂ϕ(¯x)=∂ϕ(¯x)=∂|¯x1|p×…×∂|¯xn|p={v∈Rn∣vj=sign(¯xj)p|¯xj|p−1, j∈¯N}.

This proves (2.1).

By the definition of horizon cone and (2.1), it is obvious that . We next prove .

For any and , we can select sequence such that . Let . From (2.1), this means and . Therefore, .

On the other hand, for sufficiently close to , it holds that . Therefore, by (2.1), for any . Hence, for any sequence , , , it holds that for any , or, equivalently, . Overall, we have shown that .

It then follows from [14, Corollary 8.11] that is subdifferentially regular at any . ∎

The regular and general normal vectors can be calculated as follows.

###### Theorem 2.2.

For any , , i.e.,

 NΘ(¯x)={{v∈Rn∣vj=λsign(¯xj)p|¯xj|p−1,j∈¯N; λ≥0}if ¯x∈∂Θ,{0}if ¯x∈int Θ. (2.3)

Furthermore, is Clarke regular at any , i.e., .

###### Proof.

We only prove the case that since the other is trivial. We have and . Together with Theorem 2.1 and [14, Proposition 10.3], it holds that

 ˆNΘ(¯x)=NΘ(¯x)=pos ∂ϕ(¯x)∪∂∞ϕ(¯x)

and is Clarke regular at . By the definition of pos and (2.1),

 pos ∂ϕ(¯x)={0}∪{λv∈Rn∣vj=sign(¯xj)p|¯xj|p−1,j∈¯N; λ>0}.

Therefore, it holds that

 NΘ(¯x)=pos ∂ϕ(¯x)∪∂∞ϕ(¯x)={v∈Rn∣vj=λsign(¯xj)p|¯xj|p−1,j∈¯N; λ≥0}.

Now we have from [14, Theorem 8.15] the following first-order necessary condition for (). For (), we only focus on the local minimizers on the boundary of the ball, i.e., ; otherwise, the characterization of local minimizers reverts to the case of traditional constrained nonlinear problems.

###### Theorem 2.3.

Suppose is differentiable over . The following statements hold true.

1. If contains no vector such that , then a necessary condition for to be locally optimal for () is

 0∈∇f0(¯x)+λ∂ϕ(¯x)+NΓ(¯x). (2.4)
2. Suppose that is a local minimizer of () with . Then

 −∇f0(¯x)∈ˆNΘ∩Γ(¯x)⊂NΘ∩Γ(¯x). (2.5)

Obviously, to find an optimal solution, we can only focus on the top semicircle of the given ball, so that the original problem is equivalent to

 min (x1+1)2+√x1+√1−√−x21+2x1 s.t. % 0≤x1≤2

The derivative over the domain is always positive, therefore is a global minimizer. However, there exist , such that and . In this case, one can see (2.4) does not hold at .

### 2.2 Optimality conditions for (P1)

To make condition (2.4) for () informative, we need to clarify when happens and how to calculate the elements in . For this purpose, we define the following extended Mangasarian-Fromovitz constraint qualification (EMFCQ).

The extended MFCQ (EMFCQ) holds at for if the subvectors , are linearly independent and there exists such that

 ⟨∇¯Nfj(¯x),d⟩=0, j∈E  and  ⟨∇¯Nfj(¯x),d⟩<0, j∈¯A. (2.6)

Obviously, the EMFCQ is a weaker condition than the ELICQ proposed in [21]. Moreover, if the EMFCQ holds at for , then the MFCQ holds naturally true at for ; letting

 ΛΓ(¯x)={∑j∈E∪¯Ayj∇fj(¯x)∣yj≥0,j∈¯A}, (2.7)

we have from [14, Theorem 6.14] that is regular at and .

In fact, the EMFCQ guarantees the condition in Theorem 2.3 about holds true.

###### Theorem 2.4.
1. Suppose the EMFCQ is satisfied at for . Then contains no vector such that . Furthermore, a necessary condition for to be locally optimal for () is that there exist and such that

 ∇if0(¯x)+λpsign(¯xi)|¯xi|p−1+∑j∈E∪¯Ayj∇ifj(¯x)=0, i∈¯N. (2.8)
2. Suppose is closed and convex in (). If is a local minimizer of () and contains no vector such that , then it holds that

 −∇if0(¯x)+λpsign(¯xi)|¯xi|p−1+vi=0,i∈¯N; v∈NΓ(¯x). (2.9)
###### Proof.

(a) Assume by contradiction that there exists nonzero such that ; then it follows from Theorem 2.1 and (2.7) that

 ∑j∈E∪¯Ayj∇¯Nfj(¯x)=−v¯N=0;∑j∈E∪¯Ayj∇¯Zfj(¯x)=−v¯Z≠0; yj≥0,j∈¯A. (2.10)

Since EMFCQ holds true at , the dual form [16] of condition (2.6) tells that is the unique solution of the the system

 ∑j∈E∪¯Ayj∇¯Nfj(¯x)=0, yj≥0,j∈¯A.

It follows that , contradicting (2.10). Therefore, for any nonzero , . From Theorem 2.3, at a local optimal solution of (), (2.8) is satisfied.

(b) This is trivially true from Theorem 2.3. ∎

We call the conditions (2.8) the Karush-Kuhn-Tucker (KKT) conditions for (). Using the notation of , it can also be equivalently written as

 −∇f0(¯x)∈λ∂ϕ(¯x)+ΛΓ(¯x), (2.11)

and is replaced with when is a closed and convex set.

### 2.3 Optimality conditions for (P2)

We also consider other verifiable forms of condition (2.5) if some constraint qualification (CQ) is satisfied at . For , define the extended linearized cone as:

 ΥΘ∩Γ(¯x):={d∈Rn∣⟨v,d⟩≤0,∀v∈∂ϕ(¯x);⟨∇fj(¯x),d⟩=0,i∈E; ⟨∇fj(¯x),d⟩≤0,i∈¯A}.

Obviously,

 NΘ(¯x)+ΛΓ(¯x)= ΥΘ∩Γ(¯x)∗ = {v∈Rn∣vi=y0psign(¯xi)|¯xi|p−1+∑j∈¯A∪Eyj∇ifj(¯x),i∈¯N; yj≥0,j∈{0}∪¯A}.

It follows from [14, Theorem 6.14] that Hence, we have the following result.

###### Proposition 2.5.

For with , . Therefore, if , meaning that there exist such that

 ∇f0(¯x)+y0psign(¯xi)|¯xi|p−1+∑j∈¯A∪Eyj∇ifj(¯x)=0,i∈¯N,

then the first-order necessary condition (2.5) is satisfied at .

The extended MFCQ (EMFCQ) for () holds at if the subvectors , , are linearly independent and there exists such that

 ⟨psign(x¯N)|x¯N|p−1,d⟩<0,⟨∇¯Nfj(¯x),d⟩=0, j∈E  and  ⟨∇¯Nfj(¯x),d⟩<0, j∈¯A. (2.12)

Equivalently, the dual form of EMFCQ for () holds at if is the unique solution of

 {y0psign(¯xi)|¯xi|p−1+∑j∈¯A∪Eyj∇ifj(¯x)=0,i∈¯A; yj≥0,j∈{0}∪¯A}.

We now state the necessary optimality conditions for various situations.

###### Theorem 2.6.

Suppose with is local optimal for ().

1. If the extended MFCQ holds at , then there exist and , such that

 ∇f0(¯x)+y0psign(¯xi)|¯xi|p−1+∑j∈¯A∪Eyj∇ifj(¯x)=0,i∈¯N. (2.13)
2. Suppose is closed and convex. If is the only vector such that and , then . Therefore, if is local optimal for (), then

 −∇if(¯x)+y0psign(¯xi)|¯xi|p−1+vi=0,i∈¯N; y0≥0; v∈NΓ(¯x).
3. Suppose . If is local optimal for (), then there exists such that

 −∇if(¯x)+y0psign(¯xi)|¯xi|p−1=0,i∈¯N.
###### Proof.

(a) From [14, Theorem 6.14], if the extended MFCQ holds for () at with , then is regular at and

 ˆNΘ∩Γ(¯x)=ΥΘ∩Γ(¯x)∗. (2.14)

By Theorem 2.3, (a) is true.

(b) By [14, Theorem 6.42], . Therefore, if is local optimal, then .

(c) Trivial by (b). ∎

We call the conditions (2.13) the Karush-Kuhn-Tucker (KKT) conditions for (). Using the notation of , it can also be equivalently written as

 −∇f0(¯x)∈NΘ(¯x)+ΛΓ(¯x), (2.15)

and is replaced with when is a closed and convex set.

## 3 First-order sequential optimality condition

In this section, we turn to study the sequential optimality conditions under the approximate Karush-Kuhn-Tucker (AKKT) conditions, which are defined as follows.

###### Definition 3.1.
1. For (), we say that satisfies the AKKT if there exist , , such that