1 Introduction
Sparse regularization problems have attracted considerable attentions over the past decades, which have numerous applications in areas including compressed sensing [5], biomedical engineering [22, 10], sensor selection [1] and portfolio management [6]. This is because sparse solutions usually lead to better generalization performance and robustness. A common approach of promoting sparsity in the solution is to involve a sparseinducing term such as the norm or the norm of the variables either in the objective as a penalty or in the constraint. In recent years, nonconvex and/or nonLipschitz sparsity inducing terms such as the (quasi)norm () are shown [13] to have preferable performance in many situations. As a consequence, in the past decade, many works focus on designing and analyzing the algorithms for solving the unconstrained regularized problems [23, 8, 11, 20, 18, 9]. However, when it comes to constrained cases, there are not many works despite its wider applicability. We list two examples of the constrained optimization problems involving an norm.
Example 1 In the first example, we consider the cloud radio access network (CloudRAN) power consumption problem [15, 19], which solves a group sparse problem to induce the group sparsity for the beamformers to guide the remote radio head (RRH) selection. This group sparse problem is addressed by minimizing the mixed norm with , yielding the following problem
Here the CloudRAN architecture of this model has RRHs and singleantenna Mobile Users (MUs), where the th RRH is equipped with antennas.
is the transmit beamforming vector from the
th RRH to the th user with the group structure of transmit vectors . Denote the relative fronthaul link power consumption by , and the inefficient of drain efficiency of the radio frequency power amplifier by . The channel propagation between user and RRH is denoted as . is the maximum transmit power of the th RRH. is the noise at MU . is the target signaltointerferenceplusnoise ratio (SINR).Example 2. (The constrained sparse coding) In the context of sparse coding [17], the task is to reconstruct the unknown sparse code word from the linear measurements , where represents the data with features, denotes the noise vector, and corresponds to the fixed dictionary that consists of atoms with respect to its columns. This problem can be formulated as
(1.1) 
where the ball constraint is to induce sparsity in the code word.
In this paper, we consider the following two general forms of constrained nonlinear optimization with norms. The first one is the constrained regularized problem, meaning the norm appears in the objective as a penalty,
() 
The second one has the norm in the constraint and requires it to be smaller than a prescribed value,
() 
Here, are continuously differentiable on and with . The positive is a given regularization parameter and the positive is a prescribed scalar and is referred to as the radius of ball.
Despite of the advantages of nonconvex norm in promoting sparse solutions, problems of the forms () and () are generally not easy to handle. This is largely due to the nonconvex and nonLipschitz continuous nature of the norm, making it difficult to characterize the optimal solutions. In particular, verifiable optimality conditions are often difficult to derive, leaving it an obstacle for designing efficient numerical algorithms. For example, for (), many researchers [7, 12, 19, 18] tend to approximate the term by Lipschitz continuous functions and then solve for an approximate solution. As for (), to the best of our knowledge, not much has been done except the special case that only the ball constraint presents in the problem [24], meaning the projection onto the ball.
1.1 Literature review
The optimality conditions of the unconstrained and inequality constrained versions of () were studied in [21]. They derived the firstorder and secondorder necessary conditions by assuming the “extended” linear independence constraint qualification (ELICQ) is satisfied by (), meaning the LICQ is satisfied at the local minimizer in the subspace consisting of the nonzero variables. They also stated that the secondorder optimality conditions can be derived by considering the reduced problems by fixing the zero components at a stationary point. In [3], Bian and Chen derived a firstorder necessary optimality condition using the theory of the generalized directional derivative and the tangent cone. In particular, for the case that the constraints are all linear, Gabriel Haeser et al. [9] articulate first and secondorder necessary optimality conditions for this problem based on the perturbed problem and the limits of perturbation . Sufficient conditions for the perturbed stationary points are also presented. As for (), [24] derived optimality conditions for the special case where only ball constraint exists using the concept of generalized Fréchet normal cone. To the best of our knowledge, there has been no study on the optimality conditions for more general cases of this problem.
1.2 Contributions
In this paper, we are interested in deriving the optimality conditions to characterize the local solutions of () and () under different constraint qualification (CQ). First of all, we analyze the basic properties of the norm and the norm ball. We derive the regular and general subgradients of the norm and the regular and general normal of the norm ball, which indicate the norm is subdifferentially regular and the ball is Clarke regular. For () and (), we derive the KarushKuhnTucker (KKT) conditions and discuss the constraint qualifications that ensure that the KKT conditions are satisfied at a local minimizer. For (), we believe this is the first result.
Recently, Andreani et al. [2] introduced the sequential optimality conditions, namely, the approximate KKT (AKKT) conditions for constrained smooth optimization problems, which is commonly satisfied by many algorithms. They also proposed the ConeContinuity Property (CCP), under which the AKKT conditions implies the KKT conditions. This is widely believed to be one of the weakest qualification under which KKT conditions hold at local minimizer. We also define the sequential optimality conditions for () and () and explore the constraint qualification under which the sequential conditions imply KKT conditions. We believe these are much stronger results than contemporary ones.
To exhibit the use of the proposed sequential optimality conditions, we extend the wellknown iteratively reweighted algorithms for solving unconstrained regularized problem to constrained cases and show that those conditions are satisfied at the limit points of the generated iterates. Therefore, under the proposed constraint qualification, the limit points also satisfies the KKT conditions.
1.3 Notation and preliminary
We use as the vector filled with all zeros of appropriate size. For and , let be the reduced subspace of that consists of the components , and denote as the subvector of containing the elements . Letting , we also write to denote with and . For differentiable , let be the vector consisting of . In () and (), define
and the index set of active inequalities by We define the sets of zeros and nonzeros in as
For simplicity, we use shorthands , and , .
For , define the neighborhood with radius as . For any cone , the polar of is defined to be the cone
Whenever , one has . Hence, and , implying . For , its horizon cone is defined by
Another operation on is the smallest cone containing , namely the positive hull of , which is defined as A vector is tangent to a set at a point , written , if The boundary of a set is denoted as .
Definition 1.1 ([14] Definition 6.3, 6.4).

Let and . A vector is a regular normal to at , written , if
It is a (general) normal to at , written , if there are sequences and with . We call the regular normal cone and the normal cone to .

A set is regular at one of its points in the sense of Clarke if it is locally closed at and .
For a nonempty convex at ,
Definition 1.2 ([14] Definition 8.3).
Consider a function and a point with finite. For a vector , one says that

is regular subgradient of at , written , if

is a (general) subgradient of at , written , if there are sequence and with ;

is a horizon subgradient of at , written , if there are sequence and with for some sequence .
For , the epigraph of is the set
Definition 1.3 ([14] Definition 7.25).
A function is called subdifferentially regular at if is finite and is Clarke regular at as a subset of .
2 Firstorder necessary optimality conditions
In this section, we present the firstorder necessary optimality conditions for () and (). Before proceeding to the optimality conditions, we provide some basic properties.
2.1 Basic Properties
Denote and the norm ball . In this subsection, we provide basic properties about and . In particular, we derive regular and general subgradients of and the regular and the general normal cones of , and then show that the is subdifferentially regular and is Clarke regular on .
The regular, general, and horizon subgradients of can be calculated as follows.
Theorem 2.1.
For any , it holds that
(2.1)  
(2.2) 
Therefore, is subdifferentially regular at every .
Proof.
We first consider on . If , then . On the other hand, , implying for any . Therefore, if , it follows from the definition of regular subgradient that . Hence, . By [[14] Proposition 10.5], we have
This proves (2.1).
By the definition of horizon cone and (2.1), it is obvious that . We next prove .
For any and , we can select sequence such that . Let . From (2.1), this means and . Therefore, .
On the other hand, for sufficiently close to , it holds that . Therefore, by (2.1), for any . Hence, for any sequence , , , it holds that for any , or, equivalently, . Overall, we have shown that .
It then follows from [14, Corollary 8.11] that is subdifferentially regular at any . ∎
The regular and general normal vectors can be calculated as follows.
Theorem 2.2.
For any , , i.e.,
(2.3) 
Furthermore, is Clarke regular at any , i.e., .
Proof.
We only prove the case that since the other is trivial. We have and . Together with Theorem 2.1 and [14, Proposition 10.3], it holds that
and is Clarke regular at . By the definition of pos and (2.1),
Therefore, it holds that
∎
Now we have from [14, Theorem 8.15] the following firstorder necessary condition for (). For (), we only focus on the local minimizers on the boundary of the ball, i.e., ; otherwise, the characterization of local minimizers reverts to the case of traditional constrained nonlinear problems.
Theorem 2.3.
Obviously, to find an optimal solution, we can only focus on the top semicircle of the given ball, so that the original problem is equivalent to
The derivative over the domain is always positive, therefore is a global minimizer. However, there exist , such that and . In this case, one can see (2.4) does not hold at .
2.2 Optimality conditions for ()
To make condition (2.4) for () informative, we need to clarify when happens and how to calculate the elements in . For this purpose, we define the following extended MangasarianFromovitz constraint qualification (EMFCQ).
The extended MFCQ (EMFCQ) holds at for if the subvectors , are linearly independent and there exists such that
(2.6) 
Obviously, the EMFCQ is a weaker condition than the ELICQ proposed in [21]. Moreover, if the EMFCQ holds at for , then the MFCQ holds naturally true at for ; letting
(2.7) 
we have from [14, Theorem 6.14] that is regular at and .
In fact, the EMFCQ guarantees the condition in Theorem 2.3 about holds true.
Theorem 2.4.
Proof.
(a) Assume by contradiction that there exists nonzero such that ; then it follows from Theorem 2.1 and (2.7) that
(2.10) 
Since EMFCQ holds true at , the dual form [16] of condition (2.6) tells that is the unique solution of the the system
It follows that , contradicting (2.10). Therefore, for any nonzero , . From Theorem 2.3, at a local optimal solution of (), (2.8) is satisfied.
(b) This is trivially true from Theorem 2.3. ∎
We call the conditions (2.8) the KarushKuhnTucker (KKT) conditions for (). Using the notation of , it can also be equivalently written as
(2.11) 
and is replaced with when is a closed and convex set.
2.3 Optimality conditions for ()
We also consider other verifiable forms of condition (2.5) if some constraint qualification (CQ) is satisfied at . For , define the extended linearized cone as:
Obviously,
It follows from [14, Theorem 6.14] that Hence, we have the following result.
Proposition 2.5.
For with , . Therefore, if , meaning that there exist such that
then the firstorder necessary condition (2.5) is satisfied at .
The extended MFCQ (EMFCQ) for () holds at if the subvectors , , are linearly independent and there exists such that
(2.12) 
Equivalently, the dual form of EMFCQ for () holds at if is the unique solution of
We now state the necessary optimality conditions for various situations.
Theorem 2.6.
Proof.
(a) From [14, Theorem 6.14], if the extended MFCQ holds for () at with , then is regular at and
(2.14) 
By Theorem 2.3, (a) is true.
(b) By [14, Theorem 6.42], . Therefore, if is local optimal, then .
(c) Trivial by (b). ∎
We call the conditions (2.13) the KarushKuhnTucker (KKT) conditions for (). Using the notation of , it can also be equivalently written as
(2.15) 
and is replaced with when is a closed and convex set.
3 Firstorder sequential optimality condition
In this section, we turn to study the sequential optimality conditions under the approximate KarushKuhnTucker (AKKT) conditions, which are defined as follows.
Comments
There are no comments yet.