1 Introduction
Minimization of the nonconvex and nonsmooth function
(1.1) 
is a core part of nonlinear programming and applied mathematics. Different with traditional convergence results on the global minimizers in the convex community, the convergence of the nonconvex algorithm just promises that the iteration falls into a critical point. In most practical cases, the objective functions enjoy the Kurdykaojasiewicz property (see definitions in Sec. 2). In this paper, we consider the convergence analysis under the Kurdykaojasiewicz property assumption on the objective function .
In paper [5], for the sequence generated by a very general scheme for problem (1.1), the authors consider three conditions, sufficient descent condition, relative error condition and continuity condition. Mathematically, these three conditions can be presented as: for some
(1.2) 
where means the limiting subdifferential of (see definition in Sec. 2). Actually, various algorithms satisfy these three conditions. The third condition is usually derived by the minimization in each iteration. The proofs in [5] use a local area analysis; the authors first prove that the sequence falls into a neighbor of some point after enough iterations and then employ the Kurdykaojasiewicz property around the point. In latter paper [9], the authors prove a uniformed Kurdykaojasiewicz lemma for a closed set and much simplify the proofs.
1.1 A novel convergence framework
In this paper, we consider the convergence for inexact nonconvex and nonsmooth algorithms. We stress that the inexact algorithms discussed in our paper are different from the paper [5]. In their paper, an assumption is posed for the noise: the noise should be bounded by the successive difference of the iteration. The “inexact algorithm” in [5] is much closer to “proximal algorithm”. For example, if is differentiable (may be nonconvex), the nonconvex gradient descent algorithm performs as
(1.3) 
If the gradient of is Lipschitz with and , the sequence generated by (1.3) satisfies condition (1.2). However, if the iteration is corrupted by some noise in each step, i.e.,
(1.4) 
However, the sequence generated by (1.4) is likely violating some conditions in (1.2) when . The existing analysis cannot be directly used for the algorithm (1.4). The authors in [5] proposed the assumption for the noise as
(1.5) 
where . Under this assumption, they can continue using the sufficient descent condition and relative error condition. In this paper, we get rid of the dependent assumption like (1.5). Although in this case the inexact algorithms always fail to obey the first two of the core condition (1.2), we find that many of them satisfy an alternative condition:
(1.6) 
where are constants, and is a nonnegative sequence, and and is a sequence satisfying
(1.7) 
for some . The continuity condition is kept here. Obviously, if , and , the condition will reduce to (1.2). Thus, our work can also be regarded as a generation of paper [5]. Our approach is first proving convergence for a general inexact algorithm whose sequence satisfying the condition (1.6) under a specific summable assumption on . We then prove several classical inexact algorithms satisfying condition (1.6).
The core of the proof lies in using an auxiliary function whose successive difference gives a bound to the successive difference of the sequence . If is semialgebraic, the new function is then Kurdykaojasiewicz. And then, we build sufficient descent involving the new function and . We denote , which is a composition of , in (3.3). In the th iteration, the distance between subdifferential of the new function and the origin is bounded by the composition of , and . And then, we prove the finite length of provided is also summable. In proving the finite length, the key part is using the Kurdykaojasiewicz property of the new Lyapunov function. The proof techniques are motivated by the methodology proposed in [5].
1.2 Related work
Recently, the convergence analysis in nonconvex optimization has paid increasing attention to using the Kurdykaojasiewicz property in proofs. In paper [3], the authors proved the convergence of proximal algorithm minimizing the Kurdykaojasiewicz functions. In [3], the rates for the iteration converging to a critical point were exploited. An alternating proximal algorithm was considered in [4], and the convergence was proved under Kurdykaojasiewicz assumption on the objective function. Later, a proximal linearized alternating minimization algorithm was proposed and studied in [9]. A convergence framework was given in [5], which contains various nonconvex algorithms. In [14], the authors modified the framework for analyzing splitting methods with variable metric, and proved the general convergence rates. The nonconvex ADMM was studied under Kurdykaojasiewicz assumption by [20, 21]. And latter paper [32] proposed the nonconvex primaldual algorithm and proved the convergence. The Kurdykaojasiewiczanalysis convergence method was applied to analyzing the convergence of the reweighted algorithm by [35]. And the extension to the reweighted nuclear norm version was developed in [34]. Recently, the DC algorithm has also employed the Kurdykaojasiewicz property in the convergence analysis [2].
1.3 Contribution and organization
In this paper, we focus on the inexact nonconvex algorithms. We first propose a new framework (1.6), which is more general than the frameworks proposed in [5] and [14]. The convergence is proved for any sequence satisfying (1.6) with and if is a Kurdykaojasiewicz function. In the analysis, we employ the new Lyapunov function which is a composition of the and the length of the noise. The new framework proposed in this paper indicates kinds of algorithms. We then apply our results to these algorithms. For a specific algorithm, we just need to verify that (1.6) and (1.7) hold.
The rest of the paper is organized as follows. In section 2, we list necessary preliminaries. Section 3 contains the main results. In section 4, we provide the applications. Section 5 concludes the paper.
2 Preliminaries
This section presents the mathematical tools which will be used in our proofs and contains two parts: in the first one, we introduce the basic definitions and properties for subdifferentials; in the second one, the KŁ property is introduced.
2.1 Subdifferential
More details about the definition of subdifferential can be found in the textbooks [27, 28]. Given an lower semicontinuous function , its domain is defined by
The notion of subdifferential plays a central role in variational analysis.
Definition 1 (subdifferential).
Let be a proper and lower semicontinuous function.

For a given , the Frchet subdifferential of at , written
, is the set of all vectors
which satisfyWhen , we set .

The (limiting) subdifferential, or simply the subdifferential, of at , written , is defined through the following closure process
It is easy to verify that the Frchet subdifferential is convex and closed while the subdifferential is closed. When is convex, the definition agrees with the subgradient in convex analysis as
The graph of subdifferential for a real extended valued function is defined by
And the domain of the subdifferential of is given as
Let be a sequence in such that . If converges to as and converges to as , then . A necessary condition for to be a minimizer of is
(2.1) 
When is convex, (2.1) is also sufficient. A point that satisfies (2.1) is called (limiting) critical point. The set of critical points of is denoted by .
2.2 KurdykaŁojasiewicz function
With the definition of subdifferential, we now are prepared to introduce the KurdykaŁojasiewicz property and function.
Definition 2.
[22, 18, 7] (a) The function is said to have the KurdykaŁojasiewicz property at if there exist , a neighborhood of and a continuous concave function such that

.

is on .

for all , .

for all in , the KurdykaŁojasiewicz inequality holds
(2.2)
(b) Proper lower semicontinuous functions which satisfy the KurdykaŁojasiewicz inequality at each point of are called KL functions.
It is hard to directly judge whether a function is KurdykaŁojasiewicz or not. Fortunately, the concept of semialgebraicity can help to find and check a very rich class of KurdykaŁojasiewicz functions.
Definition 3 (Semialgebraic sets and functions [7, 8]).
(a) A subset of is a real semialgebraic set if there exists a finite number of real polynomial functions such that
(b) A function is called semialgebraic if its graph
is a semialgebraic subset of .
Better yet, the semialgebraicity enjoys many quite nice properties [7, 8]. We just put a few of them here:

Real polynomial functions.

Indicator functions of semialgebraic sets.

Finite sums and product of semialgebraic functions.

Composition of semialgebraic functions.

Sup/Inf type function, e.g., is semialgebraic when is a semialgebraic function and a semialgebraic set.

Cone of PSD matrices, Stiefel manifolds and constant rank matrices.
Now we present a lemma for the uniformized KŁ property. With this lemma, we can make the proofs much more concise.
Lemma 1 ([9]).
Let be a proper lower semicontinuous function and be a compact set. If is a constant on and satisfies the KŁ property at each point on , then there exists concave function satisfying the four assumptions in Definition 2 and such that for any and any satisfying that and , it holds that
(2.3) 
3 Convergence analysis
The sequence is assumed to satisfy
(3.1) 
It is worth mentioning that the assumption (3.1) is necessary to guarantee the sequence convergence in general case. To see this, we consider the inexact gradient example (1.4) in a very special case that . And then, we get . Further, we consider the onedimensional case, in which ; we set . In this example, will diverge if (3.1) fails to hold. However, in our proofs, only (3.1) barely promises the sequence convergence. The final assumption for the sequence convergence is a little stronger than (3.1).
Now, we introduce the Lyapunov function used in the analysis. Given any fixed , we denote a new function, which plays an important role in the analysis, as
(3.2) 
We also need to define the new sequences as
(3.3) 
Due to that when and is larger enough, is welldefined. The aim in this part is proving that generated by the algorithm converges to a critical point of , and building the relationships between the critical points of and . The proof contains two main steps:

Find a positive constant such that

Find another positive constants such that
Lemma 2.
Assume that is generated by the general inexact algorithm satisfying conditions (1.6) and (1.7), and condition (3.1) holds.
Then, we have the following results.
(1) It holds that
(3.4) 
And then, is bounded if is coercive.
(2) , which implies that
(3.5) 
Proof.
(1) From the direct algebra computations, we can easily obtain
(3.6)  
If is coercive, then is coercive. Thus, is bounded due to that is bounded.
Lemma 3.
If the conditions of Lemma 2 hold,
(3.7) 
Proof.
Direct calculation yields
(3.8) 
Thus, we have
(3.9)  
∎
In the following, we establish some results about the limit points of the sequence generated by the general algorithm. We need a definition about the limit point which is introduced in [5].
Definition 4.
For a sequence , define that
where is the starting point.
Lemma 4.
Suppose that is generated by general algorithm and is coercive. And the conditions of Lemma 2 hold. Then, we have the following results.
(1) For any , we have and .
(2) is nonempty and .
(2’) is nonempty and
(3) .
(3’) .
(4) The function is finite and constant on .
(4’) The function is finite and constant on .
Proof.
(1) Noting , and
(2) It is easy to see the coercivity of . With Lemma 2 and the coercivity of , is bounded. Thus, is nonempty. Assume that , from the definition, there exists a subsequence . From Lemmas 2 and 3, we have . The closedness of indicates that , i.e. .
(2’) With the facts and , we can easily derive the results.
(3)(3’) This item follows as a consequence of the definition of the limit point.
(4) Let be the limit of . There exists one stationary point , from the continuity condition, there exists satisfying . We denote that . Thus, the subsequence and . And we have
(4’) The proof is similar to (4).
∎
Lemma 5.
Proof.
Obviously, is semialgebraic, and then KŁ. Let be a cluster point of , then, is also a cluster point of . If for some , with the fact is decreasing, as . Using Lemma 2, as . In the following, we consider the case . From Lemmas 1 and 4, there exist such that for any and any satisfying that and . From Lemma 4, as is large enough,
Thus, there exist concave function such that
(3.12) 
Therefore, we have
where is due to the concavity of , and depends on Lemma 2, uses the KŁ property, and follows from Lemma 3. That is also
(3.13) 
where uses the Schwarz inequality with , and , and . Multiplying (3) with , we have
(3.14)  
Summing both sides from to , and with simplifications,
(3.15) 
Letting and using and , we then derive
(3.16) 
By using (1.7), we are then led to
(3.17) 
Thus, has only one stationary point . From Lemma 4, . ∎
The requirement (3.10) is complicated and impractical in the applications. Thus, we consider the sequence enjoys the polynomial forms as with . We try to simplify (3.10) in this case. The task then reduce the following mathematical analysis problem: find the minimum such that for any , there exists can make (3.10) hold. Direct calculations give us
(3.18) 
Thus, we need
(3.19) 
After simplifications, we get
(3.20) 
Then, the problem reduces to
(3.21) 
Figure 1 shows the function values between . We can see is decreasing to at . Therefore, we get . That is also to say if with any fixed , there exists such that (3.10) can hold. And then, the sequence is convergent to some critical point of . Therefore, we obtain the following result.
4 Applications to several nonconvex algorithms
In this part, several classical nonconvex inexact algorithms are considered. We apply our theoretical findings to these algorithms and derive corresponding convergence results for the algorithms. As presented before, we just need to check whether the algorithm satisfies the three conditions in (1.6). For a closed function (may be nonconvex) , we denote
(4.1) 
Different with convex cases, the is a pointtoset operator and may have more than one solution. We present a useful lemma which plays a very important role in the analysis.
Lemma 6.
For any and , if ,
(4.2) 
Of course, we also have
(4.3) 
In subsections 4.14.4, the point is itself, i.e., .
4.1 Inexact nonconvex gradient and proximal algorithm
The nonconvex proximal gradient algorithm is developed for the nonconvex composite optimization
(4.4) 
where is differentiable and is Lipschitz with , and is closed. And both and may be nonconvex. The nonconvex inexact proximal gradient algorithm can be described as
(4.5) 
where is the stepsize, prox is the proximal operator and is the noise. In the convex case, this algorithm is discussed in [38, 29], and the acceleration is studied in [31].
Lemma 7.
Let and the sequence be generated by algorithm (4.5), we have
(4.6) 
Proof.
Lemma 8.
Let the sequence be generated by algorithm (4.5), we have
(4.12) 
Proof.
We have
(4.13) 
Therefore,
(4.14) 
Thus, we have
(4.15)  
Comments
There are no comments yet.