Minimization of the nonconvex and nonsmooth function
is a core part of nonlinear programming and applied mathematics. Different with traditional convergence results on the global minimizers in the convex community, the convergence of the nonconvex algorithm just promises that the iteration falls into a critical point. In most practical cases, the objective functions enjoy the Kurdyka-ojasiewicz property (see definitions in Sec. 2). In this paper, we consider the convergence analysis under the Kurdyka-ojasiewicz property assumption on the objective function .
In paper , for the sequence generated by a very general scheme for problem (1.1), the authors consider three conditions, sufficient descent condition, relative error condition and continuity condition. Mathematically, these three conditions can be presented as: for some
where means the limiting subdifferential of (see definition in Sec. 2). Actually, various algorithms satisfy these three conditions. The third condition is usually derived by the minimization in each iteration. The proofs in  use a local area analysis; the authors first prove that the sequence falls into a neighbor of some point after enough iterations and then employ the Kurdyka-ojasiewicz property around the point. In latter paper , the authors prove a uniformed Kurdyka-ojasiewicz lemma for a closed set and much simplify the proofs.
1.1 A novel convergence framework
In this paper, we consider the convergence for inexact nonconvex and nonsmooth algorithms. We stress that the inexact algorithms discussed in our paper are different from the paper . In their paper, an assumption is posed for the noise: the noise should be bounded by the successive difference of the iteration. The “inexact algorithm” in  is much closer to “proximal algorithm”. For example, if is differentiable (may be nonconvex), the nonconvex gradient descent algorithm performs as
However, the sequence generated by (1.4) is likely violating some conditions in (1.2) when . The existing analysis cannot be directly used for the algorithm (1.4). The authors in  proposed the assumption for the noise as
where . Under this assumption, they can continue using the sufficient descent condition and relative error condition. In this paper, we get rid of the dependent assumption like (1.5). Although in this case the inexact algorithms always fail to obey the first two of the core condition (1.2), we find that many of them satisfy an alternative condition:
where are constants, and is a nonnegative sequence, and and is a sequence satisfying
for some . The continuity condition is kept here. Obviously, if , and , the condition will reduce to (1.2). Thus, our work can also be regarded as a generation of paper . Our approach is first proving convergence for a general inexact algorithm whose sequence satisfying the condition (1.6) under a specific summable assumption on . We then prove several classical inexact algorithms satisfying condition (1.6).
The core of the proof lies in using an auxiliary function whose successive difference gives a bound to the successive difference of the sequence . If is semi-algebraic, the new function is then Kurdyka-ojasiewicz. And then, we build sufficient descent involving the new function and . We denote , which is a composition of , in (3.3). In the -th iteration, the distance between subdifferential of the new function and the origin is bounded by the composition of , and . And then, we prove the finite length of provided is also summable. In proving the finite length, the key part is using the Kurdyka-ojasiewicz property of the new Lyapunov function. The proof techniques are motivated by the methodology proposed in .
1.2 Related work
Recently, the convergence analysis in nonconvex optimization has paid increasing attention to using the Kurdyka-ojasiewicz property in proofs. In paper , the authors proved the convergence of proximal algorithm minimizing the Kurdyka-ojasiewicz functions. In , the rates for the iteration converging to a critical point were exploited. An alternating proximal algorithm was considered in , and the convergence was proved under Kurdyka-ojasiewicz assumption on the objective function. Later, a proximal linearized alternating minimization algorithm was proposed and studied in . A convergence framework was given in , which contains various nonconvex algorithms. In , the authors modified the framework for analyzing splitting methods with variable metric, and proved the general convergence rates. The nonconvex ADMM was studied under Kurdyka-ojasiewicz assumption by [20, 21]. And latter paper  proposed the nonconvex primal-dual algorithm and proved the convergence. The Kurdyka-ojasiewicz-analysis convergence method was applied to analyzing the convergence of the reweighted algorithm by . And the extension to the reweighted nuclear norm version was developed in . Recently, the DC algorithm has also employed the Kurdyka-ojasiewicz property in the convergence analysis .
1.3 Contribution and organization
In this paper, we focus on the inexact nonconvex algorithms. We first propose a new framework (1.6), which is more general than the frameworks proposed in  and . The convergence is proved for any sequence satisfying (1.6) with and if is a Kurdyka-ojasiewicz function. In the analysis, we employ the new Lyapunov function which is a composition of the and the length of the noise. The new framework proposed in this paper indicates kinds of algorithms. We then apply our results to these algorithms. For a specific algorithm, we just need to verify that (1.6) and (1.7) hold.
The rest of the paper is organized as follows. In section 2, we list necessary preliminaries. Section 3 contains the main results. In section 4, we provide the applications. Section 5 concludes the paper.
This section presents the mathematical tools which will be used in our proofs and contains two parts: in the first one, we introduce the basic definitions and properties for subdifferentials; in the second one, the KŁ property is introduced.
The notion of subdifferential plays a central role in variational analysis.
Definition 1 (subdifferential).
Let be a proper and lower semicontinuous function.
For a given , the Frchet subdifferential of at , written
, is the set of all vectorswhich satisfy
When , we set .
The (limiting) subdifferential, or simply the subdifferential, of at , written , is defined through the following closure process
It is easy to verify that the Frchet subdifferential is convex and closed while the subdifferential is closed. When is convex, the definition agrees with the subgradient in convex analysis as
The graph of subdifferential for a real extended valued function is defined by
And the domain of the subdifferential of is given as
Let be a sequence in such that . If converges to as and converges to as , then . A necessary condition for to be a minimizer of is
2.2 Kurdyka-Łojasiewicz function
With the definition of subdifferential, we now are prepared to introduce the Kurdyka-Łojasiewicz property and function.
is on .
for all , .
for all in , the Kurdyka-Łojasiewicz inequality holds
(b) Proper lower semicontinuous functions which satisfy the Kurdyka-Łojasiewicz inequality at each point of are called KL functions.
It is hard to directly judge whether a function is Kurdyka-Łojasiewicz or not. Fortunately, the concept of semi-algebraicity can help to find and check a very rich class of Kurdyka-Łojasiewicz functions.
(a) A subset of is a real semi-algebraic set if there exists a finite number of real polynomial functions such that
(b) A function is called semi-algebraic if its graph
is a semi-algebraic subset of .
Real polynomial functions.
Indicator functions of semi-algebraic sets.
Finite sums and product of semi-algebraic functions.
Composition of semi-algebraic functions.
Sup/Inf type function, e.g., is semi-algebraic when is a semi-algebraic function and a semi-algebraic set.
Cone of PSD matrices, Stiefel manifolds and constant rank matrices.
Now we present a lemma for the uniformized KŁ property. With this lemma, we can make the proofs much more concise.
Lemma 1 ().
Let be a proper lower semi-continuous function and be a compact set. If is a constant on and satisfies the KŁ property at each point on , then there exists concave function satisfying the four assumptions in Definition 2 and such that for any and any satisfying that and , it holds that
3 Convergence analysis
The sequence is assumed to satisfy
It is worth mentioning that the assumption (3.1) is necessary to guarantee the sequence convergence in general case. To see this, we consider the inexact gradient example (1.4) in a very special case that . And then, we get . Further, we consider the one-dimensional case, in which ; we set . In this example, will diverge if (3.1) fails to hold. However, in our proofs, only (3.1) barely promises the sequence convergence. The final assumption for the sequence convergence is a little stronger than (3.1).
Now, we introduce the Lyapunov function used in the analysis. Given any fixed , we denote a new function, which plays an important role in the analysis, as
We also need to define the new sequences as
Due to that when and is larger enough, is well-defined. The aim in this part is proving that generated by the algorithm converges to a critical point of , and building the relationships between the critical points of and . The proof contains two main steps:
Find a positive constant such that
Find another positive constants such that
(1) It holds that
And then, is bounded if is coercive.
(2) , which implies that
(1) From the direct algebra computations, we can easily obtain
If is coercive, then is coercive. Thus, is bounded due to that is bounded.
If the conditions of Lemma 2 hold,
Direct calculation yields
Thus, we have
In the following, we establish some results about the limit points of the sequence generated by the general algorithm. We need a definition about the limit point which is introduced in .
For a sequence , define that
where is the starting point.
Suppose that is generated by general algorithm and is coercive. And the conditions of Lemma 2 hold. Then, we have the following results.
(1) For any , we have and .
(2) is nonempty and .
(2’) is nonempty and
(4) The function is finite and constant on .
(4’) The function is finite and constant on .
(1) Noting , and
(2) It is easy to see the coercivity of . With Lemma 2 and the coercivity of , is bounded. Thus, is nonempty. Assume that , from the definition, there exists a subsequence . From Lemmas 2 and 3, we have . The closedness of indicates that , i.e. .
(2’) With the facts and , we can easily derive the results.
(3)(3’) This item follows as a consequence of the definition of the limit point.
(4) Let be the limit of . There exists one stationary point , from the continuity condition, there exists satisfying . We denote that . Thus, the subsequence and . And we have
(4’) The proof is similar to (4).
Obviously, is semi-algebraic, and then KŁ. Let be a cluster point of , then, is also a cluster point of . If for some , with the fact is decreasing, as . Using Lemma 2, as . In the following, we consider the case . From Lemmas 1 and 4, there exist such that for any and any satisfying that and . From Lemma 4, as is large enough,
Thus, there exist concave function such that
Therefore, we have
where uses the Schwarz inequality with , and , and . Multiplying (3) with , we have
Summing both sides from to , and with simplifications,
Letting and using and , we then derive
By using (1.7), we are then led to
Thus, has only one stationary point . From Lemma 4, . ∎
The requirement (3.10) is complicated and impractical in the applications. Thus, we consider the sequence enjoys the polynomial forms as with . We try to simplify (3.10) in this case. The task then reduce the following mathematical analysis problem: find the minimum such that for any , there exists can make (3.10) hold. Direct calculations give us
Thus, we need
After simplifications, we get
Then, the problem reduces to
Figure 1 shows the function values between . We can see is decreasing to at . Therefore, we get . That is also to say if with any fixed , there exists such that (3.10) can hold. And then, the sequence is convergent to some critical point of . Therefore, we obtain the following result.
4 Applications to several nonconvex algorithms
In this part, several classical nonconvex inexact algorithms are considered. We apply our theoretical findings to these algorithms and derive corresponding convergence results for the algorithms. As presented before, we just need to check whether the algorithm satisfies the three conditions in (1.6). For a closed function (may be nonconvex) , we denote
Different with convex cases, the is a point-to-set operator and may have more than one solution. We present a useful lemma which plays a very important role in the analysis.
For any and , if ,
Of course, we also have
In subsections 4.1-4.4, the point is itself, i.e., .
4.1 Inexact nonconvex gradient and proximal algorithm
The nonconvex proximal gradient algorithm is developed for the nonconvex composite optimization
where is differentiable and is Lipschitz with , and is closed. And both and may be nonconvex. The nonconvex inexact proximal gradient algorithm can be described as
Let and the sequence be generated by algorithm (4.5), we have
Let the sequence be generated by algorithm (4.5), we have
Thus, we have