A convergence frame for inexact nonconvex and nonsmooth algorithms and its applications to several iterations

09/12/2017 ∙ by Tao Sun, et al. ∙ Xiangtan University NetEase, Inc 0

In this paper, we consider the convergence of an abstract inexact nonconvex and nonsmooth algorithm. We promise a pseudo sufficient descent condition and a pseudo relative error condition, which both are related to an auxiliary sequence, for the algorithm; and a continuity condition is assumed to hold. In fact, a wide of classical inexact nonconvex and nonsmooth algorithms allow these three conditions. Under the finite energy assumption on the auxiliary sequence, we prove the sequence generated by the general algorithm converges to a critical point of the objective function if being assumed Kurdyka- Lojasiewicz property. The core of the proofs lies on building a new Lyapunov function, whose successive difference provides a bound for the successive difference of the points generated by the algorithm. And then, we apply our findings to several classical nonconvex iterative algorithms and derive corresponding convergence results.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Minimization of the nonconvex and nonsmooth function

(1.1)

is a core part of nonlinear programming and applied mathematics. Different with traditional convergence results on the global minimizers in the convex community, the convergence of the nonconvex algorithm just promises that the iteration falls into a critical point. In most practical cases, the objective functions enjoy the Kurdyka-ojasiewicz property (see definitions in Sec. 2). In this paper, we consider the convergence analysis under the Kurdyka-ojasiewicz property assumption on the objective function .

In paper [5], for the sequence generated by a very general scheme for problem (1.1), the authors consider three conditions, sufficient descent condition, relative error condition and continuity condition. Mathematically, these three conditions can be presented as: for some

(1.2)

where means the limiting subdifferential of (see definition in Sec. 2). Actually, various algorithms satisfy these three conditions. The third condition is usually derived by the minimization in each iteration. The proofs in [5] use a local area analysis; the authors first prove that the sequence falls into a neighbor of some point after enough iterations and then employ the Kurdyka-ojasiewicz property around the point. In latter paper [9], the authors prove a uniformed Kurdyka-ojasiewicz lemma for a closed set and much simplify the proofs.

1.1 A novel convergence framework

In this paper, we consider the convergence for inexact nonconvex and nonsmooth algorithms. We stress that the inexact algorithms discussed in our paper are different from the paper [5]. In their paper, an assumption is posed for the noise: the noise should be bounded by the successive difference of the iteration. The “inexact algorithm” in [5] is much closer to “proximal algorithm”. For example, if is differentiable (may be nonconvex), the nonconvex gradient descent algorithm performs as

(1.3)

If the gradient of is Lipschitz with and , the sequence generated by (1.3) satisfies condition (1.2). However, if the iteration is corrupted by some noise in each step, i.e.,

(1.4)

However, the sequence generated by (1.4) is likely violating some conditions in (1.2) when . The existing analysis cannot be directly used for the algorithm (1.4). The authors in [5] proposed the assumption for the noise as

(1.5)

where . Under this assumption, they can continue using the sufficient descent condition and relative error condition. In this paper, we get rid of the dependent assumption like (1.5). Although in this case the inexact algorithms always fail to obey the first two of the core condition (1.2), we find that many of them satisfy an alternative condition:

(1.6)

where are constants, and is a nonnegative sequence, and and is a sequence satisfying

(1.7)

for some . The continuity condition is kept here. Obviously, if , and , the condition will reduce to (1.2). Thus, our work can also be regarded as a generation of paper [5]. Our approach is first proving convergence for a general inexact algorithm whose sequence satisfying the condition (1.6) under a specific summable assumption on . We then prove several classical inexact algorithms satisfying condition (1.6).

The core of the proof lies in using an auxiliary function whose successive difference gives a bound to the successive difference of the sequence . If is semi-algebraic, the new function is then Kurdyka-ojasiewicz. And then, we build sufficient descent involving the new function and . We denote , which is a composition of , in (3.3). In the -th iteration, the distance between subdifferential of the new function and the origin is bounded by the composition of , and . And then, we prove the finite length of provided is also summable. In proving the finite length, the key part is using the Kurdyka-ojasiewicz property of the new Lyapunov function. The proof techniques are motivated by the methodology proposed in [5].

1.2 Related work

Recently, the convergence analysis in nonconvex optimization has paid increasing attention to using the Kurdyka-ojasiewicz property in proofs. In paper [3], the authors proved the convergence of proximal algorithm minimizing the Kurdyka-ojasiewicz functions. In [3], the rates for the iteration converging to a critical point were exploited. An alternating proximal algorithm was considered in [4], and the convergence was proved under Kurdyka-ojasiewicz assumption on the objective function. Later, a proximal linearized alternating minimization algorithm was proposed and studied in [9]. A convergence framework was given in [5], which contains various nonconvex algorithms. In [14], the authors modified the framework for analyzing splitting methods with variable metric, and proved the general convergence rates. The nonconvex ADMM was studied under Kurdyka-ojasiewicz assumption by [20, 21]. And latter paper [32] proposed the nonconvex primal-dual algorithm and proved the convergence. The Kurdyka-ojasiewicz-analysis convergence method was applied to analyzing the convergence of the reweighted algorithm by [35]. And the extension to the reweighted nuclear norm version was developed in [34]. Recently, the DC algorithm has also employed the Kurdyka-ojasiewicz property in the convergence analysis [2].

1.3 Contribution and organization

In this paper, we focus on the inexact nonconvex algorithms. We first propose a new framework (1.6), which is more general than the frameworks proposed in [5] and [14]. The convergence is proved for any sequence satisfying (1.6) with and if is a Kurdyka-ojasiewicz function. In the analysis, we employ the new Lyapunov function which is a composition of the and the length of the noise. The new framework proposed in this paper indicates kinds of algorithms. We then apply our results to these algorithms. For a specific algorithm, we just need to verify that (1.6) and (1.7) hold.

The rest of the paper is organized as follows. In section 2, we list necessary preliminaries. Section 3 contains the main results. In section 4, we provide the applications. Section 5 concludes the paper.

2 Preliminaries

This section presents the mathematical tools which will be used in our proofs and contains two parts: in the first one, we introduce the basic definitions and properties for subdifferentials; in the second one, the KŁ property is introduced.

2.1 Subdifferential

More details about the definition of subdifferential can be found in the textbooks [27, 28]. Given an lower semicontinuous function , its domain is defined by

The notion of subdifferential plays a central role in variational analysis.

Definition 1 (subdifferential).

Let be a proper and lower semicontinuous function.

  1. For a given , the Frchet subdifferential of at , written

    , is the set of all vectors

    which satisfy

    When , we set .

  2. The (limiting) subdifferential, or simply the subdifferential, of at , written , is defined through the following closure process

It is easy to verify that the Frchet subdifferential is convex and closed while the subdifferential is closed. When is convex, the definition agrees with the subgradient in convex analysis as

The graph of subdifferential for a real extended valued function is defined by

And the domain of the subdifferential of is given as

Let be a sequence in such that . If converges to as and converges to as , then . A necessary condition for to be a minimizer of is

(2.1)

When is convex, (2.1) is also sufficient. A point that satisfies (2.1) is called (limiting) critical point. The set of critical points of is denoted by .

2.2 Kurdyka-Łojasiewicz function

With the definition of subdifferential, we now are prepared to introduce the Kurdyka-Łojasiewicz property and function.

Definition 2.

[22, 18, 7] (a) The function is said to have the Kurdyka-Łojasiewicz property at if there exist , a neighborhood of and a continuous concave function such that

  1. .

  2. is on .

  3. for all , .

  4. for all in , the Kurdyka-Łojasiewicz inequality holds

    (2.2)

(b) Proper lower semicontinuous functions which satisfy the Kurdyka-Łojasiewicz inequality at each point of are called KL functions.

It is hard to directly judge whether a function is Kurdyka-Łojasiewicz or not. Fortunately, the concept of semi-algebraicity can help to find and check a very rich class of Kurdyka-Łojasiewicz functions.

Definition 3 (Semi-algebraic sets and functions [7, 8]).

(a) A subset of is a real semi-algebraic set if there exists a finite number of real polynomial functions such that

(b) A function is called semi-algebraic if its graph

is a semi-algebraic subset of .

Better yet, the semi-algebraicity enjoys many quite nice properties [7, 8]. We just put a few of them here:

  • Real polynomial functions.

  • Indicator functions of semi-algebraic sets.

  • Finite sums and product of semi-algebraic functions.

  • Composition of semi-algebraic functions.

  • Sup/Inf type function, e.g., is semi-algebraic when is a semi-algebraic function and a semi-algebraic set.

  • Cone of PSD matrices, Stiefel manifolds and constant rank matrices.

Now we present a lemma for the uniformized KŁ property. With this lemma, we can make the proofs much more concise.

Lemma 1 ([9]).

Let be a proper lower semi-continuous function and be a compact set. If is a constant on and satisfies the KŁ property at each point on , then there exists concave function satisfying the four assumptions in Definition 2 and such that for any and any satisfying that and , it holds that

(2.3)

3 Convergence analysis

The sequence is assumed to satisfy

(3.1)

It is worth mentioning that the assumption (3.1) is necessary to guarantee the sequence convergence in general case. To see this, we consider the inexact gradient example (1.4) in a very special case that . And then, we get . Further, we consider the one-dimensional case, in which ; we set . In this example, will diverge if (3.1) fails to hold. However, in our proofs, only (3.1) barely promises the sequence convergence. The final assumption for the sequence convergence is a little stronger than (3.1).

Now, we introduce the Lyapunov function used in the analysis. Given any fixed , we denote a new function, which plays an important role in the analysis, as

(3.2)

We also need to define the new sequences as

(3.3)

Due to that when and is larger enough, is well-defined. The aim in this part is proving that generated by the algorithm converges to a critical point of , and building the relationships between the critical points of and . The proof contains two main steps:

  1. Find a positive constant such that

  2. Find another positive constants such that

Lemma 2.

Assume that is generated by the general inexact algorithm satisfying conditions (1.6) and (1.7), and condition (3.1) holds. Then, we have the following results.

(1) It holds that

(3.4)

And then, is bounded if is coercive.

(2) , which implies that

(3.5)
Proof.

(1) From the direct algebra computations, we can easily obtain

(3.6)

If is coercive, then is coercive. Thus, is bounded due to that is bounded.

(2) From (3.4), is descending. Note that , is convergent. Hence, we can easily have that

With (1.7), we then prove the result. ∎

Lemma 3.

If the conditions of Lemma 2 hold,

(3.7)
Proof.

Direct calculation yields

(3.8)

Thus, we have

(3.9)

In the following, we establish some results about the limit points of the sequence generated by the general algorithm. We need a definition about the limit point which is introduced in [5].

Definition 4.

For a sequence , define that

where is the starting point.

Lemma 4.

Suppose that is generated by general algorithm and is coercive. And the conditions of Lemma 2 hold. Then, we have the following results.

(1) For any , we have and .

(2) is nonempty and .

(2’) is nonempty and

(3) .

(3’) .

(4) The function is finite and constant on .

(4’) The function is finite and constant on .

Proof.

(1) Noting , and

(2) It is easy to see the coercivity of . With Lemma 2 and the coercivity of , is bounded. Thus, is nonempty. Assume that , from the definition, there exists a subsequence . From Lemmas 2 and 3, we have . The closedness of indicates that , i.e. .

(2’) With the facts and , we can easily derive the results.

(3)(3’) This item follows as a consequence of the definition of the limit point.

(4) Let be the limit of . There exists one stationary point , from the continuity condition, there exists satisfying . We denote that . Thus, the subsequence and . And we have

(4’) The proof is similar to (4).

Lemma 5.

Suppose that is a closed semi-algebraic function and coercive. Let the sequence be generated by general scheme and the conditions (1.6) and (1.7) hold. If there exists such that the sequence satisfies

(3.10)

Then, the sequence has finite length, i.e.

(3.11)

And converges to a critical point of .

Proof.

Obviously, is semi-algebraic, and then KŁ. Let be a cluster point of , then, is also a cluster point of . If for some , with the fact is decreasing, as . Using Lemma 2, as . In the following, we consider the case . From Lemmas 1 and 4, there exist such that for any and any satisfying that and . From Lemma 4, as is large enough,

Thus, there exist concave function such that

(3.12)

Therefore, we have

where is due to the concavity of , and depends on Lemma 2, uses the KŁ property, and follows from Lemma 3. That is also

(3.13)

where uses the Schwarz inequality with , and , and . Multiplying (3) with , we have

(3.14)

Summing both sides from to , and with simplifications,

(3.15)

Letting and using and , we then derive

(3.16)

By using (1.7), we are then led to

(3.17)

Thus, has only one stationary point . From Lemma 4, . ∎

The requirement (3.10) is complicated and impractical in the applications. Thus, we consider the sequence enjoys the polynomial forms as with . We try to simplify (3.10) in this case. The task then reduce the following mathematical analysis problem: find the minimum such that for any , there exists can make (3.10) hold. Direct calculations give us

(3.18)

Thus, we need

(3.19)

After simplifications, we get

(3.20)

Then, the problem reduces to

(3.21)

Figure 1 shows the function values between . We can see is decreasing to at . Therefore, we get . That is also to say if with any fixed , there exists such that (3.10) can hold. And then, the sequence is convergent to some critical point of . Therefore, we obtain the following result.

Theorem 1 (Convergence result).

Suppose that is a closed semi-algebraic function and coercive. Let the sequence be generated by general scheme and the conditions (1.6) and (1.7) hold. The sequence obeys

(3.22)

Then, the sequence has finite length, i.e.

(3.23)

And converges to a critical point of .


Figure 1: Function on the interval

4 Applications to several nonconvex algorithms

In this part, several classical nonconvex inexact algorithms are considered. We apply our theoretical findings to these algorithms and derive corresponding convergence results for the algorithms. As presented before, we just need to check whether the algorithm satisfies the three conditions in (1.6). For a closed function (may be nonconvex) , we denote

(4.1)

Different with convex cases, the is a point-to-set operator and may have more than one solution. We present a useful lemma which plays a very important role in the analysis.

Lemma 6.

For any and , if ,

(4.2)

Of course, we also have

(4.3)

In subsections 4.1-4.4, the point is itself, i.e., .

4.1 Inexact nonconvex gradient and proximal algorithm

The nonconvex proximal gradient algorithm is developed for the nonconvex composite optimization

(4.4)

where is differentiable and is Lipschitz with , and is closed. And both and may be nonconvex. The nonconvex inexact proximal gradient algorithm can be described as

(4.5)

where is the stepsize, prox is the proximal operator and is the noise. In the convex case, this algorithm is discussed in [38, 29], and the acceleration is studied in [31].

Lemma 7.

Let and the sequence be generated by algorithm (4.5), we have

(4.6)
Proof.

The -Lipschitz of gives

(4.7)

On the other hand, with Lemma 6, we have

(4.8)

This is also

(4.9)

Summing (4.7) and (4.9),

(4.10)

With the Cauchy-Schwarz inequality, we have

(4.11)

Combining (4.11) and (4.10), we then prove the result. ∎

Lemma 8.

Let the sequence be generated by algorithm (4.5), we have

(4.12)
Proof.

We have

(4.13)

Therefore,

(4.14)

Thus, we have

(4.15)