# Convergence Analysis of Alternating Nonconvex Projections

We consider the convergence properties for alternating projection algorithm (a.k.a alternating projections) which has been widely utilized to solve many practical problems in machine learning, signal and image processing, communication and statistics. In this paper, we formalize two properties of proper, lower semi-continuous and semi-algebraic sets: the three point property for all possible iterates and the local contraction prop- erty that serves as the non-expensiveness property of the projector, but only for the iterates that are closed enough to each other. Then by exploiting the geometric properties of the objective function around its critical point,i.e. the Kurdyka-Lojasiewicz(KL)property, we establish a new convergence analysis framework to show that if one set satisfies the three point property and the other one obeys the local contraction property, the iterates generated by alternating projections is a convergent sequence and converges to a critical point. We complete this study by providing convergence rate which depends on the explicit expression of the KL exponent. As a byproduct, we use our new analysis framework to recover the linear convergence rate of alternating projections onto closed convex sets. To illustrate the power of our new framework, we provide new convergence result for a class of concrete applications: alternating projections for designing structured tight frames that are widely used in sparse representation, compressed sensing and communication. We believe that our new analysis framework can be applied to guarantee the convergence of alternating projections when utilized for many other nonconvex and nonsmooth sets.

• 29 publications
• 78 publications
04/05/2021

### Alternating projections with applications to Gerchberg-Saxton error reduction

We consider convergence of alternating projections between non-convex se...
10/10/2021

### Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
02/09/2016

### Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods

In this paper, we study the Kurdyka-Łojasiewicz (KL) exponent, an import...
01/19/2020

### On Dykstra's algorithm: finite convergence, stalling, and the method of alternating projections

A popular method for finding the projection onto the intersection of two...
08/30/2022

### Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

Training deep neural networks (DNNs) is an important and challenging opt...
09/30/2014

### Douglas-Rachford splitting for nonconvex optimization with application to nonconvex feasibility problems

We adapt the Douglas-Rachford (DR) splitting method to solve nonconvex f...
06/06/2021

### Method of Alternating Projection for the Absolute Value Equation

A novel approach for solving the general absolute value equation Ax+B|x|...

## 1 Introduction

We consider the problem of finding the minimum Euclidean distance between two sets:

 minimizex∈X,y∈Y g(x,y)=∥x−y∥22, (1)

where and are two nonempty closed subsets of and are possibly nonconvex. A simple but popular approach for solving (1) is the alternating projection method (a.k.a the alternating projections) which alternatingly projects the iterates onto the sets and :

 xk+1∈argminx∈X g(x,yk)=PX(yk),yk+1∈argminy∈Y g(xk+1,y)=PY(xk+1). (2)

Here for a closed subset , represents the orthogonal projection onto , that is,

 PV(u):=arg~{}minv∈V∥v−u∥22.

In case there exist more than one choice for (or ) in (2), we pick any of them. The alternating projection method for solving (1) is depicted in Algorithm 1.

Alternating projections has been widely utilized for solving practical problems provided an efficient way for solving (2) (i.e., the orthogonal projection onto the sets and ). Compared with gradient-based local search algorithms (such as gradient descent), the alternating projection method is step-size free and has faster empirical convergence speed. Choosing an appropriate step-size is one of the major challenges in gradient-based optimization algorithms. It is easy to implement alternating projection method for many practical applications due to the fact that there is no need to tune the step-size and we only require to solve (2) which admits a closed-form solution (i.e., the orthogonal projection onto the sets and ) for many cases. Typical applications include system feasibility problem where alternating projection method has been successfully employed for solving linear and nonlinear system of equations; see [1, 2, 3]. Alternating projection has been widely applied for convex feasibility problem; see [4] for a comprehensive view. In the area of image restoration, Youla et al. [5]estimated the image from its incomplete observation by recursively computing projections onto closed convex sets and provided theoretical convergence analysis if the underlying ground truth image lies in the intersection of these convex sets; this was further extended in [6] where the revised alternating projection method allows parallel computing and inexact projection at each step. In signal processing and inverse problem, Bauschke et al. [7] formulated the classical phase retrieval problem into the minimum Euclidean distance framework (1), and Byrne [8] presented a unified treatment for many iterative algorithms in signal processing and inverse problem via an alternating projection perspective. We refer the readers to [9] and the references therein for many other applications involving alternating projections.

Although the alternating projections have been known to work surprisingly well in practice, it remains an active research area to fully understand the theoretical foundation of this phenomenon, especially the convergence behaviors for these methods. Our main interest is the convergence result guaranteeing that the sequence of iterates is convergent and satisfies certain optimality conditions.

### 1.1 Previous related work

Alternating projections has long history which can be traced back to John Von Neumann [10], where the alternating projection between two closed subspaces of a Hilbert space is guaranteed to globally converge to a intersection point of the two subspaces, if they intersect non-trivially. Aronszain [11] proved that the rate of convergence is linear depending on the principal angle between the two subspaces. Bregman [12] extended the alternating projection onto subspaces to projection onto closed convex sets (POCS) with almost similar convergence guarantee. The convergence rate of POCS is known to be linear if the relative interiors of the two convex sets intersect to each other [13]. See [4] for a comprehensive survey on POCS. Alternating projections has also been widely utilized when the sets do not intersect. It has been pointed out in [14] that alternating projections is convergent and converges to a pair of points in and that have Euclidean minimum distance when the two sets are closed convex sets.

Unlike alternating projections between convex sets, the theoretical results for alternating projections between nonconvex sets are limited. Tropp et al. [15] have applied the theorem of Meyer [16] to obtain subsequence convergence results for alternating projections when utilized for a class of nonconvex sets onto which the orthogonal projection is unique. Certain properties of the nonconvex sets have been imposed to obtain stronger convergence results. Lewis et al. [17] utilized the notion of regularity of the intersection between the two sets. In particular, if the two sets have linear regular intersection and at least one set is super-regular at a common point in the intersection area, the alternating projection algorithm is proved to converge to this common point at a linear rate provided that the algorithm is initialized at a point that is close enough to this common point [17]. Recently, Drusvyatskiy et al. [18] proved that if the two sets intersect transversally at a common point and Algorithm 1 starts with a point close enough to this common point, then the alternating projection algorithm converges linearly to this common point without the assumption that one set is super-regular at the common point.

Another closely related method to alternating projections is the Gauss-Seidel method, also known as the alternating optimization, which aims to solve the problems similar to (1) but with general objective functions, i.e.,

 minimizex∈X,y∈Y Φ(x,y). (3)

Alternating minimization solves (3) with the same approach as in (2) (by replacing by ) that keeps one variable constant and optimizes the other variable. In this sense, alternating projections belongs to alternating minimization which has also been widely utilized in a variety of applications, such as blind deconvolution [19], interference alignment [20], image reconstruction [21], matrix completion [22]

, and so on. Note that though the idea that alternatively updates the variables by solving the subproblems exactly is quite simple and heuristic, the convergence analysis for alternating minimization is far more complicated as the algorithm appears. In particular, we may not even guaranteed that the alternating minimization converges in the sense that the limit points of the sequence generate by the algorithm are critical points of the problem

[23]. For example, Powell [24] constructed a counter example revealing that the Gauss-Seidel method may cycle indefinitely without converging to a critical point when the problem has three variables. Thus, additional properties on the problems (3) are required to have certain convergence guarantee for the alternating minimization. The convergence of the alternating minimization under a strong convexity assumption was studied in [25]. If the minimum with respect to each block of variables is unique, Bertsekas [26] showed that any limit point of the sequence generated by the alternating minimization is a critical point. When both and are closed convex sets and the objective function exhibits strict quasiconvexity with respect to each variable, Grippo and Sciandrone [23] provided subsequence convergence results that characterize certain properties of the limit points of the sequence generated by the alternating minimization. Csiszár and Tusnády [27] provided the convergence of the alternating minimization in terms of the objective function values under the assumptions of the so-called three-point property and the four-point property [27]; see [28] for a comprehensive review. Several results on the convergence rate of the method for solving convex minimization problems have been established in the literature. Luo and Peng [29] established a linear rate of convergence of the alternating minimization under a set of assumptions such as strong convexity with respect to each variable and local error bound of the objective function. A sublinear convergence rate for the sequence of the function values was obtained in [30, 31] under general convexity assumptions (and not strong convexity).

We finally mention another closely related recent works in proximal algorithms including proximal alternating minimization [32] and proximal alternating linearlized minimization [33]. Under the assumption that the objective function satisfies the so-called Kurdyka-Łojasiewicz (KL) inequality [34, 35], the convergence of the iterates sequence generated by the proximal alternating algorithms was established in [36, 37, 32, 33] for general nonsmooth optimization that is not required to be convex. As pointed out by Bolte et al. [36, 37], the KL inequality is quite universal in the sense that if a function is proper, lower semi-continuous and semi-algebraic or sub-analytical, the function satisfies the KL inequality at any point in its effective domain; see also [33, Theorem 5.1]. The KL property is proved to be very useful for analyzing the convergence behavior of proximal type algorithms solving general nonsmooth and nonconvex problems [38, 32, 39, 33].

### 1.2 Outline and our contributions

In this paper, we provide new convergence results for alternating projections (i.e, Algorithm 1) when applied for nonconvex sets and that satisfy creftypecap 1 in Section 2. One of our main result (creftypecap 1) in Section 2 can be summarized as follows: Assume that the sets satisfy the three-point property and the local contraction property (see creftypecap 1). Then the sequence generated by the alternating projections is convergent and converges to a critical point (1). The underpinning fact from which the new result is established is the utilization of the three point property to guarantee the asymptotic regular property of the sequence in terms of one variable and the local contraction property to ensure similar asymptotic regular property of the sequence in terms of the other variable. The sequence convergence property is then obtained by exploiting the KL property of the objective function. We complete this result by the study of the convergence rate which depends on the explicit expression for the KL exponent characterizing the geometrical properties of the problem around its critical points.

Let be the sequence of iterates generated by the alternating projections. We now give some insights into our proof strategy which is of independent interest.

• (partial) sufficient decrease property: Utilizing the three-point property, we find a positive constant such that

 g(xk−1,yk−1)−g(xk,yk)≥α∥yk−1−yk∥22, (4)

which guarantees the asymptotic regular property of , i.e., . This together with the local contraction property gives the asymptotic regular property of ;

• safeguard property: find a positive constant such that

 ∥dk∥2≤c∥yk−yk−1∥2,dk∈∂f(xk,yk);
• using the KL property to show that the sequence is a Cauchy sequence.

We note that the first two requirements are slightly different from the standard ones that shared by most descent algorithms [38, 32, 33]. We use the first requirement as an example to illustrate the difference. As pointed out in [38, 32, 33], the standard sufficient decrease property has the form

 g(xk−1,yk−1)−g(xk,yk)≥α(∥yk−1−yk∥22+∥xk−1−xk∥22), (5)

which is stronger than (4). The partial sufficient decrease property in (4) that depends on the iterates gap of only one variable provides us the freedom to put different requirements on the two sets. Typical examples for satisfying the three-point property (and hence (4)) include convex sets and unit spheres. The assumption of the local contraction property (see (7)) on the set is very mild as it basically requires that is small when converges to 0. On the other hand, the classical sufficient decrease property (5) depends on the iterates gap of both variables and thus adds similar requirement on both sets.

Unlike the convergence results in [17] and [18] that require the two sets intersect each other and an initialization that is near the intersection area, our result can be applied to any two sets that have an empty intersection. Checking if the two sets intersect each other is non-trivial; it is even harder to find such a proper initialization that is close enough to the intersection area. Also, as the examples given in Section 3, it is common that the two sets do not intersect each other and the goal is to find a pair of points that have minimum distance.

As the subspaces and closed convex sets automatically satisfy the three-point property and the local contraction property (see (6) and (7)), our results cover the global iterates sequence convergence result (with linear rate convergence) for alternating projection onto subspaces and closed convex sets [14]. However, our proof technique differs to the most existing ones for analysing the convergence of alternating projections onto subspaces or closed convex sets [14] in that we exploit the geometric properties of the objective function around its critical points (i.e, the KL property). The KL property enables us to apply our convergence results to general closed nonconvex, semi-algebraic sets that obey the three-point property and the local contraction property. The KL property has also been utilized to address the convergence issue of the alternating projections for general nonconvex sets and in[32]. In particular, Attouch et al. [32] provided a revised version of alternating projection method with guaranteed sequence convergence. With a proximal regularizer, and are updated [32] respectively by and with rather than as in Algorithm 1. The proximal regularizers and ensure the convergence of the corresponding algorithm. However, Algorithm 1 is widely utilized for practical applications as it is a very simple algorithm and decreases the objective function in (1) most in each step. Thus, we stress out that our main interest is to provide convergence analysis for the alternating projectings, rather than providing new algorithms for solving (1). In particular, the sequence convergence result for Algorithm 1 under certain conditions on the sets and (see creftypecap 1) provides theoretical guarantees for the practical utilization of the naive or classical alternating projections.

To illustrate the power of our convergence analysis framework, we give new convergence results for a class of concrete applications: designing structured tight frames via Algorithm 1 [15]. Tight frame is a generalization of orthonormal basis and it has wide applications in communication and signal processing. For example, equiangular tight frame is a natural choice for sparsely representing signals as it has lower mutual coherence and thus has been extensively utilized in sparse representation and sensing matrix design for compressed sensing system [40, 15, 41, 42, 43, 44]. Also designing tight frames with prescribed column norm is crucial for direct sequence-code division multiple access (DS-CDMA) in communication [15] as it is directly related to the construction of the optimal signature sequences. As stylized applications of creftypecap 1, in Section 3, we provide sequence convergence that improves upon the previous subsequence convergence result in [15] for designing structured tight frames via alternating projections.

## 2 Convergence Analysis for alternating projections

###### Definition 1.

[33]Let be a proper and lower semi-continuous function, whose domain is defined as

 domh:={u∈Rd:h(u)<∞}.

The Fréchet subdifferential of at is defined by

 ˆ∂h(u)={z:limv→u,v≠uinfh(v)−h(u)−⟨z,v−u⟩∥u−v∥≥0}

for any and if .

The limiting subdifferential of at is defined as follows

 ∂h={z:∃uk→u,h(uk)→h(u),zk∈ˆ∂h(uk)→z}

We say a limiting critical point of if it satisfies the first-order optimality condition . Throughout the paper, when it is clear from the context, we omit the word “limiting” and just call and as the subdifferential and critical point of , respectively. The following KL property characterizes the local geometric properties of the objective function around its critical points and is proved to be pretty useful for convergence analysis [38, 32, 39, 33].

###### Definition 2.

[38] A proper semi-continuous function is said to satisfy the Kurdyka-Lojasiewicz (KL) property, if for any critical point of , there exist such that

 ∣∣h(u)−h(¯¯¯¯u)∣∣θ≤C1dist(0,∂h(u)),  ∀ u∈B(¯¯¯¯u,δ).

Here is often referred to as the KL exponent.

We then give out the main assumption we made in this paper to show the convergence of alternating projections.

###### Assumption 1.

Let and be any two closed semi-algebraic sets, and let be the sequence of iterates generated by the alternating projection method (i.e., Algorithm 1). Assume the sequence is bounded and the sets and obey the following properties:

1. three-point property of : there exists a nonnegative function with such that and

 δα(yk−1,yk)+g(xk,yk)≤g(xk,yk−1), ∀ k≥1; (6)
2. local contraction property of : there exist and such that when , we have

 ∥xk+1−xk∥=∥PX(yk)−PX(yk−1)∥2≤β∥yk−yk−1∥2. (7)

This three-point property (6) along with a so-called four-point property has been widely utilized for proving the convergence of the sequence (rather than the iterates ) generated by alternating minimization [27, 45]. As we consider the convergence of the iterates, the function in (6) is slightly stronger than the one in [27, 45], where the function is only required to be positive, i.e, for all and . We note that the three-point property (6) mostly characterizes a certain property regarding the set . A typical example satisfying this three-point property (6) is a convex and closed set which obeys (6) for any with since

 g(xk,yk−1)−g(xk,yk)=∥xk−yk−1∥22−∥xk−yk∥22=∥xk−yk+yk−yk−1∥22−∥xk−yk∥22=∥yk−yk−1∥22+2⟨xk−yk,yk−yk−1⟩≥∥yk−yk−1∥22, (8)

where the last inequality follows from the fact that is a closed convex set such that

 ⟨x−PY(x),PY(x)−y′⟩≥0, ∀ y′∈Y. (9)

Another example is the unit sphere which satisfies (6) for any that is not zero. In particular, for any , its projection onto is defined as

 PY(x)={x∥x∥2,x≠0,u,x=0, (10)

where

represents an arbitrary unit vector. Now by defining

, for any , we have

 ∥x−y′∥22−∥x−y∥22=2xTs−2xTy′=∥x∥2(2yTy−2yTy′)=∥x∥2(yTy−2yTy′+∥y′∥22)=∥x∥2∥y−y′∥22, (11)

where the first line utilizes and the second line follows from . It is clear from (11) that the set obeys the three point property (6) for all that is away from zero. With this example, we stress that the three point property (6) of is only required to hold for all possible iterates rather than for any .

The local contraction property of in (7) is mild and it basically requires the projections of and onto are not far away when is close enough to . This property is expected to hold if we want to guarantee the convergence of the alternating projections. Similarly, a typical example satisfying this local contraction property (7) is a closed convex set with and be arbitrary positive number in (7):

 ∥PX(yk)−PX(yk−1)∥2≤∥yk−yk−1∥2. (12)

for arbitrary (not only the algorithm trajectory). (12) is also known as the non-expensiveness property of orthogonal projector onto the convex set. To see this, utilizing the property (9) for the convex set , , we have

 ⟨yk−PX(yk),PX(yk−1)−PX(yk)⟩≤0,  ⟨yk−1−PX(yk−1),PX(yk)−PX(yk−1)⟩≤0.

Summing up the above two inequalities gives

 ∥PX(yk−1)−PX(yk)∥22≤⟨yk−1−yk,PX(yk−1)−PX(yk)⟩,

which implies the desired non-expensiveness property (12) by applying the Cauchy-Schwarz inequality that .

The other examples satisfying the local contraction property (7) include the set of tight frames which is presented in Section 3.

### 2.1 Convergence to a critical value

We transfer the constrained problem into the following equivalent form without any constraints:

 f(x,y)=g(x,y)+δX(x)+δY(y), (13)

where (and ) is the indicator function of the set (and ).

To simplify the notation, we stack and into one variable as . With creftypecap 1 , we begin by showing the convergence of and that the sequence is regular (i.e., ) in the following result.

###### Lemma 1.

Under creftypecap 1 and , we have the following assertions.

1. We have for some positive

 f(zk−1)−f(zk)≥α∥yk−1−yk∥22, ∀ k≥1, (14)

and

 limk→∞∥yk−yk−1∥2=0,  limk→∞∥xk−xk−1∥2=0. (15)
2. The sequence is monotonically decreasing and convergent.

3. Denote by for all . Then

 dk∈∂f(xk,yk). (16)
###### Proof of creftypecap 1.

Show : Utilizing the fact that and invoking (6) gives

 f(xk,yk−1)−f(zk)≥α∥yk−1−yk∥22. (17)

It follows from the fact that

 f(xk,yk−1)≤f(zk−1),

which together with (17) gives

 f(zk−1)−f(zk)≥α∥yk−1−yk∥22.

Hence the function value sequence is convergent since . Repeating (14) for all and summing them up, we have

 ∞∑k=1∥yk−1−yk∥22≤1α∞∑k=1f(zk−1)−f(zk)≤1αf(x0,y0),

which immediately implies

 limk→∞∥yk−yk−1∥2=0.

The above equation implies that for any , there exists such that . Picking such that (7) holds, we have

 ∥xk+1−xk∥2≤β∥yk+1−yk∥2

for all since by assumption for all . Letting , we conclude

 limk→∞∥xk−xk−1∥2=0.

Show : from the statement , we have

 f(zk−1)≥f(zk),    ∀ k≥1,

which together with the fact that gives that the sequence is monotonically decreasing and lower bounded, hence convergent.

Show : By the definition of , must lie in the subdifferential at of the function . Hence

 0∈2(xk−yk−1)+∂δX(xk). (18)

And similarly

 0∈∂yf(xk,yk).

Noting that

 ∂xf(xk,yk)=2(xk−yk)+∂δX(xk),

which together with (18) gives

 2(xk−yk)−2(xk−yk−1)=2(yk−1−yk)∈∂xf(xk,yk).

Thus, we have

 (2(yk−1−yk),0)∈∂f(xk,yk).

This completes the proof for creftypecap 1. ∎

creftypecap 1 ensures a sufficient decrease of the objective function after one step update of and . However, we note that the sufficient decrease guaranteed by (14) is slightly different than the classical one in convergence analysis (like in [32]) where for some is required.

Let denote the set of limit points of , i.e.,

 L(z0)={¯¯¯z∈Rn×Rn:∃ anincreasingsequenceofintergers {km}m∈N, suchthatlimm→∞zkm=¯¯¯z}.

The following result establishes several properties of the limit points set .

###### Lemma 2.

Under creftypecap 1, obeys the following properties.

1. is a nonempty compact connected set and the iterates sequence satisfies

 limk→∞dist(zk,L(z0))=0 (19)
2. The objective function is finite and constant on and

 limk→∞f(zk)=f(z⋆), ∀ z⋆∈L(z0). (20)
3. Any is a critical point of (13).

###### Proof of creftypecap 2.

Show : It is clear that has at least one convergent subsequence since by assumption the sequence is bounded. Also we have , since lies in a closed set and it is bounded, the set is compact for any . We conclude that is compact by interpreting it as the intersection of compact sets. The connectedness of is a direct consequence of and some classical properties of sequences in ; see [33, Lemma 3.5]. Finally, (19) follows from classical properties of sequences in .

Show : we extract an arbitrary convergent subsequence from with limit . Since we have and for all , it follows from the closedness of and that and

 δX(x⋆)=δX(xkm)=0,  δY(y⋆)=δY(ykm)=0,    ∀ m≥1,

which together with the fact that is a continuous function gives

 limm→∞f(zkm)=limm→∞g(zkm)+δX(xkm)+δY(ykm)=f(z⋆).

Now utilizing the statement in creftypecap 1 that the sequence is convergent, we have

 f(z⋆)=limm→∞f(zkm)=limk→∞f(zk).

Thus the objective function is constant on since is the limit point of any convergent subsequence.

Show : It follows from (15) and (16) that and

 limk→∞dk=0.

Now for any convergent subsequence with limit , we have that belongs to the graph of and . By invoking (20) and the definition of , we immediately conclude that belongs to the graph of , hence

 0∈∂f(z⋆),

which implies that any limit point of is a critical point for (1).

### 2.2 Convergence to a critical point

The following result establishes that obeys the KL property at .

###### Lemma 3.

There exist uniform constants and such that

 |f(z)−f(z⋆)|θ≤Cdist(0,∂f(z)) (21)

for any and with .

###### Proof of creftypecap 3.

Under the semi-algebraic assumption of sets and , we immediately conclude that the indicator functions and are semi-algebraic. We then have satisfies the KL property at any point in its effective domain, since it is lower semi-continuous and semi-algebraic [37]. The remaining proof follows from creftypecap 1 in [38] and creftypecap 2. ∎

###### Theorem 1.

Under creftypecap 1, the sequence is convergent and converges to a critical point of (13).

###### Proof of creftypecap 1.

Invoking (19), we know there exists such that for all . Now from the concavity of the function with domain , we have

 (f(zk+1)−f(z⋆))1−θ≤(f(zk)−f(z⋆))1−θ+(1−θ)f(zk+1)−f(zk)(f(zk)−f(z⋆))θ.

Thus, for all ,

 (f(zk)−f(z⋆))1−θ−(f(zk+1)−f(z⋆))1−θ ≥(1−θ)f(zk)−f(zk+1)|f(zk)−f(z⋆)|θ ≥(1−θ)∥yk−yk+1∥22Cdist(0,∂f(xk,yk)) ≥1−θC∥yk−yk+1∥222∥yk−1−yk∥2 =1−θ2C(∥yk−yk+1∥22∥yk−1−yk∥2+∥yk−1−yk∥2−∥yk−1−yk∥2) ≥1−θ2C(2∥yk−yk+1∥2−∥yk−1−yk∥2). (22)

where the third line follows from (14) and (21), the forth line utilizes (16). Repeating the above equation for from to and summing them gives

 ∞∑k=k2∥yk+1−yk∥2≤2C1−θ(f(zk2)−f(z⋆))1−θ+∥yk2−1−yk2∥2<∞, (23)

which implies that the series is convergent. Thus,

 limsupm→∞,m1,m2≥m m2∑k=m1∥yk+1−yk∥2=0

From triangle inequality we have

 limsupm→∞,m1,m2≥m m2∑k=m1∥yk+1−yk∥2≥limsupm→∞,m1,m2≥m ∥∥ym2+1−ym1∥∥2,

which gives that . Thus the sequence is Cauchy, hence it is convergent.

Due to , there exists such that for all , where is a fixed constant defined in local contraction property in creftypecap 1. It then follows from (7) that

 ∥xk+1−xk∥2≤β∥yk−yk−1∥2, ∀ k≥max{k0,k1}.

Now invoking (23) gives

 ∞∑k=max{k0,k1,k2}∥xk+1−xk∥2≤β∞∑k=max{k0,k1,k2}∥yk−yk−1∥2<∞,

which (with a similar argument for ) implies that the sequence is convergent. ∎

### 2.3 Convergence rate

creftypecap 1 reveals that the sequence is convergent. Given the explicit KL exponent in creftypecap 3, we can have the convergence rate concerning how fast the sequence converges to its limit point. We note that the connection between convergence rate and the KL exponent has been populated exploited in [38, 32, 33]. The following result establishes the convergence rate for the sequence based on the explicit KL exponent .

###### Theorem 2.

(convergence rate) Suppose the sequence is generated by Algorithm 1 and converges to a critical point , and assume the function obeys the KL property with the KL exponent at this critical point . Then we have

1. if , converges to in a finite step.

2. if , then there exist a , and a positive integer such that

 ∥zk−z⋆∥2≤˜c⋅ρk,  ∀ k≥˜k.
3. if , then there exist a and a positive integer such that

 ∥zk−z⋆∥2≤¯¯c⋅k−1−θ2θ−1,  ∀ k≥¯¯¯k.
###### Proof of Theorem 2.
111The proof of Theorem 2 shares similar strategies as those in [38, 32, 33]. However, as we explained in Section 1.2, the sufficient decrease property (and also the safeguard property) in (4) which is utilized here is slightly different than the standard one as in (5). Thus, we include the proof of Theorem 2.

Since the function satisfies the KL property at , there exists and such that

 |f(z)−f(z⋆)|θ≤Cdist(0,∂f(z)), ∀ z∈B(z⋆,δ⋆).

It follows from the fact that there exists a positive integer such that for all . This together with the above KL property implies

 |f(zk)−f(z⋆)|θ≤Cdist(0,∂f(zk)), ∀ k≥k2. (24)

In the sequel of the proof, we consider as we utilize (24) to proof the three arguments in Theorem 2.

Show : In the case where , it follows from (24) that

 {dist(0,∂f(zk))≥1,    f(zk)>f(z⋆),dist(0,∂f(zk))=0,    f(zk)=f(z⋆).

Suppose at -th iteration , which implies that . This together with (14) (i.e. ) and (16) (i.e.