We consider the problem of finding the minimum Euclidean distance between two sets:
where and are two nonempty closed subsets of and are possibly nonconvex. A simple but popular approach for solving (1) is the alternating projection method (a.k.a the alternating projections) which alternatingly projects the iterates onto the sets and :
Here for a closed subset , represents the orthogonal projection onto , that is,
Alternating projections has been widely utilized for solving practical problems provided an efficient way for solving (2) (i.e., the orthogonal projection onto the sets and ). Compared with gradient-based local search algorithms (such as gradient descent), the alternating projection method is step-size free and has faster empirical convergence speed. Choosing an appropriate step-size is one of the major challenges in gradient-based optimization algorithms. It is easy to implement alternating projection method for many practical applications due to the fact that there is no need to tune the step-size and we only require to solve (2) which admits a closed-form solution (i.e., the orthogonal projection onto the sets and ) for many cases. Typical applications include system feasibility problem where alternating projection method has been successfully employed for solving linear and nonlinear system of equations; see [1, 2, 3]. Alternating projection has been widely applied for convex feasibility problem; see  for a comprehensive view. In the area of image restoration, Youla et al. estimated the image from its incomplete observation by recursively computing projections onto closed convex sets and provided theoretical convergence analysis if the underlying ground truth image lies in the intersection of these convex sets; this was further extended in  where the revised alternating projection method allows parallel computing and inexact projection at each step. In signal processing and inverse problem, Bauschke et al.  formulated the classical phase retrieval problem into the minimum Euclidean distance framework (1), and Byrne  presented a unified treatment for many iterative algorithms in signal processing and inverse problem via an alternating projection perspective. We refer the readers to  and the references therein for many other applications involving alternating projections.
Although the alternating projections have been known to work surprisingly well in practice, it remains an active research area to fully understand the theoretical foundation of this phenomenon, especially the convergence behaviors for these methods. Our main interest is the convergence result guaranteeing that the sequence of iterates is convergent and satisfies certain optimality conditions.
1.1 Previous related work
Alternating projections has long history which can be traced back to John Von Neumann , where the alternating projection between two closed subspaces of a Hilbert space is guaranteed to globally converge to a intersection point of the two subspaces, if they intersect non-trivially. Aronszain  proved that the rate of convergence is linear depending on the principal angle between the two subspaces. Bregman  extended the alternating projection onto subspaces to projection onto closed convex sets (POCS) with almost similar convergence guarantee. The convergence rate of POCS is known to be linear if the relative interiors of the two convex sets intersect to each other . See  for a comprehensive survey on POCS. Alternating projections has also been widely utilized when the sets do not intersect. It has been pointed out in  that alternating projections is convergent and converges to a pair of points in and that have Euclidean minimum distance when the two sets are closed convex sets.
Unlike alternating projections between convex sets, the theoretical results for alternating projections between nonconvex sets are limited. Tropp et al.  have applied the theorem of Meyer  to obtain subsequence convergence results for alternating projections when utilized for a class of nonconvex sets onto which the orthogonal projection is unique. Certain properties of the nonconvex sets have been imposed to obtain stronger convergence results. Lewis et al.  utilized the notion of regularity of the intersection between the two sets. In particular, if the two sets have linear regular intersection and at least one set is super-regular at a common point in the intersection area, the alternating projection algorithm is proved to converge to this common point at a linear rate provided that the algorithm is initialized at a point that is close enough to this common point . Recently, Drusvyatskiy et al.  proved that if the two sets intersect transversally at a common point and Algorithm 1 starts with a point close enough to this common point, then the alternating projection algorithm converges linearly to this common point without the assumption that one set is super-regular at the common point.
Another closely related method to alternating projections is the Gauss-Seidel method, also known as the alternating optimization, which aims to solve the problems similar to (1) but with general objective functions, i.e.,
Alternating minimization solves (3) with the same approach as in (2) (by replacing by ) that keeps one variable constant and optimizes the other variable. In this sense, alternating projections belongs to alternating minimization which has also been widely utilized in a variety of applications, such as blind deconvolution , interference alignment , image reconstruction , matrix completion 
, and so on. Note that though the idea that alternatively updates the variables by solving the subproblems exactly is quite simple and heuristic, the convergence analysis for alternating minimization is far more complicated as the algorithm appears. In particular, we may not even guaranteed that the alternating minimization converges in the sense that the limit points of the sequence generate by the algorithm are critical points of the problem. For example, Powell  constructed a counter example revealing that the Gauss-Seidel method may cycle indefinitely without converging to a critical point when the problem has three variables. Thus, additional properties on the problems (3) are required to have certain convergence guarantee for the alternating minimization. The convergence of the alternating minimization under a strong convexity assumption was studied in . If the minimum with respect to each block of variables is unique, Bertsekas  showed that any limit point of the sequence generated by the alternating minimization is a critical point. When both and are closed convex sets and the objective function exhibits strict quasiconvexity with respect to each variable, Grippo and Sciandrone  provided subsequence convergence results that characterize certain properties of the limit points of the sequence generated by the alternating minimization. Csiszár and Tusnády  provided the convergence of the alternating minimization in terms of the objective function values under the assumptions of the so-called three-point property and the four-point property ; see  for a comprehensive review. Several results on the convergence rate of the method for solving convex minimization problems have been established in the literature. Luo and Peng  established a linear rate of convergence of the alternating minimization under a set of assumptions such as strong convexity with respect to each variable and local error bound of the objective function. A sublinear convergence rate for the sequence of the function values was obtained in [30, 31] under general convexity assumptions (and not strong convexity).
We finally mention another closely related recent works in proximal algorithms including proximal alternating minimization  and proximal alternating linearlized minimization . Under the assumption that the objective function satisfies the so-called Kurdyka-Łojasiewicz (KL) inequality [34, 35], the convergence of the iterates sequence generated by the proximal alternating algorithms was established in [36, 37, 32, 33] for general nonsmooth optimization that is not required to be convex. As pointed out by Bolte et al. [36, 37], the KL inequality is quite universal in the sense that if a function is proper, lower semi-continuous and semi-algebraic or sub-analytical, the function satisfies the KL inequality at any point in its effective domain; see also [33, Theorem 5.1]. The KL property is proved to be very useful for analyzing the convergence behavior of proximal type algorithms solving general nonsmooth and nonconvex problems [38, 32, 39, 33].
1.2 Outline and our contributions
In this paper, we provide new convergence results for alternating projections (i.e, Algorithm 1) when applied for nonconvex sets and that satisfy creftypecap 1 in Section 2. One of our main result (creftypecap 1) in Section 2 can be summarized as follows: Assume that the sets satisfy the three-point property and the local contraction property (see creftypecap 1). Then the sequence generated by the alternating projections is convergent and converges to a critical point (1). The underpinning fact from which the new result is established is the utilization of the three point property to guarantee the asymptotic regular property of the sequence in terms of one variable and the local contraction property to ensure similar asymptotic regular property of the sequence in terms of the other variable. The sequence convergence property is then obtained by exploiting the KL property of the objective function. We complete this result by the study of the convergence rate which depends on the explicit expression for the KL exponent characterizing the geometrical properties of the problem around its critical points.
Let be the sequence of iterates generated by the alternating projections. We now give some insights into our proof strategy which is of independent interest.
(partial) sufficient decrease property: Utilizing the three-point property, we find a positive constant such that
which guarantees the asymptotic regular property of , i.e., . This together with the local contraction property gives the asymptotic regular property of ;
safeguard property: find a positive constant such that
using the KL property to show that the sequence is a Cauchy sequence.
We note that the first two requirements are slightly different from the standard ones that shared by most descent algorithms [38, 32, 33]. We use the first requirement as an example to illustrate the difference. As pointed out in [38, 32, 33], the standard sufficient decrease property has the form
which is stronger than (4). The partial sufficient decrease property in (4) that depends on the iterates gap of only one variable provides us the freedom to put different requirements on the two sets. Typical examples for satisfying the three-point property (and hence (4)) include convex sets and unit spheres. The assumption of the local contraction property (see (7)) on the set is very mild as it basically requires that is small when converges to 0. On the other hand, the classical sufficient decrease property (5) depends on the iterates gap of both variables and thus adds similar requirement on both sets.
Unlike the convergence results in  and  that require the two sets intersect each other and an initialization that is near the intersection area, our result can be applied to any two sets that have an empty intersection. Checking if the two sets intersect each other is non-trivial; it is even harder to find such a proper initialization that is close enough to the intersection area. Also, as the examples given in Section 3, it is common that the two sets do not intersect each other and the goal is to find a pair of points that have minimum distance.
As the subspaces and closed convex sets automatically satisfy the three-point property and the local contraction property (see (6) and (7)), our results cover the global iterates sequence convergence result (with linear rate convergence) for alternating projection onto subspaces and closed convex sets . However, our proof technique differs to the most existing ones for analysing the convergence of alternating projections onto subspaces or closed convex sets  in that we exploit the geometric properties of the objective function around its critical points (i.e, the KL property). The KL property enables us to apply our convergence results to general closed nonconvex, semi-algebraic sets that obey the three-point property and the local contraction property. The KL property has also been utilized to address the convergence issue of the alternating projections for general nonconvex sets and in. In particular, Attouch et al.  provided a revised version of alternating projection method with guaranteed sequence convergence. With a proximal regularizer, and are updated  respectively by and with rather than as in Algorithm 1. The proximal regularizers and ensure the convergence of the corresponding algorithm. However, Algorithm 1 is widely utilized for practical applications as it is a very simple algorithm and decreases the objective function in (1) most in each step. Thus, we stress out that our main interest is to provide convergence analysis for the alternating projectings, rather than providing new algorithms for solving (1). In particular, the sequence convergence result for Algorithm 1 under certain conditions on the sets and (see creftypecap 1) provides theoretical guarantees for the practical utilization of the naive or classical alternating projections.
To illustrate the power of our convergence analysis framework, we give new convergence results for a class of concrete applications: designing structured tight frames via Algorithm 1 . Tight frame is a generalization of orthonormal basis and it has wide applications in communication and signal processing. For example, equiangular tight frame is a natural choice for sparsely representing signals as it has lower mutual coherence and thus has been extensively utilized in sparse representation and sensing matrix design for compressed sensing system [40, 15, 41, 42, 43, 44]. Also designing tight frames with prescribed column norm is crucial for direct sequence-code division multiple access (DS-CDMA) in communication  as it is directly related to the construction of the optimal signature sequences. As stylized applications of creftypecap 1, in Section 3, we provide sequence convergence that improves upon the previous subsequence convergence result in  for designing structured tight frames via alternating projections.
2 Convergence Analysis for alternating projections
We start with some improtant definitions.
Let be a proper and lower semi-continuous function, whose domain is defined as
The Fréchet subdifferential of at is defined by
for any and if .
The limiting subdifferential of at is defined as follows
We say a limiting critical point of if it satisfies the first-order optimality condition . Throughout the paper, when it is clear from the context, we omit the word “limiting” and just call and as the subdifferential and critical point of , respectively. The following KL property characterizes the local geometric properties of the objective function around its critical points and is proved to be pretty useful for convergence analysis [38, 32, 39, 33].
 A proper semi-continuous function is said to satisfy the Kurdyka-Lojasiewicz (KL) property, if for any critical point of , there exist such that
Here is often referred to as the KL exponent.
We then give out the main assumption we made in this paper to show the convergence of alternating projections.
Let and be any two closed semi-algebraic sets, and let be the sequence of iterates generated by the alternating projection method (i.e., Algorithm 1). Assume the sequence is bounded and the sets and obey the following properties:
three-point property of : there exists a nonnegative function with such that and
local contraction property of : there exist and such that when , we have
This three-point property (6) along with a so-called four-point property has been widely utilized for proving the convergence of the sequence (rather than the iterates ) generated by alternating minimization [27, 45]. As we consider the convergence of the iterates, the function in (6) is slightly stronger than the one in [27, 45], where the function is only required to be positive, i.e, for all and . We note that the three-point property (6) mostly characterizes a certain property regarding the set . A typical example satisfying this three-point property (6) is a convex and closed set which obeys (6) for any with since
where the last inequality follows from the fact that is a closed convex set such that
Another example is the unit sphere which satisfies (6) for any that is not zero. In particular, for any , its projection onto is defined as
represents an arbitrary unit vector. Now by defining, for any , we have
where the first line utilizes and the second line follows from . It is clear from (11) that the set obeys the three point property (6) for all that is away from zero. With this example, we stress that the three point property (6) of is only required to hold for all possible iterates rather than for any .
The local contraction property of in (7) is mild and it basically requires the projections of and onto are not far away when is close enough to . This property is expected to hold if we want to guarantee the convergence of the alternating projections. Similarly, a typical example satisfying this local contraction property (7) is a closed convex set with and be arbitrary positive number in (7):
for arbitrary (not only the algorithm trajectory). (12) is also known as the non-expensiveness property of orthogonal projector onto the convex set. To see this, utilizing the property (9) for the convex set , , we have
Summing up the above two inequalities gives
which implies the desired non-expensiveness property (12) by applying the Cauchy-Schwarz inequality that .
2.1 Convergence to a critical value
We transfer the constrained problem into the following equivalent form without any constraints:
where (and ) is the indicator function of the set (and ).
To simplify the notation, we stack and into one variable as . With creftypecap 1 , we begin by showing the convergence of and that the sequence is regular (i.e., ) in the following result.
Under creftypecap 1 and , we have the following assertions.
We have for some positive
The sequence is monotonically decreasing and convergent.
Denote by for all . Then
Proof of creftypecap 1.
Show : Utilizing the fact that and invoking (6) gives
It follows from the fact that
which together with (17) gives
Hence the function value sequence is convergent since . Repeating (14) for all and summing them up, we have
which immediately implies
The above equation implies that for any , there exists such that . Picking such that (7) holds, we have
for all since by assumption for all . Letting , we conclude
Show : from the statement , we have
which together with the fact that gives that the sequence is monotonically decreasing and lower bounded, hence convergent.
creftypecap 1 ensures a sufficient decrease of the objective function after one step update of and . However, we note that the sufficient decrease guaranteed by (14) is slightly different than the classical one in convergence analysis (like in ) where for some is required.
Let denote the set of limit points of , i.e.,
The following result establishes several properties of the limit points set .
Proof of creftypecap 2.
Show : It is clear that has at least one convergent subsequence since by assumption the sequence is bounded. Also we have , since lies in a closed set and it is bounded, the set is compact for any . We conclude that is compact by interpreting it as the intersection of compact sets. The connectedness of is a direct consequence of and some classical properties of sequences in ; see [33, Lemma 3.5]. Finally, (19) follows from classical properties of sequences in .
Show : we extract an arbitrary convergent subsequence from with limit . Since we have and for all , it follows from the closedness of and that and
which together with the fact that is a continuous function gives
Now utilizing the statement in creftypecap 1 that the sequence is convergent, we have
Thus the objective function is constant on since is the limit point of any convergent subsequence.
Now for any convergent subsequence with limit , we have that belongs to the graph of and . By invoking (20) and the definition of , we immediately conclude that belongs to the graph of , hence
which implies that any limit point of is a critical point for (1).
2.2 Convergence to a critical point
The following result establishes that obeys the KL property at .
There exist uniform constants and such that
for any and with .
Proof of creftypecap 3.
Under the semi-algebraic assumption of sets and , we immediately conclude that the indicator functions and are semi-algebraic. We then have satisfies the KL property at any point in its effective domain, since it is lower semi-continuous and semi-algebraic . The remaining proof follows from creftypecap 1 in  and creftypecap 2. ∎
Proof of creftypecap 1.
Invoking (19), we know there exists such that for all . Now from the concavity of the function with domain , we have
Thus, for all ,
which implies that the series is convergent. Thus,
From triangle inequality we have
which gives that . Thus the sequence is Cauchy, hence it is convergent.
2.3 Convergence rate
creftypecap 1 reveals that the sequence is convergent. Given the explicit KL exponent in creftypecap 3, we can have the convergence rate concerning how fast the sequence converges to its limit point. We note that the connection between convergence rate and the KL exponent has been populated exploited in [38, 32, 33]. The following result establishes the convergence rate for the sequence based on the explicit KL exponent .
(convergence rate) Suppose the sequence is generated by Algorithm 1 and converges to a critical point , and assume the function obeys the KL property with the KL exponent at this critical point . Then we have
if , converges to in a finite step.
if , then there exist a , and a positive integer such that
if , then there exist a and a positive integer such that
Proof of Theorem 2.111The proof of Theorem 2 shares similar strategies as those in [38, 32, 33]. However, as we explained in Section 1.2, the sufficient decrease property (and also the safeguard property) in (4) which is utilized here is slightly different than the standard one as in (5). Thus, we include the proof of Theorem 2.
Since the function satisfies the KL property at , there exists and such that
It follows from the fact that there exists a positive integer such that for all . This together with the above KL property implies