1 Introduction
Largescale nonsmooth and nonconvex optimization problems are ubiquitous in machine learning and data analysis. Tremendous efforts have thus been directed at designing efficient algorithms for solving these problems. One popular class of algorithms is the class of firstorder methods. These methods are noted for their simplicity, easeofimplementation and relatively (often surprisingly) good performance; some notable examples include the proximal gradient algorithm, the inertial proximal algorithms and the alternating direction method of multipliers, etc. Due to the excellent performance and wide applicability of firstorder methods, their convergence behaviors have been extensively studied in recent years; see, for example,
[1, 2, 3, 10, 19, 20, 24, 26, 35] and references therein. Analyzing the convergence rate of firstorder methods is an important step towards a better understanding of existing algorithms, and is also crucial for developing new optimization models and numerical schemes.As demonstrated in [2, Theorem 3.4], the convergence behavior of many firstorder methods can be understood using the celebrated KurdykaŁojasiewicz (KL) property and its associated KL exponent; see Definitions 2.2 and 2.3. The KL property and its associated KL exponent have their roots in algebraic geometry, and they describe a qualitative relationship between the value of a suitable potential function (depending on the optimization model and the algorithm being considered) and some firstorder information (gradient or subgradient) of the potential function. The KL property has been applied to analyzing local convergence rate of various firstorder methods for a wide variety of problems by many researchers; see, for example, [2, 16, 24, 43]. In these studies, a prototypical theorem on convergence rate takes the following form:
Prototypical result on convergence rate. For a certain algorithm of interest, consider a suitable potential function. Suppose that the potential function satisfies the KL property with an exponent of , and that is a bounded sequence generated by the algorithm. Then the following results hold.

If , then converges finitely.

If , then converges locally linearly.

If , then converges locally sublinearly.
While this kind of convergence results is prominent and theoretically powerful, for the results to be fully informative, one has to be able to estimate the KL exponent. Moreover, in order to guarantee a local linear convergence rate, it is desirable to be able to determine whether a given model has a KL exponent of at most
, or be able to construct a new model whose KL exponent is at most if the old one does not have the desired KL exponent.However, as noted in [31, Page 63, Section 2.1], the KL exponent of a given function is often extremely hard to determine or estimate. There are only few results available in the literature concerning explicit KL exponent of a function. One scenario where an explicit estimate of the KL exponent is known is when the function can be expressed as the maximum of finitely many polynomials. In this case, it has been shown in [23, Theorem 3.3] that the KL exponent can be estimated explicitly in terms of the dimension of the underlying space and the maximum degree of the involved polynomials. However, the derived estimate grows rapidly with the dimension of the problem, and so, leads to rather weak sublinear convergence rate. It is only until recently that a dimensionindependent KL exponent of convex piecewise linearquadratic functions is known, thanks to [7, Theorem 5] that connects the KL property and the concept of error bound^{1}^{1}1This notion is different from the LuoTseng error bound to be discussed in Definition 2.1. for convex functions. In addition, a KL exponent of is only established in [27] very recently for a class of quadratic optimization problems with matrix variables satisfying orthogonality constraints. Nevertheless, the KL exponent of many common optimization models, such as the least squares problem with smoothly clipped absolute deviation (SCAD) regularization [14] or minimax concave penalty (MCP) regularization [45] and the logistic regression problem with regularization [39], are still unknown to the best of our knowledge. In this paper, we attempt to further address the problem of determining the explicit KL exponents of optimization models, especially for those that arise in practical applications.
The main contributions of this paper are the rules for computing explicitly the KL exponent of many (convex or nonconvex) optimization models that arise in applications such as statistical machine learning. We accomplish this via two different means: studying calculus rules and building connections with the concept of LuoTseng error bound; see Definition 2.1. The LuoTseng error bound was used for establishing local linear convergence for various firstorder methods, and was shown to hold for a wide range of problems; see, for example, [28, 29, 30, 40, 41, 46] for details. This concept is different from the error bound studied in [7] because the LuoTseng error bound is defined for specially structured optimization problems and involves firstorder information, while the error bound studied in [7] does not explicitly involve any firstorder information. The different nature of these two concepts was also noted in [7, Section 1], in which the LuoTseng error bound was referred as “firstorder error bound”.
In this paper, we first study various calculus rules for the KL exponent. For example, we deduce the KL exponent of the minimum of finitely many KL functions, the KL exponent of the Moreau envelope of a convex KL function, and the KL exponent of a convex objective from its Lagrangian relaxation, etc., under suitable assumptions. This is the context of Section 3. These rules are useful in our subsequent analysis of the KL exponent of concrete optimization models that arise in applications. Next, we show that if the LuoTseng error bound holds and a mild assumption on the separation of stationary values is satisfied, then the function is a KL function with an exponent of . This is done in Section 4. Upon making this connection, we can now take advantage of the relatively better studied concept of LuoTseng error bound, which is known to hold for a wide range of concrete optimization problems; see, for example, [28, 29, 30, 40, 41, 46]. Hence, in Section 5, building upon the calculus rules and the connection with LuoTseng error bound, we show that many optimization models that arise in applications such as sparse recovery have objectives whose KL exponent is ; this covers the least squares problem with smoothly clipped absolute deviation (SCAD) [14] or minimax concave penalty (MCP) [45] regularization, and the logistic regression problem with regularization [39]. In addition, we also illustrate how our result can be used for establishing linear convergence of some firstorder methods, such as the proximal gradient algorithm and the inertial proximal algorithm [35] with constant stepsizes. Finally, we present some concluding remarks in Section 6.
2 Notation and preliminaries
In this paper, we use to denote the dimensional Euclidean space, equipped with the standard inner product and the induced norm . The closed ball centered at with radius is denoted by . We denote the nonnegative orthant by , and the set of symmetric matrices by
. For a vector
, we use to denote the norm and to denote the number of entries in that are nonzero (“ norm”). For a (nonempty) closed set , the indicator function is defined asIn addition, we denote the distance from an to by , and the set of points in that achieve this infimum (the projection of onto ) is denoted by . The set becomes a singleton if is a closed convex set. Finally, we write to represent the relative interior of a closed convex set .
For an extendedrealvalued function , the domain is defined as . Such a function is called proper if it is never and its domain is nonempty, and is called closed if it is lower semicontinuous. For a proper function , we let denote and . The regular subdifferential of a proper function [38, Page 301, Definition 8.3(a)] at is given by
The (limiting) subdifferential of a proper function [38, Page 301, Definition 8.3(b)] at is then defined by
(1) 
By convention, if , then . We also write . It is well known that when is continuously differentiable, the subdifferential (1) reduces to the gradient of denoted by ; see, for example, [38, Exercise 8.8(b)]. Moreover, when is convex, the subdifferential (1) reduces to the classical subdifferential in convex analysis; see, for example, [38, Proposition 8.12]. The limiting subdifferential enjoys rich and comprehensive calculus rules and has been widely used in nonsmooth and nonconvex optimization [33, 38]. We also define the limiting (resp. regular) normal cone of a closed set at as (resp. ) where is the indicator function of . A closed set is called regular at if (see [38, Definition 6.4]), and a proper closed function is called regular at if its epigraph is regular at the point (see [38, Definition 7.25]). Finally, we say that is a stationary point of proper closed function if . It is known that any local minimizer of is a stationary point; see, for example, [38, Theorem 10.1].
For a proper closed convex function , the proximal mapping at any is defined as
where denotes the unique minimizer of the optimization problem .^{2}^{2}2This problem has a unique minimizer because the objective is proper closed and strongly convex. For a general optimization problem , we use to denote the set of minimizers, which may be empty, a singleton or may contain more than one point. This mapping is nonexpansive, i.e., for any and , we have
(2) 
see, for example, [36, Page 340]. Moreover, it is routine to show that if and only if .
The following property is defined for proper closed functions of the form , where is a proper closed function with an open domain, and is continuously differentiable with a locally Lipschitz continuous gradient on , and is proper closed convex. Recall that for this class of functions, we have if and only if , where denotes the set of stationary points of . Indeed, we have
where (i) follows from [38, Exercise 8.8(c)].
Definition 2.1.
(LuoTseng error bound)^{3}^{3}3We adapt the definition from [41, Assumption 2a]. Let be the set of stationary points of . Suppose that . We say that the LuoTseng error bound ^{4}^{4}4This is referred as firstorder error bound in [7, Section 1]. holds if for any , there exist , so that
(3) 
whenever and .
It is known that this property is satisfied for many choices of and , and we refer to [28, 29, 30, 40, 41, 46] and references therein for more detailed discussions. This property was used for establishing local linear convergence of various firstorder methods applied to minimizing .
Recently, the following property was also used extensively for analyzing convergence rate of firstorder methods, mainly for possibly nonconvex objective functions; see, for example, [2, 3].
Definition 2.2.
(KL property & KL function) We say that a proper closed function has the KurdykaŁojasiewicz (KL) property at if there exist a neighborhood of , and a continuous concave function with such that:

is continuously differentiable on with ;

for all with , one has
A proper closed function satisfying the KL property at all points in is called a KL function.
Definition 2.3.
(KL exponent) For a proper closed function satisfying the KL property at , if the corresponding function can be chosen as for some and , i.e., there exist , and so that
(4) 
whenever and , then we say that has the KL property at with an exponent of . If is a KL function and has the same exponent at any , then we say that is a KL function with an exponent of . ^{5}^{5}5In classical algebraic geometry, the exponent is also referred as the Łojasiewicz exponent.
This definition encompasses broad classes of function that arise in practical optimization problems. For example, it is known that if is a proper closed semialgebraic function [3], then is a KL function with a suitable exponent . As established in [2, Theorem 3.4] and many subsequent work, KL exponent has a close relationship with the rate of convergence of many commonly used optimization methods.
Before ending this section, we state two auxiliary lemmas. The first result is an immediate consequence of the fact that the setvalued mapping is outer semicontinuous (with respect to the attentive convergence, i.e., ; see [38, Proposition 8.7]), and can be found in [2, Remark 4 (b)]. This result will be used repeatedly at various places in our discussion below. We include a proof for selfcontainedness.
Lemma 2.1.
Suppose that is a proper closed function, and . Then, for any , satisfies the KL property at with an exponent of .
Proof.
Fix any . Since and is nonempty and closed, it follows that is positive and finite. Define . We claim that there exists so that whenever and .
Suppose for the sake of contradiction that this is not true. Then there exists a sequence with and so that
In particular, there exists a sequence satisfying and . By passing to a subsequence if necessary, we may assume without loss of generality that for some , and we have , thanks to [38, Proposition 8.7]. But then we have , a contradiction. Thus, there exists so that whenever and .
Using this, we see immediately that
whenever and , showing that satisfies the KL property at with an exponent of . This completes the proof. ∎
The second result concerns the equivalence of “norms”, whose proof is simple and is omitted.
Lemma 2.2.
Let . Then there exist so that
for any .
3 Calculus of the KL exponent
In this section, we discuss how the KL exponent behaves under various operations on KL functions. We briefly summarize our results below. The required assumptions will be made explicit in the respective theorems.

Exponent for when the Jacobian of is surjective, given the exponent of ; see Theorem 3.2.

Exponent for given the exponents of for each ; see Theorem 3.3.

Exponent for the Moreau envelope of a convex KL function; see Theorem 3.4.

Deducing the exponent from the Lagrangian relaxation for convex problems; see Theorem 3.5.

Deducing the exponent of a partly smooth KL function by looking at its restriction on its active manifold; see Theorem 3.7.
We shall make use of some of these calculus rules in Section 5 to deduce the KL exponent of some concrete optimization models.
We start with our first result, which concerns the minimum of finitely many KL functions. This rule will prove to be useful in Section 5. Indeed, as we shall see there, many nonconvex optimization problems that arise in applications have objectives that can be written as the minimum of finitely many KL functions whose exponents can be deduced from our results in Section 4; this includes some prominent and widely used NPhard optimization model problems, for example, the least squares problem with cardinality constraint [6].
Theorem 3.1.
(Exponent for minimum of finitely many KL functions) Let , , be proper closed functions, be continuous on and , where . Suppose further that each , , satisfies the KL property at with an exponent of . Then satisfies the KL property at with an exponent of .
Proof.
From the definition of , we see that . Since is lower semicontinuous and the function is continuous on , there exists such that for all with , we have
Thus, whenever and , we have .
Next, using and the subdifferential rule of the minimum of finitely many functions [32, Theorem 5.5], we obtain for all that
(5) 
On the other hand, by assumption, for each , there exist , , such that for all with and , one has
(6) 
Let , and . Take any with and . Then and we have
where the first inequality follows from (5), the second inequality follows from (6), the construction of , , and , as well as the facts that and for ; these facts also give the last equality. This completes the proof. ∎
We have the following immediate corollary.
Corollary 3.1.
Let , , be proper closed functions with for all , and be continuous on . Suppose further that each is a KL function with an exponent of for . Then is a KL function with an exponent of .
Proof.
In view of Theorem 3.1, it suffices to show that for any , we have . To this end, take any . Note that we have by the definition, and hence . In addition, from the definition of , we have for all . Hence, is finite for all . Thus, we conclude that for all , which implies that because for all by assumption. ∎
The next theorem concerns the composition of a KL function with a smooth function that has a surjective Jacobian mapping.
Theorem 3.2.
(Exponent for composition of KL functions) Let , where is a proper closed function on and is a continuously differentiable mapping. Suppose in addition that is a KL function with an exponent of and the Jacobian is a surjective mapping at some . Then has the KL property at with an exponent of .
Proof.
Note from [38, Exercise 10.7] and that . As is a KL function, there exist , , such that for all with and , one has
(7) 
On the other hand, since the linear map is surjective and is continuously differentiable, it follows from the classical LyusternikGraves theorem (see, for example, [33, Theorem 1.57]) that there are numbers and such that for all with
where and are the closed unit balls in and , respectively. This implies that for all we have the following estimate:
(8) 
whenever
. Moreover, from the chain rule of the limiting subdifferential for composite functions (see, for example,
[38, Exercise 10.7]), we have for all with thatbecause is a surjective mapping for all such .
Now, let be such that for all , and . Fix with and . Let be such that . Then, we have for some . Hence, it follows from (8) that
In addition, since , applying (7) with gives us that
Therefore,
∎
Our next theorem concerns separable sums.
Theorem 3.3.
(Exponent for block separable sums of KL functions) Let , be such that . Let , where , , are proper closed functions on with . Suppose further that each is a KL function with an exponent of and that each is continuous on , . Then is a KL function with an exponent of .
Proof.
Denote with , . Then [38, Proposition 10.5] shows that for each . As each , , is a KL function with exponent , there exist , , such that for all with and , one has
(9) 
Since the left hand side of (9) is always nonnegative, the above relation holds trivially whenever . In addition, since is continuous on by assumption for each , by shrinking if necessary, we conclude that whenever and . Thus, we have from these two observations that for all with ,
(10) 
Let . Take any with and . We will now verify (4). To this end, let be such that
If , then clearly
since . Thus, we consider the case where . In this case, recall from [38, Proposition 10.5] that
Hence, there exist with such that
This together with (10) implies that
(11) 
for . Define . Since and , it then follows from (11) that
where the second inequality follows from Lemma 2.2 with . This completes the proof. ∎
We now discuss the operation of taking Moreau envelope. This operation is a common operation for smoothing the objective function of convex optimization problems.
Theorem 3.4.
(Exponent for Moreau envelope of convex KL functions) Let be a proper closed convex function that is a KL function with an exponent of . Suppose further that is continuous on . Fix and consider
Then is a KL function with an exponent of .
Proof.
It suffices to consider the case and show that has the KL property with an exponent of at any fixed , in view of Lemma 2.1 and the convexity of .
To this end, recall from [4, Proposition 12.29] that, for all ,
(12) 
and that is Lipschitz continuous with a Lipschitz constant of . Consequently, we have for any that
(13) 
where is the projection of onto , and the last equality holds because .
Next, note that we have according to [4, Proposition 12.28], which implies as . This together with (12) gives . Hence, . Since is a KL function with an exponent of and , using the fact that is continuous on , we obtain that there exist and so that
(14) 
whenever and ; here, the condition on the bound on function values is waived by using the continuity of on and choosing a smaller if necessary. Moreover, in view of [7, Theorem 5(i)], by shrinking if necessary, we conclude that there exists so that
(15) 
whenever and ; here, the condition on the bound on function values is waived similarly as before. Finally, since , we have . Combining this with (14) and (15) implies that for some ,
(16) 
whenever and .
Now, using the definition of the proximal mapping as minimizer, we have by using the firstorder optimality condition that for any ,
In particular, . In addition, using the above relation and (12), we deduce that
(17) 
Fix an arbitrary with . Then , where the inequality is due to (2). Let . Then and . Hence, the relations (16) and (17) imply that
Applying (13) with and combining this with the preceding relation, we obtain further that
(18) 
whenever . Finally, from the convexity of , we have
(19) 
where the equality follows from (12).
Shrink