Deducing Kurdyka-Łojasiewicz exponent via inf-projection

02/10/2019
by   Peiran Yu, et al.
UNSW
Hong Kong Polytechnic University
0

Kurdyka-Łojasiewicz (KL) exponent plays an important role in estimating the convergence rate of many contemporary first-order methods. In particular, a KL exponent of 1/2 is related to local linear convergence. Nevertheless, KL exponent is in general extremely hard to estimate. In this paper, we show under mild assumptions that KL exponent is preserved via inf-projection. Inf-projection is a fundamental operation that is ubiquitous when reformulating optimization problems via the lift-and-project approach. By studying its operation on KL exponent, we show that the KL exponent is 1/2 for several important convex optimization models, including some semidefinite-programming-representable functions and functions that involve C^2-cone reducible structures, under conditions such as strict complementarity. Our results are applicable to concrete optimization models such as group fused Lasso and overlapping group Lasso. In addition, for nonconvex models, we show that the KL exponent of many difference-of-convex functions can be derived from that of their natural majorant functions, and the KL exponent of the Bregman envelope of a function is the same as that of the function itself. Finally, we estimate the KL exponent of the sum of the least squares function and the indicator function of the set of matrices of rank at most k.

READ FULL TEXT VIEW PDF
02/09/2016

Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods

In this paper, we study the Kurdyka-Łojasiewicz (KL) exponent, an import...
10/10/2021

Convergence of Random Reshuffling Under The Kurdyka-Łojasiewicz Inequality

We study the random reshuffling (RR) method for smooth nonconvex optimiz...
08/22/2018

Convergence of Cubic Regularization for Nonconvex Optimization under KL Property

Cubic-regularized Newton's method (CR) is a popular algorithm that guara...
04/19/2018

A refined convergence analysis of pDCA_e with applications to simultaneous sparse recovery and outlier detection

We consider the problem of minimizing a difference-of-convex (DC) functi...
07/05/2021

The Last-Iterate Convergence Rate of Optimistic Mirror Descent in Stochastic Variational Inequalities

In this paper, we analyze the local convergence rate of optimistic mirro...
08/30/2022

Convergence Rates of Training Deep Neural Networks via Alternating Minimization Methods

Training deep neural networks (DNNs) is an important and challenging opt...
12/30/2006

Magnification Laws of Winner-Relaxing and Winner-Enhancing Kohonen Feature Maps

Self-Organizing Maps are models for unsupervised representation formatio...

1 Introduction

Many problems in machine learning, signal processing and data analysis involve large-scale nonsmooth nonconvex optimization problems. These problems are typically solved using first-order methods, which are noted for their scalability and ease of implementation. Commonly used first-order methods include the proximal gradient method and its variants, and splitting methods such as Douglas-Rachford splitting method and its variants; see the recent expositions

[15, 36] and references therein for more detail. In the general nonconvex setting, convergence properties of the sequences generated by these algorithms are typically analyzed by assuming a certain potential function to have the so-called Kurdyka-Łojasiewicz (KL) property.

Loosely speaking, the KL property holds when it is possible to bound the function value deviation in terms of a generalized notion of “gradient”; see Definition 2.1 below for the precise definition. This property is satisfied by a large class of functions such as proper closed semi-algebraic functions; see, for example, [5]. It has been the main workhorse for establishing convergence of sequences generated by various first-order methods, especially in nonconvex settings [4, 5, 6, 13]. Moreover, when it comes to estimating local convergence rate, the so-called KL exponent plays a key role; see, for example, [4, Theorem 2], [24, Theorem 3.4] and [29, Theorem 3]. Roughly speaking, an exponent of of a suitable potential function corresponds to a linear convergence rate, while an exponent of corresponds to a sublinear convergence rate. However, as noted in [35, Page 63, Section 2.1], explicit estimation of KL exponent for a given function is difficult in general. Nevertheless, due to its significance in convergence rate analysis, KL exponent computation has become an important research topic in recent years and some positive results have been obtained. For instance, we now know the KL exponent of the maximum of finitely many polynomials [28, Theorem 3.3] and the KL exponent of a class of quadratic optimization problems with matrix variables satisfying orthogonality constraints [33]. In addition, it has been shown that the KL exponent is closely related to several existing and widely-studied error bound concepts such as the Hölder growth condition and the Luo-Tseng error bound; see, for example, [12, Theorem 5], [20, Theorem 3.7], [20, Proposition 3.8], [21, Corollary 3.6] and [30, Theorem 4.1]. Taking advantage of these connections, we now also know that convex models that satisfy the second-order growth condition have KL exponent , so do models that satisfy the Luo-Tseng error bound condition together with a mild assumption on the separation of stationary values; see the recent work [16, 30, 48] for concrete examples. This sets the stage for developing calculus rules for KL exponent in [30] to deduce the KL exponent of a function from functions with known KL exponents. For example, it was shown in [30, Corollary 3.1] that under mild conditions, if is a KL function with exponent , , then the KL exponent of is given by . This was then used in [30, Section 5.2] for showing that the least squares loss with smoothly clipped absolute deviation (SCAD) [23] or minimax concave penalty (MCP) regularization [47] has KL exponent .

In this paper, we will further explore this line of research and study how KL exponent behaves under the inf-projection operation: this is a significant generalization of the operation of taking the minimum of finitely many functions. Precisely, let and be two finite dimensional Hilbert spaces and let be a proper closed function,111We refer the readers to Section 2 for relevant definitions. we call the function for an inf-projection of . The name comes from the fact that the strict epigraph of , defined as , is equal to the projection of the strict epigraph of onto . Functions represented in terms of inf-projections arise naturally in sensitivity analysis as value functions; see, for example, [14, Chapter 3.2]

. Inf-projection also appears when representing functions as optimal values of linear programming problems, or more generally, semidefinite programming (SDP) problems; see

[26] for SDP-representable functions. It is known that inf-projection preserves nice properties of such as convexity [39, Proposition 2.22(a)]. In this paper, we show that, under mild assumptions, the KL exponent is also preserved under inf-projection. Based on this result and the ubiquity of inf-projection, we are then able to obtain KL exponents of various important convex and nonconvex models that were out of reach in the previous study. More explicitly, our contributions are listed as follows.

  1. Inf-projection: We derive the KL exponent of the inf-projection given that of under mild assumptions; see Theorem 3.1. As an immediate consequence, we strengthen [30, Theorem 3.1] by relaxing the continuity assumption there; see Corollary 3.1.

  2. SDP-representable functions: We obtain the KL exponent of some SDP-representable functions under suitable Slater and strict complementarity conditions on the SDP representation; see Theorem 4.1.

  3. Sum of LMI-representable functions: We show that the strict complementarity condition can be imposed directly on the function for deducing the KL exponent of a special class of SDP-representable functions: sum of linear-matrix-inequality-representable (LMI-representable) functions and possibly also the nuclear norm; see Theorem 4.2 and Theorem 4.3.

  4. Functions with -cone reducible structure: We compute the KL exponent of some functions that involve -cone reducible structures under suitable relative interior conditions; see Theorem 4.4 and Corollary 4.1.

  5. Functions with DC structure: We relate the KL exponent of the difference-of-convex (DC) function to that of its majorant: , where is the Fenchel conjugate of . Here, is a linear map, is proper closed convex and is convex continuous; see Theorem 5.1.

  6. Bregman envelope: We infer the KL exponent of the Bregman envelope given that of , where is a suitable Bregman distance; see Theorem 5.2. This generalizes existing results concerning the Moreau envelope and the forward-backward envelope; see Remark 5.1.

  7. Lagrangian relaxation: We determine the KL exponent of from its Lagrangian relaxation, where and are continuously differentiable functions, is the indicator function of the set , and the so-called linear independence constraint qualification (LICQ) holds; see Theorem 5.3.

  8. Rank constraints: We estimate the KL exponent of the sum of the least squares function and the indicator function of the set of matrices of rank at most ; see Theorem 5.4.

The rest of the paper is organized as follows. We present necessary notation and preliminary materials in Section 2. The KL exponent under inf-projection is studied in Section 3. The results developed are then used for studying KL exponent in Section 4 for various structured convex models, and in Section 5 for several nonconvex models. Finally, some concluding remarks are given in Section 6.

2 Notation and preliminaries

In this paper, we use and to denote two finite dimensional Hilbert spaces. We use to denote the inner product of the underlying Hilbert space and use to denote the associated norm. Moreover, for a linear map , we use to denote its adjoint. Next, we let denote the set of real numbers and . We also let denote the set of all matrices. The (trace) inner product of two matrices and is defined as , where denotes the trace of a square matrix. The Fröbenius norm of a matrix is denoted by , which is defined as . Finally, the space of symmetric matrices is denoted by , the cone of positive semidefinite matrices is denoted by , and we write (resp., ) to mean (resp., , where is the interior of ).

For a set , we denote the distance from an to as . The closure (resp., interior) of is denoted by (resp., ), and we use to denote the closed ball centered at with radius , i.e., . For a convex set , we denote its relative interior by , and use to denote its polar, which is defined as

Finally, the indicator function of a nonempty set is denoted by , which equals zero in and is infinity otherwise. If is in additional convex, we use to denote its support function, which is defined as for .

For an extended-real-valued function , we denote its epigraph by . Such a function is said to be proper if its domain . A proper function is closed if it is lower semicontinuous. For a proper function , its regular subdifferential at is defined in [39, Definition 8.3] by

The subdifferential of at (which is also called the limiting subdifferential) is defined in [39, Definition 8.3] by

here, means both and . Moreover, we set for by convention, and write . It is known in [39, Exercise 8.8] that if is continuously differentiable at . Moreover, when is proper convex, the limiting subdifferential reduces to the classical subdifferential in convex analysis; see [39, Proposition 8.12]. Finally, for a nonempty closed set , we define its normal cone at an by . If is in addition convex, we define its tangent cone at by .

For a function that is continuously differentiable on , we use to denote the derivative mapping of at : this is the linear map defined by

We denote the adjoint of the derivative mapping by . This latter mapping is referred to as the gradient mapping of at . For a proper convex function , its Fenchel conjugate is

moreover, it is known that the following equivalence holds (see [38, Theorem 23.5]):

(2.1)

For a proper closed convex function , its asymptotic (or recession) function is defined by ; see [7, Theorem 2.5.1]. Finally, for a proper function , we say that it is level-bounded if, for each , the set is bounded. Moreover, for a proper function , following [39, definition 1.16], we say that is level-bounded in locally uniformly in if for each and there is a neighborhood of such that the set is bounded.

We next recall the Kurdyka-Łojasiewicz (KL) property and the notion of KL exponent; see [34, 27, 4, 5, 6, 30]. This property has been used extensively in analyzing convergence of first-order methods; see, for example, [4, 5, 6, 13, 45].

Definition 2.1 (Kurdyka-Łojasiewicz property and exponent).

We say that a proper closed function satisfies the Kurdyka-Łojasiewicz (KL) property at if there are , a neighborhood of and a continuous concave function with such that

  1. is continuously differentiable on with on ;

  2. For any with , it holds that

    (2.2)

If satisfies the KL property at and the in (2.2) can be chosen as for some and , then we say that satisfies the KL property at with exponent .

A proper closed function satisfying the KL property at every point in is said to be a KL function, and a proper closed function satisfying the KL property with exponent at every point in is said to be a KL function with exponent .

KL functions is a broad class of functions which arise naturally in many applications. For instance, it is known that proper closed semi-algebraic functions are KL functions with exponent ; see, for example, [5]. KL property is a key ingredient in many contemporary convergence analysis for first-order methods, and the KL exponent plays an important role in identifying local convergence rate; see, for example, [4, Theorem 2], [24, Theorem 3.4] and [29, Theorem 3]. In this paper, we will study how the KL exponent behaves under inf-projection, and use the rules developed to compute the KL exponents of various functions and to derive new calculus rules for KL exponent.

Before ending this section, we present two auxiliary lemmas. The first lemma concerns the uniformized KL property. It is a specialization of [13, Lemma 6] and explicitly involves the KL exponent.

Lemma 2.1 (Uniformized KL property with exponent).

Suppose that is a proper closed function and let be a compact set. If is constant on and satisfies the KL property at each point of with exponent , then there exist such that

for any and any satisfying and .

Proof.

Replace the in the proof of [13, Lemma 6] by for some . The desired conclusion can then be proved analogously as in [13, Lemma 6]. ∎

The next lemma is a direct consequence of results in [42]; see [42, Theorem 3.3] and the discussion following [42, Eq. (1.4)] concerning the degree of singularity.

Lemma 2.2 (Error bound for standard SDP problems under strict complementarity).

Let , be a linear map, and define the function by

where . Suppose that and there exists satisfying . Then for any bounded neighborhood of , there exists such that for any ,

Proof.

Observe that

(2.3)

where (a) follows from [39, Exercise 8.8], (b) follows from [38, Theorem 23.8] and the assumption , and (c) follows from [38, Corollary 6.6.2]. Since , we deduce further from (2.3) the existence of satisfying . This means that the following semidefinite programming problem

satisfies the strict complementarity condition.

Next, since , we have that and thus

Using this, the strict complementarity condition, [42, Theorem 3.3] and the discussion following [42, Eq. (1.4)] (see also [22, Theorem 2.3] and the discussion preceding [22, Proposition 3.2]), we conclude that for any bounded neighborhood of , there exists such that for any ,

where the second inequality holds for some thanks to the Hoffman error bound. This completes the proof. ∎

3 KL exponent via inf-projection

In this section, we study how the KL exponent behaves under inf-projection. Specifically, given a proper closed function with known KL exponent, we would like to deduce the KL exponent of under suitable assumptions. Here is our main theorem of this section.

Theorem 3.1 (KL exponent via inf-projection).

Let be a proper closed function and define and for . Let , and suppose that the following conditions hold:

  1. The function is level-bounded in locally uniformly in .

  2. It holds that for all .

  3. The function satisfies the KL property with exponent at every point in .

Then is proper closed and satisfies the KL property at with exponent .

Proof.

Since is proper closed and level-bounded in locally uniformly in , we apply [39, Theorem 10.13] and conclude that for any ,

(3.1)

Moreover, we have from [39, Theorem 1.17] that is proper and closed, and is a nonempty closed set whenever .

Since is level-bounded in locally uniformly in and , there exists and a bounded set so that whenever , we have . Thus, for any satisfying and , we obtain

(3.2)

In particular, we deduce that is compact.

Using the compactness of and the facts that on and satisfies the KL property with exponent at every point in , we deduce from Lemma 2.1 that there exist such that

(3.3)

for any satisfying

(3.4)

Without loss of generality, we may assume .

We next show that

(3.5)

here we recall from [39, Section 5B] that

To this end, fix any . From the definition, there exists with and such that for all . Then we have

where (a) is due to the closedness of , (b) holds because , and (c) holds because . The above relation implies that . This proves (3.5).

Since (3.5) holds, by picking so that and using [39, Proposition 5.12], we see that for this , there exists such that

whenever and , where the first equality follows from (3.2) and the facts that and . This further implies that

(3.6)

for any with and .

Now, fix any with . Then in view of (3.6), we have for any that

where the last inequality follows from the choice of . This together with shows that the relation (3.4) holds for any such and any . Hence, using (3.3) we conclude that

where the first inequality follows from (3.1) and the last equality follows from the definition of . This completes the proof. ∎

Theorem 3.1 can be viewed as a generalization of [30, Theorem 3.1], which studied the KL exponent of the minimum of finitely many proper closed functions with known KL exponents. Indeed, let , , be proper closed functions. If we let and define by

(3.7)

then it is not hard to see that this is proper closed and for all . Moreover, one can check directly from the definition that

(3.8)

Thus, we have the following immediate corollary of Theorem 3.1, which is a slight generalization of [30, Theorem 3.1] by dropping the continuity assumption on .

Corollary 3.1 (KL exponent for minimum of finitely many functions).

Let , , be proper closed functions and define . Let , where . Suppose that for each , the function satisfies the KL property at with exponent . Then satisfies the KL property at with exponent .

Proof.

Define as in (3.7). Then is proper closed and . Moreover, . It is clear that this is level-bounded in locally uniformly in . Moreover, in view of (3.8) and the assumption that , we see that whenever . Finally, it is routine to show that satisfies the KL property with exponent at for . Thus, satisfies the KL property with exponent on . The desired conclusion now follows from Theorem 3.1. ∎

The next corollary can be proved similarly as [30, Corollary 3.1] by using Corollary 3.1 in place of [30, Theorem 3.1].

Corollary 3.2.

Let , , be proper closed functions with for all , and define . Suppose that for each , the function is a KL function with exponent . Then is a KL function with exponent .

Finally, we show in the next corollary that one can relax some conditions of Theorem 3.1 when is in addition convex.

Corollary 3.3 (KL exponent via inf-projections under convexity).

Let be a proper closed convex function and define and for . Let , . Then for all . Now, suppose in addition that the following conditions hold:

  1. The set is nonempty and compact.

  2. The function satisfies the KL property with exponent at every point in .

Then is proper closed and satisfies the KL property at with exponent .

Proof.

We first show that whenever . To this end, since , we see that and hence is proper. Moreover, the function is convex as inf-projection of the convex function ; see [39, Proposition 2.22(a)]. Now, for the proper convex function , we have from the definition that for any . Taking a and using (2.1), we see further that for any ,

where the equality holds because . In view of (2.1), the above relation further implies that . This proves whenever .

Next, suppose in addition that conditions (i) and (ii) holds. In view of Theorem 3.1 and the preceding discussions, it remains to show that condition (i) in Theorem 3.1 is satisfied, i.e., is level-bounded in locally uniformly in . Suppose to the contrary that there exist and so that is unbounded. Then there exists with . By passing to a subsequence if necessary, we may assume for some with . Since and is bounded, we have

where is the asymptotic function of and the first inequality follows from [7, Theorem 2.5.1]. This together with the convexity of and [7, Proposition 2.5.2] shows that

Since is nonempty by condition (i), we can take and set and in the above display to conclude that for all . This further implies that for all , which contradicts the compactness of . Thus, for any and , the set is bounded, proving that condition (i) in Theorem 3.1 is satisfied. The desired conclusion now follows from an application of Theorem 3.1. ∎

As we will see in Sections 4 and 5, the results on how the KL exponent behaves under the inf-projection operation (Theorem 3.1 and Corollary 3.3) allow us to obtain KL exponents of various important convex and nonconvex models that were out of reach in the previous study. This includes a large class of semidefinite-programming-representable functions, nonconvex models such as rank constrained least squares problems, and Bregman envelopes.

4 Deducing KL exponent for some convex models

4.1 Convex models with SDP-representable structure

In this section, we explore the KL exponent of functions that are SDP-representable. These functions arise in various applications and include important examples such as least squares loss functions,

norm, and nuclear norm, etc; see, for example, [11, Section 4.2] for more discussions. Following [26, Eq. (1.3)], we say that a function is SDP-representable if its epigraph can be expressed as the feasible region of some SDP problems, i.e.,

(4.1)

for some and . Using these symmetric matrices, we define a linear map as

(4.2)

Then it is routine to show that is given by for . Now, if we define

(4.3)

then it holds that for all . We will explore how to deduce the KL exponent of under suitable conditions on .

We start with an auxiliary lemma. In what follows, for an SDP-representable function with its epigraph represented as in (4.1) and the corresponding and defined in (4.3), we define the following set for each :

(4.4)
Lemma 4.1.

Let be a proper closed function, and . Suppose that the following conditions hold:

  1. The function is SDP-representable with its epigraph represented as in (4.1).

  2. The set defined as in (4.4) is nonempty and compact.

  3. The function defined in (4.3) satisfies the KL property with exponent at every point in .

Then satisfies the KL property at with exponent .

Proof.

Observe from the definition that

We will now check the conditions in Corollary 3.3 and apply the corollary to deduce the KL property of from that of . First, note that because is proper. Since is clearly closed and convex, we conclude that is proper closed and convex.

Next, by assumption, we see that satisfies the KL property with exponent on and that is nonempty and compact. Thus, conditions (i) and (ii) in Corollary 3.3 are satisfied and the desired conclusion follows from a direct application of Corollary 3.3. This completes the proof. ∎

We are now ready to state and prove our main result in this section.

Theorem 4.1 (KL exponent of SDP-representable functions).

Let be a proper closed function and . Suppose in addition that is SDP-representable with its epigraph represented as in (4.1) and that the following conditions hold:

  1. (Slater’s condition) There exists such that , where and are given in (4.1) and (4.2) respectively.

  2. (Compactness) The set defined as in (4.4) is nonempty and compact.

  3. (Strict complementarity) It holds that for all , where is defined as in (4.3) and is defined as in (4.4).

Then satisfies the KL property at with exponent .

Remark 4.1.

In Theorem 4.1, we require for all with defined as in (4.4). This sometimes can be hard to check in practice. In Sections 4.1.1 and 4.1.2, we will impose additional assumptions on so that this condition can be replaced by , which is a form of strict complementarity condition on the original function (instead of ).

Proof.

In view of Lemma 4.1, it suffices to show that satisfies the KL property with exponent at every point in .

Fix any . Our proof will be divided into several steps.

Step 1: We show that it suffices to consider another auxiliary function of the form (4.3) so that (i) the corresponding linear map is injective; (ii) a certain strict complementarity condition holds. See (4.6) and (4.7) below.

To this end, recall from the assumption that . This together with [39, Exercise 8.8] shows that

(4.5)

where is defined as in (4.3), and

is the zero vector of dimension

. Next, since and we have