1 Introduction
It has long been recognized that many convex optimization problems can be put into the form
(1) 
where is a finitedimensional Euclidean space, is a proper convex function that is continuously differentiable on , and is a closed proper convex function. On one hand, the constrained minimization problem
where is a closed convex set, is an instance of Problem (1) with being the indicator function of ; i.e.,
On the other hand, various data fitting problems in machine learning, signal processing, and statistics can be formulated as Problem (1), where
is a loss function measuring the deviation of a solution from the observations and
is a regularizer intended to induce certain structure in the solution. With the advent of the big data era, instances of Problem (1) that arise in contemporary applications often involve a large number of variables. This has sparked a renewed interest in firstorder methods for solving Problem (1) in recent years; see, e.g., [38, 32, 41] and the references therein. From a theoretical point of view, a fundamental issue concerning these methods is to determine their convergence rates. It is well known that various firstorder methods for solving Problem (1) will converge at the sublinear rate of , where is the number of iterations; see, e.g., [37, 5, 38, 23]. Moreover, the convergence rate is optimal when the functions and are given by firstorder oracles [22]. However, in many applications, both and are given explicitly and have very specific structure. It has been observed numerically that firstorder methods for solving structured instances of Problem (1) converge at a much faster rate than that suggested by the theory; see, e.g., [11, 42]. Thus, it is natural to ask whether the structure of the problem can be exploited in the convergence analysis to yield sharper convergence rate results.As it turns out, a very powerful approach to addressing the above question is to study a socalled error bound property associated with Problem (1). Formally, let be the set of optimal solutions to Problem (1), assumed to be nonempty. Furthermore, let be a set satisfying and be a function satisfying if and only if . We say that Problem (1) possesses a Lipschitzian error bound (or simply error bound) for with test set and residual function if there exists a constant such that
(2) 
where denotes the Euclidean distance from the vector to the set ; cf. [26]. Conceptually, the error bound (2) provides a handle on the structure of the objective function of Problem (1) in the neighborhood of the optimal solution set via the residual function . For the purpose of analyzing the convergence rates of firstorder methods, one particularly useful choice of the residual function is , where is the residual map defined by
(3) 
and is the proximal map associated with ; i.e.,
(4) 
Indeed, by comparing the optimality conditions of (1) and (4), it is immediate that if and only if . Moreover, it is known that many firstorder methods for solving Problem (1) have update rules that aim at reducing the value of the residual function; see, e.g., [18, 6, 39]. This leads to the following instantiation of (2):
Error Bound with Proximal MapBased Residual Function. For any , there exist constants and such that
(EBP) 
The usefulness of the error bound (EBP) comes from the fact that whenever it holds, a host of firstorder methods for solving Problem (1), such as the proximal gradient method, the extragradient method, and the coordinate (gradient) descent method, can be shown to converge linearly; see [18, 38] and the references therein. Thus, an important research issue is to identify conditions on the functions and under which the error bound (EBP) holds. Nevertheless, despite the efforts of many researchers over a long period of time, the repertoire of instances of Problem (1) that are known to possess the error bound (EBP) is still rather limited. Below are some representative scenarios in which (EBP) has been shown to hold:

([25, Theorem 3.1]) , is strongly convex, is Lipschitz continuous, and is arbitrary (but closed, proper, and convex).

([17, Theorem 2.1]) , takes the form , where , are given and is proper and convex with the properties that (i) is continuously differentiable on , assumed to be nonempty and open, and (ii) is strongly convex and is Lipschitz continuous on any compact subset of , and has a polyhedral epigraph.
In many applications, such as regression problems, the function of interest is not strongly convex but has the structure described in scenarios (S2) and (S3). However, a number of widely used structureinducing regularizers —most notably the nuclear norm regularizer—are not covered by these scenarios. One of the major difficulties in establishing the error bound (EBP) for regularizers other than those described in scenarios (S2) and (S3) is that they typically have nonpolyhedral epigraphs. Moreover, existing approaches to establishing the error bound (EBP) are quite ad hoc in nature and cannot be easily generalized. Thus, in order to identify more scenarios in which the error bound (EBP) holds, some new ideas would seem to be necessary.
In this paper, we present a new analysis framework for studying the error bound property (EBP) associated with Problem (1). The framework applies to the setting where has the form described in scenario (S2) and is any closed proper convex function. In particular, it applies to all the scenarios (S1)–(S3). Our first contribution is to elucidate the relationship between the error bound property (EBP) and various notions in setvalued analysis. This allows us to utilize powerful tools from setvalued analysis to elicit the key properties of Problem (1) that can guarantee the validity of (EBP). Specifically, we show that the problem of establishing the error bound (EBP) can be reduced to that of checking the calmness of a certain setvalued mapping induced by the optimal solution set of Problem (1); see Corollary 1. Furthermore, using the fact that can be expressed as the intersection of a polyhedron and the inverse of the subdifferential of at a certain point (see Proposition 1), we show that the calmness of is in turn implied by (i) the bounded linear regularity of the two intersecting sets and (ii) the calmness of at ; see Theorem 2. These results provide a concrete starting point for verifying the error bound property (EBP) and make it possible to simplify the analysis substantially. We remark that when has a polyhedral epigraph, the early works [18, 19] of Luo and Tseng have already pointed out a connection between (EBP) and the calmness of certain polyhedral multifunction. However, such an idea has not been further explored in the literature to tackle more general forms of .
To demonstrate the power of our proposed framework, we apply it to scenarios (S1)–(S3) and show that the error bound results in [25, 17, 38] can be recovered in a unified manner; see Sections 4.1–4.3. It is worth noting that scenario (S3) involves the nonpolyhedral grouped LASSO regularizer, and the existing proof of the validity of the error bound (EBP) in this scenario employs a highly intricate argument [38]. By contrast, our approach leads to a much simpler and more transparent proof. Motivated by the above success, we proceed to apply our framework to the following scenario, which again involves a nonpolyhedral regularizer and arises in the context of lowrank matrix optimization:

, takes the form , where is a given linear operator, is a given matrix, is as in scenario (S2), and is the nuclear norm regularizer; i.e., .
The validity of the error bound (EBP) in this scenario was left as an open question in [38] and to date is still unresolved.^{1}^{1}1It was claimed in [13] that the error bound (EBP) holds in scenario (S4). However, there is a critical flaw in the proof. Specifically, contrary to what was claimed in [13, Supplementary Material, Section C], the matrices and that satisfy displayed equations (37) and (38) need not satisfy displayed equation (35). The erroneous claim was due to an incorrect application of [35, Lemma 4.3]. We thank Professor Defeng Sun and Ms. Ying Cui for bringing this issue to our attention. As our second contribution in this work, we show that under a strict complementaritytype regularity condition on the optimal solution set of Problem (1), the error bound (EBP) holds in scenario (S4); see Proposition 12. This is achieved by verifying conditions (i) and (ii) mentioned in the preceding paragraph. Specifically, we first show that condition (i) is satisfied under the said regularity condition. Then, we prove that is calm everywhere, which implies that condition (ii) is always satisfied; see Proposition 11. We note that to the best of our knowledge, this last result is new and could be of independent interest. To further understand the role of the regularity condition, we demonstrate via a concrete example that without such condition, the error bound (EBP) could fail to hold; see Section 4.4.4. Consequently, we obtain a rather complete answer to the question raised by Tseng [38].
The following notations will be used throughout the paper. Let denote finitedimensional Euclidean spaces. The closed ball around with radius in is given by . For simplicity, we denote the closed unit ball in by . We use and to denote the sets of real symmetric matrices and orthogonal matrices, respectively. Given a matrix , we use or (resp. or ) to indicate that is positive semidefinite (resp. positive definite). Also, we use and to denote the Frobenius norm and spectral norm of the matrix , respectively.
2 Preliminaries
2.1 Basic Setup
Consider the optimization problem (1). Recall that its optimal value and optimal solution set are denoted by and , respectively. We shall make the following assumptions in our study:
Assumption 1
(Structural Properties of the Objective Function)

The function takes the form
(5) where is a linear operator, is a given vector, and is a convex function with the following properties:

The effective domain of is nonempty and open, and is continuously differentiable on .

For any compact convex set , the function is strongly convex and its gradient is Lipschitz continuous on .


The function is convex, closed, and proper.
Assumption 2
(Properties of the Optimal Solution Set) The optimal solution set is nonempty and compact. In particular, .
The above assumptions yield several useful consequences. First, Assumption 1(ai) implies that is also nonempty and open, and is continuously differentiable on . Second, under Assumption 1(aii), if the Lipschitz constant of on the compact convex set is , then the Lipschitz constant of on is at most , where is the spectral norm of . Third, Assumption 1 implies that is a closed proper convex function. Together with Assumption 2 and [30, Corollary 8.7.1], we conclude that for any , the level set is a compact subset of .
Assumptions 1 and 2 are automatically satisfied by a number of applications. As an illustration, consider the problem of regularized empirical risk minimization of linear predictors, which underlies much of the development in machine learning. With being the number of data points and , the problem takes the form
(6) 
where is the th component of the vector and represents the th linear prediction, is the th response, is a smooth convex loss function, and is a regularizer used to induce certain structure in the solution. It is clear that Problem (6) is an instance of Problem (1). Moreover, one can easily verify that when instantiated with the loss functions and regularizers in Table 1—which have been widely used in the machine learning literature—Problem (6) satisfies both Assumptions 1 and 2.


2.2 A Characterization of the Optimal Solution Set
Since Problem (1) is an unconstrained convex optimization problem, its firstorder optimality condition is both necessary and sufficient for optimality. Hence, we have
(7) 
The following proposition shows that under Assumptions 1 and 2, the optimal solution set admits an alternative, more explicit characterization. Such a characterization will be central to our analysis of the error bound property associated with Problem (1).
Proposition 1
Proof The proof of (8) is rather standard; cf. [36, 17]. For completeness’ sake, we include the proof here. For arbitrary , let and . Note that the line segment between and is a compact convex subset of . By Assumption 1(aii), the function is strongly convex on this set. Thus, there exists a such that
Due to (5), the above is equivalent to
Moreover, the convexity of gives
Upon adding the above two inequalities and using , we have
This implies that , for otherwise the above inequality contradicts the fact that is the optimal value of Problem (1). Consequently, the map is invariant over ; i.e., there exists a such that for all . Now, using (5) and Assumption 1(ai), we compute . Since for all , we have for all . This completes the proof of (8).
2.3 Tools from SetValued Analysis
Proposition 1 reveals that the optimal solution set of Problem (1) is completely characterized by the vectors and
. Thus, in order to estimate
for some , a natural idea is to take and an arbitrary and establish a relationship between and . Intuitively, if is “nice” (e.g., satisfies certain regularity condition), then one should be able to control the (local) growth of by that of a “nice” function of . Such an idea can be formalized using tools from setvalued analysis, which we now introduce.Let and be finitedimensional Euclidean spaces. We say that a mapping is a multifunction (or setvalued mapping) from to (denoted by ) if it assigns a subset of to each vector . The graph and domain of are defined by
respectively. The inverse mapping of , denoted by , is the multifunction from to defined by
Before we proceed further, let us briefly illustrate some of the concepts above.
Example

Let be a given matrix. The mapping defined by is a multifunction from to . Here, is simply the solution set of the linear system .

Let be a closed proper convex function. Its subdifferential is a multifunction from to . Moreover, by [30, Corollary 23.5.1], we have , where is the conjugate of .
Next, we introduce two regularity notions regarding setvalued mappings.
Definition 1
(see, e.g., [8, Chapter 3H])

A multifunction is said to be calm at for if and there exist constants such that
(10) 
A multifunction is said to be metrically subregular at for if and there exist constants such that
(11)
The notions of calmness and metric subregularity have played a central role in the study of error bounds; see, e.g., [26, 9, 31, 27, 15] and the references therein. To see what these notions would yield in the context of Problem (1), consider the multifunction given by
(12) 
Suppose that is calm at for , where and are given in Proposition 1. Note that and . Hence, by (10), there exist constants such that
(13) 
Since is equivalent to and , it follows from (13) that
(14) 
which is an error bound for with test set and residual function . Incidentally, the inequality (14) also shows that the multifunction is metrically subregular at for .
The error bound (14) shows that under a calmness assumption on the multifunction given in (12), the local growth of is on the order of , where and is arbitrary. This realizes the idea mentioned at the beginning of this subsection. However, we are ultimately interested in establishing the error bound (EBP), which is concerned with the test set (where is arbitrary and depends on ) and residual function . At first sight, it is not clear whether the error bounds (EBP) and (14) are compatible. Indeed, the former involves only easily computable quantities (i.e., and ), while the latter involves quantities that are generally not known a priori (i.e., , , and ). Nevertheless, as we shall demonstrate in Section 3, the latter can be used to establish the former under some mild conditions.
Before we leave this section, let us record two useful results regarding the notions of calmness and metric subregularity. The first is a wellknown equivalence between the calmness of a multifunction and the metric subregularity of its inverse. One direction of the equivalence has already manifested in our discussion above.
Fact 1
(see, e.g., [8, Theorem 3H.3]) For a multifunction , let . Then, is calm at for if and only if its inverse is metrically subregular at for .
The second result concerns a multifunction that is calm at for a set of points . It shows that if is compact, then the neighborhoods around each in the definition of calmness can be made uniform.
Proposition 2
For a multifunction , let and suppose that is compact. Then, the following statements are equivalent:

is calm at for any .

There exist constants such that
Proof It is clear that (b) implies (a). Hence, suppose that (a) holds. By (10), given any , there exist constants such that
(15) 
Let denote the open unit ball around the origin in . Then, the set forms an open cover of the compact set . Hence, by the HeineBorel theorem, there exist points (where is finite) such that . We claim that there exists a constant such that . Indeed, suppose that this is not the case. Then, for , we can find vectors such that for ,
Since is compact and , by passing to a subsequence if necessary, we may assume that for some . Then, we have
which shows that . On the other hand, since , there exists an index such that . This implies that
for which contradicts the fact that . Thus, the claim is established.
Now, upon setting , we obtain
where the second inclusion is due to (15) and the fact that for . This completes the proof.
3 Sufficient Conditions for the Validity of the Error Bound (Ebp)
Following our discussion in Section 2.3, we now show that under Assumptions 1 and 2, the error bound (EBP) is implied by certain calmness property of the multifunction given in (12). This is achieved by exploring the relationships between error bounds defined using different test sets and residual functions. For the sake of convenience, we shall refer to the multifunction given in (12) as the solution map associated with Problem (1) in the sequel.
3.1 Error Bound with NeighborhoodBased Test Set
To begin, recall that the error bound (EBP) involves the test set , where is arbitrary and depends on . The following proposition shows that under Assumptions 1 and 2, we can replace the test set by a neighborhood of . This would facilitate our analysis of the relationship between the error bound (EBP) and the calmness of the solution map , as the latter is also defined in terms of a neighborhood of .
Proposition 3
Proof To establish the error bound (EBP), it suffices to show that for any , there exists an such that
Suppose that this does not hold. Then, there exist a scalar and a sequence in such that for and , but for . Since is compact by Assumption 2, by passing to a subsequence if necessary, we may assume that for some . Using the fact that is 1Lipschitz continuous on (see, e.g., [6, Lemma 2.4]) and is continuous on (Assumption 1(ai)), we see that is continuous on . This, together with the fact that , implies that ; i.e., . However, this contradicts the fact that for , and the proof is completed.
Before we proceed, two remarks are in order. First, the reverse implication in Proposition 3 is also true if, in addition to Assumptions 1 and 2, the optimal solution set of Problem (1) is contained in the relative interior of . However, since we will mostly focus on sufficient conditions for the error bound (EBP) to hold, we will not indulge in proving this here. Second, for those instances of Problem (1) that do not satisfy Assumption 2, one or both of the error bounds (EBP) and (EBN) could fail to hold. The following example demonstrates such possibility.
Example Let . Define the function by
and take to be the indicator function of . Furthermore, let be given by . It can be verified that is convex and continuously differentiable on with
Moreover, we have
which shows that the level sets of are closed but not bounded. It follows that is a closed proper convex function with and
Next, we determine the residual map on . Recall that
Since is the indicator function of , it is easy to see that is the projection operator onto . Note that for each , the function
is decreasing in . Moreover, it can be verified that for all . It follows that
In particular, we have for all .
Now, observe that for any , if satisfies , then for any . However, we have for any and . It follows that there do not exist constants such that (EBP) holds. Similarly, for any , we have
Since for any , there does not exist a constant such that (EBN) holds. In fact, the same arguments show that the instance in question does not possess a Hölderian error bound; i.e., the error bounds (EBP) and (EBN) fail to hold even if one replaces the inequality by for any .
3.2 Error Bound with Alternative Residual Function
As the reader would recall, a motivation for using as the residual function is that the optimal solution set can be characterized as . Since admits the alternative characterization (9), we can define another residual function by
and consider the error bound
(EBR) 
where are constants and , are given in Proposition 1. Our interest in the error bound (EBR) stems from the following result, which reveals that it is closely related to certain calmness property of the solution map :
Proposition 4
Proof Suppose that the error bound (EBR) holds. Let be arbitrary and suppose that . In particular, we have and . Using the inequality , which is valid for all , we see from (EBR) that
Since , this implies that . Hence, is calm at for any .
Conversely, suppose that is calm at for any
Comments
There are no comments yet.