 # On the Location of the Minimizer of the Sum of Strongly Convex Functions

The problem of finding the minimizer of a sum of convex functions is central to the field of distributed optimization. Thus, it is of interest to understand how that minimizer is related to the properties of the individual functions in the sum. In the case of single-dimensional strongly convex functions, it is easy to show that the minimizer lies in the interval bracketed by the smallest and largest minimizers of the set of functions. However, a similar characterization for multi-dimensional functions is not currently available. In this paper, we provide an upper bound on the region containing the minimizer of the sum of two strongly convex functions. We consider two scenarios with different constraints on the upper bound of the gradients of the functions. In the first scenario, the gradient constraint is imposed on the location of the potential minimizer, while in the second scenario, the gradient constraint is imposed on a given convex set in which the minimizers of two original functions are embedded. We characterize the boundaries of the regions containing the minimizer in both scenarios.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

The problem of distributed optimization arises in a variety of applications, including machine learning

[1, 2, 3, 4], control of large-scale systems [5, 6], and cooperative robotic systems [7, 8, 9, 10, 11]. In such problems, each node in a network has access to a local convex function (e.g., representing certain data available at that node), and all nodes are required to calculate the minimizer of the sum of the local functions. There is a significant literature on distributed algorithms that allow the nodes to achieve this objective [12, 13, 14, 15, 16, 17, 18]. The local functions in the above settings are typically assumed to be private to the nodes. However, there are certain common assumptions that are made about the characteristics of such functions, including strong convexity and bounds on the gradients (e.g., due to minimization over a convex set).

In certain applications, it may be of interest to determine a region where the minimizer of the sum of the functions can be located, given only the minimizers of the local functions, their strong convexity parameters, and the bound on their gradients (either at the minimizer or at the boundaries of a convex constraint set). For example, when the network contains malicious nodes that do not follow the distributed optimization algorithm, one cannot guarantee that all nodes calculate the true minimizer. Instead, one must settle for algorithms that allow the non-malicious nodes to converge to a certain region [19, 20]. In such situations, knowing the region where the minimizer can lie would allow us to evaluate the efficacy of such resilient distributed optimization algorithms. Similarly, suppose that the true functions at some (or all) nodes are not known (e.g., due to noisy data, or if the nodes obfuscate their functions due to privacy concerns). A key question in such scenarios is to determine how far the minimizer of the sum of the true functions can be from the minimizer calculated from the noisy (or obfuscated) functions. The region containing all possible minimizers of the sum of functions (calculated using only their local minimizers, convexity parameters, and bound on the gradients) would provide the answer to this question.

When the local functions at each node are single dimensional (i.e., ), and strongly convex, it is easy to see that the minimizer of the sum of functions must be in the interval bracketed by the smallest and largest minimizers of the local functions. This is because the gradients of all the functions will have the same sign outside that region, and thus cannot sum to zero. However, a similar characterization of the region containing the minimizer of multidimensional functions is lacking in the literature, and is significantly more challenging to obtain. For example, the conjecture that the minimizer of a sum of convex functions is in the convex hull of their local minimizers can be easily disproved via simple examples; consider and with minimizers and respectively, whose sum has minimizer . Thus, in this paper, our goal is to take a step toward characterizing the region containing the minimizer of a sum of strongly convex functions. Specifically, we focus on characterizing this region for the sum of two strongly convex functions under various assumptions on their gradients (as described in the next section). As we will see, the analysis is significantly complicated even for this scenario. Nevertheless, we obtain such a region and gain insights that could potentially be leveraged in future work to tackle the sum of multiple functions.

## Ii Notation and Preliminaries

Sets: We denote the closure and interior of a set by and , respectively. The boundary of a set defined as .

Linear Algebra: We denote by the -dimensional Euclidean space. For simplicity, we often use

to represent the column vector

. We use to denote the -th basis vector (the vector of all zeros except for a one in the -th position). We denote by the Euclidean norm and by the angle between vectors and . Note that . We use and to denote the open and closed ball, respectively, centered at of radius .

Convex Sets and Functions: A set in is said to be convex if, for all and in and all in the interval , the point also belongs to . A differentiable function is called strongly convex with parameter (or -strongly convex) if holds for all points in its domain. We denote the set of all -strongly convex functions by .

## Iii Problem Statement

We will consider two scenarios in this paper. We first consider constraints on the gradients of the local functions at the location of the potential minimizer, and then consider constraints on the gradients inside a convex constraint set.

### Iii-a Problem 1

Consider two strongly convex functions and . The two functions and have strong convexity parameters and , respectively, and minimizers and , respectively. Let denote the minimizer of , and suppose that the norm of the gradients of and must be bounded above by a finite number at

. Our goal is to estimate the region

containing all possible values satisfying the above conditions. More specifically, we wish to estimate the region

 M(x∗1,x∗2,σ1,σ2,L)≜{x∈Rn:∃f1∈S(σ1),∃f2∈S(σ2),∇f1(x∗1)=0,∇f2(x∗2)=0,∇f1(x)=−∇f2(x),∥∇f1(x)∥=∥∇f2(x)∥≤L}. (1)

For simplicity of notation, we will omit the argument of the set and write it as or .

### Iii-B Problem 2

Consider two strongly convex functions and . The two functions and have strong convexity parameters and , respectively, and minimizers and , respectively. Suppose that we also have a compact convex set containing the minimizers and . Let denote the minimizer of within the region . The norm of the gradients of both functions and is bounded above by a finite number everywhere in the set . Our goal is to estimate the region containing all possible values satisfying the above conditions. More specifically, define to be the family of functions that are -strongly convex and whose gradient norm is upper bounded by everywhere inside the convex set :

 F(σ,L,C)≜{f:f∈S(σ),∥∇f(x)∥≤L,∀x∈C}.

Then, we wish to characterize the region

 N(x∗1,x∗2,σ1,σ2,L)≜{x∈Rn:∃f1∈F(σ1,L,C),∃f2∈F(σ2,L,C),∇f1(x∗1)=0,∇f2(x∗2)=0,∇f1(x)=−∇f2(x)}. (2)

For simplicity of notation, we will omit the argument of the set and write it as or .

### Iii-C A Preview of the Solution

We provide two examples of the region containing the minimizer of the sum of -dimensional functions in both scenarios in Fig. 1, where and are the minimizers of and , respectively; we derive these regions in the rest of the paper. Notice that the region containing set (the area bounded by the red line) is bigger than the region containing set (the area bounded by the blue line). In addition, even though we have changed the shape of convex set in the two examples, the minimizer regions are similar. Fig. 1: The red lines are the boundary of the region that contains M, while the blue lines are the boundary of the region that contains N, where convex sets C1 and C2 are a circle (Left) and a box (Right) respectively.

## Iv Problem 1: Gradient Constraint at Location of Potential Minimizer

In this section, we consider the first scenario when the gradient constraint is imposed on the location of the potential minimizer and derive an approximation to the set in (1).

Consider functions with minimizer and with minimizer . Without loss of generality, we can assume and for some , since for any and such that , we can find a unique affine transformation that maps the original minimizers into these values and also preserves the distance between these points i.e., . The minimizer region in the original coordinates can then be obtained by applying the inverse transformation to the derived region.

We will be using the following functions throughout our analysis. For , define

 ~ϕi(x,L)≜arccos(σiL∥x−x∗i∥), (3)

for all such that . For simplicity of notation, if is a constant, we will omit the arguments and write it as or . Furthermore, for all , define

 ψ(x)≜π−(α2(x)−α1(x)),

where is the angle between and i.e., .

###### Lemma 1

Necessary conditions for a point to be a minimizer of when the gradients of and are bounded by at are (i) for , and (ii) .

From the definition of strongly convex functions,

 (∇fi(x)−∇fi(y))T(x−y)≥σi∥x−y∥2

for all and for . Since and are the minimizers of and respectively, we get

 (∇fi(x)−∇fi(x∗i))T(x−x∗i) ≥σi∥x−x∗i∥2 ⇒∇fi(x)Tx−x∗i∥x−x∗i∥ ≥σi∥x−x∗i∥≥0. (4)

Let be the unit vector in the direction of and , with as shown in Fig. 2. From (4), we get

 ∇fi(x)Tui(x)=∥∇fi(x)∥cos(ϕi(x))≥σi∥x−x∗i∥.

If is a candidate minimizer then we can apply the gradient norm constraint to the above inequality to obtain

 cos(ϕi(x)) ≥σiL∥x−x∗i∥. (5)

If then . On the other hand, if then there is no that can satisfy the inequality (5). Therefore, if or , we conclude that cannot be the minimizer of the function . Fig. 2: The quantities ϕi(x0) represent the angles between ∇fi(x0) and ui(x0). The quantities ~ϕi(x0) represent the maximum possible values for ϕi(x0) in order for x0 to be a minimizer. In other words, the angles ϕ1(x0) and ϕ2(x0) must lie in the shaded regions.

Suppose that for so that and are well-defined. In order to capture the possible gradient of at point , define a set of vectors whose norms are at most and satisfy (5):

 G1(x)≜{g∈Rn:∥g∥≤L,∠(g,u1(x))≤arccos(σ1L∥x−x∗1∥)}.

Since can be the minimizer of the function only when , we define a set of vectors whose norms are at most and satisfy (5) to capture the possible negated gradient vectors of :

 G2(x)≜{g∈Rn:∥g∥≤L,∠(−g,u2(x))≤arccos(σ2L∥x−x∗2∥)}.

Note that can be viewed geometrically as the angle between and as shown in Fig. 2. If , then cannot be the minimizer of the function because it is not possible to choose and such that satisfy inequality (5) for and simultaneously. Fig. 3: The green region in the figure is the set G1(x0) and the yellow region is the set G2(x0). These regions are defined by the angles ~ϕ1 and ~ϕ2. If these regions overlap, the point x0 is a minimizer candidate.

Recall that with for , i.e., Note that due to the definition of . Then, the angle between and is . Therefore, the angle between and is equal to .

Let be the maximum angle of that satisfies inequality (5), i.e., as given by (3). By the definition of , if , there is an overlapping region caused by and as shown in Fig. 3 and there exist gradients and such that . On the other hand, if then and it is not possible to choose gradients and such that they cancel each other. In this case, we can conclude that this cannot be the minimizer of the function .

Note that angles , , , and can be expressed as a function of , , and . Thus, from the proof of Lemma 1, the inequality depends only on the distance between the three points , , and . Therefore, the candidate minimizer property of can be fully described by the 2-D picture in Fig. 3.

Now we consider the relationship between set in (1) (which is the set that we want to identify) and certain other sets which we define below. Define the set

 ^M(x∗1,x∗2)≜{x∈Rn:~ϕ1(x)+~ϕ2(x)≥ψ(x),∥x−x∗1∥≤Lσ1,∥x−x∗2∥≤Lσ2}. (6)

Note that based on Lemma 1, contains the minimizers of .

Define to be the set of points such that there exist strongly convex functions (with given strong convexity parameters and minimizers) whose gradients can be bounded by at those points:

 H(x∗1,x∗2)≜{x∈Rn:∃f1∈S(σ1),∃f2∈S(σ2),∇f1(x∗1)=0,∇f2(x∗2)=0,∥∇f1(x)∥≤L,∥∇f2(x)∥≤L}. (7)

Define to be the set of points such that there exists a -strongly convex function with minimizer whose gradient is bounded by at those points:

 Hi(x∗i)≜{x∈Rn:∃fi∈S(σi),∇fi(x∗i)=0,∥∇fi(x)∥≤L},i=1,2.
###### Lemma 2

and .

From Lemma 1, we get . From the definition of a strongly convex function,

 (∇fi(x)−∇fi(y))T(x−y)≥σi∥x−y∥2

for all where . Substitute into to get

 (∇fi(x)−∇fi(x∗i))T(x−x∗i) ≥σi∥x−x∗i∥2 ⇔∥∇fi(x)∥∥x−x∗i∥cos(ϕi(x)) ≥σi∥x−x∗i∥2 ⇒L ≥σi∥x−x∗i∥ ⇔∥x−x∗i∥ ≤Lσi (8)

where the equality occurs when is chosen such that and . Note that the above sequence of inequalities uses the fact that and . Since , from (8), we have .

For the converse, consider . By choosing a quadratic function where , one can easily verify that and . So, we have .

From the definition of and , we get . Finally, since the conditions of the set are the same as the last two conditions in the set , we get .

The result from Lemma 2 shows that the set contains the set from (1) within it. Thus, we will derive the equation of the boundary of in -dimensional space from the angles defined in (3), and the necessary condition .

From this point, we will denote where and .

###### Lemma 3

(i) if and only if .
(ii) if and only if .

Consider case (i) with . First, suppose where . Since , . By the location of , we get and . Consequently, we obtain . Since , the inequality holds. This means that .

Second, suppose where and . By the location of , we get , , and . Consequently, we obtain . In order to satisfy the inequality , we have to choose and . However, since , , and , we get and conclude that . Thus, we have .

If , then and therefore . Combining the analysis above, we can conclude that if and only if . A similar proof applies to case (ii).

Define the set of points

 Tn(L)={(z1,z)∈Rn:z21+∥z∥2−r2d21d22+σ1σ2L2= ⎷1d21−σ21L2⋅ ⎷1d22−σ22L2}

where and . For simplicity of notation, if is a constant, we will omit the argument and write it as . In addition, since , we can write for as follows:

 ∂H1={(z1,z)∈Rn:(z1+r)2+∥z∥2=L2σ21}
 ∂H2={(z1,z)∈Rn:(z1−r)2+∥z∥2=L2σ22}.
###### Lemma 4

The set is equivalent to .

From Fig. 3, for any point with , the -axis equations are given by (with elided for notational convenience)

 z1=d1cosα1−r =d2cosα2+r, ⇔cosα1=z1+rd1and cosα2=z1−rd2. (9)

The -axes equations are given by

 ∥z∥ =d1sinα1=d2sinα2, ⇔sinα1 =∥z∥d1andsinα2=∥z∥d2. (10)

Consider the equation

 ~ϕ1(x)+~ϕ2(x) =π−(α2(x)−α1(x)). (11)

Since , we get . Since and , . Thus, since the cosine function is one-to-one for this range of angles, equation (11) is equivalent to

 cos(~ϕ1+~ϕ2) =cos(π−(α2−α1)) ⇔cos(~ϕ1+~ϕ2) =−cos(α2−α1).

Expanding this equation and substituting (9), (10), and for , we get

Dividing the above equation by and rearranging yields .

For convenience, we define for and . We also define and .

In the following lemma, we will show that if we consider the points in or , we can simplify the angle condition given in (6) for .

###### Lemma 5

Consider .
(i) If then if and only if .
(ii) If then if and only if .

Consider part (i). Since , we get and thus from (3). Consider the inequality

 ~ϕ1(x)+~ϕ2(x)<π−(α2(x)−α1(x)).

Substitute and take cosine of both sides of the inequality (and use (3)) to get

 σ2Ld2>−cos(α2(x)−α1(x)).

Expand the cosine and substitute the equations (9) and (10) to obtain

 σ2Ld2>−z21+∥z∥2−r2d1d2. (12)

Since , we have , and . Also, . Multiply the inequality (12) by and then substitute , , and to get

 σ2σ1(−4rz1+L2σ21)>2r2+2rz1−L2σ21 ⇔ z1(2r+4rσ2σ1)<σ2σ1⋅