Value of Information in Greedy Submodular Maximization

07/25/2018 ∙ by David Grimsman, et al. ∙ The Regents of the University of California 0

The maximization of submodular functions an NP-Hard problem for certain subclasses of functions, for which a simple greedy algorithm has been shown to guarantee a solution whose quality is within 1/2 of that of the optimal. When this algorithm is implemented in a distributed way, agents sequentially make decisions based on the decisions of all previous agents. This work explores how limited access to the decisions of previous agents affects the quality of the solution of the greedy algorithm. Specifically, we provide tight upper and lower bounds on how well the algorithm performs, as a function of the information available to each agent. Intuitively, the results show that performance roughly degrades proportionally to the size of the largest group of agents which make decisions independently. Additionally, we consider the case where a system designer is given a set of agents and a global limit on the amount of information that can be accessed. Our results show that the best designs are to partition the agents into equally-sized sets, and allow agents to access the decisions of all previous agents within the same set.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The optimization of submodular functions is a well-studied topic due to its application in many common engineering problems. Examples include information gathering [17], maximizing influence in social networks [15], image segmentation in image processing [16]

, multiple object detection in computer vision 

[3]

, document summarization 

[19], path planning of multiple robots [27], sensor placement [18, 21], and resource allocation in multi-agent systems [21]. The key thread in these problems is that each exhibits some form of a “diminishing returns” property, e.g., adding more sensors to a sensor placement problem improves performance, but every additional sensor marginally contributes less to the overall performance as the number of sensors increases. Any problem exhibiting such behavior can likely be formulated as a submodular optimization problem.

While polynomial algorithms exist to solve submodular minimization problems, [13, 14, 26], maximization has been shown to be NP-Hard for important subclasses of submodular functions [20]. Thus a tremendous effort has been placed on developing fast algorithms that approximate the solution to the submodular maximization problem [24, 8, 22, 4, 32, 28, 25]. A resounding message from this extensive research is that very simple algorithms can provide strong guarantees on the quality of the approximation.

The seminal work in [8] demonstrates that a greedy algorithm provides a solution that is within of the quality of the optimal solution. In fact, more sophisticated algorithms can often be derived for certain classes of submodular maximization problems that push these guarantees from to [24, 5, 7]. Progress beyond this level of suboptimality is not possible in general, because it was also shown that no polynomial-time algorithm can achieve a higher guarantee than , unless [6].

One appealing trait of the greedy algorithm is that it can be implemented in a distributed way while still maintaining the performance guarantee. In the distributed greedy algorithm, each agent in the set sequentially selects its choice by greedily optimizing the global objective function conditioned on the decisions of the previous agents. However, this requires agents to have full access to the decisions of all previous agents. In addition, each agent must have access to the value of the global objective function for previous agents’ decisions, plus any choice in its own decision set. In many cases, these informational demands may be costly or infeasible.

Research has therefore begun to explore how limited information can impact the performance of the greedy algorithm on distributed submodular maximization problems. For example, [21] focuses on the submodular resource allocation problem, modeled as a game played among agents. The resulting Nash equilibria have the familiar performance guarantee, however it is shown that when information is limited to be local instead of global, the performance guarantee degrades to , where is the number of agents. The work in [23] formulates the problem of selecting representative data points from a large corpus as a submodular maximization problem. In order to perform the optimization in a distributed way, agents are partitioned into sets, where the full greedy algorithm is performed among agents within a set, while no information is transferred between sets. In this setting, the paper shows that the algorithm performance is worse than , even when a preprocessing algorithm is used to intelligently assign decision sets to each agent. Other work in [25] discusses the role of information in the task assignment problem. It is shown that the distributed greedy algorithm can be implemented asynchronously, with convergence in a finite number of steps. Additionally, when agent action sets are based on spatial proximity, agents need only consider local information to achieve the bound. Finally, the work in [10] studies the performance of the distributed greedy algorithm when an agent can only observe a local subset of its predecessors. It is shown that localizing information, particularly when agents are partitioned from each other, leads to a degradation in performance. For instance, in the case where agents are partitioned into sets, performing the full greedy algorithm within the set and obtaining no information outside the set, the performance degrades proportionally to the number of sets in the partition.

This paper more closely relates to the work done in [10] in evaluating informational constraints. We leverage a similar model and seek to find how limiting which decisions an agent can access impacts the overall performance of the distributed greedy algorithm. We also consider the scenario where a system designer is given a set of agents and a global limit on the amount of information that can be accessed, and seek to find the best policies to ensure the highest performance possible.

More specifically, the contributions of this paper are the following results:

  1. Theorem 1 gives lower and upper bounds on worst-case performance of the greedy algorithm on any submodular function for any given set of constraints. The bounds show that worst-case performance (roughly) degrades proportionally to the size of the largest group of agents which make decisions independently.

  2. Theorem 2 shows the best performance of the greedy algorithm that a system designer can achieve with a fixed number of agents and constraints. The results show that when information is costly, the best system design is to partition the agents into equal sets, and have agents in each set execute the full greedy algorithm.

The remainder of this paper is dedicated to proving and discussing these two theorems.

Ii Model

This paper focuses on a distributed algorithm for solving submodular maximization. To that end, let be a set of elements and have the following properties:

  • Normalized: .

  • Monotonic: For , .

  • Submodular: For and , the following holds:

    (1)

For simplicity, we will refer to a function with all three above properties merely as submodular.

In this paper we focus on distributed approaches to submodular optimization where there are a set of decision-making agents , and each agent is associated with an action set . Notationally, we define as the family of action sets, an action for agent as , and an action profile as . We also overload the notation of to allow multiple actions as inputs: , and for . The submodular maximization problem addressed in this work is to find

(2)

As shown in [10], this type of constraint where we choose from a family of subsets is also referred to in the literature as a partition matroid constraint.

We next present two relevant problems that can be modeled accordingly. This serves to give a scope and relevance to the model, as well as provide an example that will be leveraged throughout the rest of the paper.

Example 1 (Vehicles target assignment problem [1]).

Consider the classic vehicles target assignment problem where there are a collection of targets and each target has an associated value . Further, there exists a collection of agents, and each agent

is associated with a success probability

and a set of possible assignments . The agents make decisions to reach a feasible allocation of agents to targets that optimizes a system-level performance metric of the form:

(3)

Note that the objective function given in (3) is submodular, as can be expressed as a function of the form for an appropriate choice of the domain set , i.e., and the action sets can be expressed as disjoint sets in .

Example 2 (Weighted set cover problem [9]).

Consider the subset of the vehicles target assignment problem where for all . Then, the objective function takes on the form

(4)

Note that (4) can now be expressed by a submodular function with domain . An instance of such a problem is shown in Figure 1.

We will henceforth focus on submodular functions of the form , where , without explicitly highlighting the structure of and the action sets . Furthermore, we will rely on the weighted set cover problem in several of the forthcoming proofs.

Iii The Greedy Algorithm

(a) The setup of a weighted set cover problem. The targets are , each represented by a box, and each with a corresponding value. The available choices to each agent are represented by the black lines (both dotted and solid) - for instance , , etc. The dashed lines represent an optimal set of choices. The goal for the agents is to maximize in (4). Using the generalized distributed algorithm (i.e., agents choose according to (7)), agent 1 chooses , since . Then, agent 2, who (according to the graph) does not know that agent 1 has chosen , also chooses , since . Agent 3 observes that agents 1 and 2 have both chosen , so it chooses , since . Finally, agent 4, observing that agent 1 has chosen (but not that agent 3 has chosen ), chooses , since . These results are summarized in the table below.
Algorithm
Optimal 9
Distributed Greedy 8
Generalized Distributed Greedy 6
(b) For the weighted set cover problem outlined above, this table shows the agents’ decisions in an optimal case, the case where the distributed greedy algorithm is used (agents choose according to (5)) and the case where the generalized distributed algorithm is used (agents choose according to (7), constrained to the graph shown above). The difference between the distributed greedy algorithm and the generalized version can be seen in the choices of agents 2 and 4. Agent 2 chooses when it can observe that has already been chosen by agent 1, otherwise it chooses . Likewise, agent 4 chooses only when it knows that has already been selected. Therefore, as the informational constraints grow, the solution quality decreases. As a note, in this case we see that .
Fig. 1: An instance of the weighted set cover problem and the performance of the greedy algorithm in solving it.

One of the most well-studied algorithms to solve the submodular maximization problem is the greedy algorithm. This algorithm requires agents to make decisions sequentially 111Although we state that the agents must choose sequentially, the real restriction is that the flow of information in the system is acyclic. Accordingly, if the agents select their decisions with regards to another process, e.g., a synchronous best reply process, they will still arrive at the same solution. Once agent 1 has chosen, it will not make a different choice regardless of order, and agent 2 will not switch after agent 1 has decided, etc. See [10] for more details). , so without loss of generality we impose an ordering on the agents according to their labels, i.e., agent 1 chooses first, agent 2 chooses second, etc. Agent makes its choice based on the following rule:

(5)

In words, each agent selects the action that would optimize the objective function given knowledge of the action choices of the previous agents and the global objective function .

It is well-known that for any submodular , any set of action spaces , and any order of agents, the quality of the resulting solution derived from the distributed greedy algorithm compared to the optimal solution satisfies

(6)

where . In other words, the quality of the solution is within that of the optimal [24]. For special classes of submodular functions, and additional constraints placed on (for instance if ), [24] also shows that the solution to the greeedy lies within of the optimal.

In (5), agent must have access to the decisions of all previous agents. However, there are many applications where this level of informational demand may be impractical. A more generalized version of the distributed greedy algorithm is proposed in [10], where each agent makes its choice using the following rule:

(7)

where , and . The sets characterize the informational constraints of the agent in the sense that is the set of agents whose choices agent can access when making its own action decision. The central topic of discussion here is how the structure of impacts the performance guarantees associated with this generalized greedy algorithm. Note that (7) is simply one decision rule that could be used by the agents. Analysis of whether this rule is optimal among all possible decision rules given the local information is the topic of ongoing research.

It is helpful to model the informational constraints as a graph , where is a set of nodes and is a set of directed edges between nodes. In this scenario each node is an agent (and thus we use the terms interchangeably) and each edge implies that , i.e., is the set of in-neighbors for vertex . Since there is an imposed ordering on the vertices, and the agents choose sequentially, the set is the set of admissible graphs that correspond to a set of informational constraints.

The solution of the algorithm defined by (7) is denoted and the optimal decisions as to explicity highlight the dependence of (7) on the graph . We define the efficiency guarantees associated with this algorithm as:

(8)

Note that could in fact be a set when (7) is not unique, so we write with the understanding that is evaluated at the worst possible candidate solution, i.e., .

One goal of this paper is to characterize the efficiency guarantees associated with this more generalized version of the distributed greedy algorithm for any submodular function and action sets . To that end, we define

(9)

In words, is the worst-case efficiency for any and family of sets as defined above, given the informational constraints among the agents represented by .

Iv Efficiency Bounds

In this section we present lower and upper bounds for the worst-case efficiency based on the structure of the graph . We begin with some preliminaries from graph theory, and then present the bounds.

Iv-a Preliminaries

(a) In this graph, there are 4 cliques of size 1 (one for each node), 5 cliques of size 2 (one representing each edge), and 2 cliques of size 3 (the sets and ). Thus . A minimum clique cover is , so . The maximum independent set is , thus . Since , we know that . Appendix -A shows that , making it a graph that meets the upper bound for Theorem 1 (see Section IV-C). Lastly, it is also an example of a graph without the Sibling Property (see Section V-C), since no such exists from Definition 1.
(b) A graph where , , , and maximizes (12). As a note, this is the graph with the fewest number of nodes and edges such that . This is also a graph with the Sibling Property (see Section V-C), since for maximum independent set , , so from Definition 1.
Fig. 2: Two example graphs showcasing the graph properties defined in Section IV-A. These graphs will be referred to throughout the paper to illustrate the tightness of bounds in Theorem 1 and to illustrate the Sibling Property (see Section V-C).

For all definitions in this section, we assume that is a general directed graph. We begin with cliques: a clique is a set of nodes such that for every , either or . The clique number is the number of nodes in the largest clique in . We denote by the set of all cliques in . A clique cover is a partition on such that the nodes in each set of the partition form a clique. The clique cover number is the minimum number of sets needed to form a clique cover of . For an example, see Figure 1(a).

Another important notion in graph theory is that of independence. An independent set is a set of vertices such that implies . A maximum independent set is an independent set of such that no other independent set has more vertices. The independence number is the number of nodes in the largest independent set in . For an example, see Figure 1(a).

The work in [11]

equivalently characterizes the independence number as the solution to an integer linear program 

222It is actually the chromatic number and clique number that are defined this way in [11]. However, using graph complementarity, it is an easy extension to show that the solution to the linear program in (10) yields a maximum independent set.. Let

be the binary matrix whose rows are indicator vectors for the cliques in

. In other words, if node belongs to clique in , and 0 otherwise. Note that also includes cliques of size 1 (the individual nodes). Then is given by

(10)
subject to

It is similarly shown that is characterized by the dual to this problem, implying that . As an example, for the graph in Figure 1(a),

(11)

Using this in (10), it is straightforward to show that the optimal solution is , i.e., , and the maximum independent set is .

Note by defintion that and are always positive integers. However, in many applications, it is helpful to consider a real-valued relaxation on these notions: this is the motivation for fractional graph theory [11]. Here we leverage the fractional independence number , which we define as the real-valued relaxation to (10): 333Another defintion of fractional independence exists in the literature (see [2]), which was created to preserve certain properties of graph independence (such as nested maximality), but has not been shown to preserve , where is the complement graph of and is the fractional clique number of .

(12)
subject to

Likewise, , the fractional clique cover number of , can be defined by its dual

(13)
subject to

In accordance with the Strong Duality of Linear Programming, it follows that:

(14)

An example of a graph where the independence number differs from the fractional independence number is found in Figure 1(b).

Iv-B Result

We now present results regarding the quality of the solution provided by the generalized distributed greedy algorithm subject to the informational constraints represented by a graph . We show that the performance degrades proportionally to the fractional independence number of 444In comparison to the bounds shown in [10], the bounds shown in our work are tighter in all cases. In fact, except in certain corner cases (for example, both bounds are the same on a full clique), our results are strictly tighter..

Theorem 1.

For any graph ,

(15)

The upper bound shows that it is impossible to construct a graph such that the greedy algorithm’s performance is better than for all possible and . Likewise, the lower bound means that no and can result in a performance lower than .

The formal proof for this theorem is given in Section IV-D, but here we give a brief outline of the argument. For the upper bound, we develop a canonical and , which are dependent on the cliques in . Then we show that for this example, is the inverse of the solution to (12), i.e., . The lower bound is found by leveraging the properties of submodularity and monotonicity, showing that the highest lower bound requires solving (13). Then, using (14), we show that Theorem 1 holds.

Iv-C Examples

Theorem 1 shows lower and upper bounds on , but we have not shown whether either of these bounds is tight. There exist graphs for which and can be chosen to meet the lower bound, and there also exist graphs whose lower bound can be proven to meet the upper bound. In this section, we provide an example of each.

Fig. 3: An example of a graph where , and an instance of a weighted set cover problem using the same notation as in Figure 1. Here , and we can see that . The worst-case results from the generalized distributed greedy algorithm occur when , and therefore . This means , so the lower bound in Theorem 1 is tight for this graph.
Example 3.

The weighted set coverage problem presented in Figure 3 is an example showing that the lower bound from Theorem 1 is tight. For this graph , . As shown, . Since by definition, it follows that .

Example 4.

The graph in Figure 1(a) is an example where the upper bound from Theorem 1 is tight. Here , and it is shown in Appendix -A for this graph that no and can be constructed to give a worse efficiency than .

Iv-D Proof for Theorem 1

Before starting the proof, we introduce some notation. Let and . Then we define

(16)

to be the marginal contribution of the choice of agent given the choices of the agents in set . One property to note about is the following holds when is a clique:

(17)

We now begin the theorem proof. We first show the upper bound by constructing a canonical example and , which can be applied to any , and show that . Then we prove the lower bound leveraging the properties of submodularity and monotonicity.

Consider a base set of elements and let for all . We endeavor to design submodular so that for all , and in the worst case . Let have the following restrictions:

  1. .

  2. .

  3. for any .

  4. for all and .

First note that Restrictions 2 and 3 hold if and only if for all

(18)

If one sets values for all which meet this requirement, a full function on can be created with a simple extension. Let , then define

(19)

We claim the function created in this way is normalized, monotone, and submodular. For , we know by Restriction 4 that for , . For , it should be clear from (19) that . We also see that (monotone) and (normalized).

Restrictions 1, 3, and 4 imply that, since agents are choosing according to (7), every agent is choosing between equally desirable choices. However, it should be clear from Restriction 4 that , and in the worst case . Therefore,

(20)

which is true by Restrictions 1, 2 and 4. This shows how the values directly determine the upper bound on efficiency of the algorithm for this scenario.

Since by definition , one can find the lowest such upper bound on by setting so as to maximize (from (20)) subject to the constraint in (18). This is precisely the linear program in (12), whose value is . Therefore is an upper bound on .

For the lower bound, let and for ease of notation. Then consider the following:

(21)
(22)
(23)

where (22) follows from submodularity and (23) follows from the decision-making rule in (7). Let and let for . Then it follows that

(24)
(25)
(26)

where (25) is true by submodularity and (26) is true by (17) with some algebraic manipulation. The procedure followed in (24)–(26) can be thought of as an algorithm to put the sum in (23) into a more convenient form (for our purposes) with respect to some clique . One could run the same procedure on the sum in (26) with respect to some clique , to obtain:

(27)

Adding more cliques to the set simply requires an update on the two places that appears in the equation. Therefore, if the set of cliques is the full set , then

(28)

Note that this holds for any set of , but in order to remove the third term from the inequality, we impose a constraint that . Alternatively stated, we require that

(29)

Under these conditions and by monotonicity, (28) becomes , which implies that

(30)

To find the highest such lower bound on , one needs to find the minimum value for the sum in (30), subject to the constraint in (29). This is precisely the linear program defined in (13). Therefore, it follows that . ∎

V Optimal Structures

In this section, we describe how to build a graph that yields the highest efficiency subject to a constraint on the number of edges.

V-a Preliminaries

(a) The Turán graph , where . No other graph with 8 vertices can have more edges without also having a clique of size 4 or higher.
(b) The complement Turán graph , where . No other graph with 8 vertices can have fewer edges without also having an independent set of size 4 or higher.
Fig. 4: A Turán graph and its complement

We denote and , i.e., is a graph in that maximizes efficiency. The complement of graph is such that and if and only if . It is straightforward to show that .

In graph theory a Turán graph is a graph with vertices created with the following algorithm:

  1. Partition the vertices into disjoint sets such that for all .

  2. Create edges between all nodes not within the same set.

A result known as Turán’s theorem states that is an -node graph with the highest number of edges that has clique number or smaller [29]. Alternatively stated,

(31)
Fig. 5: The efficiency of for all values of , with example graphs for a few values of . Notice the “dead zones”, where adding more edges does not lead to any higher efficiency guarantees.

The complement of a Turán graph, denoted , is created with the same procedure as a Turán graph, except that in Step 2, edges are created among all nodes within the same set. An example of a Turán graph and its complement is found in Figure 4. Thus we can also state, similar to (31), that

(32)

In words is a graph with the fewest edges that has independence number . It should also be clear that

(33)

Lastly, we define the graph

(34)

which is the complement -node Turán graph with the lowest independence number among all graphs with the number of edges less than or equal to 555Searching over the space of complement Turán graphs can be done simply. Adapting part of Turán’s theorem, we see that . Therefore, one can start by setting to this minimum value, and then determining whether , see Lemma 2 below. If the statement is not true, can be incremented until it is..

V-B Result

The main result of this section regarding efficient graph structures is stated below and later proved in Section V-D.

Theorem 2.

Consider two nonnegative integers and such that . If , then . If , then is the full clique on nodes, minus the edge .

An illustration of as a function of the number of edges is given in Figure 5. One item to note is that there may be extra edges not used in our design of . For instance, in Figure 5, the efficiency is the same when . This implies that and can be the same graph, and for any value of in between. Hence, there are “dead zones” seen in the graph in Figure 5.

V-C The Sibling Property

Here we present a graph property, along with a corollary to Theorem 1. These results are key to the proof for Theorem 2.

Definition 1.

Let . Then has the Sibling Property if for some maximum independent set , there exist and such that (see Figure 2).

Lemma 1.

If a graph lacks the Sibling Property, then

  1. There is a unique maximum independent set .

  2. The set must include nodes and .

  3. The induced subgraph created by removing the set from must be such that .

  4. Every node outside the set must have outgoing edges to at least 2 nodes in 666In the literature, such a is called a perfect independent set. We present a proof here that suits the needs of this work, but it is also shown in [31] that every unique maximum independent set is perfect..

Proof.

We prove each Propoerty separately:

Property 1: Suppose there are 2 maximum independent sets and . Let and . By definition, all nodes in either set cannot have any outgoing edges. This implies that : in other words, and are independent from each other, and neither nor are maximum, a contradiction.

Property 2: First, suppose that is not included in . Then there exists an edge for some . By defintion, this means has the Sibling Property, a contradiction. Now suppose that is not in . Since does not have the Sibling Property, then by definition for all . This means that another maximum independent set is , which is a contradiction to statement 1.

Property 3: If this were not true, then would not be a unique maximum independent set.

Property 4: Let . By definition, cannot have any incoming edges from and if there are no edges between and , then must be part of , a contradiction. Therefore, we consider the case where for some , but no outgoing edges from to exist. This means that another maximum independent set is , and is not unique. By Property 1, this is a contradiction. ∎

Corollary 1.

For a graph with the Sibling Property,

(35)

with equality when

Proof.

We provide an example which gives us the upper bound using a weighted set cover problem. Let be a maximum independent set of and let be defined as in Definition 1. Then , where if or , and otherwise. The action sets are

(36)

Each agent in is equally incentivized to choose either option, since none of them can access to the choice of the others. Therefore, the worst case in the greedy algorithm is for every agent in to choose , implying . Each agent makes the other choice in the optimal, so . Therefore is an upper bound on .

In the case where , (14) shows that , which implies by Theorem 1 that . ∎

V-D Proof for Theorem 2

In this section we present the proof for Theorem 2, beginning with two lemmas. The first characterizes the number of edges in a complement Turán graph, and the second characterizes the fewest number of edges in a graph without the Sibling Property.

Lemma 2.

Let . Then the number of edges in is

(37)
Proof.

This can be shown by construction. Recall that is a set of disconnected cliques, of as close to equal size as possible. Making purely equal-sized cliques would mean that each clique is of size , with nodes left over. If each of these remaining nodes is added to a different clique, then consists of cliques of size and the rest of size . Since a clique of size contains edges, we can see that the first line in (2) is the number of edges in all the larger cliques, and the second line is the number of edges in all the smaller cliques. ∎

Lemma 3.

Let have nodes, be without the Sibling Property, and such that . Then the number of edges in satisfies

(38)

Furthermore, for any values of and , such a can be constructed so that (38) is at equality.

Proof.

In this proof, we construct a such that (38) is at equality, then reason that no other graph with nodes, independence number , and without the Sibling Property can have fewer edges. The proof also leverages the properties for a graph without the Sibling Property, found in Lemma 1. Let be the unique maximum independent set (Property 1) in , and let be the induced subgraph of created by removing the nodes in . Then we know that (Property 3). From (32) and Lemma 2, the minimum number of edges that such a can have is . Finally, every node in must have outgoing edges to at least two nodes in , therefore, must have an additional edges. Thus the minimum number of edges to construct is given in (38). ∎

We now commence with the proof for Theorem 2. The case trivially holds, so we assume that . Recall that the graph is a set of disconnected cliques, which implies that any maximum independent set has one node from each clique, and that no maximum independent set is unique. Therefore, by Lemma 1, Property 1, has the Sibling Property. In light of (33), It follows from Corollary 1 that . The statement in (32) also shows that no other graph with edges can have a smaller independence number. Combining this with Corollary 1 implies that no other graph with the Sibling Property (and same number of nodes and edges) can have a higher efficiency.

It remains to confirm that any graph without the Sibling Property cannot have a higher efficiency than , given nodes and edges – with the exception when . Let be a graph with nodes, edges, without the Sibling Property, and with independence number . We assume that has the fewest number of edges (as dictated by Lemma 3), with the highest possible efficiency . By Corollary 1 and (33), this is the same efficiency as , thus we seek to characterize when the number of edges in is greater than or equal to that of . In other words, only if

(39)

In order to show when this condition holds, we divide the remainder of the proof into four cases, the union of which covers all possible values of and . In the first case, when , we prove (39) is false for all values of (which corresponds to the case in the theorem statement when ). In the other cases, we show that (39) is true, justifying that .

Case 1: . Here, is a clique, and has edges. The graph is such that

(40)

which is one less edge than . Thus, for any value of , there exists a where (39) is false. Such a is shown for in Figure 1(a), and a trivial extension to the proof in Appendix -A shows that for any value of . Since is created with the fewest number of edges, it follows that (39) is false only when . By the construction in the proof of Lemma 3, such a is the full clique minus the edge .

Case 2: . In this case, is the graph with no edges and efficiency . Any graph with 1 or 0 edges must have this same efficiency, so (39) is true in this case.

In the remaining cases, we assume that , which also implies that . In both cases, we show that (39) holds.

Case 3: . This condition implies the following:

Leveraging the above statements, and become:

(41)
(42)

Using these expressions to evaluate (39) yields

(43)

We can now use the identity to change the requirement in (43) to

(44)

Since , a sufficient statement for (43) to hold can be found by replacing with , which can be simplified to

(45)

The expression on the left side of the inequality is nondecreasing in . Since , if the inequality is true for , then it is true for all relevant values of . If we let in (