α^α-Rank: Scalable Multi-agent Evaluation through Evolution

09/25/2019
by   Yaodong Yang, et al.
HUAWEI Technologies Co., Ltd.
0

Although challenging, strategy profile evaluation in large connected learner networks is crucial for enabling the next wave of machine learning applications. Recently, α-Rank, an evolutionary algorithm, has been proposed as a solution for ranking joint policy profiles in multi-agent systems. α-Rank claimed scalability through a polynomial time implementation with respect to the total number of pure strategy profiles. In this paper, we formally prove that such a claim is not grounded. In fact, we show that α-Rank exhibits an exponential complexity in number of agents, hindering its application beyond a small finite number of joint profiles. Realizing such a limitation, we contribute by proposing a scalable evaluation protocol that we title α^α-Rank. Our method combines evolutionary dynamics with stochastic optimization and double oracles for truly scalable ranking with linear (in number of agents) time and memory complexities. Our contributions allow us, for the first time, to conduct large-scale evaluation experiments of multi-agent systems, where we show successful results on large joint strategy profiles with sizes in the order of O(2^25) (i.e., ≈33 million strategies) -- a setting not evaluable using current techniques.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

09/25/2019

α^α-Rank: Practically Scaling α-Rank through Stochastic Optimisation

Recently, α-Rank, a graph-based algorithm, has been proposed as a soluti...
03/04/2019

α-Rank: Multi-Agent Evaluation by Evolution

We introduce α-Rank, a principled evolutionary dynamics methodology, for...
09/14/2020

Persistent And Scalable JADE: A Cloud based InMemory Multi-agent Framework

Multi-agent systems are often limited in terms of persistenceand scalabi...
04/02/2018

A Tool for Analysis and Synthesis of Heterogeneous Multi-agent Systems under Rank-deficient Coupling

The behavior of heterogeneous multi-agent systems is studied when the co...
05/25/2020

Non-cooperative Multi-agent Systems with Exploring Agents

Multi-agent learning is a challenging problem in machine learning that h...
06/17/2020

Policy Evaluation and Seeking for Multi-Agent Reinforcement Learning via Best Response

This paper introduces two metrics (cycle-based and memory-based metrics)...
02/13/2017

Certificates for triangular equivalence and rank profiles

In this paper, we give novel certificates for triangular equivalence and...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Scalable policy evaluation and learning have been long-standing challenges in multi-agent reinforcement learning (MARL) with two difficulties obstructing progress. First, joint-strategy spaces exponentially explode when a large number of strategic decision-makers is considered, and second, the underlying game dynamics may exhibit cyclic behavior (e.g. the game of Rock-Paper-Scissor) rendering an appropriate evaluation criteria non-trivial.

Focusing on the second challenge, much work in multi-agent systems followed a game-theoretic treatment proposing fixed-points, e.g., Nash (Nash et al., 1950)

equilibrium, as potentially valid evaluation metrics. Though appealing, such measures are normative only when prescribing behaviors of perfectly rational agents – an assumption rarely met in reality

Grau-Moya et al. (2018); Wen et al. (2019). In fact, many game dynamics have been proven not converge to any fixed-point equilibria (Hart & Mas-Colell, 2003; Viossat, 2007), but rather to limit cycles (Palaiopanos et al., 2017; Bowling & Veloso, 2001). Apart from these aforementioned inconsistencies, solving for a Nash equilibrium even for “simple” settings, e.g. two-player games is known to be PPAD-complete (Chen & Deng, 2005) – a demanding complexity class when it comes to computational requirements.

To address some of the above limitations, Omidshafiei et al. (2019) recently proposed -Rank as a graph-based game-theoretic solution to multi-agent evaluation. -Rank adopts Markov Conley Chains to highlight the presence of cycles in game dynamics, and attempts to compute stationary distributions as a mean for strategy profile ranking. Though successful in small-scale applications, -Rank severely suffers in scalability contrary to polynomial time claims made in Omidshafiei et al. (2019). In fact, we show that -Rank exhibits exponential time and memory complexities shedding light on the small-scale empirical study conducted in Omidshafiei et al. (2019), whereby the largest reported game included only four agents with four available strategies each.

In this work, we put forward -Rank as a scalable alternative for multi-agent evaluation with linear time and memory demands

. Our method combines numerical optimization with evolutionary game theory for a scalable solver capable of handling large joint spaces with millions of strategy profiles. To handle even larger profiles, e.g., tens to hundreds of millions, we further introduce an oracle

(McMahan et al., 2003) mechanism transforming joint evaluation into a sequence of incremental sub-games with varying sizes. Given our algorithmic advancements, we justify our claims in a large-scale empirical study involving systems with possible strategy profiles. We first demonstrate the computation advantages of

-Rank on varying size stochastic matrices against other implementations in Numpy, PyTorch, and OpenSpiel

(Lanctot et al., 2019). With these successes, we then consider experiments unsolvable by current techniques. Precisely, we evaluate multi-agent systems in self-driving and Ising model scenarios each exhibiting a prohibitively-large strategy space (i.e., order of thousands for the former, and tens of millions for the latter). Here, we again show that -Rank is capable of recovering correct strategy ranking in such complex domains.

2 -Rank & Its Limitations

Figure 1: Example of population based evaluation on learners (star, triangle, circle) each with strategies (denoted by the colors) and copies. a) Each population obtains a fitness value depending on the strategies chosen, b) mutation strategy (red star), and c) population either selecting original strategy, or adopting the novel strategy.

In -Rank, strategy profiles of agents are evaluated through an evolutionary process of mutation and selection. Initially, agent populations are constructed by creating multiple copies of each learner assuming that all agents (in one population) execute the same unified policy. With this, -Rank then simulates a multi-agent game played by randomly sampled learners from each population. Upon game termination, each participating agent receives a payoff to be used in policy mutation and selection after its return to the population. Here, the agent is faced with a probabilistic choice between switching to the mutation policy, continuing to follow its current policy, or randomly selecting a novel policy (other than the previous two) from the pool. This process repeats with the goal of determining an evolutionary strong profile that spreads across the population of agents. Each of the above three phases is demonstrated in Fig. 1 on a simple example of three agents – depicted by different symbols – each equipped with three strategies – depicted by the colors.

2.1 Mathematical Formalisation of -Rank

We next formalize the process posed by -Rank, which will lead to its limitations, and also pave the way for our own proposed solution. We consider agents with each agent having access to a set of strategies of size . At round of the evaluation process, we denote the strategy profile for agent by , with representing the allowed policy of the learner. represents the set of states and is the set of actions for agent . With this, we define a joint strategy profile for all participating agents as policies belonging to the joint strategy pool, : with and .

To evaluate performance, we assume each agent is additionally equipped with a payoff (reward) function . Crucially, the domain of is the pool of joint strategies so to accommodate the effect of other learners on the player performance further complicating the evaluation process. Finally, given a joint profile , we define the corresponding joint payoff to be the collection of all individual payoff functions, i.e., .

After attaining rewards from the environment, each agent returns to its population and faces a choice between switching to a mutation policy, exploring a novel policy, or sticking to the current one Such a choice is probabilistic and defined proportional to rewards. Precisely, agent adopts

with denoting an exploration parameter222Please note that in the original paper

is heuristically set to a small positive constant to ensure at maximum two varying policies per-each population. Theoretical justification can be found in  

Fudenberg & Imhof (2006), representing policies followed by other agents at round , and an intensity ranking parameter. As noted in Omidshafiei et al. (2019)

, one can relate the above switching process to a random walk on a Markov chain with states defined as elements in

and transition probabilities through payoff functions. In particular, each entry in the transition probability matrix

refers to the probability of one agent switching from one policy in a relation to attained payoffs. Precisely, consider any two joint strategy profiles and that differ in only one individual strategy for the agent, i.e., there exists a unique agent such that and with , we set with defining the probability that one copy of agent with strategy invades the population with all other agents (in that population) playing . Following Pinsky & Karlin (2010) for , such a probability is formalized as:

(1)

with being the size of the population. So far, we presented relevant derivations for the entry of the state transition matrix when exactly the agent differs in exactly one strategy. Having one policy change, however, only represents a subset of allowed variations, where two more cases need to be considered. Now we restrict out attention to variations in joint policies involving more than two individual strategies, i.e., . Here, we set333This assumption significantly reduces the analysis complexity as detailed in Fudenberg & Imhof (2006). . Consequently, the remaining event of self-transitions can be thus written as . Summarising the above three cases, we can then write the ’s entry of the Markov chain’s transition matrix as:

(2)

The goal in

-Rank is to establish an ordering in policy profiles dependent on evolutionary stability of each joint strategy. In other words, higher ranked strategies are these that are prevalent in populations with higher average times. Formally, such a notion can be easily derived as the limiting vector

of our Markov chain when evolving from an initial distribution

. Knowing that the limiting vector is a stationary distribution, one can calculate strategy rankings as the solution to the following eigenvector problem:

(3)

2.2 Limitations of -Rank

Though the work in Omidshafiei et al. (2019) seeks to determine a solution to the above problem, it is worth mentioning that -Rank suffers from one major drawback–the scalability–that we remedy in this paper. We note that the solution methodology in -Rank is in fact unscalable to settings involving more than a hand-full of agents. Particularly, authors claim polynomial complexities of their solution to the problem in Eqn. 3. Though polynomial, such a complexity, however, is polynomial in an exponential search space, i.e., the space of joint strategy profiles. As such, the polynomial complexity claim is not grounded, and need to be investigated. In short, -Rank exhibits an exponential (in terms of the number of agents) complexity for determining a ranking, thus rendering it inapplicable to settings involving more than a small amount of agents.

Before we look into the scalability issue of -Rank, it is worth mentioning that our proposed method can still be bounded by another major limitation inherent to -Rank in that it prohibits behavioural improvement through strategy adaptation. A natural, also simple, extension to -Rank that one can easily think of is to allow policy refinements by introducing Policy-Space Response Oracles (PSRO) (Lanctot et al., 2017) to induce PSRO--Rank. The idea of PSRO--Rank is that after each round of -Rank evaluation, one can augment the strategy space for each agent by finding the best response, through reinforcement learning algorithms, to the other agents according to the top-rank strategy profile. However, as we show in the Appendix C.3, such idea only shows minor advantages compared to the other PSRO baselines including PSRO-Replicator Dynamics (Lanctot et al., 2017) and PSRO-Nash (Balduzzi et al., 2019). Consequently, we believe the major issue of -Rank is still the scalability. Since our proposed -Rank is beneficial to all the PSRO extensions that are built on -Rank, we leave the experimental exploration on PSRO--Rank for future work.

In what comes next, we first discuss traditional approaches that could help solve the Eqn. 3; soon we realize an off-the-shelve solution is unavailable. Hence, we commence to propose an efficient evaluation algorithm, i.e., -Rank, based on stochastic optimization with suitable complexities and rigorous theoretical guarantees. At the end, we propose a search heuristic to further scale up our method by introducing oracles and we name it by -Oracle.

3 Scalable Evaluation for Multi-Agent Systems

The problem of computing stationary distributions is a long-standing classical problem from linear algebra. Various techniques including

power method, PageRank, eigenvalue decomposition, and mirror descent

can be utilized for solving the problem in Eqn. 3. As we demonstrate next, any such implementation scales exponentially in the number of learners, as we summarize in Table 1.

3.1 Traditional Approaches

Power Method.

One of the most common approaches to computing the solution in Eqn. 3 is the power method. Power method computes the stationary vector by constructing a sequence from a non-zero initial vector by applying . Though viable, we first note that the power method exhibits an exponential memory complexity in terms of the number of agents. To formally derive the bound, define to represent the total number of joint strategy profiles (i.e., ), and the total number of transitions between the states of the Markov chain in Section 2. By the construction, one can easily see that as each row and column in contains non-zero elements. Hence, memory complexity of such an implementation are in the order of

Analyzing its time complexity, on the other hand, requires a careful consideration that links convergence rates with the resulting graph topology of the Markov chain. Precisely, the convergence rate of the power method is dictated by the second-smallest eigenvalue of the normalized Laplacian, , of the graph, , associated to the Markov chain in Section 2, i.e., , with being the second-smallest eigenvalue of , and . Hence, as long as the second-smallest eigenvalue of the normalized Laplacian is well-behaved, one would expect suitable time complexity guarantees. To this end, we prove the following lemma

Lemma: [Second-Smallest Eigenvalue] Consider the Markov chain defined in Section 2 with states in and transition probability matrix . The second-smallest eigenvalue of the normalized Laplacian of the graph associated with the Markov chain is given by:

Due to space constraints, the full proof of the above lemma is refrained to Appendix A.1. The importance of Lemma A.1 is that the resultant time complexity of the power method is also exponential of the form . The above results is true as . Hence, .

PageRank.

Inspired by ranking web-pages on the internet, one can consider PageRank (Page et al., 1999) for computing the solution to the eigenvalue problem in Eqn. 3. Applied to our setting, we first realize that the memory is analogous to the power method that is , and the time complexity are in the order of .

Eigenvalue Decomposition.

Apart from the above, we can also consider the problem as a standard eigenvalue decomposition task (also what the original -Rank is implemented according to Lanctot et al. (2019)) and adopt the method in Coppersmith & Winograd (1990) to compute the stationary distribution. Unfortunately, state-of-the-art techniques for eigenvalue decomposition also require exponential memory and exhibit a time complexity of the form with . Clearly, these bounds restrict -Rank to small number of agents .

Mirror Descent.

The ordered subsets mirror descent (Ben-Tal et al., 2001) requires at each iteration a projection on standard dimensional simplex: As stated in the paper, the computing of this projection requires time. In our setting, is the total number of joint strategy profiles. Hence, the projection step is exponential in the number of agents . This makes mirror descent inapplicable for -Rank when is large.

Method Time Memory
Power Method
PageRank
Eig. Decomp.
OSMD
Our Method
Table 1: Time and space complexity comparison (: precision parameter).

3.2 Our Proposal: An Optimization-Based Solution

Rather than seeking an exact solution to the problem in Eqn. 3, one can consider approximate solvers by defining a constraint optimization objective:

(4)

The constrained objective in Eqn. 4 simply seeks a vector minimizing the distance between , itself, and (i.e., attempting to solve ) while ensuring that lies on an -dimensional simplex (i.e., , and ). Due to time and memory complexities required for computing exact solutions, we focus on determining an approximate vector defined to be the solution to the following relaxed problem of Eqn. 4:

(5)

The optimization problem in Eqn. 5 can be solved using a barrier-like technique that we detail below. Before that, it is instructive to clarify the connection between the original and the relaxed problems

Proposition: [Connections to Markov Chain] Let be a solution to the relaxed optimization problem in Eqn. 5. Then, is the stationary distribution to the Markov chain in Section 2.

Importantly, the above proposition allows us to focus on solving the problem in Eqn. 5 which only exhibits inequality constraints. Problems of this nature can be solved by considering a barrier function leading to an unconstrained finite sum minimization problem. To do so, denoting to be the row of , we can write Introducing logarithmic barrier-functions, with being a penalty parameter, we arrive at

(6)

Eqn. 6 is a standard finite minimization problem that can be solved using any off-the-shelve stochastic optimization algorithm, e.g., stochastic gradients, ADAM (Kingma & Ba, 2014) among others. A stochastic gradient execution involves sampling a strategy profile at iteration , and then executing a descent step: , with being a sub-sampled gradient of Eqn. 6, and being a scheduled penalty parameter with for some ,

(7)

See Phase I in Algorithm 1 for the pseudo-code. We can further derive a convergence theorem of:

Theorem: [Convergence of Barrier Method] Let be the output of a gradient algorithm descending in the objective in Eqn. 6, after iterations, then

where expectation is taken w.r.t. all randomness of a stochastic gradient implementation, and is a decay-rate for , i.e., .

1:Inputs: Evaluation Parameters: Initial a vector , total number of iterations , decaying learning rate , penalty parameter , decay rate , total number of joint strategy profiles , and a constraint relaxation term . Oracle Parameters: initialize a subset of strategy pools for all agents by randomly sampling from
2:Set outer iteration count
3:while stopping criteria do:
4:       Phase I: Scalable Policy Evaluation (Section 3.2):
5:          for do:
6:                Uniformly sample one strategy profile
7:                Construct as the row of with entries defined Eqn. 2
8:                Update solution as in Eqn. 7 & set
9:       Extract from approximate stationary distribution
10:       Phase II (if turned on): Scalable Policy Evaluation with Oracle (Section 3.3):
11:          for each agent do:
12:                Compute the best-response strategy by solving Eqn. 8.
13:                Update strategy pools for each agent as
14:       Set
15:Return: Best performing strategy profile across all agents.
Algorithm 1 -Rank/Oracle: Scalable Multi-Agent Evaluation

The proof of the above theorem (see the full proof in Appendix A.2) is interesting by itself, a more important aspect is the memory and time complexity implications posed by our algorithm. Theorem 2 implies that after iterations with being a precision parameter, our algorithm outputs a vector such that

Moreover, one can easily see444More details on these derivations can be found in the Appendix A.3 that after steps, the overall time and memory complexities of our update rules are given by and , respectively. Using eventually leads to a memory complexity of and for time (see the comparison in Table. 1). Hence, our algorithm is able to achieve an exponential reduction, in terms of number of agents, in both memory and time complexities.

3.3 Heuristic Search by Introducing Oracles

So far, we have presented scalable multi-agent evaluations through stochastic optimization. We can further boost scalability (to tens of millions of joint profiles) of our method by introducing an oracle mechanism. The heuristic of oracles was first introduced in solving large-scale zero-sum matrix games (McMahan et al., 2003). The idea is to first create a restricted sub-game in which all players are only allowed to play a restricted number of strategies, which are then expanded by adding incorporating each of the players’ best-responses to opponents; the sub-game will be replayed with agents’ augmented strategy pools before a new round of best responses is found. The worse-case scenario of introducing oracles would be to solve the original evaluation problem in full size. The best response is assumed to be given by an oracle that can be simply implemented by a grid search. Precisely, given the top-rank profile at iteration , the goal for agent is to select the optimal from the pre-defined strategy pool to maximize the reward

(8)

with denoting the state, , denoting the actions from agent and the opponents, respectively. The heuristic of solving the full game from restricted sub-games is crucial especially when it is prohibitively expensive to list all joint-strategy profiles, e.g., in scenarios involving tens-of-millions of joint profiles.

For a complete exposition, we summarize the pseudo-code in Algorithm 1. In the first phase, vanilla -Rank is executed (lines 4-9), while in the second (lines 11 - 13), -Rank with Oracle (if turned on) is computed. To avoid any confusion, we refer to the latter as -Oracle. Note that even though in the two-player zero-sum games, the oracle algorithm (McMahan et al., 2003) is guaranteed to converge to the minimax equilibrium. Providing valid convergence guarantees for -Oracle is an interesting direction for future work. In this paper, we rather demonstrate the effectiveness of such an approach in a large-scale empirical study as shown in Section 4.

4 Experiments

Figure 2: Ranking intensity sweep on (a) Battle of Sexes (b) Biased RPS (c) Prisoner’s Dilemma.
Figure 3: Comparisons of time and memory complexities on varying sizes of random matrices.

In this section, we evaluate the scalability properties of -Rank555All of the experiments are run by a single machine with GB memory, and -core Intel I9-9900X CPU.. Precisely, we demonstrate that our method is capable of successfully recovering optimal policies in self-driving car simulations and in the Ising model where strategy spaces are in the order of up to tens-of-millions of possible strategies. We note that these sizes are well beyond the capabilities of state-of-the-art methods, e.g., -Rank (Omidshafiei et al., 2019) that considers at maximum four agents with four strategies, or AlphaStar which handles about strategies as detailed in Vinyals et al. (2019).

Sparsity Data Structures. During the implementation phase, we realised that the transition probability, , of the Markov chain induces a sparsity pattern (each row and column in contains non-zero elements, check Section 3.2) that if exploited can lead to significant speed-ups. To fully leverage such sparsity, we tailored a novel data structure for sparse storage and computations needed by Algorithm 1. More details can be found in Appendix B.1.

Correctness of Ranking Results. Before conducting large-scale sophisticated experiments, it is instructive to validate the correctness of our results on the simple cases especially those reported by Omidshafiei et al. (2019). We therefore test on three normal-form games. Due to space constraints, we refrain the full description of these tasks to Appendix B.2. Fig. 2 shows that, in fact, results generated by -Rank, the Phase I of Algorithm 1, are consistent with -Rank’s results.

Complexity Results on Random Matrices. We measured the time and memory needed by our method for computing the stationary distribution with varying sizes of simulated random matrices. Baselines includes eigenvalue decomposition from Numpy, optimization tools in PyTorch, and -Rank from OpenSpiel (Lanctot et al., 2019). For our algorithm we terminated execution with gradient norms being below a predefined threshold of . According to Fig. 3, -Rank can achieve three orders of magnitude reduction compared to eigenvalue decomposition in terms of time. Most importantly, the performance gap keeps developing with the increasing matrix size.

Autonomous Driving on Highway: High-way (Leurent, 2018) provides an environment for simulating self-driving scenarios with social vehicles designed to mimic real-world traffic flow as strategy pools. We conducted a ranking experiment involving agents each with strategies, i.e. a strategy space in the order of ( possible strategy profiles). Agent strategies varied between “rational” and “dangerous” drivers, which we encoded using different reward functions during training (complete details of defining reward functions can be found in Appendix C.2). Under this setting, we know, upfront, that optimal profile corresponds to all agents is five rational drivers. Cars trained using value-iteration and rewards averaged from 200 test trails were reported. Due to the size of the strategy space, we considered both -Rank and -Oracle. We set -Oracle to run iterations of gradient updates in solving the top-rank strategy profile (Phase I in Algorithm 1). Results depicted in Fig. 4(a) clearly demonstrate that both our implementations are capable of recovering the correct highest ranking strategy profile. We also note that though such sizes are feasible using -Rank and the power-method, our results achieve 4 orders of magnitude reduction in total number of iterations.

Figure 4: Large-scale multi-agent evaluations. (a) Convergence of the optimal joint-strategy profile of self-driving simulation on the highway. (b) Status of the Ising-model equilibrium measured by . (c) Convergence of the top-rank profile from -Oracle under .

Ising Model Experiment: The Ising model (Ising, 1925) is the model for describing ferromagnetism in statistical mechanics. It assumes a system of magnetic spins, where each spin is either an up-spin, , or down-spin, . The system energy is defined by with and being constant coefficients. The probability of one set of spin configuration is where is the environmental temperature. Finding the equilibrium of the system is notoriously hard because it is needed to enumerate all possible configurations in computing .Traditional approaches include Markov Chain Monte Carlo (MCMC). An interesting phenomenon is the phase change, i.e., the spins will reach an equilibrium in the low temperatures, with the increasing , such equilibrium will suddenly break and the system becomes chaotic.

Here we try to observe the phase change through multi-agent evaluation methods. We assume each spins as an agent, and the reward to be , and set to build the link between Eqn. 1 and . We consider the top-rank strategy profile from -Oracle as the system equilibrium and compare it against the ground truth from MCMC. We consider a five-by-five 2D model which induces a prohibitively-large strategy space of size (tens of millions) to which the existing baselines, including -Rank on the single machine, are inapplicable. Fig. 4(b) illustrates that our method identifies the same phase change as what MCMC suggests. We show an example of how -Oracle’s top-ranked profile finds the system equilibrium in Fig. 4(c) at .

5 Conclusion

In this paper, we demonstrated that the approach in Omidshafiei et al. (2019) exhibits exponential time and memory complexities. We then proposed -Rank as a scalable solution for multi-agent evaluation with linear time and memory demands. In a set of experiments, we demonstrated that our method is truly scalable capable of handling large strategy spaces.

There are a lot of interesting avenues for future research. First, we plan to theoretically analyze convergence properties of the resulting oracle algorithm, and further introduce policy learning through oracles. Second, we plan take our method to the real-world by conducting multi-robot experiments.

References

Appendix

Appendix A Comprehensive Proofs

a.1 Lemma of Second-Smallest Eigenvalue

Lemma: [Second-Smallest Eigenvalue] Consider the Markov chain defined in Section 2 with states in and transition probability matrix . The second-smallest eigenvalue of the normalized Laplacian of the graph associated with the Markov chain is given by:

Proof: For simplicity we drop round index in the below derivation. Notice, the underlying graph for the constructed Markov Chain can be represented as a Cartesian product of complete graphs 666Here, denotes a complete graph with nodes.:

(9)

Indeed, two vertices are connected by the edge if and if only these joint strategy profiles differ in at most one individual strategy, i.e .Hence, the spectral properties of can be described in terms of spectral properties of as follows (Barik et al., 2015):

where is the eigenvalue of the unnormalized Laplacian of the complete graph and is the corresponding eigenvector777In other words, for all and .. The spectrum of unnormalized Laplacian of the complete graph is given by and the only eigenvector corresponding to zero eigenvalue is . Therefore, the minimum non-zero eigenvalue of unnormalized Laplacian of is given by . Finally, due to the fact that is a regular graph (with degree of each node is equal to ), the smallest non-zero eigenvalue of the normalized Laplacian of is given by .

Giving this result, the overall time complexity of Power Method is bounded by . Indeed, notice that , hence, . As for the memory complexity, Power Method requires has the same requirements as PageRank algorithm. 888Due to necessity to store matrix These results imply that Power Method scales exponentially with number of agents , and therefore, inapplicable when is large.

a.2 Theorem of Convergence of Barrier Method

1:Input: .
2:Output:
3:Set , .
4:for t = 0 to do:
5:     Sample and compute:
6:     Update .
7:     Update .
8:end for
9:Set
Algorithm 2

Log-Barrier Stochastic Gradient Descent

Theorem: [Convergence of Barrier Method] Let be the output of a gradient algorithm descending in the objective in Eqn. 6, after iterations, then

where expectation is taken w.r.t. all randomness of a stochastic gradient implementation, and is a decay-rate for , i.e., . See the Algorithm 2.

Proof: Let and be the solutions of Eqn. (5) and Eqn. (6) respectively. Convergence guarantees for logarithmic barrier method (Nocedal & Wright, 2006) with penalty parameter and barrier parameter gives:

(10)

and using in (10) gives:

(11)

Applying the convergence guarantees of stochastic gradient descent method to convex function gives:

Using the definition of function :