1 Introduction
Multiagent decision coordination is prevalent in many realworld applications, such as traffic light control [32], warehouse commissioning [10] and wind farm control [31]. Often, such settings can be formulated as coordination problems in which agents have to cooperate in order to optimize a shared team reward [5].
Handling multiagent settings is challenging, as the size of the joint action space scales exponentially with the number of agents in the system. Therefore, an approach that directly considers all agents’ actions jointly is computationally intractable. This has made such coordination problems the central focus in the planning literature [20, 13, 14, 15]. Fortunately, in realworld settings agents often only directly affect a limited set of neighboring agents. This means that the global reward received by all agents can be decomposed into local components that only depend on small subsets of agents. Exploiting such loose couplings is key in order to keep multiagent decision problems tractable.
In this work, we consider learning to coordinate in multiagent systems. For example, consider a wind farm comprised of a set of wind turbines. The objective is to maximize the farm’s total productivity. When upstream turbines directly face the incoming wind stream, energy is extracted from wind. This reduces the productivity of downstream turbines, potentially damaging the overall power production. However, turbines have the option to rotate, in order to deflect the turbulent flow away from turbines downwind [29]. Due to the complex nature of the aerodynamic interactions between the turbines, constructing a model of the environment and deriving a control policy using planning techniques is extremely challenging [24]. Instead, a joint control policy among the turbines can be learned to effectively maximize the productivity of the wind farm. The system is loosely coupled, as redirection only directly affects adjacent turbines.
While most of the literature only considers approximate reinforcement learning methods, it has recently been shown
[4] that one can achieve theoretical bounds on the regret (i.e., how much reward is lost due to learning). In this work, we use the multiagent multiarmed bandit problem, and improve upon the state of the art. Specifically, we propose the MultiAgent Thompson sampling (MATS) algorithm, which exploits looselycoupled interactions in multiagent systems. The loose couplings are formalized as a coordination graph, which defines for each pair of agents whether their actions depend on each other. We assume the graph structure is known beforehand, which is the case in many realworld applications with sparse agent interactions (e.g., wind farm control).Our method leverages the explorationexploitation mechanism of Thompson sampling (TS). TS has been shown to be highly competitive to other popular methods, e.g., UCB [8]. Recently, theoretical guarantees on its regret have been established [1], which renders the method increasingly popular in the literature. Additionally, due to its Bayesian nature, problemspecific priors can be specified. We argue that this has a strong relevance in many practical fields, such as advertisement selection [8] and influenza mitigation [22].
We provide a finitetime Bayesian regret analysis and prove that the upper regret bound of MATS is loworder polynomial in the number of actions of a single agent for sparse coordination graphs (Corollary 1). This is a significant improvement over the exponential bound of classic TS, which is obtained when the coordination graph is ignored [1]. We show that MATS improves upon the stateoftheart in various synthetic settings. Moreover, MATS achieves high performance on a realistic wind farm control task, in which multiple wind turbines have to be jointly aligned to maximize the power production.
We define the problem setting, the multiarmed multiagent bandit, in Section 2 and describe our method, MATS, in Section 3. We provide a theoretical and empirical analysis of MATS in Sections 4 and 5, respectively. We discuss the results in Section 6. Finally, we compare with related work in Section 7 and conclude that MATS achieves stateoftheart performance, both empirically and theoretically, in Section 8.
2 Problem Statement
In this work, we adopt the multiagent multiarmed bandit (MAMAB) setting [4]. A MAMAB is similar to the multiarmed bandit formalism [28], but considers multiple agents factored into groups. When the agents have pulled a joint arm, each group receives a reward. The goal shared by all agents is to maximize the total sum of rewards. Formally,
Definition 1.
A multiagent multiarmed bandit (MAMAB) is a tuple where

is the set of enumerated agents. This set is factorized into , possibly overlapping, subsets of agents .

is the set of joint actions, or joint arms, which is the Cartesian product of the sets of actions for each of the agents in . We denote as the set of local joint actions, or local arms, for the group .

is a stochastic function providing a global reward when a joint arm, , is pulled. The global reward function is decomposed into noisy, observable and independent local reward functions, i.e., . A local function only depends on the local arm of the subset of agents in .
We denote the mean reward of a joint arm as . For simplicity, we refer to the agent by its index .
The dependencies between the local reward functions and the agents are described as a coordination graph [14].
Definition 2.
A coordination graph is a bipartite graph , whose nodes are agents and components of a factored reward function , and an edge exists if and only if agent influences component .
The dependencies in a MAMAB can be described by setting .
In this setting, the objective is to minimize the expected cumulative regret [2], which is the cost incurred when pulling a particular joint arm instead of the optimal one.
Definition 3.
The expected cumulative regret of pulling a sequence of joint arms until time step according to policy is
(1) 
with
(2) 
where is the optimal joint arm and is the joint arm pulled at time . For the sake of brevity, we will omit when the context is clear.
Cumulative regret can be minimized by using a policy that considers the full joint arm space, thereby ignoring loose couplings between agents. This leads to a combinatorial problem, as the joint arm space scales exponentially with the number of agents. Therefore, loose couplings need to be taken into account whenever possible.
3 MultiAgent Thompson Sampling
We propose the MultiAgent Thompson sampling (MATS) algorithm for decision making in looselycoupled multiagent multiarmed bandit problems. Consider a MAMAB with groups (Definition 1). The local means are treated as unknown. According to the Bayesian formalism, we exert our beliefs over the local means in the form of a prior, . At each time step , MATS draws a sample from the posterior for each group and local arm given the history, , consisting of local actions and rewards associated with past pulls:
(3) 
Note that during this step, MATS samples directly the posterior over the unknown local means, which implies that the sample and the unknown mean are independent and identically distributed at time step .
Thompson sampling (TS) chooses the arm with the highest sample, i.e.,
(4) 
However, in our case, the expected reward is decomposed into several local means. As conflicts between overlapping groups will arise, the optimal local arms for an agent in two groups may differ. Therefore, we must define the argmaxoperator to deal with the factored representation of a MAMAB, while still returning the full joint arm that maximizes the sum of samples, i.e.,
(5) 
To this end, we use variable elimination (VE), which computes the joint arm that maximizes the global reward without explicitly enumerating over the full joint arm space [14]. Specifically, VE consecutively eliminates an agent from the coordination graph, while computing its best response with respect to its neighbors. VE is guaranteed to return the optimal joint arm and has a combinatorial complexity in terms of the induced width of the graph, i.e., the amount of neighbors of an agent at the time of its elimination. However, as the method is typically applied to a looselycoupled coordination graph, the induced width is generally much smaller than the size of the full joint action space, which renders the maximization problem tractable [14, 15].
Finally, the joint arm that maximizes Equation 5, , is pulled and a reward will be obtained for each group. The method is formally described in Algorithm 1.
MATS belongs to the class of probability matching methods
[21].Definition 4.
Given history
, the probability distribution of the pulled arm
is equal to the probability distribution of the optimal arm . Formally,(6) 
4 Bayesian Regret Analysis
Many multiagent systems are composed of locally connected agents. When formalized as a MAMAB (Definition 1), our method is able to exploit these local structures during the decision process. We provide a regret bound for MATS that scales sublinearly with a factor , where is the number of local arms.
Consider a MAMAB with groups and the following assumption on the rewards:
Assumption 1.
The global rewards have a mean between 0 and 1, i.e.,
Assumption 2.
The local rewards shifted by their mean are subgaussian distributed, i.e., ,
Consider the event , which states that, until time step , the differences between the local sample means and true means are bounded by a timedependent threshold, i.e.,
(7) 
with
(8) 
where is a free parameter that will be chosen later. We denote the complement of the event by .
Lemma 1.
(Concentration inequality) The probability of exceeding the error bound on the local sample means is linearly bounded by . Specifically,
(9) 
Proof.
Using the union bound (U), we can bound the probability of observing event as
(10) 
The estimated mean is a weighted sum of random variables distributed according to a subgaussian with mean . Hence, Hoeffding’s inequality (H) is applicable [30].
(11) 
Therefore, the following concentration inequality on holds:
(12) 
∎
Lemma 2.
(Bayesian regret bound under ) Provided that the error bound on the local sample means is never exceeded until time , the Bayesian regret bound, when using the MATS policy , is of the order
(13) 
Proof.
Consider this upper bound on the sample means:
(14) 
Given history , the statistics and are known, rendering a deterministic function. Therefore, the probability matching property of MATS (Equation 6) can be applied as follows:
(15) 
Hence, using the towerrule (T), the regret can be bounded as
(16) 
Note that the expression is always negative under , i.e.,
(17) 
while is bounded by twice the threshold , i.e.,
(18) 
Thus, Equation LABEL:eq:bound_regret_t can be bounded as
(19) 
where is the indicator function. The terms in the summation are only nonzero at the time steps when the local action is pulled, i.e., when . Additionally, note that only at these time steps, the counter increases by exactly 1. Therefore, the following equality holds:
(20) 
The function is decreasing and integrable. Hence, using the right Riemann sum,
(21) 
Combining Equations 1921 leads to a bound
(22) 
We use the relationship
between the 1 and 2norm of a vector
, where is the number of elements in the vector, as follows:(23) 
Finally, note that the sum of all counts is equal to the total number of local pulls done by MATS until time , i.e.,
(24) 
Using the Equations LABEL:eq:bound_ct_sum_full24, the complete regret bound under is given by
(25) 
∎
Theorem 1.
Proof.
Corollary 1.
If for all agents , and if for all groups , then
(30) 
Proof.
. ∎
Corollary 1 tells us that the regret is sublinear in terms of time and loworder polynomial in terms of the largest action space of a single agent when the number of groups and agents per group are small. This reflects the main contribution of this work. When agents are loosely coupled, the effective joint arm space is significantly reduced, and MATS provides a mechanism that efficiently deals with such settings. This is a significant improvement over the established classic regret bounds of vanilla TS when the MAMAB is ‘flattened’ and the factored structure is neglected [26, 21]. The classic bounds scale exponentially with the number of agents, which is unfeasible in many multiagent environments.
5 Experiments
We evaluate the performance of MATS on the benchmark problems proposed in the paper that proposed MAUCE [4], which is the current stateoftheart algorithm for multiagent bandit problems, and one novel setting that falls outside the domain of the theoretical guarantees for both algorithms.
The first set of benchmarks comprise two synthetic settings (i.e., Bernoulli 0101Chain and Gem Mining) and a wind farm control task. We compare against a random policy (rnd), Sparse Cooperative QLearning (SCQL) and the stateoftheart algorithm, MAUCE. For SCQL and MAUCE, we use the same exploration parameters provided for those settings [4]. For MATS, we always use noninformative Jeffreys priors.
Additionally, we introduce a novel variant of the 0101Chain with Poissondistributed local rewards. A Poisson distribution is supergaussian, meaning that its tails tend slower towards zero than the tails of any Gaussian. Therefore, both the assumptions made in Theorem 1
and in the established regret bound of MAUCE are violated. Additionally, as the rewards are highly skewed, we expect that the use of symmetric exploration bounds in MAUCE will often lead to either over or underexploration of the local arms. We assess the performance of both methods on this benchmark.
5.1 Bernoulli 0101Chain
The Bernoulli 0101Chain consists of agents and local reward distributions. Each agent can choose between two actions: 0 and 1. In the coordination graph, agents and are connected to a local reward
. Thus, each pair of agents should locally coordinate in order to find the best joint arm. The local rewards are drawn from a Bernoulli distribution with a different success probability per group. These success probabilities are given in Table
1. The optimal joint action is an alternating sequence of zeros and ones, starting with 0.is odd.
To ensure that the assumptions made in the regret analyses of MAUCE and MATS hold, we divide the local rewards by the number of groups, such that the global rewards are between 0 and 1.
Cumulative normalized regret averaged over 100 runs for the (a) Bernoulli 0101Chain, (b) Gem Mining and (d) Poisson 0101Chain, and over 10 runs for the (c) Wind Farm. Both the mean (line) and standard deviation (shaded area) are plotted.
5.2 Gem Mining
In the Gem Mining problem, a mining company wants to excavate a set of mines for gems (i.e., local rewards). The goal is to maximize the total number of gems found over all mines. However, the company’s workers live in separate villages (i.e., agents), and only one van per village is available. Therefore, each village needs to decide to which mine its workers should be sent to (i.e., local action). Moreover, workers can only commute to nearby mines (i.e., coordination graph). Hence, a group can be constructed per mine, consisting of all agents that can travel toward the mine. An example of a coordination graph is given in Figure 2
The reward is drawn from a Bernoulli distribution, where the probability of finding a gem at a mine is with the number of workers at the mine and a base probability that is sampled uniformly random from the interval for each mine. When more workers are excavating a mine, the probability of finding a gem increases. Each village is populated by a number sampled uniformly random from . The coordination graph is generated by sampling for each village a number of mines in to which it should be connected. Then, each village is connected to the mines to . The last village is always connected to 4 mines.
5.3 Wind Farm Control Application
A wind farm consists of a group of wind turbines, instantiated to extract energy from wind. From the perspective of a single turbine, aligning with the incoming wind vector usually ensures the highest productivity. However, translating this control policy directly towards an entire wind farm may be suboptimal. As wind passes through the farm, downstream turbines observe a significantly lower wind speed. This is known as the wake effect, which is due to the turbulence generated behind operational turbines.
In recent work, the possibility of deflecting wake away from the farm through rotor misalignment is investigated [29]. While a misaligned turbine produces less energy on its own, the group’s total productivity is increased. Physically, the wake effect reduces over long distances, and thus, turbines tend to only influence their neighbors. We can use this domain knowledge to define agent groups, and organize them in a graph structure. Note that the graph structure depends on the incoming wind vector. Nevertheless, atmospheric conditions are typically discretized when analyzing operational regimes [17], thus, a graph structure can be made independently for each possible incoming discretized wind vector. We construct a graph structure for one possible wind vector.
We demonstrate our method on a virtual wind farm, consisting of 11 turbines, of which the layout is shown in Figure 3. We use the stateoftheart WISDEM FLORIS simulator [25]
. For MATS, we assume the local power productions are sampled from Gaussians with unknown mean and variance, which leads to a Student’s tdistribution on the mean when using a Jeffreys prior
[16]. The results for the wind farm control setting are shown in Figure 1(c).5.4 Poisson 0101Chain
We introduce a novel benchmark with Poisson distributed local rewards, for which the established regret bounds of MATS and MAUCE do not hold. Similar to the Bernoulli 0101Chain, agents need to coordinate their actions in order to obtain an alternating sequence of zeroes and ones. However, as the rewards are highly skewed and supergaussian, this setting is much more challenging. The means of the Poisson distributions are given in Table 2. We also divide the rewards by the number of groups, similar to the Bernoulli 0101Chain.
For MAUCE, an exploration parameter must be chosen. This exploration parameter denotes the range of the observed rewards. As a Poisson distribution has unbounded support, we rely on percentiles of the reward distribution. Specifically, as 95 of the rewards when pulling the optimal arm falls below , we choose as the exploration parameter of MAUCE. For MATS we use noninformative Jeffreys priors on the unknown means, which for the Poisson likelihood is a Gamma prior, [23]. The results are shown in Figure 1(d).
6 Discussion
While both MAUCE and MATS achieve sublinear regret in terms of time and loworder polynomial regret in terms of the number of local arms for sparse coordination graphs, MATS consistently outperforms MAUCE empirically, as well as SCQL. Especially in the wind farm control task, we can see that the cumulative regret of MATS converges to about 2, while the cumulative regret of MAUCE continues increasing past 10 (see Figure 1(c)). We argue that the high performance of MATS is due to the ability to seamlessly include domain knowledge about the shape of the reward distributions and treat the problem parameters as unknowns. To highlight the power of this property, we introduced the Poisson 0101chain. In this setting, the reward distributions are highly skewed, for which the mean does not match the median. Therefore, in our case, since the mean falls well above 50% of all samples, it is expected that for the initially observed rewards, the true mean will be higher than the sample mean. Naturally, this bias averages out in the limit, but may have a large impact during the early exploration stage. The high standard deviations in Figure 1(d) support this impact. Although the established regret bounds of MATS and MAUCE do not apply for supergaussian reward distributions, we demonstrate that MATS exploits density information of the rewards to achieve more targeted exploration. In Figure 1(d), the cumulative regret of MATS stagnates around 7500 time steps, while the cumulative regret of MAUCE continues to increase significantly. As MAUCE only supports symmetric exploration bounds, it is challenging to correctly assess the amount of exploration needed to solve the task.
Throughout the experiments, exploration constants had to be specified for MAUCE, which were challenging to choose and interpret in terms of the density of the data. In contrast, MATS uses either statistics about the data (if available) or, potentially noninformative, beliefs defined by the user. For example, in the wind farm case, the spread of the data is unknown. MATS effectively maintains a posterior on the variance and uses it to balance exploration and exploitation, while still outperforming MAUCE with a manually calibrated exploration range (see Figure 1(c)).
7 Related Work
Multiagent reinforcement learning and planning with loose couplings has been investigated in sequential decision problems [15, 19, 11, 27]. In sequential settings, the value function cannot be factorized exactly. Therefore, it is challenging to provide convergence and optimality guarantees. While for planning some theoretical guarantees can be provided [27], in the learning literature the focus has been on empirical validation [19]. In this work, we focus on MAMABs, which are singleshot stateless problems. In such settings, the reward function is factored exactly into components that only depend on a subset of agents.
The combinatorial bandit [6, 7, 12, 9] is a variant of the multiarmed bandit, in which, rather than onedimensional arms, an arm vector has to be pulled. In our work, the arms’ dimensionality corresponds to the number of agents in our system, and similarly to combinatorial bandits, the number of arms exponentially increases with this quantity. We consider a variant of this framework, called the semibandit problem [3], in which local components of the global reward are observable. [9] constructed an algorithm for this setting that assumes access to an oracle, which provides a joint action that outputs a fraction of the optimal expected reward with probability . Instead, we assume the availability of a coordination graph, which we argue is a reasonable assumption in many multiagent settings.
Sparse cooperative Qlearning is an algorithm that also assumes the availability of a coordination graph [18]. However, although strong experimental results are given, no theoretical guarantees were provided. Later, the MultiAgent UpperConfidence Exploration algorithm was introduced [4]. The authors provide a tight problemdependent regret bound for MAMABs, and demonstrate high performance on a variety of benchmarks. Both methods leverage the coordination graph and use variable elimination to find the optimal joint action, given the local rewards per group of agents. Our method builds upon this idea, but provides a Bayesian alternative based on Thompson sampling (TS).
8 Conclusions
We proposed MultiAgent Thompson Sampling (MATS), a novel Bayesian algorithm for multiagent multiarmed bandits. The method exploits loose connections between agents to solve multiagent coordination tasks efficiently. Specifically, we proved that, for subgaussian rewards with bounded means, the expected cumulative regret decreases sublinearly in time and loworder polynomially in the highest number of actions of a single agent when the coordination graph is sparse. Empirically, we showed a significant improvement over the stateoftheart algorithm, MAUCE, on several synthetic benchmarks and on a realistic wind farm control task.
9 Acknowledgments
Timothy Verstraeten, Eugenio Bargiacchi and Pieter J.K. Libin were supported by an FWO PhD grant (Fonds Wetenschappelijk Onderzoek  Vlaanderen). Diederik M. Roijers was a Postdoctoral Fellow with the FWO (grant #12J0617N).
References
 [1] (2012) Analysis of thompson sampling for the multiarmed bandit problem. In Conference on Learning Theory, pp. 39–1. Cited by: §1, §1.
 [2] (2013) Further optimal regret bounds for thompson sampling. In Artificial intelligence and statistics, pp. 99–107. Cited by: §2.
 [3] (2011) Minimax policies for combinatorial prediction games.. In COLT, Vol. 19, pp. 107–132. Cited by: §7.

[4]
(2018)
Learning to coordinate with coordination graphs in repeated singlestage multiagent decision problems.
In
International Conference on Machine Learning
, pp. 491–499. Cited by: §1, §2, §5, §5, §7.  [5] (1996) Planning, learning and coordination in multiagent decision processes. In TARK 1996: Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge, pp. 195–210. Cited by: §1.
 [6] (2012) Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends® in Machine Learning 5 (1), pp. 1–122. Cited by: §7.
 [7] (2012) Combinatorial bandits. Journal of Computer and System Sciences 78 (5), pp. 1404–1422. Cited by: §7.
 [8] (2011) An empirical evaluation of thompson sampling. In Advances in neural information processing systems, pp. 2249–2257. Cited by: §1.
 [9] (2013) Combinatorial multiarmed bandit: general framework, results and applications. In Proceedings of the 30th international conference on machine learning, pp. 151–159. Cited by: §7.
 [10] (2017) Decentralised online planning for multirobot warehouse commissioning. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 492–500. Cited by: §1.
 [11] (2010) Learning multiagent state space representations. In Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’10, pp. 715–722. Cited by: §7.
 [12] (2012) Combinatorial network optimization with unknown variables: multiarmed bandits with linear rewards and individual observations. IEEE/ACM Transactions on Networking (TON) 20 (5), pp. 1466–1478. Cited by: §7.
 [13] (2001) Maxnorm projections for factored MDPs. In Proc. of the 17th International Joint Conference on Artificial Intelligence (IJCAI), pp. 673–682. Cited by: §1.
 [14] (2002) Multiagent planning with factored mdps. In Advances in neural information processing systems, pp. 1523–1530. Cited by: §1, §2, §3.
 [15] (2002) Contextspecific multiagent coordination and planning with factored mdps. In AAAI/IAAI, pp. 253–259. Cited by: §1, §3, §7.
 [16] (2014) Optimality of thompson sampling for gaussian bandits depends on priors. In Artificial Intelligence and Statistics, pp. 375–383. Cited by: §5.3.
 [17] (2012) Wind turbines – Part 4: Design requirements for wind turbine gearboxes (No. IEC 614004). Note: accessed 6 March 2019 External Links: Link Cited by: §5.3.
 [18] (2004) Sparse cooperative qlearning. In Proceedings of the Twentyfirst International Conference on Machine Learning, ICML ’04, New York, NY, USA. External Links: Document Cited by: §7.
 [19] (2006) Using the maxplus algorithm for multiagent decision making in coordination graphs. In RoboCup 2005: Robot Soccer World Cup IX, A. Bredenfeld, A. Jacoff, I. Noda, and Y. Takahashi (Eds.), Lecture Notes in Computer Science, Vol. 4020, pp. 1–12. Cited by: §7.
 [20] (2000) Policy iteration for factored MDPs. In Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, San Francisco, CA, USA, pp. 326–334. Cited by: §1.
 [21] (2018) Bandit algorithms. preprint. Cited by: §3, §4.
 [22] (2018) Bayesian bestarm identification for selecting influenza mitigation strategies. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 456–471. Cited by: §1.
 [23] (2012) The bugs book: a practical introduction to bayesian analysis. Chapman and Hall/CRC. Cited by: §5.1, §5.2, §5.4.
 [24] (2013) A modelfree approach to wind farm control using game theoretic methods. IEEE Transactions on Control Systems Technology 21 (4), pp. 1207–1214. Cited by: §1.
 [25] (2019) FLORIS. Version 1.0.0. GitHub. External Links: Link Cited by: §5.3.
 [26] (2014) Learning to optimize via posterior sampling. Mathematics of Operations Research 39 (4), pp. 1221–1243. Cited by: §4.
 [27] (2016) Solving transitionindependent multiagent MDPs with sparse interactions. In AAAI 2016: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Cited by: §7.
 [28] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 (3/4), pp. 285–294. Cited by: §2, §3.
 [29] (2016) Yawmisalignment and its impact on wind turbine loads and wind farm power output. Journal of Physics: Conference Series 753 (6). Cited by: §1, §5.3.

[30]
(2018)
Highdimensional probability: an introduction with applications in data science
. Vol. 47, Cambridge University Press. Cited by: §4.  [31] (2019) Fleetwide dataenabled reliability improvement of wind turbines. Renewable and Sustainable Energy Reviews 109, pp. 428–437. Cited by: §1.
 [32] (2000) Multiagent reinforcement learning for traffic light control. In Machine Learning: Proceedings of the Seventeenth International Conference (ICML’2000), pp. 1151–1158. Cited by: §1.
Comments
There are no comments yet.