1 Introduction
An agent can be defined as an entity (or computer program) that behaves autonomously within an arbitrary system in the pursuit of some goals [WooldridgeWooldridge2009]. A multiagent system (MAS) is a system where multiple agents interact in the pursuit of such goals. Within a MAS, agents may interact with each other directly, via communication acts, or indirectly, by acting on the shared environment. In addition, agents may decide to cooperate, to achieve a common goal, or to compete, to serve their own interests at the expense of other agents. In particular, agents may form cooperative teams, which can in turn compete against other teams of agents. Multiagent systems play an important role in distributed artificial intelligence, thanks to their ability to model a wide variety of realworld scenarios, where information and control are decentralized and distributed among a set of agents.
Figure 1 illustrates a MAS example. It represents a sensor network where a group of agents, equipped with sensors, seeks to determine the position of some targets. Agents may interact with each other and move away from the current position. The figure depicts the targets as starshaped objects. The dotted lines define an interaction graph and the directional arrows illustrate agents’ movements. In addition, various events that obstruct the sensors of an agent may dynamically occur. For instance, the presence of an obstacle along the agent’s sensing range may be detected after the agent’s movement.
Within a MAS, an agent is:

Autonomous, as it operates without the direct intervention of humans or other entities and has full control over its own actions and internal state (e.g., in the example, an agent can decide to sense, to move, etc.);

Interactant, in the sense that it interacts with other agents in order to achieve its objectives (e.g., in the example, agents may exchange information concerning results of sensing activities);

Reactive, as it responds to changes that occur in the environment and/or to the requests from other agents (e.g., in the example, agents may react with a move action to the sudden appearance of obstacles).

Proactive, because of its goaldriven behavior, which allows the agent to take initiatives beyond the reactions in response to its environment.
[0,r,,Illustration of a multiagent system: Sensors (agents) seek to determine the position of the targets.] Agent architectures are the fundamental mechanisms underlying the autonomous agent components, supporting their behavior in realworld, dynamic, and uncertain environments. Agent architectures based on decision theory, game theory, and constraint programming have successfully been developed and are popular in the Autonomous Agents and MultiAgent Systems (AAMAS) community.
Decision theory [RaiffaRaiffa1968] assumes that the agent’s actions and the environment are inherently uncertain and models such uncertainty explicitly. Agents acting in complex and dynamic environments are required to deal with various sources of uncertainty. The
Decentralized Partially Observable Markov Decision Processes (DecPOMDPs)
framework [Bernstein, Givan, Immerman, ZilbersteinBernstein et al.2002] is one of the most general multiagent frameworks, focused on team coordination in presence of uncertainty about agents’ actions and observations. The ability to capture a wide range of complex scenarios makes DecPOMDPs of central interest within MAS research. However, the result of this generality is a high complexity for generating optimal solutions. DecPOMDPs are nondeterministic exponential (NEXP) complete [Bernstein, Givan, Immerman, ZilbersteinBernstein et al.2002], even for twoagent problems, and scalability remains a critical challenge [Amato, Chowdhary, Geramifard, Ure, KochenderferAmato et al.2013].Game theory [BinmoreBinmore1992] studies interactions between selfinterested agents, aiming at maximizing the welfare of the participants. Some of the most compelling applications of game theory to MAS have been in the area of auctions and negotiations [KrausKraus1997, Noriega SierraNoriega Sierra1999, Parsons WooldridgeParsons Wooldridge2002]. These approaches model the trading process by which agents can reach agreements on matters of common interest, using market oriented and cooperative mechanisms, such as reaching Nash equilibria. Typical resolution approaches aim at deriving a set of equilibrium strategies for each agent, such that, when these strategies are employed, no agent can profit by unilaterally deviating from their strategies. A limitation of game theoreticalbased approaches is the lack of an agent’s ability to reason upon a global objective, as the underlying model relies on the interactions of selfinterested agents.
Constraint programming [Rossi, Beek, WalshRossi et al.2006] aims at solving decisionmaking problems formulated as optimization problems of some realworld objective. Constraint programs use the notion of constraints – i.e., relations among entities of the problems (variables) – in both problem modeling and problem solving. Constraint programming relies on inference techniques that prevent the exploration of those parts of the solution search space whose assignments to variables are inconsistent with the constraints and/or dominated with respect to the objective function. Distributed Constraint Optimization Problems (DCOPs) [Modi, Shen, Tambe, YokooModi et al.2005, Petcu FaltingsPetcu Faltings2005b, Gershman, Meisels, ZivanGershman et al.2009, Yeoh YokooYeoh Yokoo2012] are problems where agents need to coordinate their value assignments, in a decentralized manner, to optimize their objective functions. DCOPs focus on attaining a global optimum given the interaction graph of a collection of agents. This approach can be effectively used to model a wide range of problems. Problem solving and communication strategies are directly linked in DCOPs. This feature makes the algorithmic components of a DCOP suitable for exploiting the structure of the interaction graph of the agents to generate efficient solutions.
The absence of a framework to model dynamic problems and uncertainty makes DCOPs unsuitable at solving certain classes of multiagent problems, such as those characterized by action uncertainty and dynamic environments. However, since its original introduction, the DCOP model has undergone a process of continuous evolution to capture diverse characteristics of agent behavior and the environment in which they operate. Researchers have proposed a number of DCOP frameworks that differ from each other in terms of expressiveness and classes of problem they can target, extending the DCOP model to handle both dynamic and uncertain environments. However, current research has not explored how the different DCOP frameworks relate to each other within the general MAS context, which is critical to understand: (i) What resolution methods could be borrowed from other MAS paradigms, and (ii) What applications can be most effectively modeled within each framework. While there are important existing surveys for Distributed Constraint Satisfaction [Yokoo HirayamaYokoo Hirayama2000] and Distributed Constraint Optimization [MeiselsMeisels2008], this survey aims to comprehensively analyze and categorize the different DCOP frameworks proposed by the MAS community. We do so by presenting an extensive review of the DCOP model and its extensions, the different resolution methods, as well as a number of applications modeled within each particular DCOP extension. This analysis also provides opportunities to identify open challenges and discuss future directions in the general DCOP research area.
List of key symbols  

Agent  Projection operator  
Decision variable  Probability function  
Random variable  ’s local variables  
Domain of  ’s neighbors  
Event space of  ’s children  
Cost function  ’s pseudochildren  
Scope of  ’s parent  
Number of agents  ’s pseudoparents  
Number of variables  Agents whose variables are in  
Number of random variables  Set of edges of the constraint graph  
Number of cost functions  Tree edges of the pseudotree  
Size of the largest domain  Set of edges of the factor graph  
Global objective function  Induced width of the pseudotree  
Vector of objective functions  Size of the largest neighborhood  
Objective function in  Size of the largest local variable set  
Utopia point  Maximal sample size  
Infeasible value  Size of the Pareto set  
Complete assignment  Size of the largest bin  
Partial assignment for the variables in  Number of iterations of the algorithm  
State space 
This survey paper is organized as follows. The next section provides an overview on two relevant constraint satisfaction models and their generalization to the distributed cases. Section 3 introduces DCOPs, overviews the representation and coordination models adopted during the resolution of DCOPs, and it proposes a classification of the different variants of DCOPs based on the characteristics of the agents and the environment. Section 4 presents the classical DCOP model as well as two notable extensions: One characterized by asymmetric cost functions and another by multiobjective optimization. Section 5 presents a DCOP model where the environment changes over time. Section 6 discusses DCOP models in which agents act under uncertainty and may have partial knowledge of the environment in which they act. Section 7 discusses DCOP models in which agents are noncooperative. For each of these models, the paper introduces their formal definitions, discusses related concepts, and describes several resolution algorithms. A summary of the various classes of problems discussed in this survey is given in Table 5. Section 8 describes a number of applications that have been proposed in the DCOP literature. Section 9 provides a critical review on the DCOP variants surveyed and focuses on their applicability in various settings. Additionally, it describes some potential future directions for research. Finally, Section 10 provides concluding remarks. To facilitate the reading of this survey, Table 1 summarizes the most commonly used symbols and notations.
2 Overview of (Distributed) Constraint Satisfaction and Optimization
This section provides an overview of several constraint satisfaction models, which form the foundation of DCOPs. Figure 1 illustrates the relations among these models.
2.1 Constraint Satisfaction Problems
Constraint Satisfaction Problems (CSPs) [Golomb BaumertGolomb Baumert1965, Mackworth FreuderMackworth Freuder1985, AptApt2003, Rossi, Beek, WalshRossi et al.2006] are decision problems that involve the assignment of values to variables, under a set of specified constraints on how variable values should be related to each other. A number of problems can be formulated as CSPs, including resource allocation, vehicle routing, circuit diagnosis, scheduling, and bioinformatics. Over the years, CSPs have become the paradigm of choice to address difficult combinatorial problems, drawing and integrating insights from diverse domains, including artificial intelligence and operations research [Rossi, Beek, WalshRossi et al.2006].
A CSP is a tuple , where:

is a finite set of variables.

is a set of finite domains for the variables in , with being the set of possible values for the variable .

is a finite set of constraints over subsets of , where a constraint , defined on the variables , is a relation , where . The set of variables is referred to as the scope of .^{1}^{1}1The presence of a fixed ordering of variables is assumed. is called a unary constraint if and a binary constraint if . For all other values of , the constraint is called a kary constraint.^{2}^{2}2A constraint with is also called a ternary constraint and a constraint with is also called a global constraint.
A partial assignment is a value assignment for a proper subset of variables from that is consistent with their respective domains, i.e., it is a partial function such that, for each , if is defined, then . An assignment is complete if it assigns a value to each variable in . The notation is used to denote a complete assignment, and, for a set of variables , to denote the projection of the values in associated to the variables in , where . The goal in a CSP is to find a complete assignment such that, for each , , that is, a complete assignment that satisfies all the problem constraints. Such a complete assignment is called a solution of the CSP.
2.2 Weighted Constraint Satisfaction Problems
A solution of a CSP must satisfy all of its constraints. In many practical cases, however, it is desirable to consider complete assignments whose constraints can be violated according to a violation degree. The Weighted Constraint Satisfaction Problem (WCSP) [Shapiro HaralickShapiro Haralick1981, LarrosaLarrosa2002] was introduced to capture this property. WCSPs are problems whose constraints are considered as preferences that specify the extent of satisfaction (or violation) of the associated constraint.
A WCSP is a tuple , where and are the set of variables and their domains as defined in a CSP, and is a set of weighted constraints. A weighted constraint is a function , where is the scope of and is a special element used to denote that a given combination of values for the variables in is not allowed, and it has the property that , for all . The cost of an assignment is the sum of the evaluation of the constraints involving all the variables in . A solution is a complete assignment with cost different from , and an optimal solution is a solution with minimal cost.
Thus, a WCSP is a generalization of a CSP which, in turn, can be seen as a WCSP whose constraints use exclusively the costs and . The terms WCSP and Constraint Optimization Problem (COP) have been used interchangeably in the literature and the use of the latter term has been widely adopted in the recent years.
2.3 Distributed Constraint Satisfaction Problems
When the elements of a CSP are distributed among a set of autonomous agents, the resulting model is referred to as a Distributed Constraint Satisfaction Problem (DisCSP) [Yokoo, Durfee, Ishida, KuwabaraYokoo et al.1998, YokooYokoo2001]. A DisCSP is a tuple , where , , and are the set of variables, their domains, and the set of constraints, as defined in a CSP; is a finite set of autonomous agents; and is a surjective function, from variables to agents, which assigns the control of each variable to an agent . The goal in a DisCSP is to find a complete assignment that satisfies all the constraints of the problem.
DisCSPs can be seen as an extension of CSPs to the multiagent case, where agents communicate with each other to assign values to the variables they control so as to satisfy all the problem constraints. For a survey on the topic, the interested reader is referred to [Rossi, Beek, WalshRossi et al.2006] (Chapter 20).
2.4 Distributed Constraint Optimization Problems
Similar to the generalization of CSPs to COPs, the Distributed Constraint Optimization Problem (DCOP) model [Modi, Shen, Tambe, YokooModi et al.2005, Petcu FaltingsPetcu Faltings2005b, Gershman, Meisels, ZivanGershman et al.2009, Yeoh YokooYeoh Yokoo2012] emerges as a generalization of the DisCSP model, where constraints specify a degree of preference over their violation, rather than a Boolean satisfaction metric. DCOPs can also be viewed as an extension of the COP framework to the multiagent case, where agents control variables and constraints, and need to coordinate the value assignment for the variables they control so as to optimize a global objective function. The DCOP framework is formally introduced in the next section.
3 DCOP Classification
The DCOP model has undergone a process of continuous evolution to capture diverse characteristics of the agent behavior and the environment in which agents operate. This section proposes a classification of DCOP models from a multiagent systems perspective. It accounts for the different assumptions made about the behavior of the agents and their interactions with the environment. The classification is based on the following elements (summarized in Table 2):
Element  Characterization  

Agent(s)  Behavior  Deterministic  Stochastic 
Knowledge  Total  Partial  
Teamwork  Cooperative  Competitive  
Environment  Behavior  Deterministic  Stochastic 
Evolution  Static  Dynamic 

Agent Behavior: This parameter captures the stochastic nature of the effects of an action being executed. These effects can be either deterministic or stochastic.

Agent Knowledge: This parameter captures the knowledge of an agent about its own state and the environment. It can be total or partial (i.e., incomplete).

Agent Teamwork: This parameter characterizes the approach undertaken by (teams of) agents to solve a distributed problem. It can be either a cooperative or a competitive resolution approach. In the former class, all agents cooperate to achieve a common goal (i.e., they all optimize a global objective function). In the latter class, each agent (or team of agents) seeks to achieve its own individual goal (i.e., each agent optimizes its individual objective functions).

Environment Behavior: This parameter captures the exogenous properties of the environment. The response of the environment to the execution of an action can be either deterministic or stochastic.

Environment Evolution: This parameter captures whether the DCOP does not change over time (static) or it changes over time (dynamic).
Figure 3 illustrates a categorization of the DCOP models proposed to date from a MAS perspective. This survey focuses on the DCOP models proposed at the junction of constraint programming, game theory, and decision theory. The classical DCOP model is directly inherited from constraint programming as it extends the WCSP model to a distributed setting. It is characterized by a static model, a deterministic environment and agent behavior, a total agent knowledge, and a cooperative agent teamwork. Game theoretical concepts explored in the context of auctions and negotiations have influenced the DCOP framework leading to the development of the Asymmetric DCOP and the MultiObjective DCOP. The DCOP framework has also borrowed fundamental decision theoretical concepts related to modeling uncertain and dynamic environments, resulting in models like the Probabilistic DCOP and the Dynamic DCOP. Researchers from the DCOP community have also designed solutions that inherit from all of the three communities.
The next sections describe the different DCOP frameworks, starting with classical DCOPs before proceeding to its various extensions. The survey focuses on a categorization based on three dimensions: Agent knowledge, environment behavior, and environment evolution. It assumes a deterministic agent behavior, a fully cooperative agent teamwork, and a total agent knowledge (unless otherwise specified), as they are, by far, common assumptions adopted by the DCOP community. The DCOP models associated to this categorization are summarized in Table 3. The bottomright entry of the table is left empty, indicating a promising model with dynamic and uncertain environments that, to the best of our knowledge, has not been explored yet. There has been only a modest amount of effort in modeling the different aspects of teamwork within the DCOP community. Section 7 describes a formalism that has been adopted to model DCOPs with mixed cooperative and competitive agents.
Environment  Evolution  Environment Behavior  

Deterministic  Stochastic  
Static  Classical DCOP  Probabilistic DCOP  
Dynamic  Dynamic DCOP  — 
4 Classical DCOP
With respect to the proposed categorization, in the classical DCOP model [Modi, Shen, Tambe, YokooModi et al.2005, Petcu FaltingsPetcu Faltings2005b, Gershman, Meisels, ZivanGershman et al.2009, Yeoh YokooYeoh Yokoo2012] the agents are fully cooperative and have deterministic behavior and total knowledge. Additionally, the environment is static and deterministic. This section reviews the formal definitions of classical DCOPs, presents some relevant solving algorithms, and provides details of selected variants of classical DCOPs of particular interest.
4.1 Definition
A classical DCOP is described by a tuple , where:

is a finite set of agents.

is a finite set of variables, with .

is a set of finite domains for the variables in , with being the domain of variable .

is a finite set of cost functions, with , where similar to WCSPs, is the set of variables relevant to , referred to as the scope of . The arity of a cost function is the number of variables in its scope. Each cost function represents a factor in a global objective function . In the DCOP literature, the cost functions are also called constraints, utility functions, or reward functions.

is a total and onto function, from variables to agents, which assigns the control of each variable to an agent .
With a slight abuse of notation, will be used to denote the set of agents whose variables are involved in the scope of , i.e., . A partial assignment is a value assignment for a proper subset of variables of . An assignment is complete if it assigns a value to each variable in . For a given complete assignment , we say that a cost function is satisfied by if . A complete assignment is a solution of a DCOP if it satisfies all its cost functions. The goal in a DCOP is to find a solution that minimizes the total problem cost expressed by its cost functions:^{3}^{3}3Alternatively, one can define a maximization problem by substituting the operator in Equation 1 with . Typically, if the objective functions are referred to as utility functions or reward functions, then the DCOP is a maximization problem. Conversely, if the objective functions are referred to as cost functions, then the DCOP is a minimization problem.
(1) 
where is the state space, defined as the set of all possible solutions.
Given an agent , denotes the set of variables controlled by agent , or its local variables, and denotes the set of its neighboring agents. A cost function is said to be hard if we have that . Otherwise, the cost function is said to be soft.
Finding an optimal solution for a classical DCOP is known to be NPhard [Modi, Shen, Tambe, YokooModi et al.2005].
4.2 DCOP: Representation and Coordination
Representation in DCOPs plays a fundamental role, both from an agent coordination perspective and from an algorithmic perspective. This section discusses the most predominant representations adopted in various DCOP algorithms. It starts by describing some widely adopted assumptions regarding agent knowledge and coordination, which will apply throughout this document, unless otherwise stated:

A variable and its domain are known exclusively to the agent controlling it and its neighboring agents.

Each agent knows the values of the cost function involving at least one of its local variables. No other agent has knowledge about such cost function.

Each agent knows (and it may communicate with) exclusively its own neighboring agents.
4.2.1 Constraint Graph
Given a DCOP , is the constraint graph of , where an undirected edge exists if and only if there exists such that . A constraint graph is a standard way to visualize a DCOP instance. It underlines the agents’ locality of interactions and therefore it is commonly adopted by DCOP resolution algorithms.
Given an ordering on , a variable is said to have a higher priority with respect to a variable if appears before in . Given a constraint graph and an ordering on its nodes, the induced graph on is the graph obtained by connecting nodes, processed in increasing order of priority, to all their higherpriority neighbors. For a given node, the number of higherpriority neighbors is referred to as its width. The induced width of is the maximum width over all the nodes of on ordering .
Figure 4(a) shows an example constraint graph of a DCOP with four agents through , each controlling one variable with domain {0,1}. There are two cost functions: a ary cost function with scope and represented by a clique among , and ; and a binary cost function with scope .
4.2.2 PseudoTree
A number of DCOP algorithms require a partial ordering among the agents. In particular, when such an order is derived from a depthfirst search (DFS) exploration, the resulting structure is known as a (DFS) pseudotree. A pseudotree arrangement for a DCOP is a subgraph of such that is a spanning tree of – i.e., a connected subgraph of containing all the nodes and being a rooted tree – with the following additional condition: for each , if for some , then appear in the same branch of (i.e., is an ancestor of in or vice versa). Edges of that are in (respectively out of) are called tree edges (respectively backedges). The tree edges connect parentchild nodes, while backedges connect a node with its pseudoparents and its pseudochildren. The separator of an agent is the set containing all the ancestors of in the pseudotree (through tree edges or backedges) that are connected to or to one of its descendants. The notation , , , and will be used to indicate the set of children, pseudochildren, parent, and pseudoparents of the agent .
Both constraint graph and pseudotree representations cannot deal explicitly with ary cost functions (with ). A typical artifact to deal with such cost functions in a pseudotree representation is to introduce a virtual variable that monitors the value assignments for all the variables in the scope of the cost function, and generates the cost values [Bowring, Tambe, YokooBowring et al.2006] – the role of the virtual variables can be delegated to one of the variables participating in the cost function [Pecora, Modi, ScerriPecora et al.2006, Matsui, Matsuo, Silaghi, Hirayama, YokooMatsui et al.2008].
4.2.3 Factor Graph
Another way to represent DCOPs is through a factor graph [Kschischang, Frey, LoeligerKschischang et al.2001]. A factor graph is a bipartite graph used to represent the factorization of a function. In particular, given the global objective function , the corresponding factor graph is composed of variable nodes , factor nodes , and edges such that there is an undirected edge between factor node and variable node if .
Factor graphs can handle ary cost functions explicitly. To do so, they use a similar method as the one adopted within pseudotrees with such cost functions: They delegate the control of a factor node to one of the agents controlling a variable in the scope of the cost function. From an algorithmic perspective, the algorithms designed over factor graphs can directly handle ary cost functions, while algorithms designed over pseudotrees require changes in the algorithm design so to delegate the control of the ary cost functions to some particular entity.
4.3 Algorithms
The field of classical DCOPs is mature and a number of different resolution algorithms have been proposed. DCOP algorithms can be classified as being either
complete or incomplete, based on whether they can guarantee the optimal solution or they trade optimality for shorter execution times, producing nearoptimal solutions. They can also be characterized based on their runtime characteristics, their memory requirements, and their communication requirements (e.g., the number and size of messages that they send and whether they communicate with their neighboring agents only or also to nonneighboring agents). Table 3 tabulates the properties of a number of key DCOP algorithms that will be surveyed in Sections 4.3.4 and 4.3.5. An algorithm is said anytime if it can return a valid solution even if the DCOP agents are interrupted at any time before the algorithm terminates. Anytime algorithms are expected to seek for solutions of increasing quality as they keep running [Zivan, Okamoto, PeledZivan et al.2014].All these algorithms were originally developed under the assumption that each agent controls exactly one variable. The description of their properties will follow the same assumption. These properties may change when generalizing the algorithms to allow for agents to control multiple variables, but they will depend on how the algorithms are generalized. Throughout this document, the following notation will be often adopted when discussing the complexity of the algorithms:

refers to the number of variables in the problem; in Table 3, also refers to the number of agents in the problem since each agent has exactly one variable;

refers to the size of the largest domain;

refers to the induced width of the pseudotree;

refers to the largest number of neighboring agents; and

refers to the number of iterations in incomplete algorithms.
In addition, each of these classes can be categorized into several groups, depending on the degree of locality exploited by the algorithms, the way local information is updated, and the type of exploration process adopted. These different categories are described next.
Algorithm  Quality Characteristics  Runtime Characteristics  Memory  Communication Characteristics  
Optimal?  Error Bound?  Complexity  Anytime?  per Agent  # Messages  Message Size  Local Communication?  
SyncBB  
AFB  
ADOPT  
ConcFB  
DPOP  
OptAPO  
MaxSum  
Region Optimal  
MGM  
DSA  
DUCT  
DGibbs 
4.3.1 Partial Centralization
In general, the DCOP solving process is decentralized, driving DCOP algorithms to follow the agent knowledge and communication restrictions described in Section 4.2. However, some algorithms explore methods to centralize the decisions to be taken by a group of agents, by delegating them to one of the agents in the group. These algorithms explore the concept of partial centralization [Hirayama YokooHirayama Yokoo1997, Mailler LesserMailler Lesser2004, Petcu, Faltings, MaillerPetcu et al.2007], and thus they are classified as partially centralized algorithms. Typically, partial centralization improves the algorithms’ performance allowing agents to coordinate their local assignments more efficiently. However, such performance enhancement comes with a loss of information privacy, as the centralizing agent needs to be granted access to the local subproblem of other agents in the group [Greenstadt, Grosz, SmithGreenstadt et al.2007, Mailler LesserMailler Lesser2004]. In contrast, fully decentralized algorithms inherently reduce the amount of information privacy at cost of a larger communication effort.
4.3.2 Synchronicity
DCOP algorithms can enhance their effectiveness by exploiting distributed and parallel processing. Based on the way the agents update their local information, DCOP algorithms are classified as synchronous or asynchronous. Asynchronous algorithms allow agents to update the assignment for their variables based solely on their local view of the problem, and thus independently from the actual decisions of the other agents [Modi, Shen, Tambe, YokooModi et al.2005, Farinelli, Rogers, Petcu, JenningsFarinelli et al.2008, Gershman, Meisels, ZivanGershman et al.2009]. In contrast, synchronous algorithms constrain the agents decisions to follow a particular order, typically enforced by the representation structure adopted [Mailler LesserMailler Lesser2004, Petcu FaltingsPetcu Faltings2005b, Pearce TambePearce Tambe2007].
Synchronous algorithms tend to delay the actions of some agents guaranteeing that their local view of the problem is always consistent with that of the other agents. In contrast, asynchronous algorithms tend to minimize the idletime of the agents, which in turn can react quickly to each message being processed; however, they provide no guarantee on the consistency of the state of the local view of each agent. Such effect has been studied by peri:13, concluding that inconsistent agents’ views may cause a negative impact on network load and algorithm performance, and that introducing some level of synchronization may be beneficial for some algorithms, enhancing their performance.
4.3.3 Exploration Process
The resolution process adopted by each algorithm can be classified in three categories [YeohYeoh2010]:

Searchbased algorithms are based on the use of search techniques to explore the space of possible solutions. These algorithms are often derived from corresponding search techniques developed for centralized AI search problems, such as bestfirst search and depthfirst search.

Inferencebased algorithms are derived from dynamic programming and belief propagation techniques. These algorithms allow agents to exploit the structure of the constraint graph to aggregate costs from their neighbors, effectively reducing the problem size at each step of the algorithm.

Samplingbased algorithms
are incomplete approaches that sample the search space to approximate a function (typically, a probability distribution) as a product of statistical inference.
Figure 5 illustrates a taxonomy of classical DCOP algorithms. The following subsections summarize some representative complete and incomplete algorithms from each of the classes introduced above. A detailed description of the DCOP algorithms is beyond the scope of this manuscript. The interested reader is referred to the original articles that introduce each algorithm.
4.3.4 Complete Algorithms
Some of the algorithms described below were originally designed to solve the variant of DCOPs that maximizes rewards, while others solve the variant that minimizes costs. However, the algorithms that maximize rewards can be easily adapted to minimize costs. For consistency, this survey describes the version of the algorithms that focus on minimization of costs. It also describes their quality, runtime, memory, and communication characteristics as summarized in Table 3.
SyncBB [Hirayama YokooHirayama Yokoo1997]. Synchronous BranchandBound (SyncBB) is a complete, synchronous, searchbased algorithm that can be considered as a distributed version of a branchandbound algorithm. It uses a complete ordering of the agents to extend a Current Partial Assignment (CPA) via a synchronous communication process. The CPA holds the assignments of all the variables controlled by all the visited agents, and, in addition, functions as a mechanism to propagate bound information. The algorithm prunes those parts of the search space whose solution quality is suboptimal, by exploiting the bounds that are updated at each step of the algorithm.
SyncBB agents perform number of operations since the lowest priority agent needs to enumerate through all possible value combinations for all variables. While, by default, it is not an anytime algorithm, it can be easily extended to have an anytime property since it is a branchandbound algorithm. The memory requirement per SyncBB agent is since the lowest priority agent stores the value assignment of all problem variables. In terms of communication requirement, SyncBB agents send number of messages: The lowest priority agent enumerates through all possible value combinations for all variables and sends a message for each combination. The largest message, which contains the value assignment of all variables, is of size . Finally, the communication model of SyncBB depends on the given agent’s complete ordering. Thus, agents may communicate with nonneighboring agents.
AFB [Gershman, Meisels, ZivanGershman et al.2009]. Asynchronous Forward Bounding (AFB)
is a complete, asynchronous, searchbased algorithm. It can be considered as an asynchronous version of SyncBB. In this algorithm, agents communicate their cost estimates, which in turn are used to compute bounds and prune the search space. In AFB, agents extend a CPA sequentially, provided that the lower bound on their costs does not exceed the global upper bound, that is, the cost of the best solution found so far. Each agent performing an assignment (the “assigning” agent) triggers asynchronous checks of bounds, by sending
forward messages containing copies of the CPA to agents that have not yet assigned their variables. The unassigned agents that receive a CPA estimate the lower bound of the CPA given their local view of the constraint graph and send their estimates back to the agent that originated the forward message. This assigning agent will receive these estimates asynchronously and aggregate them into an updated lower bound. If the updated lower bound exceeds the current upper bound, the agent initiates a backtracking phase.The runtime, memory, and communication characteristics of AFB are identical to those of SyncBB for the same reasons. However, while both AFB and SyncBB agents communicate with nonneighboring agents, AFB agents broadcasts some of their messages while SyncBB agents do not.
ADOPT [Modi, Shen, Tambe, YokooModi et al.2005]. Asynchronous Distributed OPTimization (ADOPT) is a complete, asynchronous, searchbased algorithm. It can be considered as a distributed version of a memorybounded bestfirst search algorithm. It makes use of a DFS pseudotree ordering of the agents. The algorithm relies on maintaining, in each agent, lower and upper bounds on the solution cost for the subtree rooted at its node in the DFS tree. Agents explore partial assignments in bestfirst order, that is, in increasing lower bound order. They use COST messages (propagated upwards in the DFS pseudotree) and THRESHOLD and VALUE messages (propagated downwards in the pseudotree) to iteratively tighten the lower and upper bounds, until the lower bound of the minimum cost solution is equal to its upper bound. ADOPT agents store lower bounds as thresholds, which can be used to prune partial assignments that are provably suboptimal.
Similar to SyncBB and AFB, ADOPT agents perform number of operations since the lowest priority agent needs to enumerate through all possible value combinations for all variables when the pseudotree degenerates into a pseudochain. It is also not an anytime algorithm as it is a bestfirst search algorithm. The memory requirement per ADOPT agent is , where is used to store a context, which is the value assignment of all higherpriority variables, and is used to store the lower and upper bounds for each domain value and variable belonging to the agent’s child agents. Finally, ADOPT agents communicate exclusively with their neighboring agents.
ADOPT has been extended in several ways. In particular, BnBADOPT [Yeoh, Felner, KoenigYeoh et al.2010, Gutierrez MeseguerGutierrez Meseguer2012b] uses a branchandbound method to reduce the amount of computation performed during search, and ADOPT(k) combines both ADOPT and BnBADOPT into an integrated algorithm [Gutierrez, Meseguer, YeohGutierrez et al.2011]. There are also extensions that trade solution optimality for smaller runtimes [Yeoh, Sun, KoenigYeoh et al.2009a], extensions that use more memory for smaller runtimes [Yeoh, Varakantham, KoenigYeoh et al.2009b], and extensions that maintain soft arcconsistency [Bessiere, Gutierrez, MeseguerBessiere et al.2012, Bessiere, Brito, Gutierrez, MeseguerBessiere et al.2014, Gutierrez MeseguerGutierrez Meseguer2012a, Gutierrez, Lee, Lei, Mak, MeseguerGutierrez et al.2013].
Finally, the NoCommitment Branch and Bound (NCBB) algorithm [Chechetka SycaraChechetka Sycara2006] can be considered as a variant of ADOPT and SyncBB. Similar to ADOPT, NCBB agents exploit the structure defined by a pseudotree order to decompose the global objective function. This allow the agents to search nonintersecting parts of the search space concurrently. Another main feature of NCBB is the eager propagation of lower bounds on solution cost: An NCBB agent propagates its lower bound every time it learns about its ancestors’ assignments. This feature provides an efficient pruning of the search space. The runtime, memory, and communication characteristics of NCBB are the same as those of ADOPT except that NCBB is an anytime algorithm.
ConcFB [Netzer, Grubshtein, MeiselsNetzer et al.2012]. Concurrent Forward Bounding (ConcFB)
is a complete, asynchronous, searchbased algorithm that runs multiple parallel versions of AFB concurrently. By running multiple concurrent search procedures, it is able to quickly find a solution, apply a forward bounding process to detect regions of the search space to prune, and to dynamically create new search processes when detecting promising subspaces. Similar to AFB, it uses a complete ordering of agents and variables instead of pseudotrees. As such, it is able to simplify the management of reordering heuristics, which can provide substantial speed up to the search process
[Zivan MeiselsZivan Meisels2006].The algorithm operates as follows: Each agent maintains a global upper bound, which is updated during the search process. The highestpriority agent begins the process by generating a number of different search processes (SP), one for each value of its variable. It then sends an LB_Request message to all unassigned agents. This LB_Request message contains the current CPA and triggers a calculation of the lower bounds of the receiving agents, which are sent back to the sender agent via a LB_Report message. If the sum of the aggregated costs and the current CPA cost is no smaller than the current upper bound, the agent selects another value for its variable and repeats the process. If the agent has exhausted all value assignments for its variable, then it backtracks, sending the CPA to the last assigning agent. If the CPA cost is lower than the current upper bound, then it forwards the CPA message to the next nonassigned agent. Upon receiving a CPA message, the agent repeats the above process. When the lowestpriority agent finds a solution resulting to a new upper bound, it broadcasts the upper bound via a UB message, which is stored by each each agent.
netzer:12 described a series of enhancements that can be used to speed up the search process of ConcFB, including dynamic variable ordering and dynamic splitting. Despite the process within a subproblem is carried out in a synchronous fashion, different subproblems are explored independently. Thus, the agents act asynchronously and concurrently. The runtime, memory, and communication characteristics of ConcFB are identical to those of AFB since it runs multiple parallel versions of AFB concurrently.
DPOP [Petcu FaltingsPetcu Faltings2005b]. Distributed Pseudotree Optimization Procedure (DPOP) is a complete, synchronous, inferencebased algorithm that makes use of a DFS pseudotree ordering of the agents. It involves three phases. In the first phase, the agents order themselves into a DFS pseudotree. In the second phase, called the UTIL propagation phase, each agent, starting from the leaves of the pseudotree, aggregates the costs in its subtree for each value combination of variables in its separator. The aggregated costs are encoded in a UTIL message, which is propagated from children to their parents, up to the root. In the third phase, called the VALUE propagation phase, each agent, starting from the root of the pseudotree, selects the optimal value for its variable. The optimal values are calculated based on the UTIL messages received from the agent’s children and the VALUE message received from its parent. The VALUE messages contain the optimal values of the agents and are propagated from parents to their children, down to the leaves of the pseudotree.
DPOP agents perform number of operations. When an agent optimizes for each value combination of variables in its separator, it takes operations since there are variables in the separator set in the worst case. It is not an anytime algorithm as it terminates upon finding its first solution, which is an optimal solution. The memory requirement per DPOP agent is since it stores all value combinations of variables in its separator. In terms of communication requirement, DPOP agents send messages in total; UTIL messages are propagated up the pseudotree and VALUE messages are propagated down the pseudotree. The largest message sent by an agent, which contains the aggregated costs in its subtree for each value combination of variables in its separator, is . Finally, DPOP agents only communicate with their neighboring agents only.
DPOP has also been extended in several ways to enhance its performance and capabilities. ODPOP and MBDPOP trade runtimes for smaller memory requirements [Petcu FaltingsPetcu Faltings2006, Petcu FaltingsPetcu Faltings2007a], ADPOP trades solution optimality for smaller runtimes [Petcu FaltingsPetcu Faltings2005a], SSDPOP trades runtime for increased privacy [Greenstadt, Grosz, SmithGreenstadt et al.2007], PCDPOP trades privacy for smaller runtimes [Petcu, Faltings, MaillerPetcu et al.2007], HDPOP propagates hard constraints for smaller runtimes [Kumar, Petcu, FaltingsKumar et al.2008], BrCDPOP enforces branch consistency for smaller runtimes [Fioretto, Le, Yeoh, Pontelli, SonFioretto et al.2014], and ASPDPOP is a declarative version of DPOP that uses Answer Set Programming [Le, Son, Pontelli, YeohLe et al.2015].
OptAPO [Mailler LesserMailler Lesser2004]. Optimal Asynchronous Partial Overlay (OptAPO) is a complete, asynchronous, searchbased algorithm. It trades agent privacy for smaller runtimes through partial centralization. It employs a cooperative mediation schema, where agents can act as mediators and propose value assignments to other agents. In particular, the agents check if there is a conflicting assignment with some neighboring agent. If a conflict is found, the agent with the highest priority acts as a mediator. During mediation, OptAPO solves subproblems using a centralized branchandboundbased search, and when solutions of overlapping subproblems still have conflicting assignments, the solving agents increase the degree of centralization to resolve them. By sharing their knowledge with centralized entities, agents can improve their local decisions, reducing the communication costs. For instance, the algorithm has been shown to be superior to ADOPT on simple combinatorial problems [Mailler LesserMailler Lesser2004]. However, it is possible that several mediators solve overlapping problems, duplicating efforts [Petcu, Faltings, MaillerPetcu et al.2007], which can be a bottleneck in dense problems.
OptAPO agents perform number of operations, in the worst case, as a mediator agent may solve the entire problem. Like ADOPT and DPOP, OptAPO is not an anytime algorithm. The memory requirement per OptAPO agent is since it needs to store all value combinations of variables in its mediation group, which is of size . In terms of communication requirement, OptAPO agents send messages in the worst case, though the number of messages decreases with increasing partial centralization. The size of the messages is bounded by , where in the initialization phase of each mediation step, each agent sends its domain to its neighbors and the list of variables that it seeks to mediate. Finally, OptAPO agents can communicate with nonneighboring agents during the mediation phase.
The original version of OptAPO has been shown to be incomplete due to the asynchronicity of the different mediators’ groups, which can lead to race conditions. grinshpoun:08 proposed a complete variant that remedies this issue.
4.3.5 Incomplete Algorithms
MaxSum [Farinelli, Rogers, Petcu, JenningsFarinelli et al.2008]. MaxSum is an incomplete, synchronous, inferencebased algorithm based on belief propagation. It operates on factor graphs by performing a marginalization process of the cost functions, and optimizing the costs for each given variable. This process is performed by recursively propagating messages between variable nodes and factor nodes. The value assignments take into account their impact on the marginalized cost function. MaxSum is guaranteed to converge to an optimal solution in acyclic graphs, but convergence is not guaranteed on cyclic graphs.
MaxSum agents perform number of operations in each iteration, where each agent needs to optimize for all value combinations of neighboring variables. It is not an anytime algorithm. The memory requirement per MaxSum agent is since it needs to store all value combinations of neighboring variables. In terms of communication requirement, in the worst case, each MaxSum agent sends messages in each iteration, one to each of its neighbor. Thus, the total number of messages sent across all agents is . Each message is of size as it needs to contain the current aggregated costs of all the agent’s variable’s values. Finally, the agents communicate exclusively with their neighboring agents.
MaxSum has been extended in several ways. Bounded MaxSum bounds the quality of the solutions found by removing a subset of edges from a cyclic DCOP graph to make it acyclic, and running MaxSum to solve the acyclic problem [Rogers, Farinelli, Stranders, JenningsRogers et al.2011]; Improved Bounded MaxSum improves on the error bounds [Rollon LarrosaRollon Larrosa2012]; and MaxSum_ADVP guarantees convergence in acyclic graphs through a twophase value propagation phase [Zivan PeledZivan Peled2012, Chen, Deng, WuChen et al.2017]. MaxSum and its extensions have been successfully used to solve a number of large scale, complex MAS applications (see Section 8).
Region Optimal [Pearce TambePearce Tambe2007]. Regionoptimal algorithms are incomplete, synchronous, searchbased algorithms that allow users to specify regions of the constraint graph and solve the subproblem within each region optimally. Regions may be defined to have a maximum size of agents [Pearce TambePearce Tambe2007], hops from each agent [Kiekintveld, Yin, Kumar, TambeKiekintveld et al.2010], or a combination of both size and hops [Vinyals, Shieh, Cerquides, RodriguezAguilar, Yin, Tambe, BowringVinyals et al.2011]. The concept of optimality is defined with respect to the number of agents whose assignments conflict, whose set is denoted by , for two assignments and . The deviating cost of with respect to , denoted by , is defined as the difference of the aggregated cost associated to the assignment () minus the cost associated to (). An assignment is optimal if , such that , we have that . In contrast, the concept of distance emphasizes the number of hops from a central agent of the region , that is the set of agents which are separated from by at most hops. An assignment is distance optimal if, , with , for any . Therefore, the solutions found have theoretical error bounds that are a function of and/or . Regionoptimal algorithms adopt a partiallycentralized resolution scheme in which the subproblem within each region is solved optimally by a centralized authority [Tassa, Zivan, GrinshpounTassa et al.2016]. However, this scheme can be altered to use a distributed algorithm to solve each subproblem.
Regionoptimal agents perform number of operations in each iteration, as each agent runs DPOP to solve the problem within each region optimally. It is also an anytime algorithm as solutions of improving quality are found until they are regionoptimal. The memory requirement per regionoptimal agent is since its region may have an induced width of and it uses DPOP to solve the problem within its region. In terms of communication requirement, each regionoptimal agent sends messages, one to each agent within its region. Thus, the total number of messages sent across all agents is . Each message is of size as it uses DPOP. Finally, the agents communicate to all agents within their region – either to a distance of or hops away. Thus, they may communicate with nonneighboring agents.
An asynchronous version of regionaloptimal algorithms, called Distributed Asynchronous Local Optimization (DALO), was proposed by kiekintveld:10. The DALO simulator provides a mechanism to coordinate the decision of local groups of agents based on the concepts of optimality and distance.
MGM [Maheswaran, Pearce, TambeMaheswaran et al.2004a]. Maximum Gain Message (MGM) is an incomplete, synchronous, searchbased algorithm that performs a distributed local search. Each agent starts by assigning a random value to each of its variables. Then, it sends this information to all its neighbors. Upon receiving the values of its neighbors, it calculates the maximum gain (i.e., the maximum decrease in cost) if it changes its value and sends this information to all its neighbors. Upon receiving the gains of its neighbors, the agent changes its value if its gain is the largest among those of its neighbors. This process repeats until a termination condition is met. MGM provides no quality guarantees on the returned solution.
MGM agents perform number of operations in each iteration, as each agent needs to compute the cost for each of its values by taking into account the values of all its neighbors. MGM is anytime since agents only change their values when they have a nonnegative gain. The memory requirement per MGM agent is . Each agent needs to store the values of all its neighboring agents. In terms of communication requirement, each MGM agent sends messages, one to each of its neighboring agents. Thus, the total number of messages sent across all agents is . Each message is of constant size as it contains either the agent’s current value or the agent’s current gain. Finally, the agents communicate exclusively with their neighboring agents.
DSA [Zhang, Wang, Xing, WittenbergZhang et al.2005]. Distributed Stochastic Algorithm (DSA) is an incomplete, synchronous, searchbased algorithm that is similar to MGM, except that each agent does not send its gains to its neighbors and it does not change its value to the value with the maximum gain. Instead, it decides stochastically if it takes on the value with the maximum gain or other values with smaller gains. This stochasticity allows DSA to escape from local minima. Similar to MGM, it repeats the process until a termination condition is met, and it cannot provide quality guarantees on the returned solution. The runtime, memory, and communication characteristics of DSA are identical to those of MGM since it is essentially a stochastic variant of MGM.
DUCT [Ottens, Dimitrakakis, FaltingsOttens et al.2017]. The Distributed Upper Confidence Tree (DUCT) algorithm is an incomplete, synchronous, samplingbased algorithm that is inspired by MonteCarlo Tree Search and employs confidence bounds to solve DCOPs. DUCT emulates a search process analogous to that of ADOPT, where agents select the values to assign to their variables according to the information encoded in their context messages (i.e., the assignments to all the variables in the receiving variable’s separator). However, rather than systematically selecting the next value to assign to their own variables, DUCT agents sample such values. To focus on promising assignments, DUCT constructs a confidence bound , such that cost associated to the best value for any context is at least , and hence agents sample the choice with the lowest bound. This process is started by the root agent of the pseudotree: After sampling a value for its variable, it communicates its assignment to its children in a context message. When an agent receives this message, it repeats this process until the leaf agents are reached. When the leaf agents choose a value assignment, they calculate the cost within their context and propagate this information up to the tree in a cost message. This process continues for a given number of iterations or until convergence is achieved, i.e., until the sampled values in two successive iterations do not change. Therefore, DUCT is able to provide quality guarantees on the returned solution.
DUCT agents perform number of operations in each iteration, as each agent needs to compute the cost for each of its values by taking into account the values of all its neighbors. It is an anytime algorithm; The quality guarantee improves with increasing number of iterations. The memory requirement per DUCT agent is ^{4}^{4}4It is actually , where is the depth of the pseudotree. However, in the worst case, when the pseudotree degenerates into a pseudochain, then . since it needs to store the best cost for all possible contexts. In terms of communication requirement, in each iteration, each DUCT agent sends one message to its parent in the pseudotree and one message to each of its children in the pseudotree. Thus, the total number of messages sent across all agents is . Each message is of size ; context messages contain the value assignment for all higher priority agents. Finally, the agents communicate exclusively with their neighboring agents.
DGibbs [Nguyen, Yeoh, LauNguyen et al.2013]. The Distributed Gibbs (DGibbs) algorithm is an incomplete, synchronous, samplingbased algorithm that extends the Gibbs sampling process [Geman GemanGeman Geman1984]
by tailoring it to solve DCOPs in a decentralized manner. The Gibbs sampling process is a centralized Markov Chain MonteCarlo algorithm that can be used to approximate joint probability distributions. By mapping DCOPs to maximum aposteriori estimation problems, probabilistic inference algorithms like Gibbs sampling can be used to solve DCOPs.
Like DUCT, it too operates on a pseudotree, and the agents sample sequentially from the root of the pseudotree down to the leaves. Like DUCT, each agent also stores a context (i.e., the current assignment to all the variables in its separator) and it samples based on this information. Specifically, it computes the probability for each of its values given its context and chooses its current value based on this probability distribution. After it chooses its value, it informs its lower priority neighbors of its value, and its children agents start to sample. This process continues until all the leaf agents sample. Cost information is propagated up the pseudotree. This process continues for a fixed number of iterations or until convergence. Like DUCT, DGibbs is also able to provide quality guarantees on the returned solution.
The runtime characteristics of DGibbs are identical to that of DUCT and for the same reasons. However, its memory requirements are smaller: The memory requirement per DGibbs agent is since it needs to store the current values of all its neighbors. In terms of communication requirement, in each iteration, each DGibbs agent sends messages, one to each of its neighbors. Thus, the total number of messages sent across all agents is . Each message is of constant size since they contain only the current value of the agent or partial cost of its solution. Finally, the agents communicate exclusively with their neighboring agents.
A version of the algorithm that speeds up the agents’ sampling process with Graphical Processing Units (GPUs) is described in [Fioretto, Yeoh, PontelliFioretto et al.2016a].
4.4 Tradeoffs Between the Various DCOP Algorithms
The various DCOP algorithms discussed above provide a good coverage across various characteristics that may be important in different applications. As such, how well suited an algorithm is for an application depends on how well the algorithm’s characteristics match up to the application’s characteristics. The next section discusses several suggestions for the types of algorithms that are recommended based on the characteristics of the application at hand.
4.4.1 Complete Algorithms
When optimality is a requirement of the application, then one is limited to complete algorithms:

If the agents in the application have large amounts of memory and it is faster to send few large messages than many small messages, then inferencebased algorithms (e.g., DPOP and its extensions) are preferred over searchbased algorithms (e.g., SyncBB, AFB, ADOPT, ConcFB, OptAPO). This is because, in general, search algorithms perform some amount of redundant communication. Thus, for a given problem instance, the overall runtime of inferencebased algorithms tend to be smaller than the runtime of searchbased ones.

If the agents in the application have limited amounts of memory, then one has to use the searchbased algorithms (e.g., SyncBB, AFB, ADOPT, ConcFB, OptAPO), which have small memory requirements. The exception is when the problem has a small induced width (e.g., the constraint graph is acyclic), in which case inferencebased algorithms (e.g., DPOP) are also preferred.

If partial centralization is allowed by the application, then OptAPO is preferred as it has been shown to outperform many of the other search algorithms [Mailler LesserMailler Lesser2004].

Otherwise, ConcFB is recommended as it has been shown to outperform AFB due to the concurrent search [Netzer, Grubshtein, MeiselsNetzer et al.2012], and AFB has been shown to outperform ADOPT and SyncBB [Gershman, Meisels, ZivanGershman et al.2009]. The exception is if the application does not permit agents to communicate directly to nonneighbors, in which case ConcFB, AFB, and SyncBB cannot be used and one is restricted to use ADOPT or one of its variants. Note that many of the variants (e.g., BnBADOPT, NCBB) have been shown to significantly outperform ADOPT while maintaining the same runtime, memory, and communication requirements [Chechetka SycaraChechetka Sycara2006, Yeoh, Felner, KoenigYeoh et al.2010].

4.4.2 Incomplete Algorithms
In terms of incomplete algorithms, the following recommendations are given:

If the solution returned must have an accompanying quality guarantee, then, one can choose to use Bounded MaxSum, regionoptimal algorithms, DUCT, or DGibbs. Bounded MaxSum allows users to choose the error bound as a function of the different subsets of edges that can be removed from the graph to make it acyclic. Regionoptimal algorithms allow users to parameterize the error bound according to the size of the region or the number of hops that the solution should be optimal for. Finally, DUCT and DGibbs allow users to parameterize the error bound based on the number of sampling iterations to conduct. The error bounds for these two algorithms are also probabilistic bounds (i.e., the likelihood that the quality of the solution is within an error bound is a function of the number of iterations). Therefore, the choice of algorithm will depend on the type of error bound one would like to impose on the solutions. One may also choose to use a number of extensions of complete algorithms (e.g., Weighted (BnB)ADOPT and ADPOP) that allow users to parameterize the error bound and affect the degree of speedup.

If the solution quality guarantee is not required, then one can also use MaxSum, MGM, or DSA. Their performance depends on a number of factors: If the problem has large domain sizes, MGM and DSA often outperform MaxSum, since the memory and computational complexities of MaxSum grows exponentially with the domain size. However, if the problem has small induced widths (for instance, when its constraint graph is acyclic), then MaxSum is very efficient. It is even guaranteed to find optimal solutions when the induced width is 1. In general, MaxSum tends to find solutions of good quality especially when considering its recent improvements (e.g., [Zivan, Parash, Cohen, Peled, OkamotoZivan et al.2017]).

If the problem has hard constraints (i.e., certain value combinations are prohibited), then the sampling algorithms (i.e., DUCT and DGibbs) are not recommended as they are not able to handle such problems. They require the cost functions to be smooth, and exploit that characteristic to explore the search space. Thus, one is restricted to search or inferencebased algorithms.

In general, MGM and DSA are good robust benchmarks as they tend to find reasonably high quality solutions in practice. However, if specific problem characteristics are known, such as the ones discussed above, then certain algorithms may be able to exploit them to find better solutions.
4.5 Notable Variant: Asymmetric DCOPs
Asymmetric DCOPs [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013] are used to model multiagent problems where agents controlling variables in the scope of a cost function can incur to different costs, given a fixed join assignment. Such a problem cannot be naturally represented by classical DCOPs, which require that all agents controlling variables participating in a cost function incur to the same cost as each other.
4.5.1 Definition
An Asymmetric DCOP is a tuple , where , and are as defined in Definition 4.1, and each cost function is defined as: . In other words, an Asymmetric DCOP is a DCOP where the cost that an agent incurs from a cost function may differ from the cost that another agent incurs from the same cost function.
As costs for participating agents may differ from each other, the goal in Asymmetric DCOPs is different from the goal in classical DCOPs. Given a cost function and complete assignment , let denote the cost incurred by agent from cost function with complete assignment . Then, the goal in Asymmetric DCOPs is to find the solution :
(2) 
As in classical DCOPs, solving Asymmetric DCOPs is NPhard. In particular, it is possible to reduce any Asymmetric DCOP to an equivalent classical DCOP by introducing a polynomial number of variables and constraints, as described in the next section.
4.5.2 Relation to Classical DCOPs
One way to solve MAS problems with asymmetric costs via classical DCOPs is through the Private Event As Variables (PEAV) model [Maheswaran, Pearce, TambeMaheswaran et al.2004a]. It can capture asymmetric costs by introducing, for each agent, as many “mirror” variables as the number of variables held by neighboring agents. The consistency with the neighbors’ state variables is imposed by a set of equality constraints. However, this formalism suffers from scalability problems, as it may result in a significant increase in the number of variables in a DCOP. In addition, grinshpoun:13 showed that most of the existing incomplete classical DCOP algorithms cannot be used to effectively solve Asymmetric DCOPs, even when the problems are reformulated through the PEAV model. They show that such algorithms are unable to distinguish between different solutions that satisfy all hard constraints, resulting in a convergence to one of those solutions and the inability to escape that local optimum. Therefore, it is important to design specialized algorithms to solve Asymmetric DCOPs.
4.5.3 Algorithms
The current research direction in the design of Asymmetric DCOP algorithms has focused on adapting existing classical DCOP algorithms to handle the asymmetric costs. Asymmetric DCOPs require that each agent, whose variables participate in a cost function, coordinate the aggregation of their individual costs. To do so, two approaches have been identified [Brito, Meisels, Meseguer, ZivanBrito et al.2009]:

A twophase strategy, where only one side of the constraint (i.e., the cost induced by one agent) is considered in the first phase. The other side(s) (i.e., the cost induced by the other agent(s)) is considered in the second phase once a complete assignment is produced. As a result, the costs of all agents are aggregated.

A singlephase strategy, which requires a systematic check of each side of the constraint before reaching a complete assignment. Checking each side of the constraint is often referred to as back checking, a process that can be performed either synchronously or asynchronously.
Complete Algorithms
SyncABB2ph [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013]. Synchronous Asymmetric Branch and Bound  2phase (SyncABB2ph) is a complete, synchronous, searchbased algorithm that extends SyncBB with the twophase strategy. Phase 1 emulates SyncBB, where each agent considers the values of its cost functions with higherpriority agents. Phase 2 starts once a complete assignment is found. During this phase, each agent aggregates the sides of the cost functions that were not considered during Phase 1 and verifies that the known bound is not exceeded. If the bound is exceeded, Phase 2 ends and the agents restart Phase 1 by backtracking and resuming the search from the lower priority agent that exceeded the bound. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB.
SyncABB1ph [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013, Levit, Grinshpoun, Meisels, BazzanLevit et al.2013]. Synchronous Asymmetric Branch and Bound  1phase (SyncABB1ph) is a complete, synchronous, searchbased algorithm that extends SyncBB with the onephase strategy. Each agent, after having extended the CPA, updates the bound with its local cost associated to the cost functions involving its variables – as done in SyncBB. In addition, the CPA is sent back to the assigned agents to update its bound via a sequence of back checking operations. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB.
ATWB [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013]. The Asymmetric TwoWay Bounding (ATWB) algorithm is a complete, asynchronous, searchbased algorithm that extends AFB to accommodate both forward bounding and backward bounding. The forward bounding is performed analogously to AFB. The backward bounding, instead, is achieved by sending copies of the CPA backward to the agents whose assignments are included in the CPA. Similar to what is done in AFB, agents that receive a copy of the CPA compute their estimates and send them forward to the assigning agent. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of AFB.
Incomplete Algorithms
ACLS [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013]. Asymmetric Coordinated Local Search (ACLS) is an incomplete, synchronous, searchbased algorithm that extends DSA. After a random value initialization, each agent exchanges its values with all its neighboring agents. At the end of this step, each agent identifies all possible improving assignments for its own variables, given the current neighbors choices. Each agent then selects one such assignments, according to the distribution of gains (i.e., reductions in costs) from each proposal assignment, and exchanges it with its neighbors. When an agent receives a proposal assignment, it responds with the evaluation of its side of the cost functions, resulting from its current assignment and the proposal assignments of the other agents participating in the cost function. After receiving the evaluations from each of its neighbors, each agent estimates the potential gain or loss derived from its assignment, and commits to a change with a given probability, similar to agents in DSA, to escape from local minima. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DSA.
MCSMGM [Grinshpoun, Grubshtein, Zivan, Netzer, MeiselsGrinshpoun et al.2013]. Minimal Constraint Sharing MGM (MCSMGM) is an incomplete, synchronous, searchbased algorithm that extends MGM by considering each side of the cost function. Like MGM, the agents operate in an iterative fashion, where they exchange their current values at the start of each iteration. Afterwards, each agent sends the cost for its side of each cost function to its neighboring agents that participate in the same cost function.^{5}^{5}5This is a version of the algorithm with a guarantee that it will converge to a local optima. In the original version of the algorithm, which does not have such guarantee, each agent sends the cost only if its gain with the neighbor’s new values is larger than the neighbor’s last known gain. Upon receiving this information, each agent knows the total cost for each cost function – by adding together the value of both sides of the cost function. Therefore, like in MGM, the agent can calculate the maximum gain (i.e., maximum reduction in costs) if it changes its values, and will send this information to all its neighbors. Upon receiving the gains of its neighbors, each agent changes its value if its gain is the largest among its neighbors. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of MGM.
4.6 Notable Variant: MultiObjective DCOPs
MultiObjective Optimization (MOO) [MiettinenMiettinen1999, Marler AroraMarler Arora2004] aims at solving problems involving more than one objective function to be optimized simultaneously. In a MOO problem, optimal decisions need to accommodate potentially conflicting objectives. MultiObjective DCOPs extend MOO problems and DCOPs [Delle Fave, Stranders, Rogers, JenningsDelle Fave et al.2011].
4.6.1 Definition
A MultiObjective DCOP (MODCOP) is a tuple , where , and are as defined in Definition 4.1, and is a vector of multiobjective functions, where each is a set of cost functions as defined in Definition 4.1. For a complete assignment of a MODCOP, let the cost for according to the multiobjective optimization function set , where , be
(3) 
The goal of a MODCOP is to find a complete assignment such that:
(4) 
where is a cost vector for the MODCOP. A solution to a MODCOP involves the optimization of a set of partiallyordered assignments. The above definition considers pointwise comparison of vectors—i.e., if for all . Typically, there is no single global solution where all the objectives are optimized at the same time. Thus, solutions of a MODCOP are characterized by the concept of Pareto optimality, which can be defined through the concept of dominance:
Definition 1 (Dominance)
A solution is dominated by a solution iff and for at least one .
Definition 2 (Pareto Optimality)
A solution is Pareto optimal iff it is not dominated by any other solution.
Therefore, a solution is Pareto optimal iff there is no other solution that improves at least one objective function without deteriorating the cost of another function. Another important concept is the Pareto front:
Definition 3 (Pareto Front)
The Pareto front is the set of all cost vectors of all Pareto optimal solutions.
Solving a MODCOP is equivalent to finding the Pareto front. However, even for treestructured MODCOPs, the size of the Pareto front may be exponential in the number of variables.^{6}^{6}6In the worst case, every possible solution is a Pareto optimal solution. Thus, multiobjective algorithms often provide solutions that may not be Pareto optimal but may satisfy other criteria that are significant for practical applications. A widelyadopted criterion is that of weak Pareto optimality:
Definition 4 (Weak Pareto Optimality)
A solution is weakly Pareto optimal iff there is no other solution such that .
In other words, a solution is weakly Pareto optimal if there is no other solution that improves all of the objective functions simultaneously. An alternative approach to Pareto optimality is one that uses the concept of utopia points:
Definition 5 (Utopia Point)
A cost vector is a utopia point iff for all .
In other words, a utopia point is the vector of costs obtained by independently optimizing DCOPs, each associated to one objective of the multiobjective function vector. In general, is unattainable. Therefore, different approaches focus on finding a compromise solution [SalukvadSalukvad1971], which is a Pareto optimal solution that is close to the utopia point. The concept of closeness is dependent on the approach adopted.
Similar to their centralized counterpart, MODCOPs have been shown to be NPhard (their decision versions), and #Phard (the related counting versions), and to have exponentially many nondominated points [Glaßer, Reitwießner, Schmitz, WitekGlaßer et al.2010].
4.6.2 Algorithms
This section categorizes the proposed MODCOP algorithms into two classes: complete and incomplete algorithms, according to their ability to find the complete set of Pareto optimal solutions or only a subset of it.
Complete Algorithms
MOSBB [Medi, Okimoto, InoueMedi et al.2014]. MultiObjective Synchronous Branch and Bound (MOSBB) is a complete, synchronous, searchbased algorithm that extends SyncBB. It uses an analogous search strategy to that of the monoobjective SyncBB: After establishing a complete ordering, MOSBB agents extend a CPA with their own value assignments and the current associated cost vectors. Once a nondominated solution is found, it is broadcasted to all agents, which add the solution to a list of global bounds. Thus, agents maintains an approximation of the Pareto front, which is used to bound the exploration, and extend the CPA only if the new partial assignment is not dominated by solutions in the list of global bounds. When the algorithm terminates, it returns the set of Pareto optimal solutions obtained by filtering the list of global bounds by dominance. The worst case runtime and communication requirements of this algorithm are the same as those of SyncBB. In terms of memory requirement, each MOSBB agent needs amount of memory, where is the size of the Pareto set.
Pseudotree Based Algorithm [Matsui, Silaghi, Hirayama, Yokoo, MatsuoMatsui et al.2012]. The proposed algorithm is a complete, asynchronous, searchbased algorithm that extends ADOPT. It introduces the notion of boundaries on the vectors of multiobjective values, which extends the concept of lower and upper bounds to vectors of values. The proposed approach starts with the assumption that . Furthermore, the cost functions within each are sorted according to a predefined ordering, and for each , the scope of (i.e., the function in ) is the same for each (i.e., all functions in the same position in different have the same scope). Thus, without loss of generality, the notation will be used to refer to the scope of .
Given a complete assignment , for , let be the vector of cost values. The notion of nondominance is applied to these vectors, where a vector is nondominated iff there is no other vector such that for all and for at least one . The algorithm uses the notion of nondominance for bounded vectors to retain exclusively nondominated vectors.
The worst case runtime and communication requirements of this algorithm are the same as those of ADOPT. In terms of memory requirement, each agent needs amount of memory. However, notice that the number of combinations of cost vectors grows exponentially with the number of tuples of cost values, in the worst case. This algorithm has also been extended to solve Asymmetric MODCOPs [Matsui, Silaghi, Hirayama, Yokoo, MatsuoMatsui et al.2014], which is an extension of both Asymmetric DCOPs and MODCOPs.
Incomplete Algorithms
BMOMS [Delle Fave, Stranders, Rogers, JenningsDelle Fave et al.2011]. Bounded MultiObjective MaxSum (BMOMS) is an incomplete, asynchronous, inferencebased algorithm, and was the first MODCOP algorithm introduced. It extends Bounded MaxSum to compute bound approximations for MODCOPs. It consists of three phases. The Bounding Phase generates an acyclic subgraph of the multiobjective factor graph, using a generalization of the maximum spanning tree problem to vector weights. During the Maxsum Phase, the agents coordinate to find the Pareto optimal set of solutions to the acyclic factor graph generated in the bounding phase. This is achieved by extending the addition and marginal maximization operators adopted in MaxSum to the case of multiple objectives. Finally, the Value Propagation Phase allows agents to select a consistent variable assignment, as there may multiple Pareto optimal solutions. The bounds provided by the algorithm are computed using the notion of utopia points.
The worst case runtime requirement of this algorithm is the same as those of MaxSum. In terms of communication requirement, the number of messages sent is also like MaxSum, but the size of each message is now . In terms of memory requirement, each BMOMS agent needs amount of memory to store and process the messages received.
DPAOF [Okimoto, Clement, InoueOkimoto et al.2013]. Dynamic Programming based on Aggregate Objective Functions (DPAOF) is an incomplete, synchronous, inferencebased algorithm. It adapts the AOF technique [MiettinenMiettinen1999], designed to solve centralized multiobjective optimization problems, to solve MODCOPs. Centralized AOF adopts a scalarization to convert a MOO problem into a single objective optimization. This is done by assigning weights to each of the cost functions in the objective vector such that and for all . The resulting monoobjective function can be solved using any monoobjective optimization technique with guarantee to find a Pareto optimal solution [MiettinenMiettinen1999].
DPAOF proceeds in two phases. First, it computes the utopia point by solving as many monoobjective DCOPs as the number of objective functions in the MODCOP. DPAOF uses DPOP to solve these monoobjective DCOPs. It then constructs a new problem building upon the solutions obtained from the first phase. Such a problem is used to assign weights to each objective function of the MODCOP to construct the new monoobjective function in the same way as centralized AOF, which then can be solved optimally. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DPOP, except that the number of operations and the number of messages are larger by a factor of since it runs DPOP times to solve the monoobjective DCOPs.
MODPOP [Okimoto, Schwind, Clement, InoueOkimoto et al.2014]. MultiObjective norm based Distributed Pseudotree Optimization Procedure (MODPOP) is an incomplete, synchronous, inferencebased algorithm. It adapts DPOP using a scalarization measure based on the norm to find a subset of the Pareto front of a MODCOP. Similar to DPAOF, the algorithm proceeds in two phases. Its first phase is the same as the first phase of DPAOF: It solves monoobjective DCOPs using DPOP to find the utopia point . In the second phase, the agents coordinate to find a solution that minimizes the distance from according to the norm. The algorithm is guaranteed to find a Pareto optimal solution only when the norm (Manhattan norm) is adopted. In this case, MODPOP finds a Pareto optimal solution that minimizes the average cost values of all objectives. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DPAOF.
DIPLS [Wack, Okimoto, Clement, InoueWack et al.2014]. Distributed Iterated Pareto Local Search (DIPLS) is an incomplete, synchronous, searchbased algorithm. It extends the Pareto Local Search (PLS) algorithm [Paquete, Chiarandini, StützlePaquete et al.2004], which is a hill climbing algorithm designed to solve centralized multiobjective optimization problems, to solve MODCOPs. The idea behind DIPLS is to evolve an initial solution toward the Pareto front. To do so, it starts from an initial set of random assignments, and applies PLS iteratively to generate new nondominated solutions. DIPLS requires a total ordering of agents and elects one agent as the controller. At each iteration, the controller filters the set of solutions by dominance and broadcasts them to the agents in the MODCOP. Upon receiving a solution, an agent generates a list of neighboring solutions by modifying the assignments of the variables that it controls, and sends them back to the controller. When the controller receives the messages from all agents, it proceeds to filter (by dominance) the set of solutions received, and if a new nondominated solution is found, it repeats the process.
The worst case runtime of this algorithm is as the controller agent is required to check the dominance of the newly generated solutions at each iteration. In terms of memory requirement, DIPLS agents use space to store the Pareto front. Finally, in terms of communication requirement, the controller agent broadcasts messages that contain the current Pareto front. Thus, the message size is .
5 Dynamic DCOPs
Within a realworld MAS application, agents often act in dynamic environments that evolve over time. For instance, in a disaster management search and rescue scenario, new information (e.g., the number of victims in particular locations or priorities on the buildings to evacuate) typically becomes available in an incremental manner. Thus, the information flow modifies the environment over time. To cope with such a requirement, researchers have introduced the Dynamic DCOP (DDCOP) model, where cost functions can change during the problem solving process, agents may fail, and new agents may be added to the DCOP being solved. With respect to the categorization described in Section 3, in the DDCOP model, the agents are fully cooperative and they have deterministic behavior and total knowledge. On the other hand, the environment is dynamic and deterministic.
5.1 Definition
The Dynamic DCOP (DDCOP) model is defined as a sequence of classical DCOPs: , where each is a DCOP representing the problem at time step , for . The goal in a DDCOP is to solve the DCOP at each time step optimally. By assumption, the agents have total knowledge about their current environment (i.e., the current DCOP) but they are unaware of changes to the problem in future time steps.
In a dynamic system, agents are required to adapt as fast as possible to environmental changes. Stability [DijkstraDijkstra1974, Verfaillie JussienVerfaillie Jussien2005] is a core algorithmic concept in which an algorithm seeks to minimize the number of steps that it requires to converge to a solution each time the problem changes. In such a context, these converged solutions are also called stable solutions. Selfstabilization is a related concept derived from the area of faulttolerance:
Definition 6 (Selfstabilization)
A system is selfstabilizing iff the following two properties hold:

Convergence: The system reaches a stable solution in a finite number of steps, starting from any given state. In the DCOP context, this property expresses the ability of the agents to coordinate a joint assignment for their variables that optimizes the problem at time step , starting from an assignment of the problem’s variables at time step .

Closure: The system remains in a stable solution, provided that no changes in the environment happens. In the DCOP context, this means that agents do not change the assignment for their variables after converging to a solution.
Solving DDCOPs is NPhard, as it requires to solve each DCOP of the DDCOP independently.
5.2 Algorithms
In principle, one could use classical DCOP algorithms to solve the DCOP at each time step . However, the dynamic environment evolution encourages firm requirements on the algorithm design in order for the agents to respond automatically and efficiently to environmental changes over time. In particular, DDCOP algorithms often follow the selfstabilizing property. As in the previous sections, the algorithms are categorized as being either complete or incomplete, according to their ability to determine the optimal solution at each time step.
5.2.1 Complete Algorithms
SDPOP [Petcu FaltingsPetcu Faltings2005c]. Selfstabilizing DPOP (SDPOP) is a synchronous, inferencebased algorithm that extends DPOP to handle dynamic environments. It is composed of three selfstabilizing phases: (i) A selfstabilizing DFS pseudotree generation, whose goal is to create and maintain a DFS pseudotree structure; (ii) A selfstabilizing algorithm for the UTIL propagation phase; and (iii) A selfstabilizing algorithm for the VALUE propagation phase. These procedures work as in DPOP and they are invoked whenever any change in the DCOP problem sequence is revealed. Additionally, petcu:05b discuss selfstabilizing extensions that can be used to provide guarantees about the way the system transitions from a valid state to the next, after an environment change.
The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of DPOP. Additionally, upon changes to the problem, SDPOP stabilizes after at most UTIL messages and VALUE messages, where is the depth of the pseudotree and is the number of cost functions of the problem.
IADOPT and IBnBADOPT [Yeoh, Varakantham, Sun, KoenigYeoh et al.2011]. Incremental Anyspace ADOPT (IADOPT) and Incremental Anyspace BnBADOPT (IBnBADOPT) are asynchronous, searchbased algorithms that extend ADOPT and BnBADOPT, respectively. In the incremental anyspace versions of the algorithms, each agent maintains bounds for multiple contexts; in contrast, agents in ADOPT and BnBADOPT maintain bounds for one context only. By doing so, when solving the next DCOP in the sequence, agents may reuse the bounds information computed in the previous DCOP. In particular, the algorithms identify affected agents, which are agents that cannot reuse the information computed in the previous iterations, and they recompute bounds exclusively for such agents.
The worst case runtime and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of ADOPT. However, since these algorithms have the anyspace property, their minimal memory requirements are the same as those of ADOPT but they can use more memory, if available, to speed up the algorithms.
5.2.2 Incomplete Algorithms
SBDO [Billiau, Chang, GhoseBilliau et al.2012a]. Support Based Distributed Optimization (SBDO) is an asynchronous searchbased algorithm that extends the Support Based Distributed Search algorithm [Harvey, Chang, GhoseHarvey et al.2007] to the multiagent case. It uses two types of messages: isgood and nogood. Isgood messages contain an ordered partial assignment and are exchanged among neighboring agents upon a change in their value assignments. Each agent, upon receiving a message, decides what value to assign to its own variables, attempting to minimize their local costs, and communicates such decisions to its neighboring agents via isgood messages. Nogood messages are used in response to violations of hard constraints, or in response to obsolete assignments. A nogood message is augmented with a justification, that is, the set of hard constraints that are violated, and are saved locally within each agent. This information is used to discard partial assignments that are supersets of one of the known nogoods. The changes of the dynamic environment are communicated via messages, which are sent from the environment to the agents. In particular, changes in hard constraints require the update of all the justifications in all nogoods.
The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB each time the problem changes.
FMS [Ramchurn, Farinelli, Macarthur, JenningsRamchurn et al.2010]. Fast MaxSum (FMS) is an asynchronous inferencebased algorithm that extends MaxSum to the Dynamic DCOP model. As in MaxSum, the algorithm operates on a factor graph. Solution stability is maintained by recomputing only those factors that changed between the previous DCOP and the current DCOP . ramchurn:10 exploit domainspecific properties in a task allocation problem to reduce the number of states over which each factor has to compute its solution. In addition, FMS is able to efficiently manage addition or removal of tasks (e.g., factors), by performing message propagation exclusively on the factor graph regions that are affected by such topological changes. The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of MaxSum.
FMS has been extended in several ways. Bounded Fast MaxSum provides bounds on the solution found, as well as it guarantees selfstabilization [Macarthur, Farinelli, Ramchurn, JenningsMacarthur et al.2010]. BranchandBound Fast MaxSum (BnBFMS) extends FMS providing online domain pruning using a branchandbound technique [Macarthur, Farinelli, Ramchurn, JenningsMacarthur et al.2011].
5.3 Notable Variants: DDCOPs with Commitment Deadlines or Markovian Properties
We now describe several notable variants of DDCOPs and their corresponding algorithms.
RSDPOP [Petcu FaltingsPetcu Faltings2007b]. In this proposed model, agents have commitment deadlines and stability constraints. In other words, some of the variables may be unassigned at a given point in time, while others must be assigned within a specific deadline. Commitment deadlines are either hard or soft. Hard commitments model irreversible processes. When a hard committed variable is assigned, its value cannot be changed. Soft commitments model contracts with penalties. If a soft committed variable has been assigned at time step , its value can be changed at time step , at the price of a cost penalty. These costs are modeled via stability constraints, which are defined as binary relations , representing the cost of changing the value of variable from time step to time step . Given the set of stability constraints , at each time step , the goal is to find a solution :
The latter term accounts for the penalties associated to the value assignment updates for the soft committed variables. RSDPOP has the same order complexity as SDPOP.
To solve this problem, petcu:07a extended SDPOP to RSDPOP.^{7}^{7}7The full name of the algorithm was not provided by petcu:07a. Like SDPOP, it is a synchronous, inferencebased algorithm. Unlike SDPOP, it’s UTIL and VALUE propagation phases now take into account the commitment deadlines. The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of SDPOP.
Distributed Qlearning and Rlearning [Nguyen, Yeoh, Lau, Zilberstein, ZhangNguyen et al.2014]. In this proposed model, called Markovian Dynamic DCOPs (MDDCOPs), the DCOP in the next time step depends on the solution (i.e., assignment of all variables) adopted by the agents for the DCOP in the current time step . However, the transition function between these two DCOPs are not known to the agents and the agents must, thus, learn them. The Distributed Qlearning and Rlearning
algorithms are synchronous reinforcementlearningbased algorithms that extend the centralized Qlearning
[Abounadi, Bertsekas, BorkarAbounadi et al.2001] and centralized Rlearning [SchwartzSchwartz1993, MahadevanMahadevan1996] algorithms. Each agent maintains Qvalues and Rvalues for each pair, where is the solution for the DCOP and is the value of its variables in the cost function . These Q and Rvalues represent the predicted cost the agent will incur if it assigns its variables values according to when is the previous solution. The agents repeatedly refine these values at every time step and choose the values with the minimum Q or Rvalue at each time step.The worst case runtime, communication, and memory requirements of these two algorithms to solve the DCOP at each time step are the same as those of DPOP, as they use DPOP as a subroutine to update the Q and Rvalues. The exception is that agents in the Distributed Qlearning algorithm also broadcast their value assignments at each time step to all other agents. Thus, they send messages in each time step instead of the complexity of DPOP.^{8}^{8}8A single broadcast message is counted as peertopeer messages, where is the number of agents in the problem.
A related model is the Proactive Dynamic DCOPs (PDDCOPs) [Hoang, Fioretto, Hou, Yokoo, Yeoh, ZivanHoang et al.2016, Hoang, Hou, Fioretto, Yeoh, Zivan, YokooHoang et al.2017], where the transition functions between two subsequent DCOPs are known and can be exploited by the resolution process. Additionally, another key difference between these two models is that the DCOP in the next time step does not depend on the solution in the current time step, but instead depends on the values of the random variables at the current time step. Researchers have introduced a number of offline proactive and online reactive algorithms to solve this problem [Hoang, Fioretto, Hou, Yokoo, Yeoh, ZivanHoang et al.2016, Hoang, Hou, Fioretto, Yeoh, Zivan, YokooHoang et al.2017].
6 Probabilistic DCOPs
The DCOP models discussed so far can model MAS problems in deterministic environments. However, many realworld applications are characterized by environments with a stochastic behavior. In other words, there are exogenous events that can influence the outcome of an agent’s action. For example, the weather conditions or the state of a malfunctioning device can affect the cost of an agent’s action. To cope with such scenarios, researchers have introduced Probabilistic DCOP (PDCOP) models, where the uncertainty in the state of the environment is modeled through stochasticity in the cost functions. With respect to the DCOP categorization described in Section 3, in the PDCOP model, the agents are fully cooperative and have a deterministic behavior. Additionally, the environment is static and stochastic. While a large body of research has focused on problems where agents have total knowledge, this section includes a discussion of a subclass of PDCOPs where the agents’ knowledge of the environment is limited, and the agents must balance the exploration of the unknown environment and the exploitation of the known costs.
6.1 Definition
A common strategy to model uncertainty is to augment the outcome of the cost functions with a stochastic character [Atlas DeckerAtlas Decker2010, Stranders, Delle Fave, Rogers, JenningsStranders et al.2011, Nguyen, Yeoh, LauNguyen et al.2012]. Another method is to introduce additional random variables as input to the cost functions, which simulate exogenous uncontrollable traits of the environment [Léauté FaltingsLéauté Faltings2009, Léauté FaltingsLéauté Faltings2011, Wang, Sycara, ScerriWang et al.2011]. To cope with such a variety, this section introduces the Probabilistic DCOP (PDCOP) model, which generalizes the proposed models of uncertainty. A PDCOP is defined by a tuple , where and are as defined in Definition 4.1. In addition,

is a mixed set of decision variables and random variables.

is a set of random variables modeling uncontrollable stochastic events, such as weather or a malfunctioning device.

is the set of cost functions, each defined over a mixed set of decision variables and random variables, and such that each value combination of the decision variables on the cost function results in a probability distribution. As a result, is itself a random variable, given the local value assignment and a realization for the random variables involved in .

is a mapping from decision variables to agents. Notice that random variables are not controlled by any agent, as their outcomes do not depend on the agents’ actions.

is the (possibly discrete) set of events for the random variables (e.g., the different weather conditions or stress levels a device is subjected to) such that each random variable takes values in . In other words, is the domain of random variable .

is a set of probability distributions for the random variables, such that assigns a probability value to an event for and = 1 for each random variable .

is an evaluator function from random variables to real values, that, given an assignment of values to the decision variables, summarizes the distribution of the aggregated cost functions.

is a utility function that given a random variable returns an ordered set of different outcomes, and it is based on the decision maker preferences. This function is needed when the cost functions have uncertain outcomes and, thus, these distributions are not readily comparable.
The goal in a PDCOP is to find a solution , that is, an assignment of values to all the decision variables, such that:
(5) 
where argmin or argmax are selected depending on the algorithm adopted, is the operator that is used to aggregate the values from the functions . Typically such an operator is a summation, however, to handle continuous distributions, other operators have been proposed.
The probability distribution over the domain of random variables is called a belief. An assignments of all random variables in describes a (possible) scenario governed by the environment. As the random variables are not under the control of the agents, they act independently of the decision variables. Specifically, their beliefs are drawn from probability distributions. Furthermore, they are assumed to be independent of each other and, thus, they model independent sources of exogenous uncertainty.
The utility function enables us to compare the uncertain cost outcomes of the cost functions. In general, the utility function is nondecreasing, that is, the lower the cost, the higher the utility. However, the utility function should be defined for the specific application of interest. For example, in farming, the utility increases with the amount of produce harvested. However, farmers may prefer a smaller but highly certain amount of produce harvested over a larger but highly uncertain and, thus, risky outcome.
The evaluation function is used to summarize in one criterion the costs of a given assignment that depends on the random variables. A possible evaluation function is the expectation function: .
Let us now introduce some concepts that are commonly adopted in the study of PDCOPs.
Definition 7 (Convolution)
The convolution of the probability density function (PDF) and of two independent random variables and is the integral of the product of the two functions after one is reversed and shifted:
(6) 
It produces a new PDF that defines the overlapping area between and as a function of the quantity that one of the original functions is translated by. In other words, the convolution is a method of determination of the sum of two random variables. The counterpart for the distribution of the sum of two independent discrete variables is:
(7) 
In a PDCOP, the value returned by a function , for an assignment on its scope , is a random variable (). Thus, the global value is also a random variable, whose probability density function is the convolution of the PDFs of the individual ’s. Thus, the concept of convolution of two PDFs in a PDCOP is related to the summation of the utilities of two cost functions in classical DCOPs.
A common concept in optimization with uncertainty is that of ranking a set of random variables {} with Cumulative PDFs (CDFs) {. These distributions are also commonly called lotteries, a concept related to that of stochastic dominance, which is a form of stochastic ordering based on preference regarding outcomes. It refers to situations where a probability distribution over possible outcomes can be ranked as superior to another.
The firstorder stochastic dominance refers to the situation when one lottery is unambiguously better than another:
Definition 8 (FirstOrder Stochastic Dominance)
Given two random variables and with CDFs and , respectively, firstorder stochastically dominates iff:
(8) 
for all with a strict inequality over some interval.
If firstorder stochastically dominates , then necessarily has a strictly smaller expected value: . In other words, if dominates , then the decision maker prefers over regardless of his utility function is, as long as it is weakly increasing.
It is not always the case that one CDF will firstorder stochastically dominate another. In such a case, one can use the secondorder stochastic dominance to compare them. The latter refers to the situation when one lottery is unambiguously less risky than another:
Definition 9 (SecondOrder Stochastic Dominance)
Given two random variables and with CDFs and , respectively, secondorder stochastically dominates iff:
(9) 
for all with a strict inequality for some values of .
If secondorder stochastically dominates , then . If Equation 9 holds for all , for some sufficiently large , then . In this case, as both lotteries are equal in expectation, the decision maker prefers the lottery
, which has less variance and is, thus, less risky.
Another common concept in PDCOPs is that of regret. In decision theory, regret expresses the negative emotion arising from learning that a different solution than the one adopted, would have had a more favorable outcome. In PDCOPs the regret of a given solution is typically defined as the difference between its associated cost and that of the theoretical optimal solution. The notion of regret is especially useful in allowing agents to make robust decisions in settings where they have limited information about the cost functions.
An important type of regret is the minimax regret. Minimax regret is a decision rule used to minimize the possible loss for a worst case (i.e, maximum) regret. As opposed to the (expected) regret, minimax regret is independent of the probabilities of the various outcomes. Thus, minimax regret could be used when the probabilities of the outcomes are unknown or difficult to estimate.
Solving PDCOPs is PSPACEhard, as in general, the process is required to remember a solution for each possible state associated to the uncertain random variables. The study of complexity classes for PDCOPs is largely unexplored. Thus, we foresee this as a potential direction for future research, in which particular focus could be given in determining fragments of PDCOPs characterized by lower complexity than the one above.
6.2 Algorithms
Unlike for Classical DCOPs and Dynamic DCOPs, where the algorithms solve the same problem, PDCOP algorithms approach the problem uncertainty in different ways and, thus, solve different variants of the problem. This is due to the greater modeling flexibility offered by the PDCOP framework. As such, the proposed algorithms are often not directly comparable to one another. We categorize PDCOP algorithms into complete and incomplete algorithms, according to their ability to guarantee to find the optimal solutions or not, for a given evaluator and utility functions. Unless otherwise specified the ordering operator in Equation 5 refers to the operator.
6.2.1 Complete Algorithms
[DPOP] [Léauté FaltingsLéauté Faltings2011]. [DPOP] is a synchronous, samplingbased and inferencebased algorithm. It can be either complete or incomplete based on the [DPOP] variant used, and described below. [DPOP] uses a collaborative sampling strategy, where all agents concerned with a given random variable agree on a common sample set that will be used to estimate the PDF of that random variable. Agents performing collaborative sampling independently propose sample sets for the random variables influencing the variables they control, and elect one agent among themselves as responsible for combining the proposed sample sets into one. The algorithm is defined over PDCOPs with and deterministic cost function outcomes, that is, for each combination of values for the variables in , is a degenerate distribution (i.e., a distribution that results in a single value) and the utility function is the identity function. is an arbitrary evaluator function summing over all functions in .
[DPOP] builds on top of DPOP and proceeds in four phases: In Phase 1, the agents order themselves into a pseudotree ignoring the random variables. In Phase 2, the agents bind random variables to some decision variable. In Phases 3 and 4, the agents run the UTIL and VALUE propagation phases like in DPOP except that random variables are sampled. Based on different strategies adopted in binding the random variables in Phase 2, the algorithm has two variants [Léauté FaltingsLéauté Faltings2009]. In Local[DPOP], a random variable is assigned to each decision variable responsible for enforcing a constraint involving