Distributed Constraint Optimization Problems and Applications: A Survey

The field of Multi-Agent System (MAS) is an active area of research within Artificial Intelligence, with an increasingly important impact in industrial and other real-world applications. Within a MAS, autonomous agents interact to pursue personal interests and/or to achieve common objectives. Distributed Constraint Optimization Problems (DCOPs) have emerged as one of the prominent agent architectures to govern the agents' autonomous behavior, where both algorithms and communication models are driven by the structure of the specific problem. During the last decade, several extensions to the DCOP model have enabled them to support MAS in complex, real-time, and uncertain environments. This survey aims at providing an overview of the DCOP model, giving a classification of its multiple extensions and addressing both resolution methods and applications that find a natural mapping within each class of DCOPs. The proposed classification suggests several future perspectives for DCOP extensions, and identifies challenges in the design of efficient resolution algorithms, possibly through the adaptation of strategies from different areas.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/15/2021

An Artificial Bee Colony Based Algorithm for Continuous Distributed Constraint Optimization Problems

Distributed Constraint Optimization Problems (DCOPs) are a frequently us...
05/30/2019

New Algorithms for Functional Distributed Constraint Optimization Problems

The Distributed Constraint Optimization Problem (DCOP) formulation is a ...
02/15/2019

Privacy of Existence of Secrets: Introducing Steganographic DCOPs and Revisiting DCOP Frameworks

Here we identify a type of privacy concern in Distributed Constraint Opt...
02/20/2019

Empathic Autonomous Agents

Identifying and resolving conflicts of interests is a key challenge when...
02/22/2017

A Realistic Dataset for the Smart Home Device Scheduling Problem for DCOPs

The field of Distributed Constraint Optimization has gained momentum in ...
09/23/2017

Autonomous Agents Modelling Other Agents: A Comprehensive Survey and Open Problems

Much research in artificial intelligence is concerned with the developme...
09/15/2019

Cognitive swarming in complex environments with attractor dynamics and oscillatory computing

Neurobiological theories of spatial cognition developed with respect to ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

An agent can be defined as an entity (or computer program) that behaves autonomously within an arbitrary system in the pursuit of some goals [WooldridgeWooldridge2009]. A multi-agent system (MAS) is a system where multiple agents interact in the pursuit of such goals. Within a MAS, agents may interact with each other directly, via communication acts, or indirectly, by acting on the shared environment. In addition, agents may decide to cooperate, to achieve a common goal, or to compete, to serve their own interests at the expense of other agents. In particular, agents may form cooperative teams, which can in turn compete against other teams of agents. Multi-agent systems play an important role in distributed artificial intelligence, thanks to their ability to model a wide variety of real-world scenarios, where information and control are decentralized and distributed among a set of agents.

Figure 1 illustrates a MAS example. It represents a sensor network where a group of agents, equipped with sensors, seeks to determine the position of some targets. Agents may interact with each other and move away from the current position. The figure depicts the targets as star-shaped objects. The dotted lines define an interaction graph and the directional arrows illustrate agents’ movements. In addition, various events that obstruct the sensors of an agent may dynamically occur. For instance, the presence of an obstacle along the agent’s sensing range may be detected after the agent’s movement.

Within a MAS, an agent is:

  • Autonomous, as it operates without the direct intervention of humans or other entities and has full control over its own actions and internal state (e.g., in the example, an agent can decide to sense, to move, etc.);

  • Interactant, in the sense that it interacts with other agents in order to achieve its objectives (e.g., in the example, agents may exchange information concerning results of sensing activities);

  • Reactive, as it responds to changes that occur in the environment and/or to the requests from other agents (e.g., in the example, agents may react with a move action to the sudden appearance of obstacles).

  • Proactive, because of its goal-driven behavior, which allows the agent to take initiatives beyond the reactions in response to its environment.

[0,r,,Illustration of a multi-agent system: Sensors (agents) seek to determine the position of the targets.] Agent architectures are the fundamental mechanisms underlying the autonomous agent components, supporting their behavior in real-world, dynamic, and uncertain environments. Agent architectures based on decision theory, game theory, and constraint programming have successfully been developed and are popular in the Autonomous Agents and Multi-Agent Systems (AAMAS) community.

Decision theory [RaiffaRaiffa1968] assumes that the agent’s actions and the environment are inherently uncertain and models such uncertainty explicitly. Agents acting in complex and dynamic environments are required to deal with various sources of uncertainty. The

Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs)

framework [Bernstein, Givan, Immerman,  ZilbersteinBernstein et al.2002] is one of the most general multi-agent frameworks, focused on team coordination in presence of uncertainty about agents’ actions and observations. The ability to capture a wide range of complex scenarios makes Dec-POMDPs of central interest within MAS research. However, the result of this generality is a high complexity for generating optimal solutions. Dec-POMDPs are non-deterministic exponential (NEXP) complete [Bernstein, Givan, Immerman,  ZilbersteinBernstein et al.2002], even for two-agent problems, and scalability remains a critical challenge [Amato, Chowdhary, Geramifard, Ure,  KochenderferAmato et al.2013].

Game theory [BinmoreBinmore1992] studies interactions between self-interested agents, aiming at maximizing the welfare of the participants. Some of the most compelling applications of game theory to MAS have been in the area of auctions and negotiations [KrausKraus1997, Noriega  SierraNoriega  Sierra1999, Parsons  WooldridgeParsons  Wooldridge2002]. These approaches model the trading process by which agents can reach agreements on matters of common interest, using market oriented and cooperative mechanisms, such as reaching Nash equilibria. Typical resolution approaches aim at deriving a set of equilibrium strategies for each agent, such that, when these strategies are employed, no agent can profit by unilaterally deviating from their strategies. A limitation of game theoretical-based approaches is the lack of an agent’s ability to reason upon a global objective, as the underlying model relies on the interactions of self-interested agents.

Constraint programming [Rossi, Beek,  WalshRossi et al.2006] aims at solving decision-making problems formulated as optimization problems of some real-world objective. Constraint programs use the notion of constraints – i.e., relations among entities of the problems (variables) – in both problem modeling and problem solving. Constraint programming relies on inference techniques that prevent the exploration of those parts of the solution search space whose assignments to variables are inconsistent with the constraints and/or dominated with respect to the objective function. Distributed Constraint Optimization Problems (DCOPs) [Modi, Shen, Tambe,  YokooModi et al.2005, Petcu  FaltingsPetcu  Faltings2005b, Gershman, Meisels,  ZivanGershman et al.2009, Yeoh  YokooYeoh  Yokoo2012] are problems where agents need to coordinate their value assignments, in a decentralized manner, to optimize their objective functions. DCOPs focus on attaining a global optimum given the interaction graph of a collection of agents. This approach can be effectively used to model a wide range of problems. Problem solving and communication strategies are directly linked in DCOPs. This feature makes the algorithmic components of a DCOP suitable for exploiting the structure of the interaction graph of the agents to generate efficient solutions.

The absence of a framework to model dynamic problems and uncertainty makes DCOPs unsuitable at solving certain classes of multi-agent problems, such as those characterized by action uncertainty and dynamic environments. However, since its original introduction, the DCOP model has undergone a process of continuous evolution to capture diverse characteristics of agent behavior and the environment in which they operate. Researchers have proposed a number of DCOP frameworks that differ from each other in terms of expressiveness and classes of problem they can target, extending the DCOP model to handle both dynamic and uncertain environments. However, current research has not explored how the different DCOP frameworks relate to each other within the general MAS context, which is critical to understand: (i) What resolution methods could be borrowed from other MAS paradigms, and (ii) What applications can be most effectively modeled within each framework. While there are important existing surveys for Distributed Constraint Satisfaction [Yokoo  HirayamaYokoo  Hirayama2000] and Distributed Constraint Optimization [MeiselsMeisels2008], this survey aims to comprehensively analyze and categorize the different DCOP frameworks proposed by the MAS community. We do so by presenting an extensive review of the DCOP model and its extensions, the different resolution methods, as well as a number of applications modeled within each particular DCOP extension. This analysis also provides opportunities to identify open challenges and discuss future directions in the general DCOP research area.

List of key symbols
Agent Projection operator
Decision variable Probability function
Random variable ’s local variables
Domain of ’s neighbors
Event space of ’s children
Cost function ’s pseudo-children
Scope of ’s parent
Number of agents ’s pseudo-parents
Number of variables Agents whose variables are in
Number of random variables Set of edges of the constraint graph
Number of cost functions Tree edges of the pseudo-tree
Size of the largest domain Set of edges of the factor graph
Global objective function Induced width of the pseudo-tree
Vector of objective functions Size of the largest neighborhood
Objective function in Size of the largest local variable set
Utopia point Maximal sample size
Infeasible value Size of the Pareto set
Complete assignment Size of the largest bin
Partial assignment for the variables in Number of iterations of the algorithm
State space
Table 1: Commonly Used Symbols and Notations

This survey paper is organized as follows. The next section provides an overview on two relevant constraint satisfaction models and their generalization to the distributed cases. Section 3 introduces DCOPs, overviews the representation and coordination models adopted during the resolution of DCOPs, and it proposes a classification of the different variants of DCOPs based on the characteristics of the agents and the environment. Section 4 presents the classical DCOP model as well as two notable extensions: One characterized by asymmetric cost functions and another by multi-objective optimization. Section 5 presents a DCOP model where the environment changes over time. Section 6 discusses DCOP models in which agents act under uncertainty and may have partial knowledge of the environment in which they act. Section 7 discusses DCOP models in which agents are non-cooperative. For each of these models, the paper introduces their formal definitions, discusses related concepts, and describes several resolution algorithms. A summary of the various classes of problems discussed in this survey is given in Table 5. Section 8 describes a number of applications that have been proposed in the DCOP literature. Section 9 provides a critical review on the DCOP variants surveyed and focuses on their applicability in various settings. Additionally, it describes some potential future directions for research. Finally, Section 10 provides concluding remarks. To facilitate the reading of this survey, Table 1 summarizes the most commonly used symbols and notations.

2 Overview of (Distributed) Constraint Satisfaction and Optimization

This section provides an overview of several constraint satisfaction models, which form the foundation of DCOPs. Figure 1 illustrates the relations among these models.

2.1 Constraint Satisfaction Problems

Constraint Satisfaction Problems (CSPs) [Golomb  BaumertGolomb  Baumert1965, Mackworth  FreuderMackworth  Freuder1985, AptApt2003, Rossi, Beek,  WalshRossi et al.2006] are decision problems that involve the assignment of values to variables, under a set of specified constraints on how variable values should be related to each other. A number of problems can be formulated as CSPs, including resource allocation, vehicle routing, circuit diagnosis, scheduling, and bioinformatics. Over the years, CSPs have become the paradigm of choice to address difficult combinatorial problems, drawing and integrating insights from diverse domains, including artificial intelligence and operations research [Rossi, Beek,  WalshRossi et al.2006].

A CSP is a tuple , where:

  • is a finite set of variables.

  • is a set of finite domains for the variables in , with being the set of possible values for the variable .

  • is a finite set of constraints over subsets of , where a constraint , defined on the variables , is a relation , where . The set of variables is referred to as the scope of .111The presence of a fixed ordering of variables is assumed. is called a unary constraint if and a binary constraint if . For all other values of , the constraint is called a k-ary constraint.222A constraint with is also called a ternary constraint and a constraint with is also called a global constraint.

A partial assignment is a value assignment for a proper subset of variables from that is consistent with their respective domains, i.e., it is a partial function such that, for each , if is defined, then . An assignment is complete if it assigns a value to each variable in . The notation is used to denote a complete assignment, and, for a set of variables , to denote the projection of the values in associated to the variables in , where . The goal in a CSP is to find a complete assignment such that, for each , , that is, a complete assignment that satisfies all the problem constraints. Such a complete assignment is called a solution of the CSP.

2.2 Weighted Constraint Satisfaction Problems

A solution of a CSP must satisfy all of its constraints. In many practical cases, however, it is desirable to consider complete assignments whose constraints can be violated according to a violation degree. The Weighted Constraint Satisfaction Problem (WCSP) [Shapiro  HaralickShapiro  Haralick1981, LarrosaLarrosa2002] was introduced to capture this property. WCSPs are problems whose constraints are considered as preferences that specify the extent of satisfaction (or violation) of the associated constraint.

A WCSP is a tuple , where and are the set of variables and their domains as defined in a CSP, and is a set of weighted constraints. A weighted constraint is a function , where is the scope of and is a special element used to denote that a given combination of values for the variables in is not allowed, and it has the property that , for all . The cost of an assignment is the sum of the evaluation of the constraints involving all the variables in . A solution is a complete assignment with cost different from , and an optimal solution is a solution with minimal cost.

Thus, a WCSP is a generalization of a CSP which, in turn, can be seen as a WCSP whose constraints use exclusively the costs and . The terms WCSP and Constraint Optimization Problem (COP) have been used interchangeably in the literature and the use of the latter term has been widely adopted in the recent years.

Figure 1: DCOP Problems as a Generalization and Extension of Constraint Satisfaction Problems

2.3 Distributed Constraint Satisfaction Problems

When the elements of a CSP are distributed among a set of autonomous agents, the resulting model is referred to as a Distributed Constraint Satisfaction Problem (DisCSP) [Yokoo, Durfee, Ishida,  KuwabaraYokoo et al.1998, YokooYokoo2001]. A DisCSP is a tuple , where , , and are the set of variables, their domains, and the set of constraints, as defined in a CSP; is a finite set of autonomous agents; and is a surjective function, from variables to agents, which assigns the control of each variable to an agent . The goal in a DisCSP is to find a complete assignment that satisfies all the constraints of the problem.

DisCSPs can be seen as an extension of CSPs to the multi-agent case, where agents communicate with each other to assign values to the variables they control so as to satisfy all the problem constraints. For a survey on the topic, the interested reader is referred to [Rossi, Beek,  WalshRossi et al.2006] (Chapter 20).

2.4 Distributed Constraint Optimization Problems

Similar to the generalization of CSPs to COPs, the Distributed Constraint Optimization Problem (DCOP) model [Modi, Shen, Tambe,  YokooModi et al.2005, Petcu  FaltingsPetcu  Faltings2005b, Gershman, Meisels,  ZivanGershman et al.2009, Yeoh  YokooYeoh  Yokoo2012] emerges as a generalization of the DisCSP model, where constraints specify a degree of preference over their violation, rather than a Boolean satisfaction metric. DCOPs can also be viewed as an extension of the COP framework to the multi-agent case, where agents control variables and constraints, and need to coordinate the value assignment for the variables they control so as to optimize a global objective function. The DCOP framework is formally introduced in the next section.

3 DCOP Classification

The DCOP model has undergone a process of continuous evolution to capture diverse characteristics of the agent behavior and the environment in which agents operate. This section proposes a classification of DCOP models from a multi-agent systems perspective. It accounts for the different assumptions made about the behavior of the agents and their interactions with the environment. The classification is based on the following elements (summarized in Table 2):

Element Characterization
Agent(s) Behavior Deterministic Stochastic
Knowledge Total Partial
Teamwork Cooperative Competitive
Environment Behavior Deterministic Stochastic
Evolution Static Dynamic
Table 2: DCOP Classification Elements
  • Agent Behavior: This parameter captures the stochastic nature of the effects of an action being executed. These effects can be either deterministic or stochastic.

  • Agent Knowledge: This parameter captures the knowledge of an agent about its own state and the environment. It can be total or partial (i.e., incomplete).

  • Agent Teamwork: This parameter characterizes the approach undertaken by (teams of) agents to solve a distributed problem. It can be either a cooperative or a competitive resolution approach. In the former class, all agents cooperate to achieve a common goal (i.e., they all optimize a global objective function). In the latter class, each agent (or team of agents) seeks to achieve its own individual goal (i.e., each agent optimizes its individual objective functions).

  • Environment Behavior: This parameter captures the exogenous properties of the environment. The response of the environment to the execution of an action can be either deterministic or stochastic.

  • Environment Evolution: This parameter captures whether the DCOP does not change over time (static) or it changes over time (dynamic).

Figure 3 illustrates a categorization of the DCOP models proposed to date from a MAS perspective. This survey focuses on the DCOP models proposed at the junction of constraint programming, game theory, and decision theory. The classical DCOP model is directly inherited from constraint programming as it extends the WCSP model to a distributed setting. It is characterized by a static model, a deterministic environment and agent behavior, a total agent knowledge, and a cooperative agent teamwork. Game theoretical concepts explored in the context of auctions and negotiations have influenced the DCOP framework leading to the development of the Asymmetric DCOP and the Multi-Objective DCOP. The DCOP framework has also borrowed fundamental decision theoretical concepts related to modeling uncertain and dynamic environments, resulting in models like the Probabilistic DCOP and the Dynamic DCOP. Researchers from the DCOP community have also designed solutions that inherit from all of the three communities.

The next sections describe the different DCOP frameworks, starting with classical DCOPs before proceeding to its various extensions. The survey focuses on a categorization based on three dimensions: Agent knowledge, environment behavior, and environment evolution. It assumes a deterministic agent behavior, a fully cooperative agent teamwork, and a total agent knowledge (unless otherwise specified), as they are, by far, common assumptions adopted by the DCOP community. The DCOP models associated to this categorization are summarized in Table 3. The bottom-right entry of the table is left empty, indicating a promising model with dynamic and uncertain environments that, to the best of our knowledge, has not been explored yet. There has been only a modest amount of effort in modeling the different aspects of teamwork within the DCOP community. Section 7 describes a formalism that has been adopted to model DCOPs with mixed cooperative and competitive agents.

Environment Evolution Environment Behavior
Deterministic Stochastic
Static Classical DCOP Probabilistic DCOP
Dynamic Dynamic DCOP
Figure 2: DCOPs Models
Figure 3: DCOPs Within a MAS Perspective

4 Classical DCOP

With respect to the proposed categorization, in the classical DCOP model [Modi, Shen, Tambe,  YokooModi et al.2005, Petcu  FaltingsPetcu  Faltings2005b, Gershman, Meisels,  ZivanGershman et al.2009, Yeoh  YokooYeoh  Yokoo2012] the agents are fully cooperative and have deterministic behavior and total knowledge. Additionally, the environment is static and deterministic. This section reviews the formal definitions of classical DCOPs, presents some relevant solving algorithms, and provides details of selected variants of classical DCOPs of particular interest.

4.1 Definition

A classical DCOP is described by a tuple , where:

  • is a finite set of agents.

  • is a finite set of variables, with .

  • is a set of finite domains for the variables in , with being the domain of variable .

  • is a finite set of cost functions, with , where similar to WCSPs, is the set of variables relevant to , referred to as the scope of . The arity of a cost function is the number of variables in its scope. Each cost function represents a factor in a global objective function . In the DCOP literature, the cost functions are also called constraints, utility functions, or reward functions.

  • is a total and onto function, from variables to agents, which assigns the control of each variable to an agent .

With a slight abuse of notation, will be used to denote the set of agents whose variables are involved in the scope of , i.e., . A partial assignment is a value assignment for a proper subset of variables of . An assignment is complete if it assigns a value to each variable in . For a given complete assignment , we say that a cost function is satisfied by if . A complete assignment is a solution of a DCOP if it satisfies all its cost functions. The goal in a DCOP is to find a solution that minimizes the total problem cost expressed by its cost functions:333Alternatively, one can define a maximization problem by substituting the operator in Equation 1 with . Typically, if the objective functions are referred to as utility functions or reward functions, then the DCOP is a maximization problem. Conversely, if the objective functions are referred to as cost functions, then the DCOP is a minimization problem.

(1)

where is the state space, defined as the set of all possible solutions.

Given an agent , denotes the set of variables controlled by agent , or its local variables, and denotes the set of its neighboring agents. A cost function is said to be hard if we have that . Otherwise, the cost function is said to be soft.

Finding an optimal solution for a classical DCOP is known to be NP-hard [Modi, Shen, Tambe,  YokooModi et al.2005].

4.2 DCOP: Representation and Coordination

Representation in DCOPs plays a fundamental role, both from an agent coordination perspective and from an algorithmic perspective. This section discusses the most predominant representations adopted in various DCOP algorithms. It starts by describing some widely adopted assumptions regarding agent knowledge and coordination, which will apply throughout this document, unless otherwise stated:

  • A variable and its domain are known exclusively to the agent controlling it and its neighboring agents.

  • Each agent knows the values of the cost function involving at least one of its local variables. No other agent has knowledge about such cost function.

  • Each agent knows (and it may communicate with) exclusively its own neighboring agents.

4.2.1 Constraint Graph

Given a DCOP , is the constraint graph of , where an undirected edge exists if and only if there exists such that . A constraint graph is a standard way to visualize a DCOP instance. It underlines the agents’ locality of interactions and therefore it is commonly adopted by DCOP resolution algorithms.

Given an ordering on , a variable is said to have a higher priority with respect to a variable if appears before in . Given a constraint graph and an ordering on its nodes, the induced graph on is the graph obtained by connecting nodes, processed in increasing order of priority, to all their higher-priority neighbors. For a given node, the number of higher-priority neighbors is referred to as its width. The induced width of is the maximum width over all the nodes of on ordering .

Figure 4(a) shows an example constraint graph of a DCOP with four agents through , each controlling one variable with domain {0,1}. There are two cost functions: a -ary cost function with scope and represented by a clique among , and ; and a binary cost function with scope .


(a) Constraint Graph                 (b) Pseudo-Tree                        (c) Factor Graph

Figure 4: DCOP Representations

4.2.2 Pseudo-Tree

A number of DCOP algorithms require a partial ordering among the agents. In particular, when such an order is derived from a depth-first search (DFS) exploration, the resulting structure is known as a (DFS) pseudo-tree. A pseudo-tree arrangement for a DCOP is a subgraph of such that is a spanning tree of – i.e., a connected subgraph of containing all the nodes and being a rooted tree – with the following additional condition: for each , if for some , then appear in the same branch of (i.e.,  is an ancestor of in or vice versa). Edges of that are in (respectively out of) are called tree edges (respectively backedges). The tree edges connect parent-child nodes, while backedges connect a node with its pseudo-parents and its pseudo-children. The separator of an agent is the set containing all the ancestors of in the pseudo-tree (through tree edges or backedges) that are connected to or to one of its descendants. The notation , , , and will be used to indicate the set of children, pseudo-children, parent, and pseudo-parents of the agent .

Both constraint graph and pseudo-tree representations cannot deal explicitly with -ary cost functions (with ). A typical artifact to deal with such cost functions in a pseudo-tree representation is to introduce a virtual variable that monitors the value assignments for all the variables in the scope of the cost function, and generates the cost values [Bowring, Tambe,  YokooBowring et al.2006] – the role of the virtual variables can be delegated to one of the variables participating in the cost function [Pecora, Modi,  ScerriPecora et al.2006, Matsui, Matsuo, Silaghi, Hirayama,  YokooMatsui et al.2008].

Figure 4(b) shows one possible pseudo-tree of the example DCOP in Figure 4(a), where , , , and . The solid lines are tree edges and dotted lines are backedges.

4.2.3 Factor Graph

Another way to represent DCOPs is through a factor graph [Kschischang, Frey,  LoeligerKschischang et al.2001]. A factor graph is a bipartite graph used to represent the factorization of a function. In particular, given the global objective function , the corresponding factor graph is composed of variable nodes , factor nodes , and edges such that there is an undirected edge between factor node and variable node if .

Factor graphs can handle -ary cost functions explicitly. To do so, they use a similar method as the one adopted within pseudo-trees with such cost functions: They delegate the control of a factor node to one of the agents controlling a variable in the scope of the cost function. From an algorithmic perspective, the algorithms designed over factor graphs can directly handle -ary cost functions, while algorithms designed over pseudo-trees require changes in the algorithm design so to delegate the control of the -ary cost functions to some particular entity.

Figure 4(c) shows the factor graph of the example DCOP in Figure 4(a), where each agent controls its variable and, in addition, controls the cost function and controls cost function .

4.3 Algorithms

The field of classical DCOPs is mature and a number of different resolution algorithms have been proposed. DCOP algorithms can be classified as being either

complete or incomplete, based on whether they can guarantee the optimal solution or they trade optimality for shorter execution times, producing near-optimal solutions. They can also be characterized based on their runtime characteristics, their memory requirements, and their communication requirements (e.g., the number and size of messages that they send and whether they communicate with their neighboring agents only or also to non-neighboring agents). Table 3 tabulates the properties of a number of key DCOP algorithms that will be surveyed in Sections 4.3.4 and 4.3.5. An algorithm is said anytime if it can return a valid solution even if the DCOP agents are interrupted at any time before the algorithm terminates. Anytime algorithms are expected to seek for solutions of increasing quality as they keep running [Zivan, Okamoto,  PeledZivan et al.2014].

All these algorithms were originally developed under the assumption that each agent controls exactly one variable. The description of their properties will follow the same assumption. These properties may change when generalizing the algorithms to allow for agents to control multiple variables, but they will depend on how the algorithms are generalized. Throughout this document, the following notation will be often adopted when discussing the complexity of the algorithms:

  • refers to the number of variables in the problem; in Table 3, also refers to the number of agents in the problem since each agent has exactly one variable;

  • refers to the size of the largest domain;

  • refers to the induced width of the pseudo-tree;

  • refers to the largest number of neighboring agents; and

  • refers to the number of iterations in incomplete algorithms.

In addition, each of these classes can be categorized into several groups, depending on the degree of locality exploited by the algorithms, the way local information is updated, and the type of exploration process adopted. These different categories are described next.

Algorithm Quality Characteristics Runtime Characteristics Memory Communication Characteristics
Optimal? Error Bound? Complexity Anytime? per Agent # Messages Message Size Local Communication?
SyncBB
AFB
ADOPT
ConcFB
DPOP
OptAPO
Max-Sum
Region Optimal
MGM
DSA
DUCT
D-Gibbs
Table 3: Quality, Runtime, Memory, and Communication Characteristics of DCOP Algorithms

4.3.1 Partial Centralization

In general, the DCOP solving process is decentralized, driving DCOP algorithms to follow the agent knowledge and communication restrictions described in Section 4.2. However, some algorithms explore methods to centralize the decisions to be taken by a group of agents, by delegating them to one of the agents in the group. These algorithms explore the concept of partial centralization [Hirayama  YokooHirayama  Yokoo1997, Mailler  LesserMailler  Lesser2004, Petcu, Faltings,  MaillerPetcu et al.2007], and thus they are classified as partially centralized algorithms. Typically, partial centralization improves the algorithms’ performance allowing agents to coordinate their local assignments more efficiently. However, such performance enhancement comes with a loss of information privacy, as the centralizing agent needs to be granted access to the local subproblem of other agents in the group [Greenstadt, Grosz,  SmithGreenstadt et al.2007, Mailler  LesserMailler  Lesser2004]. In contrast, fully decentralized algorithms inherently reduce the amount of information privacy at cost of a larger communication effort.

4.3.2 Synchronicity

DCOP algorithms can enhance their effectiveness by exploiting distributed and parallel processing. Based on the way the agents update their local information, DCOP algorithms are classified as synchronous or asynchronous. Asynchronous algorithms allow agents to update the assignment for their variables based solely on their local view of the problem, and thus independently from the actual decisions of the other agents [Modi, Shen, Tambe,  YokooModi et al.2005, Farinelli, Rogers, Petcu,  JenningsFarinelli et al.2008, Gershman, Meisels,  ZivanGershman et al.2009]. In contrast, synchronous algorithms constrain the agents decisions to follow a particular order, typically enforced by the representation structure adopted [Mailler  LesserMailler  Lesser2004, Petcu  FaltingsPetcu  Faltings2005b, Pearce  TambePearce  Tambe2007].

Synchronous algorithms tend to delay the actions of some agents guaranteeing that their local view of the problem is always consistent with that of the other agents. In contrast, asynchronous algorithms tend to minimize the idle-time of the agents, which in turn can react quickly to each message being processed; however, they provide no guarantee on the consistency of the state of the local view of each agent. Such effect has been studied by peri:13, concluding that inconsistent agents’ views may cause a negative impact on network load and algorithm performance, and that introducing some level of synchronization may be beneficial for some algorithms, enhancing their performance.

4.3.3 Exploration Process

The resolution process adopted by each algorithm can be classified in three categories [YeohYeoh2010]:

  • Search-based algorithms are based on the use of search techniques to explore the space of possible solutions. These algorithms are often derived from corresponding search techniques developed for centralized AI search problems, such as best-first search and depth-first search.

  • Inference-based algorithms are derived from dynamic programming and belief propagation techniques. These algorithms allow agents to exploit the structure of the constraint graph to aggregate costs from their neighbors, effectively reducing the problem size at each step of the algorithm.

  • Sampling-based algorithms

    are incomplete approaches that sample the search space to approximate a function (typically, a probability distribution) as a product of statistical inference.

Figure 5 illustrates a taxonomy of classical DCOP algorithms. The following subsections summarize some representative complete and incomplete algorithms from each of the classes introduced above. A detailed description of the DCOP algorithms is beyond the scope of this manuscript. The interested reader is referred to the original articles that introduce each algorithm.

Figure 5: Classical DCOP Algorithm Taxonomy

4.3.4 Complete Algorithms

Some of the algorithms described below were originally designed to solve the variant of DCOPs that maximizes rewards, while others solve the variant that minimizes costs. However, the algorithms that maximize rewards can be easily adapted to minimize costs. For consistency, this survey describes the version of the algorithms that focus on minimization of costs. It also describes their quality, runtime, memory, and communication characteristics as summarized in Table 3.

SyncBB [Hirayama  YokooHirayama  Yokoo1997].   Synchronous Branch-and-Bound (SyncBB) is a complete, synchronous, search-based algorithm that can be considered as a distributed version of a branch-and-bound algorithm. It uses a complete ordering of the agents to extend a Current Partial Assignment (CPA) via a synchronous communication process. The CPA holds the assignments of all the variables controlled by all the visited agents, and, in addition, functions as a mechanism to propagate bound information. The algorithm prunes those parts of the search space whose solution quality is sub-optimal, by exploiting the bounds that are updated at each step of the algorithm.

SyncBB agents perform number of operations since the lowest priority agent needs to enumerate through all possible value combinations for all variables. While, by default, it is not an anytime algorithm, it can be easily extended to have an anytime property since it is a branch-and-bound algorithm. The memory requirement per SyncBB agent is since the lowest priority agent stores the value assignment of all problem variables. In terms of communication requirement, SyncBB agents send number of messages: The lowest priority agent enumerates through all possible value combinations for all variables and sends a message for each combination. The largest message, which contains the value assignment of all variables, is of size . Finally, the communication model of SyncBB depends on the given agent’s complete ordering. Thus, agents may communicate with non-neighboring agents.

AFB [Gershman, Meisels,  ZivanGershman et al.2009].   Asynchronous Forward Bounding (AFB)

is a complete, asynchronous, search-based algorithm. It can be considered as an asynchronous version of SyncBB. In this algorithm, agents communicate their cost estimates, which in turn are used to compute bounds and prune the search space. In AFB, agents extend a CPA sequentially, provided that the lower bound on their costs does not exceed the global upper bound, that is, the cost of the best solution found so far. Each agent performing an assignment (the “assigning” agent) triggers asynchronous checks of bounds, by sending

forward messages containing copies of the CPA to agents that have not yet assigned their variables. The unassigned agents that receive a CPA estimate the lower bound of the CPA given their local view of the constraint graph and send their estimates back to the agent that originated the forward message. This assigning agent will receive these estimates asynchronously and aggregate them into an updated lower bound. If the updated lower bound exceeds the current upper bound, the agent initiates a backtracking phase.

The runtime, memory, and communication characteristics of AFB are identical to those of SyncBB for the same reasons. However, while both AFB and SyncBB agents communicate with non-neighboring agents, AFB agents broadcasts some of their messages while SyncBB agents do not.

ADOPT [Modi, Shen, Tambe,  YokooModi et al.2005].   Asynchronous Distributed OPTimization (ADOPT) is a complete, asynchronous, search-based algorithm. It can be considered as a distributed version of a memory-bounded best-first search algorithm. It makes use of a DFS pseudo-tree ordering of the agents. The algorithm relies on maintaining, in each agent, lower and upper bounds on the solution cost for the subtree rooted at its node in the DFS tree. Agents explore partial assignments in best-first order, that is, in increasing lower bound order. They use COST messages (propagated upwards in the DFS pseudo-tree) and THRESHOLD and VALUE messages (propagated downwards in the pseudo-tree) to iteratively tighten the lower and upper bounds, until the lower bound of the minimum cost solution is equal to its upper bound. ADOPT agents store lower bounds as thresholds, which can be used to prune partial assignments that are provably sub-optimal.

Similar to SyncBB and AFB, ADOPT agents perform number of operations since the lowest priority agent needs to enumerate through all possible value combinations for all variables when the pseudo-tree degenerates into a pseudo-chain. It is also not an anytime algorithm as it is a best-first search algorithm. The memory requirement per ADOPT agent is , where is used to store a context, which is the value assignment of all higher-priority variables, and is used to store the lower and upper bounds for each domain value and variable belonging to the agent’s child agents. Finally, ADOPT agents communicate exclusively with their neighboring agents.

ADOPT has been extended in several ways. In particular, BnB-ADOPT [Yeoh, Felner,  KoenigYeoh et al.2010, Gutierrez  MeseguerGutierrez  Meseguer2012b] uses a branch-and-bound method to reduce the amount of computation performed during search, and ADOPT(k) combines both ADOPT and BnB-ADOPT into an integrated algorithm [Gutierrez, Meseguer,  YeohGutierrez et al.2011]. There are also extensions that trade solution optimality for smaller runtimes [Yeoh, Sun,  KoenigYeoh et al.2009a], extensions that use more memory for smaller runtimes [Yeoh, Varakantham,  KoenigYeoh et al.2009b], and extensions that maintain soft arc-consistency [Bessiere, Gutierrez,  MeseguerBessiere et al.2012, Bessiere, Brito, Gutierrez,  MeseguerBessiere et al.2014, Gutierrez  MeseguerGutierrez  Meseguer2012a, Gutierrez, Lee, Lei, Mak,  MeseguerGutierrez et al.2013].

Finally, the No-Commitment Branch and Bound (NCBB) algorithm [Chechetka  SycaraChechetka  Sycara2006] can be considered as a variant of ADOPT and SyncBB. Similar to ADOPT, NCBB agents exploit the structure defined by a pseudo-tree order to decompose the global objective function. This allow the agents to search non-intersecting parts of the search space concurrently. Another main feature of NCBB is the eager propagation of lower bounds on solution cost: An NCBB agent propagates its lower bound every time it learns about its ancestors’ assignments. This feature provides an efficient pruning of the search space. The runtime, memory, and communication characteristics of NCBB are the same as those of ADOPT except that NCBB is an anytime algorithm.

ConcFB [Netzer, Grubshtein,  MeiselsNetzer et al.2012].   Concurrent Forward Bounding (ConcFB)

is a complete, asynchronous, search-based algorithm that runs multiple parallel versions of AFB concurrently. By running multiple concurrent search procedures, it is able to quickly find a solution, apply a forward bounding process to detect regions of the search space to prune, and to dynamically create new search processes when detecting promising sub-spaces. Similar to AFB, it uses a complete ordering of agents and variables instead of pseudo-trees. As such, it is able to simplify the management of reordering heuristics, which can provide substantial speed up to the search process

[Zivan  MeiselsZivan  Meisels2006].

The algorithm operates as follows: Each agent maintains a global upper bound, which is updated during the search process. The highest-priority agent begins the process by generating a number of different search processes (SP), one for each value of its variable. It then sends an LB_Request message to all unassigned agents. This LB_Request message contains the current CPA and triggers a calculation of the lower bounds of the receiving agents, which are sent back to the sender agent via a LB_Report message. If the sum of the aggregated costs and the current CPA cost is no smaller than the current upper bound, the agent selects another value for its variable and repeats the process. If the agent has exhausted all value assignments for its variable, then it backtracks, sending the CPA to the last assigning agent. If the CPA cost is lower than the current upper bound, then it forwards the CPA message to the next non-assigned agent. Upon receiving a CPA message, the agent repeats the above process. When the lowest-priority agent finds a solution resulting to a new upper bound, it broadcasts the upper bound via a UB message, which is stored by each each agent.

netzer:12 described a series of enhancements that can be used to speed up the search process of ConcFB, including dynamic variable ordering and dynamic splitting. Despite the process within a subproblem is carried out in a synchronous fashion, different subproblems are explored independently. Thus, the agents act asynchronously and concurrently. The runtime, memory, and communication characteristics of ConcFB are identical to those of AFB since it runs multiple parallel versions of AFB concurrently.

DPOP [Petcu  FaltingsPetcu  Faltings2005b].   Distributed Pseudo-tree Optimization Procedure (DPOP) is a complete, synchronous, inference-based algorithm that makes use of a DFS pseudo-tree ordering of the agents. It involves three phases. In the first phase, the agents order themselves into a DFS pseudo-tree. In the second phase, called the UTIL propagation phase, each agent, starting from the leaves of the pseudo-tree, aggregates the costs in its subtree for each value combination of variables in its separator. The aggregated costs are encoded in a UTIL message, which is propagated from children to their parents, up to the root. In the third phase, called the VALUE propagation phase, each agent, starting from the root of the pseudo-tree, selects the optimal value for its variable. The optimal values are calculated based on the UTIL messages received from the agent’s children and the VALUE message received from its parent. The VALUE messages contain the optimal values of the agents and are propagated from parents to their children, down to the leaves of the pseudo-tree.

DPOP agents perform number of operations. When an agent optimizes for each value combination of variables in its separator, it takes operations since there are variables in the separator set in the worst case. It is not an anytime algorithm as it terminates upon finding its first solution, which is an optimal solution. The memory requirement per DPOP agent is since it stores all value combinations of variables in its separator. In terms of communication requirement, DPOP agents send messages in total; UTIL messages are propagated up the pseudo-tree and VALUE messages are propagated down the pseudo-tree. The largest message sent by an agent, which contains the aggregated costs in its subtree for each value combination of variables in its separator, is . Finally, DPOP agents only communicate with their neighboring agents only.

DPOP has also been extended in several ways to enhance its performance and capabilities. O-DPOP and MB-DPOP trade runtimes for smaller memory requirements [Petcu  FaltingsPetcu  Faltings2006, Petcu  FaltingsPetcu  Faltings2007a], A-DPOP trades solution optimality for smaller runtimes [Petcu  FaltingsPetcu  Faltings2005a], SS-DPOP trades runtime for increased privacy [Greenstadt, Grosz,  SmithGreenstadt et al.2007], PC-DPOP trades privacy for smaller runtimes [Petcu, Faltings,  MaillerPetcu et al.2007], H-DPOP propagates hard constraints for smaller runtimes [Kumar, Petcu,  FaltingsKumar et al.2008], BrC-DPOP enforces branch consistency for smaller runtimes [Fioretto, Le, Yeoh, Pontelli,  SonFioretto et al.2014], and ASP-DPOP is a declarative version of DPOP that uses Answer Set Programming [Le, Son, Pontelli,  YeohLe et al.2015].

OptAPO [Mailler  LesserMailler  Lesser2004].   Optimal Asynchronous Partial Overlay (OptAPO) is a complete, asynchronous, search-based algorithm. It trades agent privacy for smaller runtimes through partial centralization. It employs a cooperative mediation schema, where agents can act as mediators and propose value assignments to other agents. In particular, the agents check if there is a conflicting assignment with some neighboring agent. If a conflict is found, the agent with the highest priority acts as a mediator. During mediation, OptAPO solves subproblems using a centralized branch-and-bound-based search, and when solutions of overlapping subproblems still have conflicting assignments, the solving agents increase the degree of centralization to resolve them. By sharing their knowledge with centralized entities, agents can improve their local decisions, reducing the communication costs. For instance, the algorithm has been shown to be superior to ADOPT on simple combinatorial problems [Mailler  LesserMailler  Lesser2004]. However, it is possible that several mediators solve overlapping problems, duplicating efforts [Petcu, Faltings,  MaillerPetcu et al.2007], which can be a bottleneck in dense problems.

OptAPO agents perform number of operations, in the worst case, as a mediator agent may solve the entire problem. Like ADOPT and DPOP, OptAPO is not an anytime algorithm. The memory requirement per OptAPO agent is since it needs to store all value combinations of variables in its mediation group, which is of size . In terms of communication requirement, OptAPO agents send messages in the worst case, though the number of messages decreases with increasing partial centralization. The size of the messages is bounded by , where in the initialization phase of each mediation step, each agent sends its domain to its neighbors and the list of variables that it seeks to mediate. Finally, OptAPO agents can communicate with non-neighboring agents during the mediation phase.

The original version of OptAPO has been shown to be incomplete due to the asynchronicity of the different mediators’ groups, which can lead to race conditions. grinshpoun:08 proposed a complete variant that remedies this issue.

4.3.5 Incomplete Algorithms

Max-Sum [Farinelli, Rogers, Petcu,  JenningsFarinelli et al.2008].   Max-Sum is an incomplete, synchronous, inference-based algorithm based on belief propagation. It operates on factor graphs by performing a marginalization process of the cost functions, and optimizing the costs for each given variable. This process is performed by recursively propagating messages between variable nodes and factor nodes. The value assignments take into account their impact on the marginalized cost function. Max-Sum is guaranteed to converge to an optimal solution in acyclic graphs, but convergence is not guaranteed on cyclic graphs.

Max-Sum agents perform number of operations in each iteration, where each agent needs to optimize for all value combinations of neighboring variables. It is not an anytime algorithm. The memory requirement per Max-Sum agent is since it needs to store all value combinations of neighboring variables. In terms of communication requirement, in the worst case, each Max-Sum agent sends messages in each iteration, one to each of its neighbor. Thus, the total number of messages sent across all agents is . Each message is of size as it needs to contain the current aggregated costs of all the agent’s variable’s values. Finally, the agents communicate exclusively with their neighboring agents.

Max-Sum has been extended in several ways. Bounded Max-Sum bounds the quality of the solutions found by removing a subset of edges from a cyclic DCOP graph to make it acyclic, and running Max-Sum to solve the acyclic problem [Rogers, Farinelli, Stranders,  JenningsRogers et al.2011]; Improved Bounded Max-Sum improves on the error bounds [Rollon  LarrosaRollon  Larrosa2012]; and Max-Sum_ADVP guarantees convergence in acyclic graphs through a two-phase value propagation phase [Zivan  PeledZivan  Peled2012, Chen, Deng,  WuChen et al.2017]. Max-Sum and its extensions have been successfully used to solve a number of large scale, complex MAS applications (see Section 8).

Region Optimal [Pearce  TambePearce  Tambe2007].   Region-optimal algorithms are incomplete, synchronous, search-based algorithms that allow users to specify regions of the constraint graph and solve the subproblem within each region optimally. Regions may be defined to have a maximum size of agents [Pearce  TambePearce  Tambe2007], hops from each agent [Kiekintveld, Yin, Kumar,  TambeKiekintveld et al.2010], or a combination of both size and hops [Vinyals, Shieh, Cerquides, Rodriguez-Aguilar, Yin, Tambe,  BowringVinyals et al.2011]. The concept of -optimality is defined with respect to the number of agents whose assignments conflict, whose set is denoted by , for two assignments and . The deviating cost of with respect to , denoted by , is defined as the difference of the aggregated cost associated to the assignment () minus the cost associated to (). An assignment is -optimal if , such that , we have that . In contrast, the concept of -distance emphasizes the number of hops from a central agent of the region , that is the set of agents which are separated from by at most hops. An assignment is -distance optimal if, , with , for any . Therefore, the solutions found have theoretical error bounds that are a function of and/or . Region-optimal algorithms adopt a partially-centralized resolution scheme in which the subproblem within each region is solved optimally by a centralized authority [Tassa, Zivan,  GrinshpounTassa et al.2016]. However, this scheme can be altered to use a distributed algorithm to solve each subproblem.

Region-optimal agents perform number of operations in each iteration, as each agent runs DPOP to solve the problem within each region optimally. It is also an anytime algorithm as solutions of improving quality are found until they are region-optimal. The memory requirement per region-optimal agent is since its region may have an induced width of and it uses DPOP to solve the problem within its region. In terms of communication requirement, each region-optimal agent sends messages, one to each agent within its region. Thus, the total number of messages sent across all agents is . Each message is of size as it uses DPOP. Finally, the agents communicate to all agents within their region – either to a distance of or hops away. Thus, they may communicate with non-neighboring agents.

An asynchronous version of regional-optimal algorithms, called Distributed Asynchronous Local Optimization (DALO), was proposed by kiekintveld:10. The DALO simulator provides a mechanism to coordinate the decision of local groups of agents based on the concepts of -optimality and -distance.

MGM [Maheswaran, Pearce,  TambeMaheswaran et al.2004a].   Maximum Gain Message (MGM) is an incomplete, synchronous, search-based algorithm that performs a distributed local search. Each agent starts by assigning a random value to each of its variables. Then, it sends this information to all its neighbors. Upon receiving the values of its neighbors, it calculates the maximum gain (i.e., the maximum decrease in cost) if it changes its value and sends this information to all its neighbors. Upon receiving the gains of its neighbors, the agent changes its value if its gain is the largest among those of its neighbors. This process repeats until a termination condition is met. MGM provides no quality guarantees on the returned solution.

MGM agents perform number of operations in each iteration, as each agent needs to compute the cost for each of its values by taking into account the values of all its neighbors. MGM is anytime since agents only change their values when they have a non-negative gain. The memory requirement per MGM agent is . Each agent needs to store the values of all its neighboring agents. In terms of communication requirement, each MGM agent sends messages, one to each of its neighboring agents. Thus, the total number of messages sent across all agents is . Each message is of constant size as it contains either the agent’s current value or the agent’s current gain. Finally, the agents communicate exclusively with their neighboring agents.

DSA [Zhang, Wang, Xing,  WittenbergZhang et al.2005].   Distributed Stochastic Algorithm (DSA) is an incomplete, synchronous, search-based algorithm that is similar to MGM, except that each agent does not send its gains to its neighbors and it does not change its value to the value with the maximum gain. Instead, it decides stochastically if it takes on the value with the maximum gain or other values with smaller gains. This stochasticity allows DSA to escape from local minima. Similar to MGM, it repeats the process until a termination condition is met, and it cannot provide quality guarantees on the returned solution. The runtime, memory, and communication characteristics of DSA are identical to those of MGM since it is essentially a stochastic variant of MGM.

DUCT [Ottens, Dimitrakakis,  FaltingsOttens et al.2017].   The Distributed Upper Confidence Tree (DUCT) algorithm is an incomplete, synchronous, sampling-based algorithm that is inspired by Monte-Carlo Tree Search and employs confidence bounds to solve DCOPs. DUCT emulates a search process analogous to that of ADOPT, where agents select the values to assign to their variables according to the information encoded in their context messages (i.e., the assignments to all the variables in the receiving variable’s separator). However, rather than systematically selecting the next value to assign to their own variables, DUCT agents sample such values. To focus on promising assignments, DUCT constructs a confidence bound , such that cost associated to the best value for any context is at least , and hence agents sample the choice with the lowest bound. This process is started by the root agent of the pseudo-tree: After sampling a value for its variable, it communicates its assignment to its children in a context message. When an agent receives this message, it repeats this process until the leaf agents are reached. When the leaf agents choose a value assignment, they calculate the cost within their context and propagate this information up to the tree in a cost message. This process continues for a given number of iterations or until convergence is achieved, i.e., until the sampled values in two successive iterations do not change. Therefore, DUCT is able to provide quality guarantees on the returned solution.

DUCT agents perform number of operations in each iteration, as each agent needs to compute the cost for each of its values by taking into account the values of all its neighbors. It is an anytime algorithm; The quality guarantee improves with increasing number of iterations. The memory requirement per DUCT agent is 444It is actually , where is the depth of the pseudo-tree. However, in the worst case, when the pseudo-tree degenerates into a pseudo-chain, then . since it needs to store the best cost for all possible contexts. In terms of communication requirement, in each iteration, each DUCT agent sends one message to its parent in the pseudo-tree and one message to each of its children in the pseudo-tree. Thus, the total number of messages sent across all agents is . Each message is of size ; context messages contain the value assignment for all higher priority agents. Finally, the agents communicate exclusively with their neighboring agents.

D-Gibbs [Nguyen, Yeoh,  LauNguyen et al.2013].   The Distributed Gibbs (D-Gibbs) algorithm is an incomplete, synchronous, sampling-based algorithm that extends the Gibbs sampling process [Geman  GemanGeman  Geman1984]

by tailoring it to solve DCOPs in a decentralized manner. The Gibbs sampling process is a centralized Markov Chain Monte-Carlo algorithm that can be used to approximate joint probability distributions. By mapping DCOPs to maximum a-posteriori estimation problems, probabilistic inference algorithms like Gibbs sampling can be used to solve DCOPs.

Like DUCT, it too operates on a pseudo-tree, and the agents sample sequentially from the root of the pseudo-tree down to the leaves. Like DUCT, each agent also stores a context (i.e., the current assignment to all the variables in its separator) and it samples based on this information. Specifically, it computes the probability for each of its values given its context and chooses its current value based on this probability distribution. After it chooses its value, it informs its lower priority neighbors of its value, and its children agents start to sample. This process continues until all the leaf agents sample. Cost information is propagated up the pseudo-tree. This process continues for a fixed number of iterations or until convergence. Like DUCT, D-Gibbs is also able to provide quality guarantees on the returned solution.

The runtime characteristics of D-Gibbs are identical to that of DUCT and for the same reasons. However, its memory requirements are smaller: The memory requirement per D-Gibbs agent is since it needs to store the current values of all its neighbors. In terms of communication requirement, in each iteration, each D-Gibbs agent sends messages, one to each of its neighbors. Thus, the total number of messages sent across all agents is . Each message is of constant size since they contain only the current value of the agent or partial cost of its solution. Finally, the agents communicate exclusively with their neighboring agents.

A version of the algorithm that speeds up the agents’ sampling process with Graphical Processing Units (GPUs) is described in [Fioretto, Yeoh,  PontelliFioretto et al.2016a].

4.4 Tradeoffs Between the Various DCOP Algorithms

The various DCOP algorithms discussed above provide a good coverage across various characteristics that may be important in different applications. As such, how well suited an algorithm is for an application depends on how well the algorithm’s characteristics match up to the application’s characteristics. The next section discusses several suggestions for the types of algorithms that are recommended based on the characteristics of the application at hand.

4.4.1 Complete Algorithms

When optimality is a requirement of the application, then one is limited to complete algorithms:

  • If the agents in the application have large amounts of memory and it is faster to send few large messages than many small messages, then inference-based algorithms (e.g., DPOP and its extensions) are preferred over search-based algorithms (e.g., SyncBB, AFB, ADOPT, ConcFB, OptAPO). This is because, in general, search algorithms perform some amount of redundant communication. Thus, for a given problem instance, the overall runtime of inference-based algorithms tend to be smaller than the runtime of search-based ones.

  • If the agents in the application have limited amounts of memory, then one has to use the search-based algorithms (e.g., SyncBB, AFB, ADOPT, ConcFB, OptAPO), which have small memory requirements. The exception is when the problem has a small induced width (e.g., the constraint graph is acyclic), in which case inference-based algorithms (e.g., DPOP) are also preferred.

4.4.2 Incomplete Algorithms

In terms of incomplete algorithms, the following recommendations are given:

  • If the solution returned must have an accompanying quality guarantee, then, one can choose to use Bounded Max-Sum, region-optimal algorithms, DUCT, or D-Gibbs. Bounded Max-Sum allows users to choose the error bound as a function of the different subsets of edges that can be removed from the graph to make it acyclic. Region-optimal algorithms allow users to parameterize the error bound according to the size of the region or the number of hops that the solution should be optimal for. Finally, DUCT and D-Gibbs allow users to parameterize the error bound based on the number of sampling iterations to conduct. The error bounds for these two algorithms are also probabilistic bounds (i.e., the likelihood that the quality of the solution is within an error bound is a function of the number of iterations). Therefore, the choice of algorithm will depend on the type of error bound one would like to impose on the solutions. One may also choose to use a number of extensions of complete algorithms (e.g., Weighted (BnB-)ADOPT and A-DPOP) that allow users to parameterize the error bound and affect the degree of speedup.

  • If the solution quality guarantee is not required, then one can also use Max-Sum, MGM, or DSA. Their performance depends on a number of factors: If the problem has large domain sizes, MGM and DSA often outperform Max-Sum, since the memory and computational complexities of Max-Sum grows exponentially with the domain size. However, if the problem has small induced widths (for instance, when its constraint graph is acyclic), then Max-Sum is very efficient. It is even guaranteed to find optimal solutions when the induced width is 1. In general, Max-Sum tends to find solutions of good quality especially when considering its recent improvements (e.g., [Zivan, Parash, Cohen, Peled,  OkamotoZivan et al.2017]).

  • If the problem has hard constraints (i.e., certain value combinations are prohibited), then the sampling algorithms (i.e., DUCT and D-Gibbs) are not recommended as they are not able to handle such problems. They require the cost functions to be smooth, and exploit that characteristic to explore the search space. Thus, one is restricted to search- or inference-based algorithms.

  • In general, MGM and DSA are good robust benchmarks as they tend to find reasonably high quality solutions in practice. However, if specific problem characteristics are known, such as the ones discussed above, then certain algorithms may be able to exploit them to find better solutions.

4.5 Notable Variant: Asymmetric DCOPs

Asymmetric DCOPs [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013] are used to model multi-agent problems where agents controlling variables in the scope of a cost function can incur to different costs, given a fixed join assignment. Such a problem cannot be naturally represented by classical DCOPs, which require that all agents controlling variables participating in a cost function incur to the same cost as each other.

4.5.1 Definition

An Asymmetric DCOP is a tuple , where , and are as defined in Definition 4.1, and each cost function is defined as: . In other words, an Asymmetric DCOP is a DCOP where the cost that an agent incurs from a cost function may differ from the cost that another agent incurs from the same cost function.

As costs for participating agents may differ from each other, the goal in Asymmetric DCOPs is different from the goal in classical DCOPs. Given a cost function and complete assignment , let denote the cost incurred by agent from cost function with complete assignment . Then, the goal in Asymmetric DCOPs is to find the solution :

(2)

As in classical DCOPs, solving Asymmetric DCOPs is NP-hard. In particular, it is possible to reduce any Asymmetric DCOP to an equivalent classical DCOP by introducing a polynomial number of variables and constraints, as described in the next section.

4.5.2 Relation to Classical DCOPs

One way to solve MAS problems with asymmetric costs via classical DCOPs is through the Private Event As Variables (PEAV) model [Maheswaran, Pearce,  TambeMaheswaran et al.2004a]. It can capture asymmetric costs by introducing, for each agent, as many “mirror” variables as the number of variables held by neighboring agents. The consistency with the neighbors’ state variables is imposed by a set of equality constraints. However, this formalism suffers from scalability problems, as it may result in a significant increase in the number of variables in a DCOP. In addition, grinshpoun:13 showed that most of the existing incomplete classical DCOP algorithms cannot be used to effectively solve Asymmetric DCOPs, even when the problems are reformulated through the PEAV model. They show that such algorithms are unable to distinguish between different solutions that satisfy all hard constraints, resulting in a convergence to one of those solutions and the inability to escape that local optimum. Therefore, it is important to design specialized algorithms to solve Asymmetric DCOPs.

4.5.3 Algorithms

The current research direction in the design of Asymmetric DCOP algorithms has focused on adapting existing classical DCOP algorithms to handle the asymmetric costs. Asymmetric DCOPs require that each agent, whose variables participate in a cost function, coordinate the aggregation of their individual costs. To do so, two approaches have been identified [Brito, Meisels, Meseguer,  ZivanBrito et al.2009]:

  • A two-phase strategy, where only one side of the constraint (i.e., the cost induced by one agent) is considered in the first phase. The other side(s) (i.e., the cost induced by the other agent(s)) is considered in the second phase once a complete assignment is produced. As a result, the costs of all agents are aggregated.

  • A single-phase strategy, which requires a systematic check of each side of the constraint before reaching a complete assignment. Checking each side of the constraint is often referred to as back checking, a process that can be performed either synchronously or asynchronously.


Complete Algorithms

SyncABB-2ph [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013].   Synchronous Asymmetric Branch and Bound - 2-phase (SyncABB-2ph) is a complete, synchronous, search-based algorithm that extends SyncBB with the two-phase strategy. Phase 1 emulates SyncBB, where each agent considers the values of its cost functions with higher-priority agents. Phase 2 starts once a complete assignment is found. During this phase, each agent aggregates the sides of the cost functions that were not considered during Phase 1 and verifies that the known bound is not exceeded. If the bound is exceeded, Phase 2 ends and the agents restart Phase 1 by backtracking and resuming the search from the lower priority agent that exceeded the bound. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB.

SyncABB-1ph [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013, Levit, Grinshpoun, Meisels,  BazzanLevit et al.2013].   Synchronous Asymmetric Branch and Bound - 1-phase (SyncABB-1ph) is a complete, synchronous, search-based algorithm that extends SyncBB with the one-phase strategy. Each agent, after having extended the CPA, updates the bound with its local cost associated to the cost functions involving its variables – as done in SyncBB. In addition, the CPA is sent back to the assigned agents to update its bound via a sequence of back checking operations. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB.

ATWB [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013].   The Asymmetric Two-Way Bounding (ATWB) algorithm is a complete, asynchronous, search-based algorithm that extends AFB to accommodate both forward bounding and backward bounding. The forward bounding is performed analogously to AFB. The backward bounding, instead, is achieved by sending copies of the CPA backward to the agents whose assignments are included in the CPA. Similar to what is done in AFB, agents that receive a copy of the CPA compute their estimates and send them forward to the assigning agent. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of AFB.


Incomplete Algorithms

ACLS [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013].   Asymmetric Coordinated Local Search (ACLS) is an incomplete, synchronous, search-based algorithm that extends DSA. After a random value initialization, each agent exchanges its values with all its neighboring agents. At the end of this step, each agent identifies all possible improving assignments for its own variables, given the current neighbors choices. Each agent then selects one such assignments, according to the distribution of gains (i.e., reductions in costs) from each proposal assignment, and exchanges it with its neighbors. When an agent receives a proposal assignment, it responds with the evaluation of its side of the cost functions, resulting from its current assignment and the proposal assignments of the other agents participating in the cost function. After receiving the evaluations from each of its neighbors, each agent estimates the potential gain or loss derived from its assignment, and commits to a change with a given probability, similar to agents in DSA, to escape from local minima. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DSA.

MCS-MGM [Grinshpoun, Grubshtein, Zivan, Netzer,  MeiselsGrinshpoun et al.2013].   Minimal Constraint Sharing MGM (MCS-MGM) is an incomplete, synchronous, search-based algorithm that extends MGM by considering each side of the cost function. Like MGM, the agents operate in an iterative fashion, where they exchange their current values at the start of each iteration. Afterwards, each agent sends the cost for its side of each cost function to its neighboring agents that participate in the same cost function.555This is a version of the algorithm with a guarantee that it will converge to a local optima. In the original version of the algorithm, which does not have such guarantee, each agent sends the cost only if its gain with the neighbor’s new values is larger than the neighbor’s last known gain. Upon receiving this information, each agent knows the total cost for each cost function – by adding together the value of both sides of the cost function. Therefore, like in MGM, the agent can calculate the maximum gain (i.e., maximum reduction in costs) if it changes its values, and will send this information to all its neighbors. Upon receiving the gains of its neighbors, each agent changes its value if its gain is the largest among its neighbors. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of MGM.

4.6 Notable Variant: Multi-Objective DCOPs

Multi-Objective Optimization (MOO) [MiettinenMiettinen1999, Marler  AroraMarler  Arora2004] aims at solving problems involving more than one objective function to be optimized simultaneously. In a MOO problem, optimal decisions need to accommodate potentially conflicting objectives. Multi-Objective DCOPs extend MOO problems and DCOPs [Delle Fave, Stranders, Rogers,  JenningsDelle Fave et al.2011].

4.6.1 Definition

A Multi-Objective DCOP (MO-DCOP) is a tuple , where , and are as defined in Definition 4.1, and is a vector of multi-objective functions, where each is a set of cost functions as defined in Definition 4.1. For a complete assignment of a MO-DCOP, let the cost for according to the multi-objective optimization function set , where , be

(3)

The goal of a MO-DCOP is to find a complete assignment such that:

(4)

where is a cost vector for the MO-DCOP. A solution to a MO-DCOP involves the optimization of a set of partially-ordered assignments. The above definition considers point-wise comparison of vectors—i.e.,  if for all . Typically, there is no single global solution where all the objectives are optimized at the same time. Thus, solutions of a MO-DCOP are characterized by the concept of Pareto optimality, which can be defined through the concept of dominance:

Definition 1 (Dominance)

A solution is dominated by a solution iff and for at least one .

Definition 2 (Pareto Optimality)

A solution is Pareto optimal iff it is not dominated by any other solution.

Therefore, a solution is Pareto optimal iff there is no other solution that improves at least one objective function without deteriorating the cost of another function. Another important concept is the Pareto front:

Definition 3 (Pareto Front)

The Pareto front is the set of all cost vectors of all Pareto optimal solutions.

Solving a MO-DCOP is equivalent to finding the Pareto front. However, even for tree-structured MO-DCOPs, the size of the Pareto front may be exponential in the number of variables.666In the worst case, every possible solution is a Pareto optimal solution. Thus, multi-objective algorithms often provide solutions that may not be Pareto optimal but may satisfy other criteria that are significant for practical applications. A widely-adopted criterion is that of weak Pareto optimality:

Definition 4 (Weak Pareto Optimality)

A solution is weakly Pareto optimal iff there is no other solution such that .

In other words, a solution is weakly Pareto optimal if there is no other solution that improves all of the objective functions simultaneously. An alternative approach to Pareto optimality is one that uses the concept of utopia points:

Definition 5 (Utopia Point)

A cost vector is a utopia point iff for all .

In other words, a utopia point is the vector of costs obtained by independently optimizing DCOPs, each associated to one objective of the multi-objective function vector. In general, is unattainable. Therefore, different approaches focus on finding a compromise solution [SalukvadSalukvad1971], which is a Pareto optimal solution that is close to the utopia point. The concept of closeness is dependent on the approach adopted.

Similar to their centralized counterpart, MO-DCOPs have been shown to be NP-hard (their decision versions), and #P-hard (the related counting versions), and to have exponentially many non-dominated points [Glaßer, Reitwießner, Schmitz,  WitekGlaßer et al.2010].

4.6.2 Algorithms

This section categorizes the proposed MO-DCOP algorithms into two classes: complete and incomplete algorithms, according to their ability to find the complete set of Pareto optimal solutions or only a subset of it.

Complete Algorithms

MO-SBB [Medi, Okimoto,  InoueMedi et al.2014].   Multi-Objective Synchronous Branch and Bound (MO-SBB) is a complete, synchronous, search-based algorithm that extends SyncBB. It uses an analogous search strategy to that of the mono-objective SyncBB: After establishing a complete ordering, MO-SBB agents extend a CPA with their own value assignments and the current associated cost vectors. Once a non-dominated solution is found, it is broadcasted to all agents, which add the solution to a list of global bounds. Thus, agents maintains an approximation of the Pareto front, which is used to bound the exploration, and extend the CPA only if the new partial assignment is not dominated by solutions in the list of global bounds. When the algorithm terminates, it returns the set of Pareto optimal solutions obtained by filtering the list of global bounds by dominance. The worst case runtime and communication requirements of this algorithm are the same as those of SyncBB. In terms of memory requirement, each MO-SBB agent needs amount of memory, where is the size of the Pareto set.

Pseudo-tree Based Algorithm [Matsui, Silaghi, Hirayama, Yokoo,  MatsuoMatsui et al.2012].   The proposed algorithm is a complete, asynchronous, search-based algorithm that extends ADOPT. It introduces the notion of boundaries on the vectors of multi-objective values, which extends the concept of lower and upper bounds to vectors of values. The proposed approach starts with the assumption that . Furthermore, the cost functions within each are sorted according to a predefined ordering, and for each , the scope of (i.e., the function in ) is the same for each (i.e., all functions in the same position in different have the same scope). Thus, without loss of generality, the notation will be used to refer to the scope of .

Given a complete assignment , for , let be the vector of cost values. The notion of non-dominance is applied to these vectors, where a vector is non-dominated iff there is no other vector such that for all and for at least one . The algorithm uses the notion of non-dominance for bounded vectors to retain exclusively non-dominated vectors.

The worst case runtime and communication requirements of this algorithm are the same as those of ADOPT. In terms of memory requirement, each agent needs amount of memory. However, notice that the number of combinations of cost vectors grows exponentially with the number of tuples of cost values, in the worst case. This algorithm has also been extended to solve Asymmetric MO-DCOPs [Matsui, Silaghi, Hirayama, Yokoo,  MatsuoMatsui et al.2014], which is an extension of both Asymmetric DCOPs and MO-DCOPs.

Incomplete Algorithms

B-MOMS [Delle Fave, Stranders, Rogers,  JenningsDelle Fave et al.2011].   Bounded Multi-Objective Max-Sum (B-MOMS) is an incomplete, asynchronous, inference-based algorithm, and was the first MO-DCOP algorithm introduced. It extends Bounded Max-Sum to compute bound approximations for MO-DCOPs. It consists of three phases. The Bounding Phase generates an acyclic subgraph of the multi-objective factor graph, using a generalization of the maximum spanning tree problem to vector weights. During the Max-sum Phase, the agents coordinate to find the Pareto optimal set of solutions to the acyclic factor graph generated in the bounding phase. This is achieved by extending the addition and marginal maximization operators adopted in Max-Sum to the case of multiple objectives. Finally, the Value Propagation Phase allows agents to select a consistent variable assignment, as there may multiple Pareto optimal solutions. The bounds provided by the algorithm are computed using the notion of utopia points.

The worst case runtime requirement of this algorithm is the same as those of Max-Sum. In terms of communication requirement, the number of messages sent is also like Max-Sum, but the size of each message is now . In terms of memory requirement, each B-MOMS agent needs amount of memory to store and process the messages received.

DP-AOF [Okimoto, Clement,  InoueOkimoto et al.2013].   Dynamic Programming based on Aggregate Objective Functions (DP-AOF) is an incomplete, synchronous, inference-based algorithm. It adapts the AOF technique [MiettinenMiettinen1999], designed to solve centralized multi-objective optimization problems, to solve MO-DCOPs. Centralized AOF adopts a scalarization to convert a MOO problem into a single objective optimization. This is done by assigning weights to each of the cost functions in the objective vector such that and for all . The resulting mono-objective function can be solved using any mono-objective optimization technique with guarantee to find a Pareto optimal solution [MiettinenMiettinen1999].

DP-AOF proceeds in two phases. First, it computes the utopia point by solving as many mono-objective DCOPs as the number of objective functions in the MO-DCOP. DP-AOF uses DPOP to solve these mono-objective DCOPs. It then constructs a new problem building upon the solutions obtained from the first phase. Such a problem is used to assign weights to each objective function of the MO-DCOP to construct the new mono-objective function in the same way as centralized AOF, which then can be solved optimally. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DPOP, except that the number of operations and the number of messages are larger by a factor of since it runs DPOP times to solve the mono-objective DCOPs.

MO-DPOP [Okimoto, Schwind, Clement,  InoueOkimoto et al.2014].   Multi-Objective -norm based Distributed Pseudo-tree Optimization Procedure (MO-DPOP) is an incomplete, synchronous, inference-based algorithm. It adapts DPOP using a scalarization measure based on the -norm to find a subset of the Pareto front of a MO-DCOP. Similar to DP-AOF, the algorithm proceeds in two phases. Its first phase is the same as the first phase of DP-AOF: It solves mono-objective DCOPs using DPOP to find the utopia point . In the second phase, the agents coordinate to find a solution that minimizes the distance from according to the -norm. The algorithm is guaranteed to find a Pareto optimal solution only when the -norm (Manhattan norm) is adopted. In this case, MO-DPOP finds a Pareto optimal solution that minimizes the average cost values of all objectives. The worst case runtime, memory, and communication requirements of this algorithm are the same as those of DP-AOF.

DIPLS [Wack, Okimoto, Clement,  InoueWack et al.2014].   Distributed Iterated Pareto Local Search (DIPLS) is an incomplete, synchronous, search-based algorithm. It extends the Pareto Local Search (PLS) algorithm [Paquete, Chiarandini,  StützlePaquete et al.2004], which is a hill climbing algorithm designed to solve centralized multi-objective optimization problems, to solve MO-DCOPs. The idea behind DIPLS is to evolve an initial solution toward the Pareto front. To do so, it starts from an initial set of random assignments, and applies PLS iteratively to generate new non-dominated solutions. DIPLS requires a total ordering of agents and elects one agent as the controller. At each iteration, the controller filters the set of solutions by dominance and broadcasts them to the agents in the MO-DCOP. Upon receiving a solution, an agent generates a list of neighboring solutions by modifying the assignments of the variables that it controls, and sends them back to the controller. When the controller receives the messages from all agents, it proceeds to filter (by dominance) the set of solutions received, and if a new non-dominated solution is found, it repeats the process.

The worst case runtime of this algorithm is as the controller agent is required to check the dominance of the newly generated solutions at each iteration. In terms of memory requirement, DIPLS agents use space to store the Pareto front. Finally, in terms of communication requirement, the controller agent broadcasts messages that contain the current Pareto front. Thus, the message size is .

5 Dynamic DCOPs

Within a real-world MAS application, agents often act in dynamic environments that evolve over time. For instance, in a disaster management search and rescue scenario, new information (e.g., the number of victims in particular locations or priorities on the buildings to evacuate) typically becomes available in an incremental manner. Thus, the information flow modifies the environment over time. To cope with such a requirement, researchers have introduced the Dynamic DCOP (D-DCOP) model, where cost functions can change during the problem solving process, agents may fail, and new agents may be added to the DCOP being solved. With respect to the categorization described in Section 3, in the D-DCOP model, the agents are fully cooperative and they have deterministic behavior and total knowledge. On the other hand, the environment is dynamic and deterministic.

5.1 Definition

The Dynamic DCOP (D-DCOP) model is defined as a sequence of classical DCOPs: , where each is a DCOP representing the problem at time step , for . The goal in a D-DCOP is to solve the DCOP at each time step optimally. By assumption, the agents have total knowledge about their current environment (i.e., the current DCOP) but they are unaware of changes to the problem in future time steps.

In a dynamic system, agents are required to adapt as fast as possible to environmental changes. Stability [DijkstraDijkstra1974, Verfaillie  JussienVerfaillie  Jussien2005] is a core algorithmic concept in which an algorithm seeks to minimize the number of steps that it requires to converge to a solution each time the problem changes. In such a context, these converged solutions are also called stable solutions. Self-stabilization is a related concept derived from the area of fault-tolerance:

Definition 6 (Self-stabilization)

A system is self-stabilizing iff the following two properties hold:

  • Convergence: The system reaches a stable solution in a finite number of steps, starting from any given state. In the DCOP context, this property expresses the ability of the agents to coordinate a joint assignment for their variables that optimizes the problem at time step , starting from an assignment of the problem’s variables at time step .

  • Closure: The system remains in a stable solution, provided that no changes in the environment happens. In the DCOP context, this means that agents do not change the assignment for their variables after converging to a solution.

Solving D-DCOPs is NP-hard, as it requires to solve each DCOP of the D-DCOP independently.

5.2 Algorithms

In principle, one could use classical DCOP algorithms to solve the DCOP at each time step . However, the dynamic environment evolution encourages firm requirements on the algorithm design in order for the agents to respond automatically and efficiently to environmental changes over time. In particular, D-DCOP algorithms often follow the self-stabilizing property. As in the previous sections, the algorithms are categorized as being either complete or incomplete, according to their ability to determine the optimal solution at each time step.

5.2.1 Complete Algorithms

S-DPOP [Petcu  FaltingsPetcu  Faltings2005c].   Self-stabilizing DPOP (S-DPOP) is a synchronous, inference-based algorithm that extends DPOP to handle dynamic environments. It is composed of three self-stabilizing phases: (i) A self-stabilizing DFS pseudo-tree generation, whose goal is to create and maintain a DFS pseudo-tree structure; (ii) A self-stabilizing algorithm for the UTIL propagation phase; and (iii) A self-stabilizing algorithm for the VALUE propagation phase. These procedures work as in DPOP and they are invoked whenever any change in the DCOP problem sequence is revealed. Additionally, petcu:05b discuss self-stabilizing extensions that can be used to provide guarantees about the way the system transitions from a valid state to the next, after an environment change.

The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of DPOP. Additionally, upon changes to the problem, S-DPOP stabilizes after at most UTIL messages and VALUE messages, where is the depth of the pseudo-tree and is the number of cost functions of the problem.

I-ADOPT and I-BnB-ADOPT [Yeoh, Varakantham, Sun,  KoenigYeoh et al.2011].   Incremental Any-space ADOPT (I-ADOPT) and Incremental Any-space BnB-ADOPT (I-BnB-ADOPT) are asynchronous, search-based algorithms that extend ADOPT and BnB-ADOPT, respectively. In the incremental any-space versions of the algorithms, each agent maintains bounds for multiple contexts; in contrast, agents in ADOPT and BnB-ADOPT maintain bounds for one context only. By doing so, when solving the next DCOP in the sequence, agents may reuse the bounds information computed in the previous DCOP. In particular, the algorithms identify affected agents, which are agents that cannot reuse the information computed in the previous iterations, and they recompute bounds exclusively for such agents.

The worst case runtime and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of ADOPT. However, since these algorithms have the any-space property, their minimal memory requirements are the same as those of ADOPT but they can use more memory, if available, to speed up the algorithms.

5.2.2 Incomplete Algorithms

SBDO [Billiau, Chang,  GhoseBilliau et al.2012a].   Support Based Distributed Optimization (SBDO) is an asynchronous search-based algorithm that extends the Support Based Distributed Search algorithm [Harvey, Chang,  GhoseHarvey et al.2007] to the multi-agent case. It uses two types of messages: is-good and no-good. Is-good messages contain an ordered partial assignment and are exchanged among neighboring agents upon a change in their value assignments. Each agent, upon receiving a message, decides what value to assign to its own variables, attempting to minimize their local costs, and communicates such decisions to its neighboring agents via is-good messages. No-good messages are used in response to violations of hard constraints, or in response to obsolete assignments. A no-good message is augmented with a justification, that is, the set of hard constraints that are violated, and are saved locally within each agent. This information is used to discard partial assignments that are supersets of one of the known no-goods. The changes of the dynamic environment are communicated via messages, which are sent from the environment to the agents. In particular, changes in hard constraints require the update of all the justifications in all no-goods.

The worst case runtime, memory, and communication requirements of this algorithm are the same as those of SyncBB each time the problem changes.

FMS [Ramchurn, Farinelli, Macarthur,  JenningsRamchurn et al.2010].   Fast Max-Sum (FMS) is an asynchronous inference-based algorithm that extends Max-Sum to the Dynamic DCOP model. As in Max-Sum, the algorithm operates on a factor graph. Solution stability is maintained by recomputing only those factors that changed between the previous DCOP and the current DCOP . ramchurn:10 exploit domain-specific properties in a task allocation problem to reduce the number of states over which each factor has to compute its solution. In addition, FMS is able to efficiently manage addition or removal of tasks (e.g., factors), by performing message propagation exclusively on the factor graph regions that are affected by such topological changes. The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of Max-Sum.

FMS has been extended in several ways. Bounded Fast Max-Sum provides bounds on the solution found, as well as it guarantees self-stabilization [Macarthur, Farinelli, Ramchurn,  JenningsMacarthur et al.2010]. Branch-and-Bound Fast Max-Sum (BnB-FMS) extends FMS providing online domain pruning using a branch-and-bound technique [Macarthur, Farinelli, Ramchurn,  JenningsMacarthur et al.2011].

5.3 Notable Variants: D-DCOPs with Commitment Deadlines or Markovian Properties

We now describe several notable variants of D-DCOPs and their corresponding algorithms.

RS-DPOP [Petcu  FaltingsPetcu  Faltings2007b].   In this proposed model, agents have commitment deadlines and stability constraints. In other words, some of the variables may be unassigned at a given point in time, while others must be assigned within a specific deadline. Commitment deadlines are either hard or soft. Hard commitments model irreversible processes. When a hard committed variable is assigned, its value cannot be changed. Soft commitments model contracts with penalties. If a soft committed variable has been assigned at time step , its value can be changed at time step , at the price of a cost penalty. These costs are modeled via stability constraints, which are defined as binary relations , representing the cost of changing the value of variable from time step to time step . Given the set of stability constraints , at each time step , the goal is to find a solution :

The latter term accounts for the penalties associated to the value assignment updates for the soft committed variables. RS-DPOP has the same order complexity as S-DPOP.

To solve this problem, petcu:07a extended S-DPOP to RS-DPOP.777The full name of the algorithm was not provided by petcu:07a. Like S-DPOP, it is a synchronous, inference-based algorithm. Unlike S-DPOP, it’s UTIL and VALUE propagation phases now take into account the commitment deadlines. The worst case runtime, memory, and communication requirements of this algorithm to solve the DCOP at each time step are the same as those of S-DPOP.

Distributed Q-learning and R-learning [Nguyen, Yeoh, Lau, Zilberstein,  ZhangNguyen et al.2014].   In this proposed model, called Markovian Dynamic DCOPs (MD-DCOPs), the DCOP in the next time step depends on the solution (i.e., assignment of all variables) adopted by the agents for the DCOP in the current time step . However, the transition function between these two DCOPs are not known to the agents and the agents must, thus, learn them. The Distributed Q-learning and R-learning

algorithms are synchronous reinforcement-learning-based algorithms that extend the centralized Q-learning

[Abounadi, Bertsekas,  BorkarAbounadi et al.2001] and centralized R-learning [SchwartzSchwartz1993, MahadevanMahadevan1996] algorithms. Each agent maintains Q-values and R-values for each pair, where is the solution for the DCOP and is the value of its variables in the cost function . These Q- and R-values represent the predicted cost the agent will incur if it assigns its variables values according to when is the previous solution. The agents repeatedly refine these values at every time step and choose the values with the minimum Q- or R-value at each time step.

The worst case runtime, communication, and memory requirements of these two algorithms to solve the DCOP at each time step are the same as those of DPOP, as they use DPOP as a subroutine to update the Q- and R-values. The exception is that agents in the Distributed Q-learning algorithm also broadcast their value assignments at each time step to all other agents. Thus, they send messages in each time step instead of the complexity of DPOP.888A single broadcast message is counted as peer-to-peer messages, where is the number of agents in the problem.

A related model is the Proactive Dynamic DCOPs (PD-DCOPs) [Hoang, Fioretto, Hou, Yokoo, Yeoh,  ZivanHoang et al.2016, Hoang, Hou, Fioretto, Yeoh, Zivan,  YokooHoang et al.2017], where the transition functions between two subsequent DCOPs are known and can be exploited by the resolution process. Additionally, another key difference between these two models is that the DCOP in the next time step does not depend on the solution in the current time step, but instead depends on the values of the random variables at the current time step. Researchers have introduced a number of offline proactive and online reactive algorithms to solve this problem [Hoang, Fioretto, Hou, Yokoo, Yeoh,  ZivanHoang et al.2016, Hoang, Hou, Fioretto, Yeoh, Zivan,  YokooHoang et al.2017].

6 Probabilistic DCOPs

The DCOP models discussed so far can model MAS problems in deterministic environments. However, many real-world applications are characterized by environments with a stochastic behavior. In other words, there are exogenous events that can influence the outcome of an agent’s action. For example, the weather conditions or the state of a malfunctioning device can affect the cost of an agent’s action. To cope with such scenarios, researchers have introduced Probabilistic DCOP (P-DCOP) models, where the uncertainty in the state of the environment is modeled through stochasticity in the cost functions. With respect to the DCOP categorization described in Section 3, in the P-DCOP model, the agents are fully cooperative and have a deterministic behavior. Additionally, the environment is static and stochastic. While a large body of research has focused on problems where agents have total knowledge, this section includes a discussion of a subclass of P-DCOPs where the agents’ knowledge of the environment is limited, and the agents must balance the exploration of the unknown environment and the exploitation of the known costs.

6.1 Definition

A common strategy to model uncertainty is to augment the outcome of the cost functions with a stochastic character [Atlas  DeckerAtlas  Decker2010, Stranders, Delle Fave, Rogers,  JenningsStranders et al.2011, Nguyen, Yeoh,  LauNguyen et al.2012]. Another method is to introduce additional random variables as input to the cost functions, which simulate exogenous uncontrollable traits of the environment [Léauté  FaltingsLéauté  Faltings2009, Léauté  FaltingsLéauté  Faltings2011, Wang, Sycara,  ScerriWang et al.2011]. To cope with such a variety, this section introduces the Probabilistic DCOP (P-DCOP) model, which generalizes the proposed models of uncertainty. A P-DCOP is defined by a tuple , where and are as defined in Definition 4.1. In addition,

  • is a mixed set of decision variables and random variables.

  • is a set of random variables modeling uncontrollable stochastic events, such as weather or a malfunctioning device.

  • is the set of cost functions, each defined over a mixed set of decision variables and random variables, and such that each value combination of the decision variables on the cost function results in a probability distribution. As a result, is itself a random variable, given the local value assignment and a realization for the random variables involved in .

  • is a mapping from decision variables to agents. Notice that random variables are not controlled by any agent, as their outcomes do not depend on the agents’ actions.

  • is the (possibly discrete) set of events for the random variables (e.g., the different weather conditions or stress levels a device is subjected to) such that each random variable takes values in . In other words, is the domain of random variable .

  • is a set of probability distributions for the random variables, such that assigns a probability value to an event for and = 1 for each random variable .

  • is an evaluator function from random variables to real values, that, given an assignment of values to the decision variables, summarizes the distribution of the aggregated cost functions.

  • is a utility function that given a random variable returns an ordered set of different outcomes, and it is based on the decision maker preferences. This function is needed when the cost functions have uncertain outcomes and, thus, these distributions are not readily comparable.

The goal in a P-DCOP is to find a solution , that is, an assignment of values to all the decision variables, such that:

(5)

where argmin or argmax are selected depending on the algorithm adopted, is the operator that is used to aggregate the values from the functions . Typically such an operator is a summation, however, to handle continuous distributions, other operators have been proposed.

The probability distribution over the domain of random variables is called a belief. An assignments of all random variables in describes a (possible) scenario governed by the environment. As the random variables are not under the control of the agents, they act independently of the decision variables. Specifically, their beliefs are drawn from probability distributions. Furthermore, they are assumed to be independent of each other and, thus, they model independent sources of exogenous uncertainty.

The utility function enables us to compare the uncertain cost outcomes of the cost functions. In general, the utility function is non-decreasing, that is, the lower the cost, the higher the utility. However, the utility function should be defined for the specific application of interest. For example, in farming, the utility increases with the amount of produce harvested. However, farmers may prefer a smaller but highly certain amount of produce harvested over a larger but highly uncertain and, thus, risky outcome.

The evaluation function is used to summarize in one criterion the costs of a given assignment that depends on the random variables. A possible evaluation function is the expectation function: .

Let us now introduce some concepts that are commonly adopted in the study of P-DCOPs.

Definition 7 (Convolution)

The convolution of the probability density function (PDF) and of two independent random variables and is the integral of the product of the two functions after one is reversed and shifted:

(6)

It produces a new PDF that defines the overlapping area between and as a function of the quantity that one of the original functions is translated by. In other words, the convolution is a method of determination of the sum of two random variables. The counterpart for the distribution of the sum of two independent discrete variables is:

(7)

In a P-DCOP, the value returned by a function , for an assignment on its scope , is a random variable (). Thus, the global value is also a random variable, whose probability density function is the convolution of the PDFs of the individual ’s. Thus, the concept of convolution of two PDFs in a P-DCOP is related to the summation of the utilities of two cost functions in classical DCOPs.

A common concept in optimization with uncertainty is that of ranking a set of random variables {} with Cumulative PDFs (CDFs) {. These distributions are also commonly called lotteries, a concept related to that of stochastic dominance, which is a form of stochastic ordering based on preference regarding outcomes. It refers to situations where a probability distribution over possible outcomes can be ranked as superior to another.

The first-order stochastic dominance refers to the situation when one lottery is unambiguously better than another:

Definition 8 (First-Order Stochastic Dominance)

Given two random variables and with CDFs and , respectively, first-order stochastically dominates iff:

(8)

for all with a strict inequality over some interval.

If first-order stochastically dominates , then necessarily has a strictly smaller expected value: . In other words, if dominates , then the decision maker prefers over regardless of his utility function is, as long as it is weakly increasing.

It is not always the case that one CDF will first-order stochastically dominate another. In such a case, one can use the second-order stochastic dominance to compare them. The latter refers to the situation when one lottery is unambiguously less risky than another:

Definition 9 (Second-Order Stochastic Dominance)

Given two random variables and with CDFs and , respectively, second-order stochastically dominates iff:

(9)

for all with a strict inequality for some values of .

If second-order stochastically dominates , then . If Equation 9 holds for all , for some sufficiently large , then . In this case, as both lotteries are equal in expectation, the decision maker prefers the lottery

, which has less variance and is, thus, less risky.

Another common concept in P-DCOPs is that of regret. In decision theory, regret expresses the negative emotion arising from learning that a different solution than the one adopted, would have had a more favorable outcome. In P-DCOPs the regret of a given solution is typically defined as the difference between its associated cost and that of the theoretical optimal solution. The notion of regret is especially useful in allowing agents to make robust decisions in settings where they have limited information about the cost functions.

An important type of regret is the minimax regret. Minimax regret is a decision rule used to minimize the possible loss for a worst case (i.e, maximum) regret. As opposed to the (expected) regret, minimax regret is independent of the probabilities of the various outcomes. Thus, minimax regret could be used when the probabilities of the outcomes are unknown or difficult to estimate.

Solving P-DCOPs is PSPACE-hard, as in general, the process is required to remember a solution for each possible state associated to the uncertain random variables. The study of complexity classes for P-DCOPs is largely unexplored. Thus, we foresee this as a potential direction for future research, in which particular focus could be given in determining fragments of P-DCOPs characterized by lower complexity than the one above.

6.2 Algorithms

Unlike for Classical DCOPs and Dynamic DCOPs, where the algorithms solve the same problem, P-DCOP algorithms approach the problem uncertainty in different ways and, thus, solve different variants of the problem. This is due to the greater modeling flexibility offered by the P-DCOP framework. As such, the proposed algorithms are often not directly comparable to one another. We categorize P-DCOP algorithms into complete and incomplete algorithms, according to their ability to guarantee to find the optimal solutions or not, for a given evaluator and utility functions. Unless otherwise specified the ordering operator in Equation 5 refers to the operator.

6.2.1 Complete Algorithms

[DPOP] [Léauté  FaltingsLéauté  Faltings2011].   [DPOP] is a synchronous, sampling-based and inference-based algorithm. It can be either complete or incomplete based on the [DPOP] variant used, and described below. [DPOP] uses a collaborative sampling strategy, where all agents concerned with a given random variable agree on a common sample set that will be used to estimate the PDF of that random variable. Agents performing collaborative sampling independently propose sample sets for the random variables influencing the variables they control, and elect one agent among themselves as responsible for combining the proposed sample sets into one. The algorithm is defined over P-DCOPs with and deterministic cost function outcomes, that is, for each combination of values for the variables in , is a degenerate distribution (i.e., a distribution that results in a single value) and the utility function is the identity function.