1 Introduction
Distributed Constraint Optimization Problems (DCOPs) are an important framework for coordinating interactions in cooperative MultiAgent Systems (MAS). More specifically, agents in this framework coordinate value assignments to their variables in such a way that minimizes constraint violations by optimizing their aggregated costs [24]. DCOPs have gained popularity due to their applications in various realworld multiagent coordination problems, including distributed meeting scheduling [15], sensor networks [6] and smart grids [8].
Over the last two decades, several algorithms have been proposed to solve DCOPs, and they can be broadly classified into two classes: exact and nonexact. The former always provide an optimal solution of a given DCOP. However, solving DCOPs optimally is NPhard, so scalability becomes an issue as the system grows. In contrast, nonexact algorithms compromise some solution quality for scalability. Among the nonexact algorithms, DSA
[25], DSAN [2], MGM & MGM2 [14], MaxSum [7], MaxSum_ADVP [28], DSASDP [26], GDBA [17] and PDGibbs [22] are commonplace.In general, DCOP models assume that all the variables that are used to model a problem are discrete. However, many realworld applications (e.g. target tracking sensor orientation [9], sleep scheduling of wireless sensors [11]) are best modelled with continuous variables. In order to address this, stranders2009decentralised stranders2009decentralised proposed a model that facilitates the use of continuous variables, later referred to as a Functional DCOP (FDCOP) [10]. In contrast to standard model, FDCOP assumes all the variables are continuous and models all the constraint in the form of functions of those variables (instead of tabular from in DCOP model). Among the FDCOP solvers, CMS [20], HCMS [23], EFDPOP & AFDPOP [10] and PFD [3] are well known.
Against this background, it is obvious that the DCOP and FDCOP models can only deal with problems having discrete and continuous valued variables, respectively. In this paper, we first combine both of the models into a Mixed Integer Functional DCOP (MIFDCOP) model, that can deal with problems regardless of its variable types and representation of the constraints. We then develop a new algorithm that we call Distributed Parallel Simulated Annealing (DPSA) that can be directly applied to DCOPsandFDCOPs, and even more importantly to their generalized version MIFDCOPs.
The DPSA algorithm is based on Simulated Annealing (SA) metaheuristic which is motivated by a physical analogy of controlled temperature cooling (i.e. annealing) of a material
[12]. One of the most important factors that influence the quality of the solution produced by SA is this temperature parameter, widely denoted as . More precisely, SA starts with a high value of T and during the search process continuously cools it down to near zero. When T is high, SA only explores the search space without exploiting. This makes its behaviour similar to a random search procedure. On the other hand, when T is near zero, SA tends to only exploit and thus the exploration capability demises. In such a scenario, SA emulates the behaviour of a greedy algorithm. In fact, SA most effectively balances between exploration and exploitation in some optimum temperature region that lies in between these two extremes [19]. Several existing literature also discusses a constant optimal temperature where SA performs the best [4][1]. Unfortunately, the optimum temperature region varies from one type of problem to another and from one instance to another of the same problem. Considering this background, we present a novel method where agents cooperatively try to learn this optimal temperature region using a Monte Carlo importance sampling method called CrossEntropy sampling [13] (discussed in the background). Using the learned knowledge during this process, agents cooperatively solve the given problem. This results in a significant improvement of solution quality compared to the stateoftheart algorithms in both DCOP and FDCOP settings. Moreover, we apply and evaluate both DSAN (i.e. the only other SA based DCOP solver) and DPSA in MIFDCOP settings and show that DPSA outperforms DSAN in this setting by a notable margin.2 Background
We first describe DCOPs and FDCOPs in detail which will be the basis for the MIFDCOP model. We then conclude this section with the literature necessary for this work.
2.1 DCOP Model
A DCOP is defined by a tuple [16]: is a set of agents . is a set of discrete variables , which are being controlled by the set of agents . is a set of discrete and finite variable domains , where each is a set containing values which may be assigned to its associated variable . is a set of constraints , where is a function of a subset of variables defining the relationship among the variables in . Thus, the function denotes the cost for each possible assignment of the variables in . is a variabletoagent mapping function which assigns the control of each variable to an agent of . Each variable is controlled by a single agent. However, each agent can hold several variables.
Within the framework, the objective of a DCOP algorithm is to produce ; a complete assignment that minimizes^{1}^{1}1For a maximization problem, replace with . the aggregated cost of the constraints as shown in Equation 1.
(1) 
For ease of understanding, we assume that each agent controls one variable. Thus, the terms ‘variable’ and ‘agent’ are used interchangeably throughout this paper.
2.2 Functional DCOP Model
FDCOPs can be defined by a tuple , where , and have the same definition as in the DCOP model. However, the set of variables, and the set of domains, , are defined as follows: is a set of continuous variables that are controlled by agents in . is a set of continuous domains , where each variable can take any value between a range, = [, ]. A notable difference between FDCOP and DCOP is found in the representation of the cost function. In DCOPs, the cost functions are conventionally represented in tabular form, while in FDCOPs, each constraint is represented in the form of a function [10].
2.3 Distributed Simulated Annealing
Distributed Simulated Annealing (DSAN) [2] is the only existing Simulated Annealing (SA) based DCOP solver. DSAN is a local search algorithm that executes the following steps iteratively:

Each agent selects a random value from domain .

Agent then assign the selected value to
with the probability
where, is the local gain and is temperature at iteration . Note that authors of DSAN suggest that the value of or . However, setting the value of the temperature parameter with such a fixed schedule does not take into account their impact on the performance of the algorithm. 
Finally, agents notify neighbouring agents if the value of a variable changes.
2.4 Anytime Local Search
Anytime Local Search (ALS) is a general framework that gives distributed iterative local search DCOP algorithms such as DSAN described above an anytime property. Specifically, ALS uses a BFStree to calculate the global cost (i.e. evaluate Equation 1) of the system’s state during each iteration and keeps track of the best state visited by the algorithm. Hence, using this framework, agents can carry out the best decision that they explored during the iterative search process instead of the one that occurs at the termination of the algorithm (see [27] for more details).
2.5 CrossEntropy Sampling
CrossEntropy (CE) is a Monte Carlo method for importance sampling. CE has successfully been applied to importance sampling, rareevent simulation and optimization (discrete, continuous, and noisy problems) [13]. Algorithm 1 sketches an example that iteratively learns the optimal value of the vector
within a search space. The algorithm starts with a probability distribution
over the search space with parameter vector initialized to a certain value (that may be random). At each iteration, it takes # (which is a parameter of the algorithm) samples from the probability distribution () (line ). After that, each sample point is evaluated on a problem dependent objective function. The top among the # sample points are used to calculate the new value of which is referred to as (lines ). Finally, is used to update (line ). At the end of the learning process, most of the probability density of will be allocated near the optimal value of .3 Mixed Integer Functional DCOP Models
We now formulate Mixed Integer Functional DCOP (MIFDCOP) model that combines both the DCOP and FDCOP models. This removes the requirement of all the variables being either continuous or discrete and constraint being represented in tabular or functional form. More formally, an MIFDCOP is a tuple , where , and are as defined in standard DCOPs. The key differences are as follows:

is a set of variables , where each is either a discrete or a continuous variable.

is a set of finite domains . If a variable is discrete, its domain is the same as it is in the DCOP model; otherwise, is the same as it is in the FDCOP model.

is a set of constraint functions . Each constraint function can be modeled as follow: when the subset of the variables involved with contains only discrete variables, it can be model either in tabular form or functional form. Otherwise it is modeled only in functional form.
It is worth highlighting that both DCOPs and FDCOPs are special cases of MIFDCOPs wherein either all the variables are discrete and constraints are in tabular form or all the variables are continuous and constraints are in functional form, respectively.
MIFDCOPs are specifically useful when the variables in represent decisions about a heterogeneous set of actions. Suppose, for instance, we want to model a target classification problem in a network of optical sensor agents. In the FDCOP mode of this problem [21], an agent was only able to rotate their sensor to change its viewing direction. Now imagine that agents also can optically zoom in or zoom out to increase clarity or field of vision, respectively. The decision about rotation can be best modeled with a continuous variable (i.e. , as described in [21]) and the decision about optical zoom is naturally modeled using discrete variables (e.g. ). This problem, and similar problems where heterogeneous type of decision variables are needed, can easily be modeled with the newly proposed MIFDCOPs. We provide a small example of MIFDCOP in Figure 1.
4 The DPSA Algorithm
We will now describe the DPSA algorithm for solving MIFDCOPs. As discussed earlier, a big motivation behind DPSA is to learn the optimal temperature region for Simulated Annealing (SA). Thus, it is important that we formally define optimal temperature and optimal temperature region.
Definition 1. The Optimal Temperature given simulation length is a constant temperature at which the expected solution cost yielded by running SA for L iterations is the lowest of all the temperatures .
Definition 2: The Optimal Temperature Region (OTR) of length is a continuous interval where and contains the optimal temperature. If we set to near zero and to a very large number, it will always be a OTR by the above definition; although not a useful one. Our goal is to find a OTR with sufficiently small .
The DPSA algorithm consists of two main components: the parallel SA component and the learning component. The parallel SA component (lines ), is an extension of the existing DSAN algorithm that simulates systems in parallel. Additionally, we introduce the SelectNext() function in which agents use different strategies to select a value from its domain depending on its corresponding type, Continuous or Discrete (line ). This simple modification makes DPSA (also DSAN) applicable to DCOP, FDCOP and MIFDCOP. We also modify the existing ALS framework to make it compatible with parallel simulation.
The other significant component of DPSA is the iterative learning component sketched in the pseudocode from line to . It starts with a large temperature region and iteratively tries to shrink it down to an OTR of length . To obtain this, at each iteration, agents cooperatively perform actions (i.e synchronously simulate parallel SA with different constant temperatures), collect feedback (i.e. the cost yield by the different simulations) and use the feedback to update the current temperature region toward the goal region. The underlining algorithm is used in the learning process is based on crossentropy importance sampling. However, to make DPSA sample efficient, we present modifications that significantly reduce the number of iterations and parallel simulations needed.
We structure the rest of our discussion in this section as follows: we first describe parallel SA and our modification of ALS. Then we discuss the learning component of DPSA and the techniques that make it sample efficient. Finally, we analyze the complexity of DPSA.
In DPSA, the Simulate() function (lines ) runs SA on K copies of a system in parallel. This function is called in two different contexts: during learning to collect feedback (line ) and after the learning process has ended (line ). The main difference is that in the first context the function runs a short simulation in K different constant temperatures (one fixed temperature for each copy). In the second case, the function runs for significantly longer and all K systems run on the learned OTR with a linear schedule (discussed shortly). In addition, in the first case, all copies are initialized with the same random value from the domain. This is done because we want to identify the effect of different constant temperatures on the simulation step, and initializing them with different initial states would add more noise to the feedback. In the second case, we initialize with the best state found so far. Note that, to avoid confusion, we use to refer to the actual decision variable and to refer to on the kth copy of the system. The variable (line ) is used to represent the context in which the function was called. Depending on its value, variables of all K systems are initialized and the length of the simulation is set as we discussed (lines ). After that, the main simulation starts and runs for iterations.
At the start of each iteration, each agent shares the current state of each system (i.e. the variable assignment of each system) with their neighbours (lines ). Each agent then updates and performs other operations related to the modified ALS (discussed shortly) (line ). After that, for each of the K systems, agent picks a value from its domain (line ). If is discrete, it picks this value uniform randomly. If
is continuous, it picks it either using a Gaussian distribution
or a Uniform distribution
. Here, is an input parameter; and is the bound of . The difference is that Gaussian gives preference to a nearby value of the current assigned value, while uniform does not have any preference. Afterward, each agent selects the temperature for the current iteration i.e. (line 15). If it is called during the learning context, it is always set to a constant. More precisely, it is set to the kth value of (from lines 29, 31). Otherwise, if the learned OTR is ; it uses a linear schedule to calculate the temperature using the Equation 2:(2) 
Finally, agents assign the value to with the probability where is the local gain if the value is changed (line 16). If this gain is nonnegative, it will always be changed (since ). Otherwise, it will be accepted with a certain probability less than one.
We now describe our modification of ALS. This is used to collect feedback from the simulations during learning and to give DPSA its anytime property. We modify ALS in the following ways:

Since DPSA simulated K systems in parallel, modified_ALS() keeps track of the best state and the cost found by each system separately within the duration of a call of Simulate() function. This is used for feedback.

Modified_ALS() also keeps track of the best state and cost across all K systems and all the calls of the Simulate() function. Using this, agents assign values to their decision variables. This is used to give DPSA its anytime property.
The first part of the modification can easily be done by running each system at each call with its separate ALS. For the second part, we can have a metaALS that utilizes information of the ALSs in the first part to calculate the best state and cost across calls and systems. Due to space constraint we omit the pseudo code.
We now discuss the learning component (lines ). Here, we start with a probability distribution over a large temperature region where is the parameter vector of the distribution. At the start of each iteration, the root agent samples K points (i.e. a set of constant temperatures from this distribution (line 29)). Agents then propagate this information down the pseudotree (lines 2833). After that, agents synchronously call the Simulate() function for times (line 35). At each call, agents simulate SA in the K sampled constant temperatures (i.e. set T) in parallel. Then using modified_ALS(
), agents collect feedback i.e. cost of the best solution found by the simulation (line 36). According to law of large numbers, the average should be close to the expected value i.e. the actual feedback given large
. So we take a mean over feedback (lines 3738). Note that in the pseudocode, we use to refer to the best cost found by the kth system in the Sth call and we use to refer to the actual best cost found so far. After all the feedback is collected, we use it to update the parameter vector using the updateParameter() function (line 39). The updateParameter() function takes the G best sample point (lines 1920) and uses them to update the parameter vector (line 21). In this way, agents iteratively learn the parameter vector .The parameter vector and its update depends on the particular distribution used. In this paper, we focus on two particular distributions namely Gaussian and uniform . The parameter vector for is
and consists of the mean and the standard deviation. The new parameter vector is calculated as:
(3) 
The parameter vector for is and consists of the current bound of temperature region. The new parameter vector is calculated as:
(4) 
Finally, is updated as shown in Equation 5 where is the learning rate.
(5) 
Updating parameters in the way discussed above requires a considerable amount of iterations and samples. To reduce the number of iterations and samples required, we now discuss a few techniques that we used in the experimental section. First, when the number of parallel simulations is small (i.e. the value of K is small) taking a random sample in a large range is not efficient, and taking a stratified sample will cover a large range better. For example, when using , we may take samples at regular interval using Equation 6:
(6) 
Second, when
is small, the estimation of expected cost becomes noisy. To address this, when two sample points produce feedback within a bound of each other, we consider them equally good. We calculate this bound
as shown in Equation 7:(7) 
Here, stands for sensitivity and is an algorithm parameter. According to this, we may calculate the in line 19 as follows: Gth best of .
Finally, when is small, setting learning rate to a larger value will speed up the learning process. However, if it is set too high, the algorithm might prematurely converge or skip the optimal temperature. Additionally, we can terminate before if all the sample points are within of each other.
We now provide an example of the learning process:
Example:
Suppose, we have and we use a uniform distribution. In the first round, the sampled points will be (when taken using the regular interval):
Let the feedback from each point be:
The selected sample points (top points) will be:
Finally, the parameter update will be (min and max of ):
This process will repeat until the termination conditions meet. We give a visual overview of the process in Figure 2.
After the learning process ends, agents call the Simulate() function for the final time at line . At this time, the simulation usually runs for longer on the learned optimal temperature region. This concludes our discussion on the learning component. It is worth noting even though we apply this learning component to learn parameter value for SA, it can also be applied to learn parameter(s) for other DCOP algorithms. In this way, it can be thought of as a generic distributed parameter learning algorithm for DCOPs.
In terms of complexity, the main cost is yielded by the Simulate() function. Each iteration of this function requires calculating the local gain for systems. The calculation of local gain requires complexity where is the number of the neighbours. Hence, the total computation complexity is . In terms of communication complexity, variable assignments are transferred at each iteration which gives it complexity. Finally, agents have to save K local variable assignments, each of which requires memory, meaning the total memory requirement will be . It is worth mentioning that the memory requirement of modified ALS() is where is the height of the BFStree.
5 Experimental Results
Algorithm  = 25  = 50  = 75  

P = 0.1  P = 0.6  P = 0.1  P = 0.6  P = 0.1  P = 0.6  
DSAC  432  5725  2605  27163  7089  65519 
DSASDP  325  5635  2365  27210  6701  65600 
GDBA  386  5465  2475  26950  6867  65156 
MGM2  352  5756  2481  27421  6962  65988 
PDGibbs  398  5875  2610  27350  7178  65650 
MS_ADVP  400  5805  2550  27400  7058  66008 
DSAN  408  5802  2639  27413  7224  66085 
DPSA  268  5358  2136  26240  6276  63998 
Initially, we evaluate DPSA against the stateoftheart DCOP algorithms on 7 different benchmarks. We then test DPSA and DSAN against the stateoftheart FDCOP solvers. Finally, we present comparative solution quality produced by DPSA and DSAN on a MIFDCOP setting.
For the former, we consider the following seven benchmarking DCOP algorithms: DSAC (p = 0.8), MGM2 (offer probability 0.5), MaxSum ADVP, DSASDP (pA = 0.6, pB = 0.15, pC = 0.4, pD = 0.8), DSAN, GDBA and PDGibbs. For all the benchmarking algorithms, the parameter settings that yielded the best results are selected. We compare these algorithms on six Random DCOP settings and Weighted Graph Coloring Problems (WGCPs). For all settings, we use ErdősRényi topology (i.e. random graph) to generate the constraint graphs [5]. For random DCOPs, we vary the density from 0.1 to 0.6 and the number of agents form 25 to 75. For WGCPs, we set and number of agents to 120. We then take constraint costs uniformly from the range and set domain size to 10. For all the benchmarks, we use the following parameters for DPSA and . Note that, DPSA uses for CE and initializes parameter vector with a large temperature region (this initial value of is used in all the settings of this section). In all of the settings described above, we run the benchmarking algorithms on 50 independently generated problems and 50 times on each problem for 500 ms. In order to conduct these experiments, we use a GCPn2highcpu64 instance, a cloud computing service which is publicly accessible at cloud.google.com. Note that unless stated otherwise, all differences shown in this section are statistically significant for .
Figure 3 shows a comparison between DPSA and the benchmarking algorithms on the random DCOPs ( = 75 and p=0.1) setting. While Table 1
presents the variance of performances of these algorithms with the number of agents and density. When the density is low, the closest competitor to DPSA is DSASDP. Even though both of the algorithms keep on improving the solution until the end of the run, DPSA makes significant improvement when it starts running in the optimal temperature region after the learning process ends, and we see a big decline after 250 ms. From the results in Table
1, it can be observed that DPSA produces solutions that are better than DSASDP depending on the number of agents. However, when the density is high (), GDBA is the closest competitor to DPSA. In dense settings, DPSA outperforms GDBA by . Other competing algorithms perform equal or worse than GDBA and DSASDP and produce even bigger performance difference with DPSA (up to 15%  61% in sparse settings and 9.6%  3.2% in dense settings). Also note that the optimal cost for ( = 25, p = 0.1), which we generate using the wellknown DPOP [18] algorithm, is 253, while DPSA produces 268 in the samesetting.Figure 4 shows a comparison between DPSA and the benchmarking algorithms on the WGCPs ( = 120 and p=0.05) benchmark. We see a similar trend here as observed in the random DCOP settings. For the first 1200 iterations (up to 250 ms) i.e. during the learning stage, DPSA improves the solution with several small steps, and after that it takes a big step toward a better solution when ran longer in the OTR. In this experiment, DPSA demonstrates its notable performance. Among the benchmarking algorithms, GDBA is the closest but is still outperformed by DPSA by 1.3 times. Among the other algorithms, DPSA finds solutions that are times better. From the trend seen in Figures 3 and 4 and performance produced by DPSA compared to the current stateoftheart DCOP algorithms signifies that DPSA applied in the optimal temperature region is extremely effective at solving DCOPs. Since both DPSA and DSAN apply same principle; the big performance gain of DPSA in terms of solution quality can be credited to the fact that DPSA runs significantly longer near the optimal temperature.
We now compare DPSA and DSAN against current stateoftheart FDCOP solvers namely: PFD and HCMS on a large random graph with binary quadratic functions of the form . To generate random graphs, we use ErdősRényi topology with number of agents set to 50 and . We choose coefficients of the cost functions (a, b, c) randomly between and set the domains of each agent to . We run all algorithms on 50 independently generated problems and 50 times on each problem for 1 second and use the same hardware setup as the previous settings. For PFD, we use the same configuration suggested in [3]. For HCMS, we choose the number of discrete points to be 3. The discrete points are chosen randomly between the domain range. Finally, we use following parameters for DPSA and
. To select neighbours in DPSA and DSAN, we use both uniform distribution over the domain and Normal distribution with
.Figure 5 shows a comparison between DPSA and the benchmarking algorithms on the binary quadratic FDCOP ( = 50, P = 0.2) benchmark. For this benchmark, uniform distribution for neighbour selection performs better than normal distribution both for DPSA and DSAN. DSAN (Uniform) produces similar solution quality as PFD. However, it has significantly improved the anytime performance. On the other hand, DPSA produces solutions of significantly improved quality with the closest competitor PFD and DSAN being outperformed by 10.1%. This demonstrates that DPSA is also an effective FDCOPs solver.
Finally, we test DPSA and DSAN on the MIFDCOP setting. For this, we use the same set of problems as FDCOPs except that we randomly make 50% of the variables discrete and set their domain to . Figure 6 shows anytime performance of DPSA and DSAN on the binary quadratic MIFDCOP ( = 50, P = 0.2) benchmark. We see a similar trend as we have seen in FDCOPs benchmark. Here, DSAN converges fast to local optima and fails to make any further improvement. On the other hand, DPSA avoids local optima through maintaining a good balance between exploration and exploitation by operating in the optimal temperature region and produces a 46% better solution.
6 Conclusions and Future Works
In this paper, we present the MIFDCOP model that generalizes over the DCOP and FDCOP models. We then propose a versatile algorithm called DPSA that can be applied to MIFDCOPs. Finally, in our experimental section we present results that indicate DPSA out performs the stateoftheart DCOP, FDCOP algorithm by a significant margin and produces high quality solution compared to DSAN in MIFDCOP benchmark. The learning problem we present in this paper can also be modeled using multiarmed bandit with continuous action space that we will explore in our future work. We would also like to apply the learning component to other DCOP algorithms such as DSASDP and Maxsum with damping to improve their solution quality.
References
 [1] (1999) A simulated annealing algorithm with constant temperature for discrete stochastic optimization. Management science 45 (5), pp. 748–764. Cited by: §1.
 [2] (2004) Distributed simulated annealing. Cited by: §1, §2.3.
 [3] (2020) A particle swarm based algorithm for functional distributed constraint optimization problems. In AAAI, Cited by: §1, §5.
 [4] (1990) An improved annealing scheme for the qap. European Journal of Operational Research 46, pp. 93 – 100. Cited by: §1.
 [5] (1960) On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 (1), pp. 17–60. Cited by: §5.
 [6] (2014) Agentbased decentralised coordination for sensor networks using the maxsum algorithm. Autonomous agents and multiagent systems 28 (3), pp. 337–380. Cited by: §1.
 [7] (2008) Decentralised coordination of lowpower embedded devices using the maxsum algorithm. In AAMAS, Cited by: §1.
 [8] (2017) A distributed constraint optimization (dcop) approach to the economic dispatch with demand response. In AAMAS, Cited by: §1.
 [9] (2003) Distributed sensor networks a multiagent perspective, chapter distributed coordination through anarchic optimization. Kluwer Academic Dordrecht. Cited by: §1.
 [10] (2019) New algorithms for functional distributed constraint optimization problems. arXiv preprint arXiv:1905.13275. Cited by: §1, §2.2.
 [11] (2004) Network coverage using low dutycycled sensors: random & coordinated sleep algorithms. In Proceedings of the 3rd international symposium on Information processing in sensor networks, pp. 433–442. Cited by: §1.
 [12] (1983) Optimization by simulated annealing.. Science 220 4598, pp. 671–80. Cited by: §1.
 [13] (2011) Handbook of monte carlo methods. Cited by: §1, §2.5.
 [14] (2004) Distributed algorithms for dcop: a graphicalgamebased approach.. In ISCA PDCS, pp. 432–439. Cited by: §1.
 [15] (2004) Taking DCOP to the real world: efficient complete solutions for distributed multievent scheduling. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004., pp. 310–317. Cited by: §1.
 [16] (2005) ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence 161 (12), pp. 149–180. Cited by: §2.1.
 [17] (2016) Distributed breakout: beyond satisfaction.. In IJCAI, pp. 447–453. Cited by: §1.
 [18] (2005) A scalable method for multiagent constraint optimization. In IJCAI, Cited by: §5.
 [19] (1989) Simulated annealing algorithms: an overview. IEEE Circuits and Devices magazine 5 (1), pp. 19–26. Cited by: §1.
 [20] (2009) Decentralised coordination of continuously valued control parameters using the maxsum algorithm. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent SystemsVolume 1, pp. 601–608. Cited by: §1.
 [21] (2010) Decentralised coordination of information gathering agents. Cited by: §3.
 [22] (201903) Distributed gibbs: a linearspace samplingbased dcop algorithm. Journal of Artificial Intelligence Research 64, pp. 705–748. External Links: Document Cited by: §1.
 [23] (2010) A hybrid continuous maxsum algorithm for decentralised coordination.. In Proceedings of the 19th European Conference on Artificial Intelligence, pp. 61–66. Cited by: §1.
 [24] (1998) The distributed constraint satisfaction problem: formalization and algorithms. IEEE Transactions on knowledge and data engineering 10 (5), pp. 673–685. Cited by: §1.
 [25] (2005) Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sensor networks. Artif. Intell. 161, pp. 55–87. Cited by: §1.
 [26] (2014) Explorative anytime local search for distributed constraint optimization. Artificial Intelligence 212, pp. 1–26. Cited by: §1.
 [27] (2014) Explorative anytime local search for distributed constraint optimization. Artif. Intell. 212, pp. 1–26. Cited by: §2.4.
 [28] (2012) Max/minsum distributed constraint optimization through value propagation on an alternating dag. In AAMAS, Cited by: §1.