Learning Optimal Temperature Region for Solving Mixed Integer Functional DCOPs

02/27/2020 ∙ by Saaduddin Mahmud, et al. ∙ University of Southampton University of Dhaka Imperial College London 0

Distributed Constraint Optimization Problems (DCOPs) are an important framework that models coordinated decision-making problem in multi-agent systems with a set of discrete variables. Later work has extended this to model problems with a set of continuous variables (F-DCOPs). In this paper, we combine both of these models into the Mixed Integer Functional DCOP (MIF-DCOP) model that can deal with problems regardless of its variables' type. We then propose a novel algorithm, called Distributed Parallel Simulated Annealing (DPSA), where agents cooperatively learn the optimal parameter configuration for the algorithm while also solving the given problem using the learned knowledge. Finally, we empirically benchmark our approach in DCOP, F-DCOP and MIF-DCOP settings and show that DPSA produces solutions of significantly better quality than the state-of-the-art non-exact algorithms in their corresponding setting.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Distributed Constraint Optimization Problems (DCOPs) are an important framework for coordinating interactions in cooperative Multi-Agent Systems (MAS). More specifically, agents in this framework coordinate value assignments to their variables in such a way that minimizes constraint violations by optimizing their aggregated costs [24]. DCOPs have gained popularity due to their applications in various real-world multi-agent coordination problems, including distributed meeting scheduling [15], sensor networks [6] and smart grids [8].

Over the last two decades, several algorithms have been proposed to solve DCOPs, and they can be broadly classified into two classes: exact and non-exact. The former always provide an optimal solution of a given DCOP. However, solving DCOPs optimally is NP-hard, so scalability becomes an issue as the system grows. In contrast, non-exact algorithms compromise some solution quality for scalability. Among the non-exact algorithms, DSA

[25], DSAN [2], MGM & MGM2 [14], Max-Sum [7], Max-Sum_ADVP [28], DSA-SDP [26], GDBA [17] and PD-Gibbs [22] are commonplace.

In general, DCOP models assume that all the variables that are used to model a problem are discrete. However, many real-world applications (e.g. target tracking sensor orientation [9], sleep scheduling of wireless sensors [11]) are best modelled with continuous variables. In order to address this, stranders2009decentralised stranders2009decentralised proposed a model that facilitates the use of continuous variables, later referred to as a Functional DCOP (F-DCOP) [10]. In contrast to standard model, F-DCOP assumes all the variables are continuous and models all the constraint in the form of functions of those variables (instead of tabular from in DCOP model). Among the F-DCOP solvers, CMS [20], HCMS [23], EF-DPOP & AF-DPOP [10] and PFD [3] are well known.

Against this background, it is obvious that the DCOP and F-DCOP models can only deal with problems having discrete and continuous valued variables, respectively. In this paper, we first combine both of the models into a Mixed Integer Functional DCOP (MIF-DCOP) model, that can deal with problems regardless of its variable types and representation of the constraints. We then develop a new algorithm that we call Distributed Parallel Simulated Annealing (DPSA) that can be directly applied to DCOPsandF-DCOPs, and even more importantly to their generalized version MIF-DCOPs.

The DPSA algorithm is based on Simulated Annealing (SA) meta-heuristic which is motivated by a physical analogy of controlled temperature cooling (i.e. annealing) of a material

[12]. One of the most important factors that influence the quality of the solution produced by SA is this temperature parameter, widely denoted as . More precisely, SA starts with a high value of T and during the search process continuously cools it down to near zero. When T is high, SA only explores the search space without exploiting. This makes its behaviour similar to a random search procedure. On the other hand, when T is near zero, SA tends to only exploit and thus the exploration capability demises. In such a scenario, SA emulates the behaviour of a greedy algorithm. In fact, SA most effectively balances between exploration and exploitation in some optimum temperature region that lies in between these two extremes [19]. Several existing literature also discusses a constant optimal temperature where SA performs the best [4][1]. Unfortunately, the optimum temperature region varies from one type of problem to another and from one instance to another of the same problem. Considering this background, we present a novel method where agents cooperatively try to learn this optimal temperature region using a Monte Carlo importance sampling method called Cross-Entropy sampling [13] (discussed in the background). Using the learned knowledge during this process, agents cooperatively solve the given problem. This results in a significant improvement of solution quality compared to the state-of-the-art algorithms in both DCOP and F-DCOP settings. Moreover, we apply and evaluate both DSAN (i.e. the only other SA based DCOP solver) and DPSA in MIF-DCOP settings and show that DPSA outperforms DSAN in this setting by a notable margin.

2 Background

We first describe DCOPs and F-DCOPs in detail which will be the basis for the MIF-DCOP model. We then conclude this section with the literature necessary for this work.

2.1 DCOP Model

A DCOP is defined by a tuple [16]: is a set of agents . is a set of discrete variables , which are being controlled by the set of agents . is a set of discrete and finite variable domains , where each is a set containing values which may be assigned to its associated variable . is a set of constraints , where is a function of a subset of variables defining the relationship among the variables in . Thus, the function denotes the cost for each possible assignment of the variables in . is a variable-to-agent mapping function which assigns the control of each variable to an agent of . Each variable is controlled by a single agent. However, each agent can hold several variables.

Within the framework, the objective of a DCOP algorithm is to produce ; a complete assignment that minimizes111For a maximization problem, replace with . the aggregated cost of the constraints as shown in Equation 1.

(1)

For ease of understanding, we assume that each agent controls one variable. Thus, the terms ‘variable’ and ‘agent’ are used interchangeably throughout this paper.

2.2 Functional DCOP Model

F-DCOPs can be defined by a tuple , where , and have the same definition as in the DCOP model. However, the set of variables, and the set of domains, , are defined as follows: is a set of continuous variables that are controlled by agents in . is a set of continuous domains , where each variable can take any value between a range, = [, ]. A notable difference between F-DCOP and DCOP is found in the representation of the cost function. In DCOPs, the cost functions are conventionally represented in tabular form, while in F-DCOPs, each constraint is represented in the form of a function [10].

2.3 Distributed Simulated Annealing

Distributed Simulated Annealing (DSAN) [2] is the only existing Simulated Annealing (SA) based DCOP solver. DSAN is a local search algorithm that executes the following steps iteratively:

  • Each agent selects a random value from domain .

  • Agent then assign the selected value to

    with the probability

    where, is the local gain and is temperature at iteration . Note that authors of DSAN suggest that the value of or . However, setting the value of the temperature parameter with such a fixed schedule does not take into account their impact on the performance of the algorithm.

  • Finally, agents notify neighbouring agents if the value of a variable changes.

2.4 Anytime Local Search

Anytime Local Search (ALS) is a general framework that gives distributed iterative local search DCOP algorithms such as DSAN described above an anytime property. Specifically, ALS uses a BFS-tree to calculate the global cost (i.e. evaluate Equation 1) of the system’s state during each iteration and keeps track of the best state visited by the algorithm. Hence, using this framework, agents can carry out the best decision that they explored during the iterative search process instead of the one that occurs at the termination of the algorithm (see [27] for more details).

2.5 Cross-Entropy Sampling

1

Initialize parameter vector

, , , , while condition not met do
2       take sample from distribution () Evaluate each point in on objective function sort(, ) calculate updated using X(1:G)
Algorithm 1 Cross-Entropy Sampling

Cross-Entropy (CE) is a Monte Carlo method for importance sampling. CE has successfully been applied to importance sampling, rare-event simulation and optimization (discrete, continuous, and noisy problems) [13]. Algorithm 1 sketches an example that iteratively learns the optimal value of the vector

within a search space. The algorithm starts with a probability distribution

over the search space with parameter vector initialized to a certain value (that may be random). At each iteration, it takes # (which is a parameter of the algorithm) samples from the probability distribution () (line ). After that, each sample point is evaluated on a problem dependent objective function. The top among the # sample points are used to calculate the new value of which is referred to as (lines ). Finally, is used to update (line ). At the end of the learning process, most of the probability density of will be allocated near the optimal value of .

3 Mixed Integer Functional DCOP Models

We now formulate Mixed Integer Functional DCOP (MIF-DCOP) model that combines both the DCOP and F-DCOP models. This removes the requirement of all the variables being either continuous or discrete and constraint being represented in tabular or functional form. More formally, an MIF-DCOP is a tuple , where , and are as defined in standard DCOPs. The key differences are as follows:

  • is a set of variables , where each is either a discrete or a continuous variable.

  • is a set of finite domains . If a variable is discrete, its domain is the same as it is in the DCOP model; otherwise, is the same as it is in the F-DCOP model.

  • is a set of constraint functions . Each constraint function can be modeled as follow: when the subset of the variables involved with contains only discrete variables, it can be model either in tabular form or functional form. Otherwise it is modeled only in functional form.

Figure 1: Example of a MIF-DCOPs.

It is worth highlighting that both DCOPs and F-DCOPs are special cases of MIF-DCOPs wherein either all the variables are discrete and constraints are in tabular form or all the variables are continuous and constraints are in functional form, respectively.

MIF-DCOPs are specifically useful when the variables in represent decisions about a heterogeneous set of actions. Suppose, for instance, we want to model a target classification problem in a network of optical sensor agents. In the F-DCOP mode of this problem [21], an agent was only able to rotate their sensor to change its viewing direction. Now imagine that agents also can optically zoom in or zoom out to increase clarity or field of vision, respectively. The decision about rotation can be best modeled with a continuous variable (i.e. , as described in [21]) and the decision about optical zoom is naturally modeled using discrete variables (e.g. ). This problem, and similar problems where heterogeneous type of decision variables are needed, can easily be modeled with the newly proposed MIF-DCOPs. We provide a small example of MIF-DCOP in Figure 1.

4 The DPSA Algorithm

1 Function Simulate(IsLearning):
2       if IsLearning equals True then
3             For initialize with same random value
4      else
5             For initialize with value of
6       while  do
7             For send value of to neighbouring agents For collect value of from neighbouring agents Update using Modified_ALS() for  do
8                   Select-Next() Scheduler(,,IsLearning) Set to with probability
9            
10      
11Function UpdateTemperatur():
12       G-th best of set Update using
13Function Main():
14       Construct BFS Tree Initialize parameters: Parameter Vector , for  AND Conditions met do
15             if the agent is the root then
16                   sampled from
17            else
18                   Receive T from the parent agent in the BFS-Tree
19            Send to all the child agents in BFS-Tree for  do
20                   Synchronously start Simulate (True) Wait for Modified_ALS() to terminate for  do
21                        
22                  
23            UpdateTemperatur ()
24      Synchronously start Simulate (False)
Algorithm 2 The DPSA Algorithm

We will now describe the DPSA algorithm for solving MIF-DCOPs. As discussed earlier, a big motivation behind DPSA is to learn the optimal temperature region for Simulated Annealing (SA). Thus, it is important that we formally define optimal temperature and optimal temperature region.

Definition 1. The Optimal Temperature given simulation length is a constant temperature at which the expected solution cost yielded by running SA for L iterations is the lowest of all the temperatures .

Definition 2: The Optimal Temperature Region (OTR) of length is a continuous interval where and contains the optimal temperature. If we set to near zero and to a very large number, it will always be a OTR by the above definition; although not a useful one. Our goal is to find a OTR with sufficiently small .

The DPSA algorithm consists of two main components: the parallel SA component and the learning component. The parallel SA component (lines ), is an extension of the existing DSAN algorithm that simulates systems in parallel. Additionally, we introduce the SelectNext() function in which agents use different strategies to select a value from its domain depending on its corresponding type, Continuous or Discrete (line ). This simple modification makes DPSA (also DSAN) applicable to DCOP, F-DCOP and MIF-DCOP. We also modify the existing ALS framework to make it compatible with parallel simulation.

The other significant component of DPSA is the iterative learning component sketched in the pseudo-code from line to . It starts with a large temperature region and iteratively tries to shrink it down to an OTR of length . To obtain this, at each iteration, agents cooperatively perform actions (i.e synchronously simulate parallel SA with different constant temperatures), collect feedback (i.e. the cost yield by the different simulations) and use the feedback to update the current temperature region toward the goal region. The underlining algorithm is used in the learning process is based on cross-entropy importance sampling. However, to make DPSA sample efficient, we present modifications that significantly reduce the number of iterations and parallel simulations needed.

We structure the rest of our discussion in this section as follows: we first describe parallel SA and our modification of ALS. Then we discuss the learning component of DPSA and the techniques that make it sample efficient. Finally, we analyze the complexity of DPSA.

In DPSA, the Simulate() function (lines ) runs SA on K copies of a system in parallel. This function is called in two different contexts: during learning to collect feedback (line ) and after the learning process has ended (line ). The main difference is that in the first context the function runs a short simulation in K different constant temperatures (one fixed temperature for each copy). In the second case, the function runs for significantly longer and all K systems run on the learned OTR with a linear schedule (discussed shortly). In addition, in the first case, all copies are initialized with the same random value from the domain. This is done because we want to identify the effect of different constant temperatures on the simulation step, and initializing them with different initial states would add more noise to the feedback. In the second case, we initialize with the best state found so far. Note that, to avoid confusion, we use to refer to the actual decision variable and to refer to on the k-th copy of the system. The variable (line ) is used to represent the context in which the function was called. Depending on its value, variables of all K systems are initialized and the length of the simulation is set as we discussed (lines ). After that, the main simulation starts and runs for iterations.

Figure 2: Overview of DPSA Algorithm

At the start of each iteration, each agent shares the current state of each system (i.e. the variable assignment of each system) with their neighbours (lines ). Each agent then updates and performs other operations related to the modified ALS (discussed shortly) (line ). After that, for each of the K systems, agent picks a value from its domain (line ). If is discrete, it picks this value uniform randomly. If

is continuous, it picks it either using a Gaussian distribution

or a Uniform distribution

. Here, is an input parameter; and is the bound of . The difference is that Gaussian gives preference to a nearby value of the current assigned value, while uniform does not have any preference. Afterward, each agent selects the temperature for the current iteration i.e. (line 15). If it is called during the learning context, it is always set to a constant. More precisely, it is set to the k-th value of (from lines 29, 31). Otherwise, if the learned OTR is ; it uses a linear schedule to calculate the temperature using the Equation 2:

(2)

Finally, agents assign the value to with the probability where is the local gain if the value is changed (line 16). If this gain is non-negative, it will always be changed (since ). Otherwise, it will be accepted with a certain probability less than one.

We now describe our modification of ALS. This is used to collect feedback from the simulations during learning and to give DPSA its anytime property. We modify ALS in the following ways:

  • Since DPSA simulated K systems in parallel, modified_ALS() keeps track of the best state and the cost found by each system separately within the duration of a call of Simulate() function. This is used for feedback.

  • Modified_ALS() also keeps track of the best state and cost across all K systems and all the calls of the Simulate() function. Using this, agents assign values to their decision variables. This is used to give DPSA its anytime property.

The first part of the modification can easily be done by running each system at each call with its separate ALS. For the second part, we can have a meta-ALS that utilizes information of the ALSs in the first part to calculate the best state and cost across calls and systems. Due to space constraint we omit the pseudo code.

We now discuss the learning component (lines ). Here, we start with a probability distribution over a large temperature region where is the parameter vector of the distribution. At the start of each iteration, the root agent samples K points (i.e. a set of constant temperatures from this distribution (line 29)). Agents then propagate this information down the pseudo-tree (lines 28-33). After that, agents synchronously call the Simulate() function for times (line 35). At each call, agents simulate SA in the K sampled constant temperatures (i.e. set T) in parallel. Then using modified_ALS(

), agents collect feedback i.e. cost of the best solution found by the simulation (line 36). According to law of large numbers, the average should be close to the expected value i.e. the actual feedback given large

. So we take a mean over feedback (lines 37-38). Note that in the pseudo-code, we use to refer to the best cost found by the k-th system in the S-th call and we use to refer to the actual best cost found so far. After all the feedback is collected, we use it to update the parameter vector using the updateParameter() function (line 39). The updateParameter() function takes the G best sample point (lines 19-20) and uses them to update the parameter vector (line 21). In this way, agents iteratively learn the parameter vector .

The parameter vector and its update depends on the particular distribution used. In this paper, we focus on two particular distributions namely Gaussian and uniform . The parameter vector for is

and consists of the mean and the standard deviation. The new parameter vector is calculated as:

(3)

The parameter vector for is and consists of the current bound of temperature region. The new parameter vector is calculated as:

(4)

Finally, is updated as shown in Equation 5 where is the learning rate.

(5)

Updating parameters in the way discussed above requires a considerable amount of iterations and samples. To reduce the number of iterations and samples required, we now discuss a few techniques that we used in the experimental section. First, when the number of parallel simulations is small (i.e. the value of K is small) taking a random sample in a large range is not efficient, and taking a stratified sample will cover a large range better. For example, when using , we may take samples at regular interval using Equation 6:

(6)

Second, when

is small, the estimation of expected cost becomes noisy. To address this, when two sample points produce feedback within a bound of each other, we consider them equally good. We calculate this bound

as shown in Equation 7:

(7)

Here, stands for sensitivity and is an algorithm parameter. According to this, we may calculate the in line 19 as follows: G-th best of .

Finally, when is small, setting learning rate to a larger value will speed up the learning process. However, if it is set too high, the algorithm might prematurely converge or skip the optimal temperature. Additionally, we can terminate before if all the sample points are within of each other.

We now provide an example of the learning process:
Example: Suppose, we have and we use a uniform distribution. In the first round, the sampled points will be (when taken using the regular interval):

Let the feedback from each point be:

The selected sample points (top points) will be:

Finally, the parameter update will be (min and max of ):

This process will repeat until the termination conditions meet. We give a visual overview of the process in Figure 2.

After the learning process ends, agents call the Simulate() function for the final time at line . At this time, the simulation usually runs for longer on the learned optimal temperature region. This concludes our discussion on the learning component. It is worth noting even though we apply this learning component to learn parameter value for SA, it can also be applied to learn parameter(s) for other DCOP algorithms. In this way, it can be thought of as a generic distributed parameter learning algorithm for DCOPs.

In terms of complexity, the main cost is yielded by the Simulate() function. Each iteration of this function requires calculating the local gain for systems. The calculation of local gain requires complexity where is the number of the neighbours. Hence, the total computation complexity is . In terms of communication complexity, variable assignments are transferred at each iteration which gives it complexity. Finally, agents have to save K local variable assignments, each of which requires memory, meaning the total memory requirement will be . It is worth mentioning that the memory requirement of modified ALS() is where is the height of the BFStree.

5 Experimental Results

Algorithm = 25 = 50 = 75
P = 0.1 P = 0.6 P = 0.1 P = 0.6 P = 0.1 P = 0.6
DSA-C 432 5725 2605 27163 7089 65519
DSA-SDP 325 5635 2365 27210 6701 65600
GDBA 386 5465 2475 26950 6867 65156
MGM-2 352 5756 2481 27421 6962 65988
PD-Gibbs 398 5875 2610 27350 7178 65650
MS_ADVP 400 5805 2550 27400 7058 66008
DSAN 408 5802 2639 27413 7224 66085
DPSA 268 5358 2136 26240 6276 63998
Table 1: Comparison of DPSA and the benchmarking algorithms on difference configuration of random DCOPs.

Initially, we evaluate DPSA against the state-of-the-art DCOP algorithms on 7 different benchmarks. We then test DPSA and DSAN against the state-of-the-art F-DCOP solvers. Finally, we present comparative solution quality produced by DPSA and DSAN on a MIF-DCOP setting.

For the former, we consider the following seven benchmarking DCOP algorithms: DSA-C (p = 0.8), MGM-2 (offer probability 0.5), Max-Sum ADVP, DSA-SDP (pA = 0.6, pB = 0.15, pC = 0.4, pD = 0.8), DSAN, GDBA and PD-Gibbs. For all the benchmarking algorithms, the parameter settings that yielded the best results are selected. We compare these algorithms on six Random DCOP settings and Weighted Graph Coloring Problems (WGCPs). For all settings, we use Erdős-Rényi topology (i.e. random graph) to generate the constraint graphs [5]. For random DCOPs, we vary the density from 0.1 to 0.6 and the number of agents form 25 to 75. For WGCPs, we set and number of agents to 120. We then take constraint costs uniformly from the range and set domain size to 10. For all the benchmarks, we use the following parameters for DPSA and . Note that, DPSA uses for CE and initializes parameter vector with a large temperature region (this initial value of is used in all the settings of this section). In all of the settings described above, we run the benchmarking algorithms on 50 independently generated problems and 50 times on each problem for 500 ms. In order to conduct these experiments, we use a GCP-n2-highcpu-64 instance, a cloud computing service which is publicly accessible at cloud.google.com. Note that unless stated otherwise, all differences shown in this section are statistically significant for .

Figure 3 shows a comparison between DPSA and the benchmarking algorithms on the random DCOPs ( = 75 and p=0.1) setting. While Table 1

presents the variance of performances of these algorithms with the number of agents and density. When the density is low, the closest competitor to DPSA is DSA-SDP. Even though both of the algorithms keep on improving the solution until the end of the run, DPSA makes significant improvement when it starts running in the optimal temperature region after the learning process ends, and we see a big decline after 250 ms. From the results in Table 

1, it can be observed that DPSA produces solutions that are better than DSA-SDP depending on the number of agents. However, when the density is high (), GDBA is the closest competitor to DPSA. In dense settings, DPSA outperforms GDBA by . Other competing algorithms perform equal or worse than GDBA and DSA-SDP and produce even bigger performance difference with DPSA (up to 15% - 61% in sparse settings and 9.6% - 3.2% in dense settings). Also note that the optimal cost for ( = 25, p = 0.1), which we generate using the well-known DPOP [18] algorithm, is 253, while DPSA produces 268 in the samesetting.

Figure 3: Comparison of DPSA and the benchmarking algorithms on random DCOPs ( = 75, P = 0.1).
Figure 4: Comparison of DPSA and the benchmarking algorithms on weighted graph coloring problems ( = 120, P = 0.05).

Figure 4 shows a comparison between DPSA and the benchmarking algorithms on the WGCPs ( = 120 and p=0.05) benchmark. We see a similar trend here as observed in the random DCOP settings. For the first 1200 iterations (up to 250 ms) i.e. during the learning stage, DPSA improves the solution with several small steps, and after that it takes a big step toward a better solution when ran longer in the OTR. In this experiment, DPSA demonstrates its notable performance. Among the benchmarking algorithms, GDBA is the closest but is still outperformed by DPSA by 1.3 times. Among the other algorithms, DPSA finds solutions that are times better. From the trend seen in Figures 3 and 4 and performance produced by DPSA compared to the current state-of-the-art DCOP algorithms signifies that DPSA applied in the optimal temperature region is extremely effective at solving DCOPs. Since both DPSA and DSAN apply same principle; the big performance gain of DPSA in terms of solution quality can be credited to the fact that DPSA runs significantly longer near the optimal temperature.

Figure 5: Comparison of DPSA and the benchmarking algorithms on binary quadratic F-DCOPs ( = 50, P = 0.2).

We now compare DPSA and DSAN against current state-of-the-art F-DCOP solvers namely: PFD and HCMS on a large random graph with binary quadratic functions of the form . To generate random graphs, we use Erdős-Rényi topology with number of agents set to 50 and . We choose coefficients of the cost functions (a, b, c) randomly between and set the domains of each agent to . We run all algorithms on 50 independently generated problems and 50 times on each problem for 1 second and use the same hardware setup as the previous settings. For PFD, we use the same configuration suggested in [3]. For HCMS, we choose the number of discrete points to be 3. The discrete points are chosen randomly between the domain range. Finally, we use following parameters for DPSA and

. To select neighbours in DPSA and DSAN, we use both uniform distribution over the domain and Normal distribution with

.

Figure 5 shows a comparison between DPSA and the benchmarking algorithms on the binary quadratic F-DCOP ( = 50, P = 0.2) benchmark. For this benchmark, uniform distribution for neighbour selection performs better than normal distribution both for DPSA and DSAN. DSAN (Uniform) produces similar solution quality as PFD. However, it has significantly improved the anytime performance. On the other hand, DPSA produces solutions of significantly improved quality with the closest competitor PFD and DSAN being outperformed by 10.1%. This demonstrates that DPSA is also an effective F-DCOPs solver.

Figure 6: Comparison of DPSA and the benchmarking algorithms on binary quadratic MIF-DCOPs ( = 50, P = 0.2).

Finally, we test DPSA and DSAN on the MIF-DCOP setting. For this, we use the same set of problems as F-DCOPs except that we randomly make 50% of the variables discrete and set their domain to . Figure 6 shows anytime performance of DPSA and DSAN on the binary quadratic MIF-DCOP ( = 50, P = 0.2) benchmark. We see a similar trend as we have seen in F-DCOPs benchmark. Here, DSAN converges fast to local optima and fails to make any further improvement. On the other hand, DPSA avoids local optima through maintaining a good balance between exploration and exploitation by operating in the optimal temperature region and produces a 46% better solution.

6 Conclusions and Future Works

In this paper, we present the MIF-DCOP model that generalizes over the DCOP and F-DCOP models. We then propose a versatile algorithm called DPSA that can be applied to MIF-DCOPs. Finally, in our experimental section we present results that indicate DPSA out performs the state-of-the-art DCOP, F-DCOP algorithm by a significant margin and produces high quality solution compared to DSAN in MIF-DCOP benchmark. The learning problem we present in this paper can also be modeled using multi-armed bandit with continuous action space that we will explore in our future work. We would also like to apply the learning component to other DCOP algorithms such as DSA-SDP and Max-sum with damping to improve their solution quality.

References

  • [1] M. H. Alrefaei and S. Andradóttir (1999) A simulated annealing algorithm with constant temperature for discrete stochastic optimization. Management science 45 (5), pp. 748–764. Cited by: §1.
  • [2] M. Arshad and M. C. Silaghi (2004) Distributed simulated annealing. Cited by: §1, §2.3.
  • [3] M. Choudhury, S. Mahmud, and Md. M. Khan (2020) A particle swarm based algorithm for functional distributed constraint optimization problems. In AAAI, Cited by: §1, §5.
  • [4] D. T. Connolly (1990) An improved annealing scheme for the qap. European Journal of Operational Research 46, pp. 93 – 100. Cited by: §1.
  • [5] P. Erdős and A. Rényi (1960) On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci 5 (1), pp. 17–60. Cited by: §5.
  • [6] A. Farinelli, A. Rogers, and N. R. Jennings (2014) Agent-based decentralised coordination for sensor networks using the max-sum algorithm. Autonomous agents and multi-agent systems 28 (3), pp. 337–380. Cited by: §1.
  • [7] A. Farinelli, A. Rogers, A. Petcu, and N. R. Jennings (2008) Decentralised coordination of low-power embedded devices using the max-sum algorithm. In AAMAS, Cited by: §1.
  • [8] F. Fioretto, W. Yeoh, E. Pontelli, Y. Ma, and S. J. Ranade (2017) A distributed constraint optimization (dcop) approach to the economic dispatch with demand response. In AAMAS, Cited by: §1.
  • [9] S. Fitzpatrick and L. Meetrens (2003) Distributed sensor networks a multiagent perspective, chapter distributed coordination through anarchic optimization. Kluwer Academic Dordrecht. Cited by: §1.
  • [10] K. D. Hoang, W. Yeoh, M. Yokoo, and Z. Rabinovich (2019) New algorithms for functional distributed constraint optimization problems. arXiv preprint arXiv:1905.13275. Cited by: §1, §2.2.
  • [11] C. Hsin and M. Liu (2004) Network coverage using low duty-cycled sensors: random & coordinated sleep algorithms. In Proceedings of the 3rd international symposium on Information processing in sensor networks, pp. 433–442. Cited by: §1.
  • [12] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi (1983) Optimization by simulated annealing.. Science 220 4598, pp. 671–80. Cited by: §1.
  • [13] D. P. Kroese, T. Taimre, and Z. I. Botev (2011) Handbook of monte carlo methods. Cited by: §1, §2.5.
  • [14] R. T. Maheswaran, J. P. Pearce, and M. Tambe (2004) Distributed algorithms for dcop: a graphical-game-based approach.. In ISCA PDCS, pp. 432–439. Cited by: §1.
  • [15] R. T. Maheswaran, M. Tambe, E. Bowring, J. P. Pearce, and P. Varakantham (2004) Taking DCOP to the real world: efficient complete solutions for distributed multi-event scheduling. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004., pp. 310–317. Cited by: §1.
  • [16] P. J. Modi, W. Shen, M. Tambe, and M. Yokoo (2005) ADOPT: asynchronous distributed constraint optimization with quality guarantees. Artificial Intelligence 161 (1-2), pp. 149–180. Cited by: §2.1.
  • [17] S. Okamoto, R. Zivan, A. Nahon, et al. (2016) Distributed breakout: beyond satisfaction.. In IJCAI, pp. 447–453. Cited by: §1.
  • [18] A. Petcu and B. Faltings (2005) A scalable method for multiagent constraint optimization. In IJCAI, Cited by: §5.
  • [19] R. A. Rutenbar (1989) Simulated annealing algorithms: an overview. IEEE Circuits and Devices magazine 5 (1), pp. 19–26. Cited by: §1.
  • [20] R. Stranders, A. Farinelli, A. Rogers, and R. Jennings (2009) Decentralised coordination of continuously valued control parameters using the max-sum algorithm. In Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pp. 601–608. Cited by: §1.
  • [21] R. Stranders (2010) Decentralised coordination of information gathering agents. Cited by: §3.
  • [22] N. Thien, W. Yeoh, H. Lau, and R. Zivan (2019-03) Distributed gibbs: a linear-space sampling-based dcop algorithm. Journal of Artificial Intelligence Research 64, pp. 705–748. External Links: Document Cited by: §1.
  • [23] T. Voice, R. Stranders, A. Rogers, and N. R. Jennings (2010) A hybrid continuous max-sum algorithm for decentralised coordination.. In Proceedings of the 19th European Conference on Artificial Intelligence, pp. 61–66. Cited by: §1.
  • [24] M. Yokoo, E. H. Durfee, T. Ishida, and K. Kuwabara (1998) The distributed constraint satisfaction problem: formalization and algorithms. IEEE Transactions on knowledge and data engineering 10 (5), pp. 673–685. Cited by: §1.
  • [25] W. Zhang, G. Wang, Z. Xing, and L. Wittenburg (2005) Distributed stochastic search and distributed breakout: properties, comparison and applications to constraint optimization problems in sensor networks. Artif. Intell. 161, pp. 55–87. Cited by: §1.
  • [26] R. Zivan, S. Okamoto, and H. Peled (2014) Explorative anytime local search for distributed constraint optimization. Artificial Intelligence 212, pp. 1–26. Cited by: §1.
  • [27] R. Zivan, S. Okamoto, and H. Peled (2014) Explorative anytime local search for distributed constraint optimization. Artif. Intell. 212, pp. 1–26. Cited by: §2.4.
  • [28] R. Zivan and H. Peled (2012) Max/min-sum distributed constraint optimization through value propagation on an alternating dag. In AAMAS, Cited by: §1.