Best-First Width Search for Multi Agent Privacy-preserving Planning

06/10/2019
by   Alfonso E. Gerevini, et al.
University of Brescia
0

In multi-agent planning, preserving the agents' privacy has become an increasingly popular research topic. For preserving the agents' privacy, agents jointly compute a plan that achieves mutual goals by keeping certain information private to the individual agents. Unfortunately, this can severely restrict the accuracy of the heuristic functions used while searching for solutions. It has been recently shown that, for centralized planning, the performance of goal oriented search can be improved by combining goal oriented search and width-based search. The combination of these techniques has been called best-first width search. In this paper, we investigate the usage of best-first width search in the context of (decentralised) multi-agent privacy-preserving planning, addressing the challenges related to the agents' privacy and performance. In particular, we show that best-first width search is a very effective approach over several benchmark domains, even when the search is driven by heuristics that roughly estimate the distance from goal states, computed without using the private information of other agents. An experimental study analyses the effectiveness of our techniques and compares them with the state-of-the-art.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

06/18/2019

Novelty Messages Filtering for Multi Agent Privacy-preserving Planning

In multi-agent planning, agents jointly compute a plan that achieves mut...
10/31/2018

Privacy Preserving Multi-Agent Planning with Provable Guarantees

In privacy-preserving multi-agent planning, a group of agents attempt to...
02/14/2021

Partial Disclosure of Private Dependencies in Privacy Preserving Planning

In collaborative privacy preserving planning (CPPP), a group of agents j...
08/16/2020

Differentially Private Multi-Agent Planning for Logistic-like Problems

Planning is one of the main approaches used to improve agents' working e...
09/10/2019

Privacy-Preserving Bandits

Contextual bandit algorithms (CBAs) often rely on personal data to provi...
03/07/2019

A Privacy-preserving Disaggregation Algorithm for Non-intrusive Management of Flexible Energy

We consider a resource allocation problem involving a large number of ag...
08/07/2019

A Privacy-preserving Method to Optimize Distributed Resource Allocation

We consider a resource allocation problem involving a large number of ag...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Over the last years, several frameworks for multi-agent (MA) planning have been proposed, e.g., [brafman2008one, nissim2014distributed, torreno2014fmap]. Most of them consider, in different ways, the agents’ privacy: some or all agents have private knowledge that cannot be communicated to other agents during the planning process and plan execution. This prevents the straightforward usage of most of the current powerful techniques developed for centralized (classical) planning, which are based on heuristic functions computed by using the knowledge of all the involved agents.

For classical planning, it has been shown that width-based search algorithms can solve instances of many existing domains in low polynomial time when they feature atomic goals. Width-based search relies on the notion of “novelty”. The novelty of a state has been originally defined as the size of the smallest tuple of facts that holds in for the first time in the search, considering all previously generated states [LipovetzkyG12]. Width-based search are pure exploration methods that are not goal oriented. For computing plans that are not necessarily optimal, the performance of goal oriented search can be improved by combining it with width-based search. The combination yields a search procedure, called best-first width search, that outperforms the state-of-the-art planners even when the estimate of the distance to the problem goals is inaccurate [DBLP:conf/aaai/LipovetzkyG17]. The heuristic used to guide such a procedure uses all the knowledge of the problem specification.

In the setting of MA planning, computing search heuristics using the knowledge of all the involved agents can require many exchanges of information among agents, and this may compromise the agents’ privacy. For preserving the privacy of the involved agents, the distance to the problem goals is estimated by using the knowledge of a single agent. However, this estimate is much more inaccurate than the estimation obtainable using the knowledge of all the agents. Since for classical planning best-first width search performs very well even when the estimate of the goal distance is inaccurate, such a procedure is a good candidate to effectively solve MA-planning problems without compromising the agents’ privacy. The contribution of the paper is investigating the usage of best-first width search for decentralized privacy-preserving MA-planning. Specifically, we propose a new search procedure -, which uses width-based exploration in the form of novelty-based preferences to provide a complement to goal-directed heuristic search.

For preserving the privacy of the involved agents, the private knowledge shared among agents is encrypted. An agent shares with the other agents a description of its search states in which all the private facts of that are true in a state are substituted with a string code obtained by encrypting all the private fact names of together. This encryption has an impact on the measure of novelty, and hence it also affects the definition of the heuristic guiding the search.

We adapt the definition of classical width [LipovetzkyG12] to MA planning, and we propose a definition of state novelty for which - can be complete when states are pruned only if their novelty is bigger than the width of the problem. Then, we define several heuristics for which the preferred states in the open list are the ones with the smallest novelty and, among those, the ones with the lowest goal distance. For this purpose, we define the novelty in another different way, taking the heuristics used to estimate the goal distance into account [DBLP:conf/aaai/LipovetzkyG17].

Finally, an experimental study evaluates the effectiveness of the proposed heuristics, and compares the proposed approach with state-of-the-art planners which preserve agents’ privacy in a form weaker than our approach, showing that best-first width search is competitive also for MA planning.

2 Background

The MA-STRIPS planning problem.

Our work relies on MA-STRIPS, a “minimalistic” extension of the STRIPS language for MA planning [brafman2008one], that is the basis of the most popular definition of MA-planning problem (see, e.g., [nissim2014distributed, maliah2016stronger]).

Definition 1

A MA-STRIPS planning problem for a set of agents is a 4-tuple where:

  • is the set of actions agent is capable of executing, and s.t. for every pair of agents and ;

  • is a finite set of propositions;

  • is the initial state;

  • is the set of goals.

Each action consists of a name, a set of preconditions, , representing facts required to be true for the execution of the action, a set of additive effects, , representing facts that the action makes true, a set of deleting effects, , representing facts that the action makes false, and a real number, , representing the cost of the action. A fact is private for an agent if other agents can neither achieve, destroy or require the fact [brafman2008one]. A fact is public otherwise. An action is private if all its preconditions and effects are private; the action is public, otherwise. A state obtained by executing a public action is said to be public; otherwise, it is private.

To maintain agents’ privacy, the private knowledge shared among agents can be encrypted. An agent can share with the other agents a description of its search state in which each private fact that is true in a state is substituted with a string obtained by encrypting the fact name [Bonisoli2018]. This encryption of states does not reveal the names of the private facts of each agent to other agents, but an agent can realize the existence of a private fact of agent and monitor its truth value during search. This allows the other agents to infer the existence of private actions of , as well as to infer their causal effects. Another way to share states containing private knowledge during the search is to substitute, for each agent , all the private facts of that are true in a state with a string obtained by encrypting all private fact names of together [nissim2014distributed]. Such a string denotes a dummy private fact of , which is treated by other agents as a regular fact. The work presented in this paper uses this latter method for the encryption of states. With this method, other agents can only infer the existence of a group of private facts of , since the encrypted string contained in the states exchanged by substitutes a group of an arbitrary number of private facts of .

Width-based Search.

Pure width-based search algorithms are exploration algorithms that do not look at the goal at all. The simplest such algorithm is , which is a plain breadth-first search where newly generated states that do not make an atom true for the first time in the search are pruned. The algorithm is similar except that a state is pruned when there are no atoms and such that the pair of atoms , is true in and false in all the states generated before .

is a normal breadth-first except that newly generated states are pruned when their “novelty” is greater than , where the novelty of is iff there is a tuple of atoms such that is the first state in the search that makes all the atoms in true, with no tuple of smaller size having this property [LipovetzkyG12]. While simple, it has been shown that manages to solve arbitrary instances of many of the standard benchmark domains in low polynomial time provided that the goal is a single atom. Such domains can be shown to have a small and bounded width that does not depend on the instance size, which implies that they can be solved (optimally) by running . Moreover, runs in time and space that are exponential in and not in the number of problem variables.

The procedure , that calls and , sequentially, has been used to solve instances featuring multiple (conjunctive) atomic goals, in the context of Serialized IW (), an algorithm that calls for achieving one atomic goal at a time [LipovetzkyG12].

Width-based exploration in the form of simple novelty-based preferences instead of pruning can provide an effective complement to goal-directed heuristic search without sacrificing completeness. Indeed, it has been recently shown that the combination of width-based search and heuristic search, called best-first width search (), yields a search scheme that is better than both, and outperforms the state-of-the-art planners [DBLP:conf/aaai/LipovetzkyG17].

3 Related Work

The MA-planning algorithm most similar to ours is  [nissim2014distributed].  is a distributed best first search that for each agent considers a separate search space. The existing work investigating the use of a distributed A* for partial-order MA-planning shares the motivations on preserving the agents’ privacy with ours [torreno2014fmap]. Differently from this approach, our MA-planning procedure searches in the space of world states, rather than in the space of partial plans, and it exchanges states among agents rather than partial plans.

Our work is also related to the one on developing heuristics for MA-planning. stolba2015comparison (stolba2015comparison) study the use of heuristic functions based on the heuristic of the well-known planner . Similarly, the work in [stolba2014relaxation] proposes a distributed algorithm computing a complete relaxed planning graph and, subsequently, extracting a relaxed plan from the distributed relaxed planning graph. Differently, our heuristics combine the novelty measure of search states with a more inaccurate, but computationally cheaper estimate of the cost required to achieve the problem goals.

Using width-based search for MA planning is not a novel idea. bazzotti2018iterative (bazzotti2018iterative) study the usage of Serialized-IW (abbreviated by -) in the setting of MA planning. The MA problem solved by - is split into a sequence of episodes, where each episode is a subproblem solved by IW, returning a path to a state where one more problem goal has been achieved with respect to the last episode . Our approach does not split the MA problem into subproblems, but solves the whole problem at once by using the novelty as a heuristic to guide the search.

 search was also used for solving a classical planning problem obtained from the compilation of a MA-planning problem [Muise2015]. This work applies to centralized MA planning, while our work investigates the distributed MA-planning problem.

An important difference between our approach and the existing ones using heuristic search for distributed privacy-preserving MA planning (e.g. [maliah2016stronger, stolba2015admissible]) is that with our approach the public projection of public actions is not shared. Our conjecture is that without sharing such a projection it is more difficult to infer private preconditions and effects of public actions, since the agents ignore their existence. While sharing the public projection of public actions may be useful to compute more accurate search heuristic, our approach is nevertheless competitive with the state-of-the-art planners.

4 Width-based Search for MA planning

A problem of Serialized-IW and - is that they are incomplete. In this section, we propose another approach of using width-based search for MA planning, which guarantees that a solution is found when the problem is solvable.

1 Procedure --
2        while  is not or is not 111More precisely, when the and lists of an agent become empty, the agent sends a special message to the other agents representing the fact that its own lists are empty. Similarly, the agent sends another special message to the others when its own open list is not empty anymore. The algorithm terminates when the lists of all the agents are empty. do
3               foreach  do
4                     
5               end foreach
6               if  then /* Plan found */
7                     
8               end if
9              if  was generated by agent and is public then
10                     
11               end if
12              foreach  s.t.  do /* Expand */
13                      if  then
14                            
15                      end if
16                     
17               end foreach
18              
19        end while
20      return failure
21
Algorithm 1 -- run by agent from the initial state to achieve goals using only set of actions of . The output is a single-agent solution plan for , or failure. Parameter is an upper bound for the novelty of expanded states. is the accumulated cost, and is the eval function to sort the open list.

Algorithm 1 shows a search procedure for an agent of the MA-planning problem combining width-based search and goal-directed search, that we call --. Parameter is an upper bound for the novelty of states that can be expanded, i.e., states with novelty greater than are pruned from the search space. The version of the algorithm without this pruning is called -; in this version the novelty is used as a preference criterion for ranking the search states in the open list.

Each agent considers a separate search space, since each agent maintains its own list of open states, , and, when an agent expands an open state, it generates a set of successor states using its own actions. Moreover, each agent also maintains its own list of received messages to process, . Algorithm 1 assumes the presence of a separate thread listening for incoming messages sent from other agents; each time a message is received, it is added to the end of list.

Agent iteratively expands the states in the open list and those contained in the received messages. Loop 4–23 is repeated until the lists and are empty. Agent extracts all the states in , computes the novelty according with the states generated or received by , computes the given heuristic function , and adds the states to the open list. Then, extracts the best state from according to (steps 5–9). We considered as a sequence of arbitrary heuristics that are applied consecutively to break ties. Each time a state is extracted from , first checks if the state satisfies the goals of the planning problem. If it does, agent , together with the other agents, reconstructs the plan achieving and returns its solution single-agent plan (steps 11–13). Once an agent expands a solution state , Procedure performs the trace-back of the solution plan. Agent begins the trace-back, and when it reaches a state received via a message , it sends a trace-back message to the agent who sent . This continues until the initial state is reached. The MA-plan derived from the trace-back is a solution of the MA planning problem. Finally, at step 12 Algorithm 1 returns the plan output by .

Then, agent checks if state is the result of its own public action, and in this case it sends a message to all other agents containing state together with its accumulated cost from the initial state up to (steps 14–16). Finally, expands state by applying the executable actions and, for each successor state of , evaluates the novelty and evaluation function , and decides whether to add in its list according to the novelty of state (steps 17–23).

Algorithm 1 prunes a state according to a novelty measure akin to KatzLMT17 (KatzLMT17) novelty heuristics, but defined instead on the basis of the cost accumulated through the trajectory from the problem initial state to (steps 20-21).

Definition 2 (Accumulated Cost Novelty)

The novelty of a state is the size of the smallest tuple in that: (1) is achieved for the first time during search, or (2) for which every other previously generated state where is true has longer paths, i.e., .

The accumulated cost of the states that are at the same time in the list of agent can be very different, because the search does not necessarily extract the state from with the lowest accumulated cost, and may contain also states incoming from other agents, who visit different search spaces that might contain states with a much greater -value.

To guarantee the agents’ privacy, the private knowledge contained in the search states exchanged among agents is encrypted. The encryption affects the measure of novelty. E.g., consider states , , , where and are private facts of an agent different from . Let denote the encrypted string representing one or more private facts of another agent. With the encryption, the descriptions of these states for are . Assume that the order with which these states are processed by is , , , and . Then, without the encryption, for , , while with the encryption we have , because in the special string representing encrypted facts is true for the first time in the search. This consequently affects the pruning of the search space: 1-- without the encryption prunes from the search space, while 1-- encrypting private facts does not prune .

Lemma 1

The novelty computed over the set of previously generated encrypted states is lower than or equal to the novelty computed over the set of previously generated states without the encryption.

Proof. For simplicity, take a MA-planning problem with only two agents and , and consider the computation of the novelty for . The proof for problems with more than two agents is similar. Take a state and a tuple such that, without the encryption, . With the encryption, we distinguish three cases. (1) Tuple is formed by public or private facts of . In this case, since only the private facts of are encrypted for , the facts forming tuple are the same as without the encryption, and hence even with the encryption . (2) Tuple is formed by at least private facts of agent , and the tuple of private facts of that are true in is different from those of previously generated states such that their accumulated cost is lower than or equal to . Then, with the encryption, the tuple is substituted by a new string. Such a string denotes a dummy fact that is false in all the previously generated states. Hence, by definition, with the encryption , and, of course, it is lower than or equal to . (3) Tuple is formed by at least private facts of agent , the tuple of private facts of that are true in is the same as in a state , has been previously expanded, and . With the encryption, the tuple is substituted by a string , which denotes a dummy fact that, in this case, is true in both and . Therefore, the smallest tuple in that is true for the first time in the search is formed by public or private facts of plus . By definition, with the encryption and, since , such a value of is lower than or equal to .

The definition of width by LipovetzkyG12 (LipovetzkyG12) for the state model induced by STRIPS applies directly to the state model induced by MA-STRIPS.

Definition 3 (MA-STRIPS width)

The width of a MA-STRIPS planning problem is if there is a sequence of tuples such that (1) and for , (2) , (3) all optimal plans for can be extended into optimal plans for with an action , and (4) all optimal plans for are also optimal plans for .

In the definition above, some actions that extend optimal plans for a tuple into optimal plans for tuple can be private. -- does not send states generated by private actions to other agents. In the following theorem, note that the novelty of a state is computed with respect to the search space of agent that generated . The search space of includes its generated states as well as the states received from other agents.

Theorem 1

-- using = is complete for problems with width when w = and the action cost function is .

Proof Sketch. The definition of width (Def. 3) implies that there is an optimal plan where every state along the plan has novelty (Def. 2), inducing a sequence of tuples that complies with the conditions in Definition 3. If , by Definition 2, there must exist at least one tuple , where , such that no other state can be generated with and . -- with is guaranteed to generate each state in , no matter the order in which states are generated, as long as by assumption zero cost actions are not allowed. State expansion order is determined breaking ties by a sequence of search heuristics . No heuristic causes -- to prune nodes, even if , as does not have access to the private actions of other agents and cannot be proved to be safe. When a state in an optimal plan has been generated by another agent, the private facts are encoded. Given Lemma 1, the novelty of such states is guaranteed to be lower or equal, hence they are not going to be pruned by --.

Theorem 2

- is complete.

Proof. - does not prune the search space according to the novelty of search states. In -, each agent expands all the search states reachable from the problem initial state except the private states of agents different from . This is the same set of search states expanded by . Since  is a complete search procedure [nissim2014distributed], even - is complete.

Theorem 3

Let be the set of public facts, and be the set of private facts of agent such that the total number of facts is . Let be the number of encrypted strings denoting dummy fluents that can be sent to agent . -- using heuristic f terminates after expanding at most

  1. states if action costs are 0,

  2. states if action costs are 1,

  3. states when the cost function is ,

where is the number of agents, and is the number of available actions for all agents.

Proof. Let be the number of possible state facts for an agent of the MA problem . We distinguish three cases. (1) When all action costs are one, the longest path an agent can expand has length . For to be expanded, every state along the path needs to have novelty . Therefore, each state either makes a tuple of size true for the first time, or achieves a tuple t with a lower than other previously generated states with . A path cannot be expanded as the state in the path has novelty . For a path to reach length , each state must have added at most one new tuple or improved the -value of at most one tuple to pass the novelty pruning criteria . Since grows monotonically, the -value of a tuple cannot be improved more than once along the same path . Once state is expanded in the path, all tuples have been generated with smaller -values. Therefore, each tuple can let states to be expanded with novelty . Given that we consider at most tuples, in total we can expand states. In the worst case, each agent can expand the state space independently, yielding the overall . (2) In case all action costs are zero, the -value can never be improved once a tuple has been made true by a state. Therefore each tuple can let just one state to be expanded with novelty , and the total number of expanded states is at most . (3) If the cost function maps actions to positive real numbers including zero, then each tuple can be improved with action, the number of actions with different cost that add tuple , which in the worst case is . Therefore the total number of states that can be expanded is .

Domain #Instances 1-- 2--
214 100.0% 100.0% 100.0%
155 85.81% 91.61% 100.0%
185 95.68% 100.0% 100.0%
255 82.75% 66.27% 99.22%
172 0.0% 93.6% 100.0%
277 98.92% 100.0% 99.28%
488 20.9% 98.36% 100.0%
61 54.1% 93.44% 98.36%
95 90.53% 98.95% 98.95%
160 51.88% 36.88% 58.75%
1084 98.89% 99.17% 97.05%
258 99.22% 79.84% 100.0%
Overall 3404 77.59% 91.63% 96.94%
Table 1: Number of instances, and coverages of 1-- and 2-- guided by w.r.t. - guided by computed by each agent using its own actions for problem instances with a single goal.
Corollary 1

Let be the maximum novelty of a state expanded by -, once a plan has been found and - terminates. Then, the number of expanded states is bounded by the complexity of -- when .

If a problem is solved by 1-- it does not entail that the problem has width 1, it entails rather a lower bound, much like the notion of effective width discussed by LipovetzkyG12 (LipovetzkyG12). Still, it provides an estimate on how hard it is to solve a MA planning problem. Table 1 shows that, even for the MA setting, for single atom goals all domains but  generally have width lower than or equal to 2. For this analysis, we considered the domains from the distributed track of the first international competition on distributed and multi-agent planning. For each instance with goal atoms, we created instances with a single goal, and ran 1-- and 2-- over each one of them. The total number of instances is 3404. The search heuristic used for 1-- and 2-- is very simple: the best state in the open list is selected among those with the lowest novelty measure , breaking the ties with the accumulated cost . For each domain we show the total number of single goal instances, and the percentage of instances solved with width equal to 1 and lower than or equal to 2. We considered action costs unitary. Therefore, by Theorem 3, this table shows that 77.59% of the single goal problems can be solved with a quadratic time where is the number of propositions in the problem. These blind and bounded planners perform well with respect to a baseline goal-directed heuristic search planner, - guided by the same heuristic used by planner but computed by each agent using only its own action. However, problems with multiple goals in general have a higher width. In the next section we explore how to scale up to multiple goals.

5 Novelty-based heuristics

In MA planning, agents have private information that they do not want to share with others. The heuristic computed using only the knowledge of one single agent can be much more inaccurate than using the knowledge of all the agents. In this section, we propose some search heuristics for privacy-preserving MA planning that combine the measure of the novelty of search states with the estimated distance to reach the problem goals. The goal distance is estimated by using the knowledge of a single agent. Our conjecture is that, in the MA setting, width-based exploration in the form of novelty-based preferences can provide a complement to goal-directed heuristic search, so that the search can be effectively guided towards a goal state even if the goal-directed heuristics are inaccurate.

The computation and memory cost of determining that the novelty of a state is is exponential in , since all the tuples of size up to but one may be stored and considered. For efficiency, we simplify the computation of novelty to only 3 levels, i.e. is determined to be equal to 1, 2, or greater than 2.

For our heuristic functions, we used the measure of novelty introduced by DBLP:conf/aaai/LipovetzkyG17 (DBLP:conf/aaai/LipovetzkyG17). Given the arbitrary functions , the novelty of a newly generated state is iff there is a tuple of atoms and no tuple of smaller size, that is true in but false in all previously generated states with the same function values and . For example, a new state has novelty if there is an atom that is true in and false in all the states generated before where for all . In the rest of the paper, the novelty measure is sometimes denoted as in order to make explicit in the notation the functions used in the definition and computation of .

The first heuristic we study is , where component denotes the goal-directed heuristic used by planner . The goal distance of an agent from a search state is estimated as the number of actions of in a relaxed plan constructed from to achieve the problem goals. The plan is relaxed because it is a solution of a relaxed problem in which the negative action effects are removed. Substantially, the best state in according to is not selected among those with the lowest estimated goal distance, but among those with the lowest novelty measure , and heuristic  is only used to break the ties. The same heuristic was proposed for classical planning obtaining, surprisingly, good results [DBLP:conf/aaai/LipovetzkyG17]. The difference with respect to classical planning is that for MA planning the distance estimated to reach the problem goals is much more inaccurate. For an agent , the relaxed plan is extracted using only the actions of . When an agent evaluates the search states by using only its own set of actions, it is possible that at least one of the goals is evaluated as unreachable. In this case, the extraction of the relaxed plan fails, and the estimated distance is evaluated as infinite. This is due to the agent not being able to solve the problem alone, needing to cooperate with other agents.

We consider other types of information for the definition of the search heuristic, in order to overcome the problem of the inaccuracy of the goal-directed heuristics computed using only the knowledge of a single agent. In the following, given a search state , and denote the number of goals that are false in and the number of goals that are unreachable from , respectively. For an agent , the number of goals unreachable from is estimated by constructing with the actions of a relaxed planning-graph (RPG) from . The goals that are not contained in the last level of the RPG are considered unreachable.

Planner - with heuristic function , denoted as -, selects the next state to expand among those in with the lowest novelty measure , breaking the ties according to the number of goals that are false in . Heuristic is finally used to break the ties when there is more than one state in with the same lowest measure of novelty and the same lowest number of false goals.

Similarly, - with heuristic function selects the next state to expand among those in with the lowest novelty measure , breaking ties according to the number of goals that are unreachable from . If there is more than one state in with the same lowest measure of novelty and the same lowest number of unreachable goals, the ties are broken according to the number of goals false in . Finally, heuristic is used only if there are still ties to break.

The drawback of for MA planning is that often the estimated goal distance from a search state is infinite, even though is not a dead-end. As stated before, the reason for this is that from the planning problem cannot be solved by an agent alone. With the next search heuristic, we study a method to overcome this problem, for which an agent extracts a relaxed plan from to the (sub)set of problem goals that are reachable from . The estimated distance from to all the problem goals is the number of actions in the relaxed plan plus the number of problem goals unreachable from multiplied by a constant. In our experiments, such a constant is equal to the maximum number of levels in the RPGs constructed so far. This variant of is denoted as . Essentially, the information about the unreachable goals is used to refine the estimated goal distance. We report experiments with -, where the function is obtained from using  in place of as goal-directed component of the evaluation function.

Components and  of heuristic are computationally expensive, since for each expanded state requires the construction of a RPG, and  additionally requires the extraction of a relaxed plan from the RPG. The last two heuristics we study consider the tradeoff between the accuracy of the estimated goal-distance and its computational cost. For this, the construction of the RPG and the extraction of the relaxed plan are not performed for each expanded state, but only for the initial state of the planning problem and the search states incoming from other agents. The facts that are preconditions of the actions in the relaxed plan are called relevant. Let be the last incoming state in the way to state for which the relaxed plan was extracted. For evaluating the goal distance of state , we consider the number of relevant facts that have not been made true in the way from to . This measure is similar to that proposed by DBLP:conf/aaai/LipovetzkyG17 (DBLP:conf/aaai/LipovetzkyG17) for classical planning. The difference with respect to classical planning is that a relaxed plan is extracted for each incoming state, instead of for the states that decrement the number of achieved problem goals in relation to their parent. This is needed as the relevant fluents are not sent among agents in order to avoid compromising privacy. Planner - with considers counter in place of the more computationally expensive components and .

The drawback of heuristic is that, when the number of exchanged messages is high, it still requires the construction of the RPG many times. The construction of the RPG is computationally much more expensive than extracting the relaxed plan and, when such a construction is performed many times, it can become the bottleneck of the search procedure. Thereby, we propose another heuristic which, for each agent , requires the construction of the RPG from only the initial state of the problem. With the aim of maintaining the agents’ privacy, the RPG is still constructed by using only the actions of a single agent. Nevertheless, when the MA-planning problem cannot be solved by an agent alone, the last level of the RPG does not contain the problem goals. For this, the construction of the RPG from the initial state is special, and is done in two steps. The first step is the construction of the RPG from . Then, in the second step, the preconditions of actions of agent that are not added by actions of and are not true in the last level of the RPG, are made true in the last level of the RPG. Finally, the construction of the RPG continues from the last level of the RPG constructed so far.

Consider a state to be expanded. For heuristic , the counter is defined as the number of relevant facts in the RPG constructed from the problem initial state that have not been made true in the way from to . The computation of for differs from , because an agent alone can reconstruct only the portion of trajectory from the last incoming state to , and cannot reconstruct the trajectory from to . The trajectory from to can contain other actions of agent that should be taken into account in the definition of the set of relevant facts that have not been made true by in the way from to . For this, the presence of these actions of is estimated by solving a super relaxed planning problem, i.e., a planning problem with the same initial state of the planning problem, the set of facts that are true in as goals, and a set of actions obtained from the set of actions of by ignoring the action preconditions that are unreachable from , as well as negative action effects. The procedure for the extraction of the super-relaxed plan is similar to the one used by . The positive effects of the actions in such a super-relaxed plan are facts that we estimate have been made true in the way from to . Therefore, for we define the counter as the number of relevant facts that have not been made true in the super-relaxed plan from to and in the way from to .

Domain
From CoDMAP
19 20 20 20 20 20 20
5 6 19 19 20 20 20
18 20 20 20 20 20 20
3 3 20 20 20 20 20
3 3 20 20 20 20 20
14 17 20 20 20 20 20
18 19 20 20 20 20 20
18 18 18 18 18 18 14
20 19 19 20 20 20 20
2 2 2 2 2 2 2
3 3 10 15 15 11 12
20 20 20 20 20 20 20
From MBS
- 0 0 11 11 12 2 19
- 0 0 0 0 1 0 15
-- 0 0 20 20 20 20 19
-- 0 0 20 20 20 20 19
Overall (320) 143 150 259 265 268 253 280
Table 2: Number of problems solved by - with seven different heuristics for the benchmarks problems of CoDMAP and MBS. The best performance is in bold.

6 Experiments

In this section, we present an experimental study aimed at testing the effectiveness of the heuristics described so far. First, we describe the experimental settings; then we evaluate the effectiveness of our heuristics; finally, we compare the performance of our approach with the state of the art.

Our code is written in C++, and exploits the Nanomsg open-source library to share messages [nanomsg]. Each agent uses three threads, two of which send and receive messages, while the other one conducts the search, so that the search is asynchronous w.r.t. the communication routines. The behavior of - depends on the order with which the messages are received by an agent. Each time a run of - is repeated, the agents’ threads can be scheduled by the operating system differently, so that the behavior of - can also be different. Thereby, for each problem of our benchmark, we run - five times and consider the performance of the algorithm as the median over the five runs. When - exceeded the CPU-time limit for more than two of the five runs, we consider the problem unsolved.

The benchmark used in our experiments includes twelve domains proposed by vstolba2015competition (vstolba2015competition) for the distributed track of the first international competition on distributed and MA planning (CoDMAP), and four domains - (shortly, -), -- (--), - (-), -- (--), which were derived by MAFSB17 (MAFSB17). In the following, these latter four domains are abbreviated to MBS. The difference w.r.t. the CoDMAP domains  and  is that for the domains of MBS many private actions need to be executed between two consecutive public actions, and agents must choose among several paths for achieving goals. All domains have uniform action costs.

All tests are run on an InfiniBand Cluster with 512 nodes and 128 Gbytes of RAM, each node has two 8-cores Intel Xeon E5-2630 v3 with 2.40 GHz. Given a MA-planning problem, for each agent in the problem we limited the usage of resources to 3 CPU cores and 8 GB of RAM. Moreover, unless otherwise specified, the time limit was 5 minutes, after which the termination of all threads was forced.

Table 2 shows the number of problems solved by - using seven different heuristics for the benchmark problems of CoDMAP and MBS. Planner -() is the baseline for our comparison, since it does not use novelty-based preferences to guide the search. For six out of sixteen domains, - with  solves almost all the problems. These are the domains with problems that require less interaction among agents.

- with solves few more problems than , and the domains where - performs well are the same as those with . Surprisingly, - with solves many more problems than with  and . The main difference between and is that the novelty-based exploration gives preference according to the number of unachieved goals . This clearly results in a positive interplay with the search procedure. Interestingly, - with solves many problems of CoDMAP domains such as , , and , and several problems from benchmark MBS, which require a greater interaction among different agents.

- with or solves few more problems than with , showing that the information about the number of unreachable goals from search states can be useful. Heuristic is computationally less expensive than and , but the goal-directed component of is less accurate. The results in Table 2 show that the tradeoff between computational cost and accuracy of does not pay off. - with is better than with  and , but solves fewer problems than with , , and . The reason of this behavior is that, when the number of incoming messages is high, heuristic is computationally still quite expensive.

Figure 1: Coverage as a function of the time for - using seven heuristics for benchmarks CoDMAP and MBS.

The cheapest heuristic function to compute is , since the most expensive step in the computation of our heuristics is the RPG construction and constructs the RPG only once. The results in Table 2 indicate that is a good tradeoff between accuracy and computational cost, since - with solves the largest set of problems. It solves several problems even for domain -, which are unsolved by using any other heuristic function.

Figure 1 shows the coverage of - with the seven heuristic functions using a time limit ranging from 0 to 300 seconds. With a time limit of few seconds, the best heuristic is ; with a time limit between 5 and 25 seconds, is the best; with a time limit between 25 seconds and 300 seconds - with heuristic solves the largest set of problems. Interestingly, the coverages obtained using a time limit of 150 seconds are substantially the same as 300 seconds.

Table 3 shows the performance of - using the proposed search heuristics in terms of average time, plan length, number of exchanged messages, number of expanded states, time score, and quality score. The averages are computed over the problems solved by all the compared heuristics. The time score and quality score are the measures originally proposed for the seventh international planning competition [IPC7]. - with is on average the fastest, and the average numbers of exchanged message and expanded states of are therefore the lowest, followed closely by . Remarkably, the average number of exchanged messages and expanded states of - with  and are almost two orders of magnitude greater than with the other heuristics.

Metric
Avg.T 8.62 6.36 1.69 1.57 1.55 2.14 3.51
Avg.L 61.4 55.9 61.4 60.3 61.0 92.8 67.0
kMess 1749.7 1433.8 25.9 25.0 24.8 46.8 97.4
kState 952.8 779.6 17.9 16.7 16.2 34.3 82.1
Score  Q 122.6 134.4 211.1 217.5 213.1 176.1 220.8
Score  T 121.9 124.9 223.9 231.7 228.1 220.0 233.4
Table 3: Average time, average plan length, number of exchanged messages (in thousands), number of expanded states (in thousands), time and quality score of - with seven heuristics for benchmarks CoDMAP and MBS.

The limits of our approach are inherited from the width based algorithms. Width based algorithms such as  perform poorly in problems with high width. Variants such as  and  try to mitigate the high width of the problems by using serialization or heuristics. When the novelty is used for pruning, the algorithms may become incomplete; if novelty is used as a preference, then completeness is not compromised. In general, novelty will help if the paths to the goal have low width, while problems that require reaching states with high width will become more challenging.

Domain - -   
20 20 20 20
8 20 12 17
20 20 16 20
20 20 8 12
18 20 18 18
20 20 20 19
20 20 20 13
4 17 17 16
20 20 20 20
0 2 4 0
1 14 15 18
20 20 20 10
Overall (240) 171 213 190 184
Table 4: Number of problems solved by -, - with heuristic , , and  for benchmark CoDMAP. The best performance is in bold.

Finally, we compared our approach with other three existing approaches, -, which is the approach mostly related to our work, the best configuration of , and .  and  are the best two planners that took part in the CoDMAP competition [vstolba2015competition]. Table 4 shows the results of this comparison for the CoDMAP domains. As for benchmark MBS, - solves no problem, while  and  do not support private goals, which are present in these problems. The time limit used for this comparison is 30 minutes, that is the same limit used in the competition.

The results in Table 4 show that for the competition problems - outperforms - and is better than  and . Another planner we experimented is  [maliah2016stronger] that, to the best of our knowledge, is the state-of-the-art for the CoDMAP problems. We observed that, with our test environment, - solves few more problems than . Remarkably, the only type of information that the agents share by using our approach is the exchanged search states, while , , and  require sharing also the information for the computation of the search heuristics. In this sense, besides solving more problems, - preserves the agents’ privacy more strongly than the other planners.

7 Conclusions

Goal-directed search is the main computational approach that has been investigated in classical planning and, subsequently, in MA planning. For classical planning, width-based exploration in the form of novelty-based preferences provides an effective complement to goal-directed search.

In our setting for MA planning, in order to preserve privacy, we do not transmit the public projection of the actions, and hence the proposed goal-directed heuristics are not as informed as in classical planning. Moreover, the encryption of the private knowledge that the agents share during the search affects the measure of novelty. Nevertheless, this work shows that the combination of new goal-directed heuristics computed efficiently and width-based search is also effective for MA planning. This opens up the possibility to increase privacy preserving properties of MA planning algorithms. For instance, given the success on black-box planning for single agents [frances2017purely], we plan to investigate the implications of fully protected models given as black-boxes, and the effect of novelty pruning in terms of sent messages and privacy.

Acknowledgements

This research carried out with the support of resources of the National Collaborative Research Infrastructure Strategy (NeCTAR), and the Big & Open Data Innovation Laboratory (BODaI-Lab) of the University of Brescia, which is granted by Fondazione Cariplo and Regione Lombardia. Nir Lipovetzky, has been partially funded by DST group.

References