Introduction
MultiAgent Path Finding (MAPF) is the problem of finding collisionfree paths for a given number of agents from their given start locations to their given goal locations in a given environment. MAPF problems arise for aircraft towing vehicles [Morris et al.2016], office robots [Veloso et al.2015], video game characters [Silver2005] and warehouse robots [Wurman, D’Andrea, and Mountz2008], among others.
Several recently developed MAPF solvers scale to large MAPF instances. However, agents typically cannot execute their MAPF plans perfectly since they often traverse their paths more slowly than intended. Their delay probabilities can be estimated but current MAPF solvers do not use this information, which often leads to frequent and runtimeintensive replanning or planexecution failures.
We thus formalize the MAPF Problem with Delay Probabilities (MAPFDP), where each agent traverses edges on an undirected graph (that models the environment) to move from its start vertex to its goal vertex. At any discrete time step, the agent can either execute 1) a wait action, resulting in it staying in its current vertex, or 2) a move action with the intent of traversing an outgoing edge of its current vertex, resulting in it staying in its current vertex with the delay probability and traversing the edge otherwise. The MAPFDP problem is the problem of finding 1) a MAPFDP plan that consists of a path for each agent from its start vertex to its goal vertex (given by a sequence of wait and move actions) and 2) a planexecution policy that controls with GO or STOP commands how each agent proceeds along its path such that no collisions occur during plan execution. There are 2 kinds of collisions, namely vertex collisions (where 2 agents occupy the same vertex at the same time step) and edge collisions (where 2 agents traverse the same edge in opposite directions at the same time step).
We make the following contributions to solve the MAPFDP problem with small average makespans: First, we formalize the MAPFDP problem, define valid MAPFDP plans and propose the use of robust planexecution policies for valid MAPFDP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust planexecution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPFDP plans. Third, we present a 2level MAPFDP solver (called Approximate Minimization in Expectation) that generates valid MAPFDP plans.
Background and Related Work
The MAPF problem is NPhard to solve optimally for flowtime minimization and to approximate within any constant factor less than for makespan minimization [Ma et al.2016]. Searchbased MAPF solvers can be optimal, bounded suboptimal or suboptimal [Standley2010, Luna and Bekris2011, Wang and Botea2011, Goldenberg et al.2014, Sharon et al.2013, Sharon et al.2015, Boyarski et al.2015, Wagner and Choset2015, Ma and Koenig2016, Cohen et al.2016]. Current MAPF solvers typically assume perfect plan execution. However, utilizing probabilistic information about imperfect plan execution can reduce frequent and timeintensive replanning and planexecution failures.
Partially Observable Markov Decision Processes (POMDPs) are a general probabilistic planning framework. The MAPFDP problem can be solved with POMDPs but this is tractable only for very few agents in very small environments since the size of the state space is proportional to the size of the environment to the power of the number of agents and the size of the belief space is proportional to the size of the state space to the power of the length of the planning horizon
[Kurniawati, Hsu, and Lee2008, Ma and Pineau2015]. Several specialized probabilistic planning frameworks, such as transitionindependent decentralized Markov Decision Processes (DecMDPs) [Becker et al.2004] and MultiAgent Markov Decision Processes (MMDPs) [Boutilier1996] can solve larger probabilistic planning problems than POMDPs. In transitionindependent DecMDPs, the local state of each agent depends only on its previous local state and the action taken by it [Goldman and Zilberstein2004]. MAPFDP is indeed transition independent. However, there are interactions among agents since the reward of each agent depends on whether it is involved in a collision and thus on the local states of other agents and the actions taken by them. Fully decentralized probabilistic planning frameworks thus cannot prevent collisions. Fully centralized probabilistic planning frameworks can prevent collisions but are more runtimeintensive and can thus scale poorly. For example, the MAPFDP problem can be solved with transitionindependent MMDPs [Scharpff et al.2016]. In fact, the most closely related research to ours is that on approximating MMDPs [Liu and Michael2016] although it handles different types of dynamics than we do. The runtime of probabilistic planning frameworks can be reduced by exploiting the problem structure, including when interactions among agents are sparse. For example, decentralized sparseinteraction Markov Decision Processes (DecSIMDPs) [Melo and Veloso2011] assume that interactions among agents occur only in welldefined interaction areas in the environment (which is not the case for MAPFDP in general), but typically still do not scale to more than 10 agents. The model shaping technique for decentralized POMDPs [Velagapudi et al.2011] can compute policies for hundreds of agents greedily and UM* [Wagner2015] scales to larger numbers of agents (with identical delay probabilities), but the plan execution for both approaches is completely decentralized and thus cannot prevent collisions.Problem Definition: Planning
A MAPFDP instance is characterized by an undirected graph whose vertices correspond to locations and whose edges correspond to transitions between locations. We are given agents . Each agent has a unique start vertex , a unique goal vertex and a delay probability . A path for agent is expressed by a function that maps each time index to a vertex such that , consecutive vertices and are either identical (when agent is scheduled to execute a wait action) or connected by an edge (when agent is scheduled to execute a move action from vertex to vertex ) and . A MAPF plan consists of a path for each agent.
Problem Definition: Plan Execution
The local state of agent at time step during plan execution is a time index. We set and always update its local state such that it is in vertex at time step . The agent knows its current local state and receives messages from some of the other agents about their local states. At each time step, its planexecution policy maps this knowledge to one of the commands or that control how it proceeds along its path.

[nolistsep]

If the command is at time step :

[nolistsep]

If , then agent executes no action and remains in its current vertex since it has entered its last local state (and thus the end of its path). We thus update its local state to .

If and , then agent executes a wait action to remain in its current vertex . The execution of wait actions never fails. We thus update its local state to (success).

If and , then agent executes a move action from its current vertex to vertex . The execution of move actions fails with delay probability with the effect that the agent executes no action and remains delayed in its current vertex . We thus update its local state to with probability (failure) and with probability (success).


If the command is at time step , then agent executes no action and remains in its current vertex . We thus update its local state to .
Our objective is to find a combination of a MAPF plan and a planexecution policy with small average makespan, which is the average earliest time step during plan execution when all agents have entered their last local states. The MAPF problem is a special case where the delay probabilities of all agents are zero and the planexecution policies always provide GO commands.
Valid MAPFDP Plans
Definition 1.
A valid MAPFDP plan is a plan with 2 properties:

[nolistsep]

[two agents are never scheduled to be in the same vertex at the same time index, that is, the vertices of two agents in the same local state are different].

[an agent is never scheduled to be in a vertex at a time index when any other agent is scheduled to be in the same vertex at time index , that is, the vertex of an agent in a local state has to be different from the vertex of any other agent in local state ].
Figure 1 shows a sample MAPFDP instance where the blue agent has to move from its start vertex to its goal vertex and the red agent has to move from its start vertex to its goal vertex . Agent has to move north to let agent pass. The paths and form a valid MAPFDP plan. However, the paths and a valid MAPF plan but not a valid MAPFDP plan since violates Property 2.
Property 1 of Definition 1 is necessary to be able to execute valid MAPFDP plans without vertex collisions because two agents could otherwise be in the same vertex at the same time step (under perfect or imperfect plan execution). Property 2 is also necessary because an agent could otherwise enter the vertex of some other agent that unsuccessfully tries to leave the same vertex at the same time step (under imperfect plan execution). Property 2 is also necessary to be able to execute valid MAPFDP plans without edge collisions (under perfect or imperfect plan execution).
Robust PlanExecution Policies
We study 2 kinds of decentralized robust planexecution policies for valid MAPFDP plans, which are planexecution policies that prevent all collisions during the imperfect plan execution of valid MAPFDP plans.
Fully Synchronized Policies (FSPs)
Fully Synchronized Policies (FSPs) attempt to keep all agents in lockstep as much as possible by providing a GO command to an agent if and only if the agent has not yet entered its last local state and all other agents have either entered their last local states or have left all local states that precede the local state of the agent itself. FSPs can be implemented easily if each agent sends a message to all other agents when it enters a new local state. An agent can implement its FSP simply by counting how many messages it has received from each other agent and providing a GO command to itself in local state if and only if it has not yet entered its last local state and has received messages over the course of plan execution from each other agent.
Minimal Communication Policies (MCPs)
FSPs have 2 drawbacks. First, agents wait unnecessarily, which results in large average makespans. Second, each agent always needs to know the local states of all other agents, which results in many sent messages. Property 2 of Definition 1 suggests that robust planexecution policies for valid MAPFDP plans could provide a GO command to an agent if and only if the agent has not yet entered its last local state and all other agents have left all local states that precede the local state of the agent itself and whose vertices are the same as the vertex of the next local state of the agent itself. This way, it is guaranteed that the vertex of the next local state of the agent is different from the vertices of all other agents in their current local states. Minimal Communication Policies (MCPs) address these drawbacks by identifying such critical dependencies between agents and obeying them during plan execution, an idea that originated in the context of centralized nonrobust planexecution policies [Hönig et al.2016].
The local state of an agent at any time step during plan execution is a time index . Since we need to relate the local states of different agents, we use in the following not only to refer to the vertex assigned to local state of agent but also to the local state of agent itself (instead of ), depending on the context.
Every valid MAPFDP plan defines a total order on the local states of all agents, which we relax to a partial order as follows:

[nolistsep]

[agent enters a local state during plan execution only after it enters local state ].

with , and [agent enters a local state with a vertex during plan execution only after agent has left a local state with vertex (and thus entered local state ) that precedes local state ].
Property 1 of the partial order enforces that each agent visits its locations in the same order as in the MAPFDP plan. Property 2 enforces that any two agents visit the same location in the same order as in the MAPFDP plan. We can express the partial order with a directed graph whose vertices correspond to local states and whose edges correspond to the partial order given by the two properties above. Property 2 specifies the critical dependencies between agents. Edges are redundant and can then be removed from the directed graph when they are implied by the other edges due to transitivity. A transitive reduction of the directed graph minimizes the number of remaining edges. It can be computed in time [Aho, Garey, and Ullman1972], is unique, contains all edges between local states of the same agent (since they are never redundant) and thus minimizes the number of edges between the local states of different agents.
MCPs can be implemented easily if each agent sends a message to each other agent when agent enters a new local state (= in Property 2) if and only if the transitive reduction contains an edge for some local state (= in Property 2) of agent . Since the transitive reduction minimizes the number of edges between the local states of different agents, it also minimizes the number of sent messages. An agent can implement its MCP simply by counting how many messages it has received from each other agent and providing a GO command to itself in local state if and only if it has not yet entered its last local state and has received a number of messages over the course of plan execution from each other agent that corresponds to the number of incoming edges from local states of agent to its local states .
Figure 2 shows a sample partial order on the local states for the MAPFDP instance from Figure 1 and its valid MAPFDP plan and . , for example, is implied by and can thus be removed from the directed graph. Figure 3 shows the resulting transitive reduction, which implies that agent has to wait in local state until it has received one message from agent during the course of plan execution but can then proceed through all future local states without waiting.
Properties of FSPs and MCPs
Both FSPs and MCPs do not result in deadlocks during the plan execution of valid MAPFDP plans because there always exists at least one agent that is provided a GO command before all agents have entered their last local states (namely an agent with the smallest local state among all agents that have not yet entered their last local states since an agent can wait only for other agents with smaller local states).
Both FSPs and MCPs are robust planexecution policies due to Properties 1 and 2 of valid MAPFDP plans. We now provide a proof sketch for the robustness of MCPs.
First, consider a valid MAPFDF plan and assume that for two agents and with . Then, 1) since according to Property 1 of Definition 1 and 2) since according to Property 2 of Definition 1 (State Property).
Second, we show by contradiction that no vertex collisions can occur during plan execution. Assume that a vertex collision occurs between agents and with when agent is in local state and agent is in local state . Assume without loss of generality that . Then, according to Property 2 of the partial order since according to our vertex collision assumption and according to the State Property. Thus, agent can leave local state only when agent reaches local state , which is a contradiction with the vertex collision assumption.
Third, we show by contradiction that no edge collisions can occur during plan execution. Assume that an edge collision between agents and with occurs when agent changes its local state from to and agent changes its local state from to . Assume without loss of generality that . Case 1) If , then , which is a contradiction with the State Property. Case 2) If , then according to Property 2 of the partial order since according to our edge collision assumption and according to the case assumption. Thus, agent can leave local state only when agent reaches local state , which is a contradiction with the edge collision assumption.
Approximate Minimization in Expectation
MCPs are robust planexecution policies for valid MAPFDP plans that do not stop agents unnecessarily and result in few sent messages. We present a MAPFDP solver, called Approximate Minimization in Expectation (AME), that determines valid MAPFDP plans so that their combination with MCPs results in small average makespans.
AME is a 2level MAPFDP solver that is based on ConflictBased Search (CBS) [Sharon et al.2015]. Its highlevel search imposes constraints on the lowlevel search that resolve violations of Properties 1 and 2 of Definition 1 (called conflicts
). Its lowlevel search plans paths for single agents that obey these constraints and result in small average makespans. The average makespan of a MAPFDP plan is the expectation of the maximum of (one or more) random variables that represent the time steps when all agents enter their last local states. Moreover, the average time step when an agent enters a local state is the expectation of the maximum of random variables as well. It is often difficult to obtain good closedform approximations of the expectation of the maximum of random variables. AME thus approximates it with the maximum over the expectations of the random variables, which typically results in an underestimate but, according to our experimental results, a close approximation. The approximate average time step
when agent enters a local state for a given MAPFDP plan is 0 for and(1) 
otherwise since agent first enters local state at approximate average time step , then might have to wait for messages from other agents that they send when they enter their local states at approximate average time steps and finally has to successfully execute one action (perhaps repeatedly) to enter local state . The average number of time steps that it needs for the successful execution of the action is 1 (for a wait action) if and (for a move action) otherwise. The approximate average makespan of the given MAPFDP plan is then since all agents need to enter their last local states. One might be able to obtain better approximations with more runtimeintensive importance sampling or dynamic programming methods but the runtime of the resulting AME variant would be large since it needs to compute many such approximations.
HighLevel Search
Algorithm 1 shows the highlevel search of AME, which is similar to the highlevel search of CBS. In the following, we point out the differences. Each highlevel node contains the following items:

[nolistsep]

A set of constraints of the form that states that the vertex of agent in local state has to be different from vertex .

A (labeled) MAPFDP plan that contains a path for each agent (that obeys the constraints ) and an approximation (called label) of each average time step when agent enters local state during plan execution with MCPs.

The key of highlevel node that encodes its priority (smaller keys have higher priority) and is equal to the approximate average makespan of MAPFDP plan given by ApproximateAverageMakespan .
When a conflict exists in MAPFDP plan , then the highlevel search creates 2 child nodes of node [Line 15] whose constraints are initially set to the constraints [Line 16] and whose MAPFDP plan is initially set to MAPFDP plan [Line 17]. Assume that the earliest conflict is a violation of Property 1 in Definition 1, in which case the vertices of two agents and in a local state are both identical to a vertex . In this case, AME adds the constraint to the constraints of the first child node and the constraint to the constraints of the second child node [Line 18], thus preventing the conflict in both cases. Assume that the earliest conflict is a violation of Property 2 in Definition 1, in which case the vertex of an agent in a local state and the vertex of some other agent in the immediately preceding local state are both identical to a vertex . In this case, AME adds the constraint to the constraints of the first child node and the constraint to the constraints of the second child node [Line 18], thus preventing the conflict in both cases.
LowLevel Search
LowLevelSearch(, , key) finds a new path for agent and the labels of this path. It uses the paths of the other agents and their labels in but does not update them. (The paths are empty directly after the execution of Line 2.) It performs a focal search with reexpansions in a state space whose states correspond to pairs of vertices and local states (except for those pairs ruled out by constraints in that pertain to agent ) and whose edges connect state to state if and only if (for a wait action) or (for a move action). The gvalue of a state approximates (sic!) the approximate average time step . The start state is and its gvalue is 0. When the lowlevel search expands state , it sets the gvalue of its successor according to Equation (1) to the minimum of its current gvalue and
where is 1 if and otherwise. The lowlevel search decides which state to expand next based on 1) the fvalue of the state, which is the sum of its gvalue and its hvalue, where the hvalue is times the distance from location to location in graph (which is an optimistic estimate of the average number of time steps required to move from location to location ) and 2) the number of conflicts of the path for agent that corresponds to the locations in the states on the found path from the start state to with the paths of other agents.
The lowlevel search starts in Phase 1. The objective in this phase is to find a path for agent so that it enters its last local state with a reasonably small approximate average number of time steps, namely one that is no larger than the approximate average makespan key of the MAPFDP plan in the parent node of node in the highlevel search, and has a small number of conflicts. The first part of the objective tries to ensure that the approximate average makespan of the resulting MAPFDP plan in node is no larger than the one of the MAPFDP plan in the parent node of node , and the second part tries to ensure that the resulting MAPFDP plan has a small number of conflicts so that the highlevel search has a small runtime since it needs to resolve only a small number of conflicts. The lowlevel search thus repeatedly expands a state with the smallest number of conflicts among all states in the priority queue whose fvalues are no larger than key.
If no such state exists, then the lowlevel search switches to Phase 2. The objective in this phase is to find a path for agent so that it enters its last local state with a small approximate average number of time steps. This objective tries to ensure that the approximate average makespan of the resulting MAPFDP plan in node is not much larger than the one of the MAPFDP plan in the parent node of node . The lowlevel search thus repeatedly expands a state with the smallest fvalue among all states in the priority queue.
The lowlevel search terminates successfully when it is about to expand a state with and contains no constraints of the form with . It then sets , the locations that form the path of agent to the corresponding locations in the states on the found path from the start state to and the approximate average time steps to the corresponding gvalues of these states. The lowlevel search terminates unsuccessfully when the priority queue becomes empty. The lowlevel search currently does not terminate otherwise but we might be able to make it complete by using an upper bound on the smallest average makespan of any valid MAPFDP plan, similar to upper bounds in the context of valid MAPF plans [Kornhauser, Miller, and Spirakis1984].
Future Work
The lowlevel search is currently the weakest part of AME due to the many approximations to keep its runtime small which is important since the highlevel search runs many lowlevel searches. We expect that future work will be able to improve the lowlevel search substantially. For example, the approximate average time steps for agents different from agent could be updated before, during or after the local search, which would provide more accurate values for the current and future lowlevel searches as well as the current highlevel search. Once the lowlevel search finds a path for agent and the highlevel search replaces the path of agent in the MAPFDP plan in the current highlevel node with this path, it could update the approximate average time steps of all agents to the ideal approximate average time steps given by Equations (1), for example as part of the execution of ApproximateAverageMakespan on Lines 7 and 21. Many other improvements are possible as well.
Experiments
We evaluate AME with MCPs on a 2.50 GHz Intel Core i52450M PC with 6 GB RAM.
Experiment 1: MAPF Solvers
AME  Push and Swap  Adapted CBS  

id 











random 1  0.058  63.15  71.28 0.34  267  0.031  812.41 0.40  287        
random 2  0.052  66.22  73.02 0.29  257  0.025  768.30 0.43  257        
random 3  0.080  78.44  84.90 0.40  373  0.052  934.59 0.33  387        
random 4  0.063  67.00  72.89 0.37  251  0.028  755.95 0.33  255        
random 5  0.050  65.13  73.98 0.31  255  0.029  875.48 0.47  318  282.079  84.11 0.40  282  
random 6  0.052  62.89  66.98 0.36  257  0.031  830.77 0.32  290        
random 7  0.495  67.22  71.34 0.36  269  0.038  785.55 0.46  274        
random 8  0.042  49.33  51.72 0.35  164  0.024  648.80 0.35  199  197.911  52.35 0.37  163  
random 9  0.051  56.27  61.30 0.27  247  0.052  780.60 0.30  294        
random 10  0.487  60.06  64.77 0.38  234  0.032  750.12 0.35  284        
warehouse 1  0.124  114.32  124.18 0.44  705  0.055  1,399.14 0.43  703        
warehouse 2  0.106  119.74  124.63 0.51  762  0.055  1,620.03 0.60  810        
warehouse 3  0.107  112.96  117.00 0.53  609  0.032  1,295.75 0.53  616        
warehouse 4  0.090  114.90  117.31 0.52  541  0.043  1,246.47 0.67  571        
warehouse 5          0.060  1,453.36 0.54  783        
warehouse 6  0.111  127.65  131.10 0.59  710  0.037  1,437.01 0.58  664        
warehouse 7  0.142  87.45  96.54 0.34  488  0.028  1,154.21 0.60  403        
warehouse 8          0.024  1,233.13 0.58  401        
warehouse 9  0.087  103.51  107.33 0.42  462  0.024  1,088.53 0.44  422        
warehouse 10  0.183  120.76  127.36 0.53  909  0.057  1,541.56 0.62  678       
We compare AME to 2 MAPF solvers, namely 1) Adapted CBS, a CBS variant that assumes perfect plan execution and computes valid MAPFDP plans, minimizes and breaks ties toward paths with smaller and thus fewer actions and 2) Push and Swap [Luna and Bekris2011], a MAPF solver that assumes perfect plan execution and computes valid MAPFDP plans where exactly one agent executes a move action at each time step and all other agents execute wait actions. We generate 10 MAPFDP instances (labeled random 110) in 3030 4neighbor grids with 10% randomly blocked cells and random but unique start and unique goal cells for 35 agents whose delay probabilities for AME are sampled uniformly at random from the delay probability range . In the same way, we generate 10 MAPFDP instances (labeled warehouse 110) in a simulated warehouse environment with random but unique start and unique goal cells on the left and right sides. Figure 4 shows two MAPFDP instances: random 1 (top) and warehouse 1 (bottom).
Table 1
reports for each MAPFDP instance the runtime, the approximate average makespan calculated by AME, the average makespan over 1,000 planexecution runs with MCPs together with 95%confidence intervals and the number of sent messages. Dashes indicate that the MAPFDP instance was not solved within a runtime limit of 5 minutes. There is no obvious difference in the numbers of sent messages of the 3 MAPF(DP) solvers. However, AME seems to find MAPFDP plans with smaller average makespans than Adapted CBS, which seems to find MAPFDP plans with smaller average makespans than Push and Swap. The approximate average makespans calculated by AME are underestimates but reasonably close to the average makespans. AME and Push and Swap seem to run faster than Adapted CBS. In fact, Adapted CBS did not solve MAPFDP instances with more than 35 agents within the runtime limit while AME and Push and Swap seem to scale to larger numbers of agents than reported here (see also Experiment 3).
Experiment 2: Delay Probability Ranges
We use AME with different delay probability ranges. We repeat Experiment 1 with 19 MAPFDP instances generated from the MAPFDP instance labeled “random 1” in Experiment 1, one for each . For each MAPFDP instance, the delay probabilities of all agents are sampled from the delay probability range by sampling the average number of time steps needed for the successful execution of single move actions uniformly at random from and then calculating .
runtime (s) 


messages  

2  0.073  77.92  84.30 0.42  251  
3  0.525  123.92  131.12 0.79  301  
4  0.356  144.61  157.88 0.96  287  
5  0.311  133.55  157.00 0.98  278  
6  0.623  168.51  192.76 1.46  299  
7  0.346  264.78  279.51 2.05  289  
8  0.236  333.09  349.72 2.69  293  
9  0.779  260.58  271.71 2.28  294  
10  1.751  307.63  336.95 2.26  305  
11  2.528  337.15  375.46 2.74  312  
12  1.374  323.87  383.25 2.53  300  
13  0.683  381.63  413.18 3.19  282  
14  2.583  440.94  498.30 3.32  278  
15  1.414  470.06  524.94 3.95  295  
16  7.072  554.32  607.20 4.26  316  
17  2.116  451.32  570.15 3.90  275  
18  3.410  763.44  782.40 6.08  306  
19  5.708  462.71  666.42 5.29  309  
20  7.812  490.26  591.35 3.73  323 
Table 2 reports the same measures as used in Experiment 1, and Figure 5 visualizes the results. Larger delay probability ranges seem to result in larger runtimes, approximate average makespans calculated by AME and average makespans (although there is lots of noise). The differences between the approximate average makespans calculated by AME and average makespans are larger as well but remain reasonable.
Experiment 3: Numbers of Agents
We use AME with different numbers of agents. We repeat Experiment 1 with 50 MAPFDP instances in 3030 4neighbor grids generated as in Experiment 1 for each number of agents.
agents  solved (%)  runtime (s) 





50  0.94  0.166  69.32  75.19  474.62  
100  0.68  4.668  78.48  87.29  1,554.71  
150  0.10  134.155  81.77  96.43  2,940.40  
200  0       
Table 3 reports the same measures as used in Experiment 1, averaged over all MAPFDP instances that were solved within a runtime limit of 5 minutes. AME solves most MAPFDP instances with 50 agents and then degrades gracefully with the number of agents.
Experiment 4: PlanExecution Policies
MCPs  FSPs 


id 

messages 

messages 



random 1  71.28 0.34  267  140.29 0.50  23,109  67.82 0.35  16.68  
random 2  73.02 0.29  257  143.55 0.55  19,316  71.96 0.31  14.27  
random 3  84.90 0.40  373  160.43 0.59  24,098  81.20 0.37  27.71  
random 4  72.89 0.37  251  141.71 0.52  19,587  69.16 0.36  25.38  
random 5  73.98 0.31  255  141.49 0.54  20,794  69.59 0.32  14.98  
random 6  66.98 0.36  257  115.98 0.51  20,597  66.76 0.37  15.19  
random 7  71.34 0.36  269  124.03 0.54  20,481  70.79 0.38  16.53  
random 8  51.72 0.35  164  96.04 0.46  16,665  51.65 0.38  8.81  
random 9  61.30 0.27  247  113.76 0.46  20,976  58.52 0.23  10.33  
random 10  64.77 0.38  234  114.04 0.50  19,834  64.00 0.38  17.51  
warehouse 1  124.18 0.44  705  219.63 0.65  28,794  122.42 0.42  34.59  
warehouse 2  124.63 0.51  762  235.35 0.72  34,154  124.40 0.60  68.68  
warehouse 3  117.00 0.53  609  206.29 0.65  26,647  117.89 0.54  29.61  
warehouse 4  117.31 0.52  541  194.07 0.59  24,889  116.02 0.53  28.09  
warehouse 6  131.10 0.59  710  205.54 0.71  29,462  131.54 0.60  37.41  
warehouse 7  96.54 0.34  488  187.90 0.59  22,401  95.80 0.35  24.91  
warehouse 9  107.33 0.42  462  187.80 0.56  18,950  105.63 0.45  22.21  
warehouse 10  127.36 0.53  909  226.95 0.73  32,903  127.59 0.55  43.78 
We use AME with 3 planexecution policies, namely 1) MCPs, 2) FSPs and 3) dummy (nonrobust) planexecution policies that always provide GO commands. We repeat Experiment 1 for each planexecution policy.
Table 4 reports for each solved MAPFDP instance and planexecution policy the average makespan over 1,000 planexecution runs together with 95%confidence intervals, the number of sent messages for MCPs and FSPs and the average number of collisions for dummy planexecution policies. The number of sent messages is zero (and thus not shown) for dummy planexecution policies since, different from MCPs and FSPs, they do not prevent collisions. The average makespan for MCPs seems to be only slightly larger than that for dummy planexecution policies, and the average makespan and number of sent messages for MCPs seem to be smaller than those for FSPs.
Conclusions
In this paper, we formalized the MultiAgent PathFinding Problem with Delay Probabilities (MAPFDP) to account for imperfect plan execution and then developed an efficient way of solving it with small average makespans, namely with Approximate Minimization in Expectation (a 2level MAPFDP solver for generating valid MAPFDP plans) and Minimal Communication Policies (decentralized robust planexecution policies for executing valid MAPFDP plans without collisions).
References
 [Aho, Garey, and Ullman1972] Aho, A. V.; Garey, M. R.; and Ullman, J. D. 1972. The transitive reduction of a directed graph. SIAM Journal on Computing 1(2):131–137.

[Becker et al.2004]
Becker, R.; Zilberstein, S.; Lesser, V.; and Goldman, C. V.
2004.
Solving transition independent decentralized Markov decision
processes.
Journal of Artificial Intelligence Research
22(1):423–455.  [Boutilier1996] Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Conference on Theoretical Aspects of Rationality and Knowledge, 195–210.
 [Boyarski et al.2015] Boyarski, E.; Felner, A.; Stern, R.; Sharon, G.; Tolpin, D.; Betzalel, O.; and Shimony, S. E. 2015. ICBS: Improved conflictbased search algorithm for multiagent pathfinding. In International Joint Conference on Artificial Intelligence, 740–746.
 [Cohen et al.2016] Cohen, L.; Uras, T.; Kumar, T. K. S.; Xu, H.; Ayanian, N.; and Koenig, S. 2016. Improved solvers for boundedsuboptimal multiagent path finding. In International Joint Conference on Artificial Intelligence, 3067–3074.
 [Goldenberg et al.2014] Goldenberg, M.; Felner, A.; Stern, R.; Sharon, G.; Sturtevant, N. R.; Holte, R. C.; and Schaeffer, J. 2014. Enhanced Partial Expansion A*. Journal of Artificial Intelligence Research 50:141–187.
 [Goldman and Zilberstein2004] Goldman, C. V., and Zilberstein, S. 2004. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research 22:143–174.
 [Hönig et al.2016] Hönig, W.; Kumar, T. K. S.; Cohen, L.; Ma, H.; Xu, H.; Ayanian, N.; and Koenig, S. 2016. Multiagent path finding with kinematic constraints. In International Conference on Automated Planning and Scheduling, 477–485.
 [Kornhauser, Miller, and Spirakis1984] Kornhauser, D.; Miller, G.; and Spirakis, P. 1984. Coordinating pebble motion on graphs, the diameter of permutation groups, and applications. In Annual Symposium on Foundations of Computer Science, 241–250.
 [Kurniawati, Hsu, and Lee2008] Kurniawati, H.; Hsu, D.; and Lee, W. S. 2008. SARSOP: Efficient pointbased POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, 65–72.
 [Liu and Michael2016] Liu, L., and Michael, N. 2016. An MDPbased approximation method for goal constrained multiMAV planning under action uncertainty. In IEEE International Conference on Robotics and Automation, 56–62.
 [Luna and Bekris2011] Luna, R., and Bekris, K. E. 2011. Push and Swap: Fast cooperative pathfinding with completeness guarantees. In International Joint Conference on Artificial Intelligence, 294–300.
 [Ma and Koenig2016] Ma, H., and Koenig, S. 2016. Optimal target assignment and path finding for teams of agents. In International Conference on Autonomous Agents and Multiagent Systems, 1144–1152.
 [Ma and Pineau2015] Ma, H., and Pineau, J. 2015. Information gathering and reward exploitation of subgoals for POMDPs. In AAAI Conference on Artificial Intelligence, 3320–3326.
 [Ma et al.2016] Ma, H.; Tovey, C.; Sharon, G.; Kumar, T. K. S.; and Koenig, S. 2016. Multiagent path finding with payload transfers and the packageexchange robotrouting problem. In AAAI Conference on Artificial Intelligence, 3166–3173.
 [Melo and Veloso2011] Melo, F. S., and Veloso, M. 2011. Decentralized MDPs with sparse interactions. Artificial Intelligence 175(11):1757–1789.
 [Morris et al.2016] Morris, R.; Pasareanu, C.; Luckow, K.; Malik, W.; Ma, H.; Kumar, S.; and Koenig, S. 2016. Planning, scheduling and monitoring for airport surface operations. In AAAI16 Workshop on Planning for Hybrid Systems, 608–614.
 [Scharpff et al.2016] Scharpff, J.; Roijers, D. M.; Oliehoek, F. A.; Spaan, M. T. J.; and de Weerdt, M. M. 2016. Solving transitionindependent multiagent MDPs with sparse interactions. In AAAI Conference on Artificial Intelligence, 3174–3180.
 [Sharon et al.2013] Sharon, G.; Stern, R.; Goldenberg, M.; and Felner, A. 2013. The increasing cost tree search for optimal multiagent pathfinding. Artificial Intelligence 195:470–495.
 [Sharon et al.2015] Sharon, G.; Stern, R.; Felner, A.; and Sturtevant, N. R. 2015. Conflictbased search for optimal multiagent pathfinding. Artificial Intelligence 219:40–66.
 [Silver2005] Silver, D. 2005. Cooperative pathfinding. In Artificial Intelligence and Interactive Digital Entertainment, 117–122.
 [Standley2010] Standley, T. S. 2010. Finding optimal solutions to cooperative pathfinding problems. In AAAI Conference on Artificial Intelligence, 173–178.
 [Velagapudi et al.2011] Velagapudi, P.; Varakantham, P.; Sycara, K. P.; and Scerri, P. 2011. Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents. In International Conference on Autonomous Agents and Multiagent Systems, 955–962.
 [Veloso et al.2015] Veloso, M.; Biswas, J.; Coltin, B.; and Rosenthal, S. 2015. CoBots: Robust symbiotic autonomous mobile service robots. In International Joint Conference on Artificial Intelligence, 4423–4429.
 [Wagner and Choset2015] Wagner, G., and Choset, H. 2015. Subdimensional expansion for multirobot path planning. Artificial Intelligence 219:1–24.
 [Wagner2015] Wagner, G. 2015. Subdimensional Expansion: A Framework for Computationally Tractable Multirobot Path Planning. Ph.D. Dissertation, Carnegie Mellon University.
 [Wang and Botea2011] Wang, K., and Botea, A. 2011. MAPP: a scalable multiagent path planning algorithm with tractability and completeness guarantees. Journal of Artificial Intelligence Research 42:55–90.
 [Wurman, D’Andrea, and Mountz2008] Wurman, P. R.; D’Andrea, R.; and Mountz, M. 2008. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine 29(1):9–20.