Multi-Agent Path Finding with Delay Probabilities

12/15/2016
by   Hang Ma, et al.
University of Southern California
0

Several recently developed Multi-Agent Path Finding (MAPF) solvers scale to large MAPF instances by searching for MAPF plans on 2 levels: The high-level search resolves collisions between agents, and the low-level search plans paths for single agents under the constraints imposed by the high-level search. We make the following contributions to solve the MAPF problem with imperfect plan execution with small average makespans: First, we formalize the MAPF Problem with Delay Probabilities (MAPF-DP), define valid MAPF-DP plans and propose the use of robust plan-execution policies for valid MAPF-DP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust plan-execution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPF-DP plans. Third, we present a 2-level MAPF-DP solver (called Approximate Minimization in Expectation) that generates valid MAPF-DP plans.

READ FULL TEXT VIEW PDF
07/05/2022

Plan Execution for Multi-Agent Path Finding with Indoor Quadcopters

We study the planning and acting phase for the problem of multi-agent pa...
05/31/2022

CBS-Budget (CBSB): A Complete and Bounded Suboptimal Search for Multi-Agent Path Finding

Multi-Agent Path Finding (MAPF) is the problem of finding a collection o...
01/18/2014

Drake: An Efficient Executive for Temporal Plans with Choice

This work presents Drake, a dynamic executive for temporal plans with ch...
02/06/2013

On Stable Multi-Agent Behavior in Face of Uncertainty

A stable joint plan should guarantee the achievement of a designer's goa...
05/27/2020

Time-Independent Planning for Multiple Moving Agents

Typical Multi-agent Path Finding (MAPF) solvers assume that agents move ...
10/25/2021

Common Information based Approximate State Representations in Multi-Agent Reinforcement Learning

Due to information asymmetry, finding optimal policies for Decentralized...
07/29/2013

Levels of Integration between Low-Level Reasoning and Task Planning

We provide a systematic analysis of levels of integration between discre...

Introduction

Multi-Agent Path Finding (MAPF) is the problem of finding collision-free paths for a given number of agents from their given start locations to their given goal locations in a given environment. MAPF problems arise for aircraft towing vehicles [Morris et al.2016], office robots [Veloso et al.2015], video game characters [Silver2005] and warehouse robots [Wurman, D’Andrea, and Mountz2008], among others.

Several recently developed MAPF solvers scale to large MAPF instances. However, agents typically cannot execute their MAPF plans perfectly since they often traverse their paths more slowly than intended. Their delay probabilities can be estimated but current MAPF solvers do not use this information, which often leads to frequent and runtime-intensive replanning or plan-execution failures.

We thus formalize the MAPF Problem with Delay Probabilities (MAPF-DP), where each agent traverses edges on an undirected graph (that models the environment) to move from its start vertex to its goal vertex. At any discrete time step, the agent can either execute 1) a wait action, resulting in it staying in its current vertex, or 2) a move action with the intent of traversing an outgoing edge of its current vertex, resulting in it staying in its current vertex with the delay probability and traversing the edge otherwise. The MAPF-DP problem is the problem of finding 1) a MAPF-DP plan that consists of a path for each agent from its start vertex to its goal vertex (given by a sequence of wait and move actions) and 2) a plan-execution policy that controls with GO or STOP commands how each agent proceeds along its path such that no collisions occur during plan execution. There are 2 kinds of collisions, namely vertex collisions (where 2 agents occupy the same vertex at the same time step) and edge collisions (where 2 agents traverse the same edge in opposite directions at the same time step).


Figure 1: A MAPF-DP instance.

We make the following contributions to solve the MAPF-DP problem with small average makespans: First, we formalize the MAPF-DP problem, define valid MAPF-DP plans and propose the use of robust plan-execution policies for valid MAPF-DP plans to control how each agent proceeds along its path. Second, we discuss 2 classes of decentralized robust plan-execution policies (called Fully Synchronized Policies and Minimal Communication Policies) that prevent collisions during plan execution for valid MAPF-DP plans. Third, we present a 2-level MAPF-DP solver (called Approximate Minimization in Expectation) that generates valid MAPF-DP plans.

Background and Related Work

The MAPF problem is NP-hard to solve optimally for flowtime minimization and to approximate within any constant factor less than for makespan minimization [Ma et al.2016]. Search-based MAPF solvers can be optimal, bounded suboptimal or suboptimal [Standley2010, Luna and Bekris2011, Wang and Botea2011, Goldenberg et al.2014, Sharon et al.2013, Sharon et al.2015, Boyarski et al.2015, Wagner and Choset2015, Ma and Koenig2016, Cohen et al.2016]. Current MAPF solvers typically assume perfect plan execution. However, utilizing probabilistic information about imperfect plan execution can reduce frequent and time-intensive replanning and plan-execution failures.

Partially Observable Markov Decision Processes (POMDPs) are a general probabilistic planning framework. The MAPF-DP problem can be solved with POMDPs but this is tractable only for very few agents in very small environments since the size of the state space is proportional to the size of the environment to the power of the number of agents and the size of the belief space is proportional to the size of the state space to the power of the length of the planning horizon

[Kurniawati, Hsu, and Lee2008, Ma and Pineau2015]. Several specialized probabilistic planning frameworks, such as transition-independent decentralized Markov Decision Processes (Dec-MDPs) [Becker et al.2004] and Multi-Agent Markov Decision Processes (MMDPs) [Boutilier1996] can solve larger probabilistic planning problems than POMDPs. In transition-independent Dec-MDPs, the local state of each agent depends only on its previous local state and the action taken by it [Goldman and Zilberstein2004]. MAPF-DP is indeed transition independent. However, there are interactions among agents since the reward of each agent depends on whether it is involved in a collision and thus on the local states of other agents and the actions taken by them. Fully decentralized probabilistic planning frameworks thus cannot prevent collisions. Fully centralized probabilistic planning frameworks can prevent collisions but are more runtime-intensive and can thus scale poorly. For example, the MAPF-DP problem can be solved with transition-independent MMDPs [Scharpff et al.2016]. In fact, the most closely related research to ours is that on approximating MMDPs [Liu and Michael2016] although it handles different types of dynamics than we do. The runtime of probabilistic planning frameworks can be reduced by exploiting the problem structure, including when interactions among agents are sparse. For example, decentralized sparse-interaction Markov Decision Processes (Dec-SIMDPs) [Melo and Veloso2011] assume that interactions among agents occur only in well-defined interaction areas in the environment (which is not the case for MAPF-DP in general), but typically still do not scale to more than 10 agents. The model shaping technique for decentralized POMDPs [Velagapudi et al.2011] can compute policies for hundreds of agents greedily and UM* [Wagner2015] scales to larger numbers of agents (with identical delay probabilities), but the plan execution for both approaches is completely decentralized and thus cannot prevent collisions.

Problem Definition: Planning

A MAPF-DP instance is characterized by an undirected graph whose vertices correspond to locations and whose edges correspond to transitions between locations. We are given agents . Each agent has a unique start vertex , a unique goal vertex and a delay probability . A path for agent is expressed by a function that maps each time index to a vertex such that , consecutive vertices and are either identical (when agent is scheduled to execute a wait action) or connected by an edge (when agent is scheduled to execute a move action from vertex to vertex ) and . A MAPF plan consists of a path for each agent.

Problem Definition: Plan Execution

The local state of agent at time step during plan execution is a time index. We set and always update its local state such that it is in vertex at time step . The agent knows its current local state and receives messages from some of the other agents about their local states. At each time step, its plan-execution policy maps this knowledge to one of the commands or that control how it proceeds along its path.

  1. [nolistsep]

  2. If the command is at time step :

    1. [nolistsep]

    2. If , then agent executes no action and remains in its current vertex since it has entered its last local state (and thus the end of its path). We thus update its local state to .

    3. If and , then agent executes a wait action to remain in its current vertex . The execution of wait actions never fails. We thus update its local state to (success).

    4. If and , then agent executes a move action from its current vertex to vertex . The execution of move actions fails with delay probability with the effect that the agent executes no action and remains delayed in its current vertex . We thus update its local state to with probability (failure) and with probability (success).

  3. If the command is at time step , then agent executes no action and remains in its current vertex . We thus update its local state to .

Our objective is to find a combination of a MAPF plan and a plan-execution policy with small average makespan, which is the average earliest time step during plan execution when all agents have entered their last local states. The MAPF problem is a special case where the delay probabilities of all agents are zero and the plan-execution policies always provide GO commands.

Valid MAPF-DP Plans

Definition 1.

A valid MAPF-DP plan is a plan with 2 properties:

  1. [nolistsep]

  2. [two agents are never scheduled to be in the same vertex at the same time index, that is, the vertices of two agents in the same local state are different].

  3. [an agent is never scheduled to be in a vertex at a time index when any other agent is scheduled to be in the same vertex at time index , that is, the vertex of an agent in a local state has to be different from the vertex of any other agent in local state ].

Figure 1 shows a sample MAPF-DP instance where the blue agent has to move from its start vertex to its goal vertex and the red agent has to move from its start vertex to its goal vertex . Agent has to move north to let agent pass. The paths and form a valid MAPF-DP plan. However, the paths and a valid MAPF plan but not a valid MAPF-DP plan since violates Property 2.

Property 1 of Definition 1 is necessary to be able to execute valid MAPF-DP plans without vertex collisions because two agents could otherwise be in the same vertex at the same time step (under perfect or imperfect plan execution). Property 2 is also necessary because an agent could otherwise enter the vertex of some other agent that unsuccessfully tries to leave the same vertex at the same time step (under imperfect plan execution). Property 2 is also necessary to be able to execute valid MAPF-DP plans without edge collisions (under perfect or imperfect plan execution).

Robust Plan-Execution Policies

We study 2 kinds of decentralized robust plan-execution policies for valid MAPF-DP plans, which are plan-execution policies that prevent all collisions during the imperfect plan execution of valid MAPF-DP plans.

Fully Synchronized Policies (FSPs)

Fully Synchronized Policies (FSPs) attempt to keep all agents in lockstep as much as possible by providing a GO command to an agent if and only if the agent has not yet entered its last local state and all other agents have either entered their last local states or have left all local states that precede the local state of the agent itself. FSPs can be implemented easily if each agent sends a message to all other agents when it enters a new local state. An agent can implement its FSP simply by counting how many messages it has received from each other agent and providing a GO command to itself in local state if and only if it has not yet entered its last local state and has received messages over the course of plan execution from each other agent.

Minimal Communication Policies (MCPs)

FSPs have 2 drawbacks. First, agents wait unnecessarily, which results in large average makespans. Second, each agent always needs to know the local states of all other agents, which results in many sent messages. Property 2 of Definition 1 suggests that robust plan-execution policies for valid MAPF-DP plans could provide a GO command to an agent if and only if the agent has not yet entered its last local state and all other agents have left all local states that precede the local state of the agent itself and whose vertices are the same as the vertex of the next local state of the agent itself. This way, it is guaranteed that the vertex of the next local state of the agent is different from the vertices of all other agents in their current local states. Minimal Communication Policies (MCPs) address these drawbacks by identifying such critical dependencies between agents and obeying them during plan execution, an idea that originated in the context of centralized non-robust plan-execution policies [Hönig et al.2016].

The local state of an agent at any time step during plan execution is a time index . Since we need to relate the local states of different agents, we use in the following not only to refer to the vertex assigned to local state of agent but also to the local state of agent itself (instead of ), depending on the context.

Every valid MAPF-DP plan defines a total order on the local states of all agents, which we relax to a partial order as follows:

  1. [nolistsep]

  2. [agent enters a local state during plan execution only after it enters local state ].

  3. with , and [agent enters a local state with a vertex during plan execution only after agent has left a local state with vertex (and thus entered local state ) that precedes local state ].

Property 1 of the partial order enforces that each agent visits its locations in the same order as in the MAPF-DP plan. Property 2 enforces that any two agents visit the same location in the same order as in the MAPF-DP plan. We can express the partial order with a directed graph whose vertices correspond to local states and whose edges correspond to the partial order given by the two properties above. Property 2 specifies the critical dependencies between agents. Edges are redundant and can then be removed from the directed graph when they are implied by the other edges due to transitivity. A transitive reduction of the directed graph minimizes the number of remaining edges. It can be computed in time [Aho, Garey, and Ullman1972], is unique, contains all edges between local states of the same agent (since they are never redundant) and thus minimizes the number of edges between the local states of different agents.

MCPs can be implemented easily if each agent sends a message to each other agent when agent enters a new local state (= in Property 2) if and only if the transitive reduction contains an edge for some local state (= in Property 2) of agent . Since the transitive reduction minimizes the number of edges between the local states of different agents, it also minimizes the number of sent messages. An agent can implement its MCP simply by counting how many messages it has received from each other agent and providing a GO command to itself in local state if and only if it has not yet entered its last local state and has received a number of messages over the course of plan execution from each other agent that corresponds to the number of incoming edges from local states of agent to its local states .


Figure 2: A directed graph that specifies a partial order on the local states for the MAPF-DP instance from Figure 1 and its valid MAPF-DP plan and .
Figure 3: The transitive reduction for Figure 2.

Figure 2 shows a sample partial order on the local states for the MAPF-DP instance from Figure 1 and its valid MAPF-DP plan and . , for example, is implied by and can thus be removed from the directed graph. Figure 3 shows the resulting transitive reduction, which implies that agent has to wait in local state until it has received one message from agent during the course of plan execution but can then proceed through all future local states without waiting.

Properties of FSPs and MCPs

Both FSPs and MCPs do not result in deadlocks during the plan execution of valid MAPF-DP plans because there always exists at least one agent that is provided a GO command before all agents have entered their last local states (namely an agent with the smallest local state among all agents that have not yet entered their last local states since an agent can wait only for other agents with smaller local states).

Both FSPs and MCPs are robust plan-execution policies due to Properties 1 and 2 of valid MAPF-DP plans. We now provide a proof sketch for the robustness of MCPs.

First, consider a valid MAPF-DF plan and assume that for two agents and with . Then, 1) since according to Property 1 of Definition 1 and 2) since according to Property 2 of Definition 1 (State Property).

Second, we show by contradiction that no vertex collisions can occur during plan execution. Assume that a vertex collision occurs between agents and with when agent is in local state and agent is in local state . Assume without loss of generality that . Then, according to Property 2 of the partial order since according to our vertex collision assumption and according to the State Property. Thus, agent can leave local state only when agent reaches local state , which is a contradiction with the vertex collision assumption.

Third, we show by contradiction that no edge collisions can occur during plan execution. Assume that an edge collision between agents and with occurs when agent changes its local state from to and agent changes its local state from to . Assume without loss of generality that . Case 1) If , then , which is a contradiction with the State Property. Case 2) If , then according to Property 2 of the partial order since according to our edge collision assumption and according to the case assumption. Thus, agent can leave local state only when agent reaches local state , which is a contradiction with the edge collision assumption.

Approximate Minimization in Expectation

MCPs are robust plan-execution policies for valid MAPF-DP plans that do not stop agents unnecessarily and result in few sent messages. We present a MAPF-DP solver, called Approximate Minimization in Expectation (AME), that determines valid MAPF-DP plans so that their combination with MCPs results in small average makespans.

AME is a 2-level MAPF-DP solver that is based on Conflict-Based Search (CBS) [Sharon et al.2015]. Its high-level search imposes constraints on the low-level search that resolve violations of Properties 1 and 2 of Definition 1 (called conflicts

). Its low-level search plans paths for single agents that obey these constraints and result in small average makespans. The average makespan of a MAPF-DP plan is the expectation of the maximum of (one or more) random variables that represent the time steps when all agents enter their last local states. Moreover, the average time step when an agent enters a local state is the expectation of the maximum of random variables as well. It is often difficult to obtain good closed-form approximations of the expectation of the maximum of random variables. AME thus approximates it with the maximum over the expectations of the random variables, which typically results in an underestimate but, according to our experimental results, a close approximation. The approximate average time step

when agent enters a local state for a given MAPF-DP plan is 0 for and

(1)

otherwise since agent first enters local state at approximate average time step , then might have to wait for messages from other agents that they send when they enter their local states at approximate average time steps and finally has to successfully execute one action (perhaps repeatedly) to enter local state . The average number of time steps that it needs for the successful execution of the action is 1 (for a wait action) if and (for a move action) otherwise. The approximate average makespan of the given MAPF-DP plan is then since all agents need to enter their last local states. One might be able to obtain better approximations with more runtime-intensive importance sampling or dynamic programming methods but the runtime of the resulting AME variant would be large since it needs to compute many such approximations.

1 ;
2 ;
3 for each agent  do
4           if LowLevelSearch(, Root, 0) returns no path (nor its labels) then
5                     return ‘‘No solution exists’’;
6                    
7          Add the returned path (and its labels) to Root.plan;
8          
9 ApproximateAverageMakespan(Root.plan);
10 ;
11 while  do
12           .pop();
13           if FindConflicts() returns no conflicts then
14                     return “Solution is” ;
15                    
16           earliest returned conflict;
17           for each agent involved in Conflict do
18                     new node with parent node ;
19                     ;
20                     ;
21                     Add one new constraint for agent to (see main text);
22                     if LowLevelSearch(, , ) returns a path (and its labels) then
23                               Replace the path (and its labels) of agent in with the returned path (and its labels);
24                               ApproximateAverageMakespan();
25                               Priorityqueue.insert();
26                              
27                    
28          
29return ‘‘No solution exists’’;
Algorithm 1 High-Level Search of AME.

High-Level Search

Algorithm 1 shows the high-level search of AME, which is similar to the high-level search of CBS. In the following, we point out the differences. Each high-level node contains the following items:

  1. [nolistsep]

  2. A set of constraints of the form that states that the vertex of agent in local state has to be different from vertex .

  3. A (labeled) MAPF-DP plan that contains a path for each agent (that obeys the constraints ) and an approximation (called label) of each average time step when agent enters local state during plan execution with MCPs.

  4. The key of high-level node that encodes its priority (smaller keys have higher priority) and is equal to the approximate average makespan of MAPF-DP plan given by ApproximateAverageMakespan .

When a conflict exists in MAPF-DP plan , then the high-level search creates 2 child nodes of node [Line 15] whose constraints are initially set to the constraints [Line 16] and whose MAPF-DP plan is initially set to MAPF-DP plan [Line 17]. Assume that the earliest conflict is a violation of Property 1 in Definition 1, in which case the vertices of two agents and in a local state are both identical to a vertex . In this case, AME adds the constraint to the constraints of the first child node and the constraint to the constraints of the second child node [Line 18], thus preventing the conflict in both cases. Assume that the earliest conflict is a violation of Property 2 in Definition 1, in which case the vertex of an agent in a local state and the vertex of some other agent in the immediately preceding local state are both identical to a vertex . In this case, AME adds the constraint to the constraints of the first child node and the constraint to the constraints of the second child node [Line 18], thus preventing the conflict in both cases.

Low-Level Search

LowLevelSearch(, , key) finds a new path for agent and the labels of this path. It uses the paths of the other agents and their labels in but does not update them. (The paths are empty directly after the execution of Line 2.) It performs a focal search with re-expansions in a state space whose states correspond to pairs of vertices and local states (except for those pairs ruled out by constraints in that pertain to agent ) and whose edges connect state to state if and only if (for a wait action) or (for a move action). The g-value of a state approximates (sic!) the approximate average time step . The start state is and its g-value is 0. When the low-level search expands state , it sets the g-value of its successor according to Equation (1) to the minimum of its current g-value and

where is 1 if and otherwise. The low-level search decides which state to expand next based on 1) the f-value of the state, which is the sum of its g-value and its h-value, where the h-value is times the distance from location to location in graph (which is an optimistic estimate of the average number of time steps required to move from location to location ) and 2) the number of conflicts of the path for agent that corresponds to the locations in the states on the found path from the start state to with the paths of other agents.

The low-level search starts in Phase 1. The objective in this phase is to find a path for agent so that it enters its last local state with a reasonably small approximate average number of time steps, namely one that is no larger than the approximate average makespan key of the MAPF-DP plan in the parent node of node in the high-level search, and has a small number of conflicts. The first part of the objective tries to ensure that the approximate average makespan of the resulting MAPF-DP plan in node is no larger than the one of the MAPF-DP plan in the parent node of node , and the second part tries to ensure that the resulting MAPF-DP plan has a small number of conflicts so that the high-level search has a small runtime since it needs to resolve only a small number of conflicts. The low-level search thus repeatedly expands a state with the smallest number of conflicts among all states in the priority queue whose f-values are no larger than key.

If no such state exists, then the low-level search switches to Phase 2. The objective in this phase is to find a path for agent so that it enters its last local state with a small approximate average number of time steps. This objective tries to ensure that the approximate average makespan of the resulting MAPF-DP plan in node is not much larger than the one of the MAPF-DP plan in the parent node of node . The low-level search thus repeatedly expands a state with the smallest f-value among all states in the priority queue.

The low-level search terminates successfully when it is about to expand a state with and contains no constraints of the form with . It then sets , the locations that form the path of agent to the corresponding locations in the states on the found path from the start state to and the approximate average time steps to the corresponding g-values of these states. The low-level search terminates unsuccessfully when the priority queue becomes empty. The low-level search currently does not terminate otherwise but we might be able to make it complete by using an upper bound on the smallest average makespan of any valid MAPF-DP plan, similar to upper bounds in the context of valid MAPF plans [Kornhauser, Miller, and Spirakis1984].

Future Work

The low-level search is currently the weakest part of AME due to the many approximations to keep its runtime small which is important since the high-level search runs many low-level searches. We expect that future work will be able to improve the low-level search substantially. For example, the approximate average time steps for agents different from agent could be updated before, during or after the local search, which would provide more accurate values for the current and future low-level searches as well as the current high-level search. Once the low-level search finds a path for agent and the high-level search replaces the path of agent in the MAPF-DP plan in the current high-level node with this path, it could update the approximate average time steps of all agents to the ideal approximate average time steps given by Equations (1), for example as part of the execution of ApproximateAverageMakespan on Lines 7 and 21. Many other improvements are possible as well.

Experiments

We evaluate AME with MCPs on a 2.50 GHz Intel Core i5-2450M PC with 6 GB RAM.

Figure 4: Two MAPF-DP instances: random 1 (top) and warehouse 1 (bottom). Blocked cells are shown in black. The start and goal cells for each agent are represented by a solid circle and a hollow circle of the same color, respectively.

Experiment 1: MAPF Solvers

AME Push and Swap Adapted CBS
id
runtime
(s)
approx-
imate
average
makespan
average
makespan
mess-
ages
runtime
(s)
average
makespan
mess-
ages
runtime
(s)
average
makespan
mess-
ages
random 1 0.058 63.15 71.28 0.34 267 0.031 812.41 0.40 287 - - -
random 2 0.052 66.22 73.02 0.29 257 0.025 768.30 0.43 257 - - -
random 3 0.080 78.44 84.90 0.40 373 0.052 934.59 0.33 387 - - -
random 4 0.063 67.00 72.89 0.37 251 0.028 755.95 0.33 255 - - -
random 5 0.050 65.13 73.98 0.31 255 0.029 875.48 0.47 318 282.079 84.11 0.40 282
random 6 0.052 62.89 66.98 0.36 257 0.031 830.77 0.32 290 - - -
random 7 0.495 67.22 71.34 0.36 269 0.038 785.55 0.46 274 - - -
random 8 0.042 49.33 51.72 0.35 164 0.024 648.80 0.35 199 197.911 52.35 0.37 163
random 9 0.051 56.27 61.30 0.27 247 0.052 780.60 0.30 294 - - -
random 10 0.487 60.06 64.77 0.38 234 0.032 750.12 0.35 284 - - -
warehouse 1 0.124 114.32 124.18 0.44 705 0.055 1,399.14 0.43 703 - - -
warehouse 2 0.106 119.74 124.63 0.51 762 0.055 1,620.03 0.60 810 - - -
warehouse 3 0.107 112.96 117.00 0.53 609 0.032 1,295.75 0.53 616 - - -
warehouse 4 0.090 114.90 117.31 0.52 541 0.043 1,246.47 0.67 571 - - -
warehouse 5 - - - - 0.060 1,453.36 0.54 783 - - -
warehouse 6 0.111 127.65 131.10 0.59 710 0.037 1,437.01 0.58 664 - - -
warehouse 7 0.142 87.45 96.54 0.34 488 0.028 1,154.21 0.60 403 - - -
warehouse 8 - - - - 0.024 1,233.13 0.58 401 - - -
warehouse 9 0.087 103.51 107.33 0.42 462 0.024 1,088.53 0.44 422 - - -
warehouse 10 0.183 120.76 127.36 0.53 909 0.057 1,541.56 0.62 678 - - -
Table 1: Results of different MAPF(-DP) solvers for MAPF-DP instances with 35 agents and delay probability range .

We compare AME to 2 MAPF solvers, namely 1) Adapted CBS, a CBS variant that assumes perfect plan execution and computes valid MAPF-DP plans, minimizes and breaks ties toward paths with smaller and thus fewer actions and 2) Push and Swap [Luna and Bekris2011], a MAPF solver that assumes perfect plan execution and computes valid MAPF-DP plans where exactly one agent executes a move action at each time step and all other agents execute wait actions. We generate 10 MAPF-DP instances (labeled random 1-10) in 3030 4-neighbor grids with 10% randomly blocked cells and random but unique start and unique goal cells for 35 agents whose delay probabilities for AME are sampled uniformly at random from the delay probability range . In the same way, we generate 10 MAPF-DP instances (labeled warehouse 1-10) in a simulated warehouse environment with random but unique start and unique goal cells on the left and right sides. Figure 4 shows two MAPF-DP instances: random 1 (top) and warehouse 1 (bottom).

Table 1

reports for each MAPF-DP instance the runtime, the approximate average makespan calculated by AME, the average makespan over 1,000 plan-execution runs with MCPs together with 95%-confidence intervals and the number of sent messages. Dashes indicate that the MAPF-DP instance was not solved within a runtime limit of 5 minutes. There is no obvious difference in the numbers of sent messages of the 3 MAPF(-DP) solvers. However, AME seems to find MAPF-DP plans with smaller average makespans than Adapted CBS, which seems to find MAPF-DP plans with smaller average makespans than Push and Swap. The approximate average makespans calculated by AME are underestimates but reasonably close to the average makespans. AME and Push and Swap seem to run faster than Adapted CBS. In fact, Adapted CBS did not solve MAPF-DP instances with more than 35 agents within the runtime limit while AME and Push and Swap seem to scale to larger numbers of agents than reported here (see also Experiment 3).

Experiment 2: Delay Probability Ranges

We use AME with different delay probability ranges. We repeat Experiment 1 with 19 MAPF-DP instances generated from the MAPF-DP instance labeled “random 1” in Experiment 1, one for each . For each MAPF-DP instance, the delay probabilities of all agents are sampled from the delay probability range by sampling the average number of time steps needed for the successful execution of single move actions uniformly at random from and then calculating .

runtime (s)
approximate average makespan
average makespan
messages
2 0.073 77.92 84.30 0.42 251
3 0.525 123.92 131.12 0.79 301
4 0.356 144.61 157.88 0.96 287
5 0.311 133.55 157.00 0.98 278
6 0.623 168.51 192.76 1.46 299
7 0.346 264.78 279.51 2.05 289
8 0.236 333.09 349.72 2.69 293
9 0.779 260.58 271.71 2.28 294
10 1.751 307.63 336.95 2.26 305
11 2.528 337.15 375.46 2.74 312
12 1.374 323.87 383.25 2.53 300
13 0.683 381.63 413.18 3.19 282
14 2.583 440.94 498.30 3.32 278
15 1.414 470.06 524.94 3.95 295
16 7.072 554.32 607.20 4.26 316
17 2.116 451.32 570.15 3.90 275
18 3.410 763.44 782.40 6.08 306
19 5.708 462.71 666.42 5.29 309
20 7.812 490.26 591.35 3.73 323
Table 2: Results of AME for MAPF-DP instances with 35 agents on a 3030 4-neighbor grid with 10% randomly blocked cells and different delay probability ranges .
Figure 5: Visualization of Table 2, where the x-axis shows the average number of time steps needed for the successful execution of single move actions. The average makespans are shown in red, and the approximate average makespans calculated by AME are shown in blue. The grey line corresponds to .

Table 2 reports the same measures as used in Experiment 1, and Figure 5 visualizes the results. Larger delay probability ranges seem to result in larger runtimes, approximate average makespans calculated by AME and average makespans (although there is lots of noise). The differences between the approximate average makespans calculated by AME and average makespans are larger as well but remain reasonable.

Experiment 3: Numbers of Agents

We use AME with different numbers of agents. We repeat Experiment 1 with 50 MAPF-DP instances in 3030 4-neighbor grids generated as in Experiment 1 for each number of agents.

agents solved (%) runtime (s)
approximate
average
makespan
average makespan
messages
50 0.94 0.166 69.32 75.19 474.62
100 0.68 4.668 78.48 87.29 1,554.71
150 0.10 134.155 81.77 96.43 2,940.40
200 0 - - -
Table 3: Results of AME for MAPF-DP instances with different numbers of agents on 3030 4-neighbor grids with 10% randomly blocked cells and delay probability range .

Table 3 reports the same measures as used in Experiment 1, averaged over all MAPF-DP instances that were solved within a runtime limit of 5 minutes. AME solves most MAPF-DP instances with 50 agents and then degrades gracefully with the number of agents.

Experiment 4: Plan-Execution Policies

MCPs FSPs
Dummy
Plan-Execution
Policies
id
average
makespan
messages
average
makespan
messages
average
makespan
average
collisions
random 1 71.28 0.34 267 140.29 0.50 23,109 67.82 0.35 16.68
random 2 73.02 0.29 257 143.55 0.55 19,316 71.96 0.31 14.27
random 3 84.90 0.40 373 160.43 0.59 24,098 81.20 0.37 27.71
random 4 72.89 0.37 251 141.71 0.52 19,587 69.16 0.36 25.38
random 5 73.98 0.31 255 141.49 0.54 20,794 69.59 0.32 14.98
random 6 66.98 0.36 257 115.98 0.51 20,597 66.76 0.37 15.19
random 7 71.34 0.36 269 124.03 0.54 20,481 70.79 0.38 16.53
random 8 51.72 0.35 164 96.04 0.46 16,665 51.65 0.38 8.81
random 9 61.30 0.27 247 113.76 0.46 20,976 58.52 0.23 10.33
random 10 64.77 0.38 234 114.04 0.50 19,834 64.00 0.38 17.51
warehouse 1 124.18 0.44 705 219.63 0.65 28,794 122.42 0.42 34.59
warehouse 2 124.63 0.51 762 235.35 0.72 34,154 124.40 0.60 68.68
warehouse 3 117.00 0.53 609 206.29 0.65 26,647 117.89 0.54 29.61
warehouse 4 117.31 0.52 541 194.07 0.59 24,889 116.02 0.53 28.09
warehouse 6 131.10 0.59 710 205.54 0.71 29,462 131.54 0.60 37.41
warehouse 7 96.54 0.34 488 187.90 0.59 22,401 95.80 0.35 24.91
warehouse 9 107.33 0.42 462 187.80 0.56 18,950 105.63 0.45 22.21
warehouse 10 127.36 0.53 909 226.95 0.73 32,903 127.59 0.55 43.78
Table 4: Results of AME for the 18 solved MAPF-DP instances from Experiment 1 and different plan-execution policies.

We use AME with 3 plan-execution policies, namely 1) MCPs, 2) FSPs and 3) dummy (non-robust) plan-execution policies that always provide GO commands. We repeat Experiment 1 for each plan-execution policy.

Table 4 reports for each solved MAPF-DP instance and plan-execution policy the average makespan over 1,000 plan-execution runs together with 95%-confidence intervals, the number of sent messages for MCPs and FSPs and the average number of collisions for dummy plan-execution policies. The number of sent messages is zero (and thus not shown) for dummy plan-execution policies since, different from MCPs and FSPs, they do not prevent collisions. The average makespan for MCPs seems to be only slightly larger than that for dummy plan-execution policies, and the average makespan and number of sent messages for MCPs seem to be smaller than those for FSPs.

Conclusions

In this paper, we formalized the Multi-Agent Path-Finding Problem with Delay Probabilities (MAPF-DP) to account for imperfect plan execution and then developed an efficient way of solving it with small average makespans, namely with Approximate Minimization in Expectation (a 2-level MAPF-DP solver for generating valid MAPF-DP plans) and Minimal Communication Policies (decentralized robust plan-execution policies for executing valid MAPF-DP plans without collisions).

References

  • [Aho, Garey, and Ullman1972] Aho, A. V.; Garey, M. R.; and Ullman, J. D. 1972. The transitive reduction of a directed graph. SIAM Journal on Computing 1(2):131–137.
  • [Becker et al.2004] Becker, R.; Zilberstein, S.; Lesser, V.; and Goldman, C. V. 2004. Solving transition independent decentralized Markov decision processes.

    Journal of Artificial Intelligence Research

    22(1):423–455.
  • [Boutilier1996] Boutilier, C. 1996. Planning, learning and coordination in multiagent decision processes. In Conference on Theoretical Aspects of Rationality and Knowledge, 195–210.
  • [Boyarski et al.2015] Boyarski, E.; Felner, A.; Stern, R.; Sharon, G.; Tolpin, D.; Betzalel, O.; and Shimony, S. E. 2015. ICBS: Improved conflict-based search algorithm for multi-agent pathfinding. In International Joint Conference on Artificial Intelligence, 740–746.
  • [Cohen et al.2016] Cohen, L.; Uras, T.; Kumar, T. K. S.; Xu, H.; Ayanian, N.; and Koenig, S. 2016. Improved solvers for bounded-suboptimal multi-agent path finding. In International Joint Conference on Artificial Intelligence, 3067–3074.
  • [Goldenberg et al.2014] Goldenberg, M.; Felner, A.; Stern, R.; Sharon, G.; Sturtevant, N. R.; Holte, R. C.; and Schaeffer, J. 2014. Enhanced Partial Expansion A*. Journal of Artificial Intelligence Research 50:141–187.
  • [Goldman and Zilberstein2004] Goldman, C. V., and Zilberstein, S. 2004. Decentralized control of cooperative systems: Categorization and complexity analysis. Journal of Artificial Intelligence Research 22:143–174.
  • [Hönig et al.2016] Hönig, W.; Kumar, T. K. S.; Cohen, L.; Ma, H.; Xu, H.; Ayanian, N.; and Koenig, S. 2016. Multi-agent path finding with kinematic constraints. In International Conference on Automated Planning and Scheduling, 477–485.
  • [Kornhauser, Miller, and Spirakis1984] Kornhauser, D.; Miller, G.; and Spirakis, P. 1984. Coordinating pebble motion on graphs, the diameter of permutation groups, and applications. In Annual Symposium on Foundations of Computer Science, 241–250.
  • [Kurniawati, Hsu, and Lee2008] Kurniawati, H.; Hsu, D.; and Lee, W. S. 2008. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In Robotics: Science and Systems, 65–72.
  • [Liu and Michael2016] Liu, L., and Michael, N. 2016. An MDP-based approximation method for goal constrained multi-MAV planning under action uncertainty. In IEEE International Conference on Robotics and Automation, 56–62.
  • [Luna and Bekris2011] Luna, R., and Bekris, K. E. 2011. Push and Swap: Fast cooperative path-finding with completeness guarantees. In International Joint Conference on Artificial Intelligence, 294–300.
  • [Ma and Koenig2016] Ma, H., and Koenig, S. 2016. Optimal target assignment and path finding for teams of agents. In International Conference on Autonomous Agents and Multiagent Systems, 1144–1152.
  • [Ma and Pineau2015] Ma, H., and Pineau, J. 2015. Information gathering and reward exploitation of subgoals for POMDPs. In AAAI Conference on Artificial Intelligence, 3320–3326.
  • [Ma et al.2016] Ma, H.; Tovey, C.; Sharon, G.; Kumar, T. K. S.; and Koenig, S. 2016. Multi-agent path finding with payload transfers and the package-exchange robot-routing problem. In AAAI Conference on Artificial Intelligence, 3166–3173.
  • [Melo and Veloso2011] Melo, F. S., and Veloso, M. 2011. Decentralized MDPs with sparse interactions. Artificial Intelligence 175(11):1757–1789.
  • [Morris et al.2016] Morris, R.; Pasareanu, C.; Luckow, K.; Malik, W.; Ma, H.; Kumar, S.; and Koenig, S. 2016. Planning, scheduling and monitoring for airport surface operations. In AAAI-16 Workshop on Planning for Hybrid Systems, 608–614.
  • [Scharpff et al.2016] Scharpff, J.; Roijers, D. M.; Oliehoek, F. A.; Spaan, M. T. J.; and de Weerdt, M. M. 2016. Solving transition-independent multi-agent MDPs with sparse interactions. In AAAI Conference on Artificial Intelligence, 3174–3180.
  • [Sharon et al.2013] Sharon, G.; Stern, R.; Goldenberg, M.; and Felner, A. 2013. The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence 195:470–495.
  • [Sharon et al.2015] Sharon, G.; Stern, R.; Felner, A.; and Sturtevant, N. R. 2015. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219:40–66.
  • [Silver2005] Silver, D. 2005. Cooperative pathfinding. In Artificial Intelligence and Interactive Digital Entertainment, 117–122.
  • [Standley2010] Standley, T. S. 2010. Finding optimal solutions to cooperative pathfinding problems. In AAAI Conference on Artificial Intelligence, 173–178.
  • [Velagapudi et al.2011] Velagapudi, P.; Varakantham, P.; Sycara, K. P.; and Scerri, P. 2011. Distributed model shaping for scaling to decentralized POMDPs with hundreds of agents. In International Conference on Autonomous Agents and Multi-agent Systems, 955–962.
  • [Veloso et al.2015] Veloso, M.; Biswas, J.; Coltin, B.; and Rosenthal, S. 2015. CoBots: Robust symbiotic autonomous mobile service robots. In International Joint Conference on Artificial Intelligence, 4423–4429.
  • [Wagner and Choset2015] Wagner, G., and Choset, H. 2015. Subdimensional expansion for multirobot path planning. Artificial Intelligence 219:1–24.
  • [Wagner2015] Wagner, G. 2015. Subdimensional Expansion: A Framework for Computationally Tractable Multirobot Path Planning. Ph.D. Dissertation, Carnegie Mellon University.
  • [Wang and Botea2011] Wang, K., and Botea, A. 2011. MAPP: a scalable multi-agent path planning algorithm with tractability and completeness guarantees. Journal of Artificial Intelligence Research 42:55–90.
  • [Wurman, D’Andrea, and Mountz2008] Wurman, P. R.; D’Andrea, R.; and Mountz, M. 2008. Coordinating hundreds of cooperative, autonomous vehicles in warehouses. AI Magazine 29(1):9–20.