Leveraging Experience in Lifelong Multi-Agent Pathfinding

by   Nitzan Madar, et al.
Stanford University

In Lifelong Multi-Agent Path Finding (L-MAPF) a team of agents performs a stream of tasks consisting of multiple locations to be visited by the agents on a shared graph while avoiding collisions with one another. L-MAPF is typically tackled by partitioning it into multiple consecutive, and hence similar, "one-shot" MAPF queries, as in the Rolling-Horizon Collision Resolution (RHCR) algorithm. Therefore, a solution to one query informs the next query, which leads to similarity with respect to the agents' start and goal positions, and how collisions need to be resolved from one query to the next. Thus, experience from solving one MAPF query can potentially be used to speedup solving the next one. Despite this intuition, current L-MAPF planners solve consecutive MAPF queries from scratch. In this paper, we introduce a new RHCR-inspired approach called exRHCR, which exploits experience in its constituent MAPF queries. In particular, exRHCR employs an extension of Priority-Based Search (PBS), a state-of-the-art MAPF solver. The extension, which we call exPBS, allows to warm-start the search with the priorities between agents used by PBS in the previous MAPF instances. We demonstrate empirically that exRHCR solves L-MAPF instances up to 39 throughput for given task streams by increasing the number of agents a planner can cope with for a given time budget.


page 4

page 5

page 10


Lifelong Multi-Agent Path Finding in Large-Scale Warehouses

Multi-Agent Path Finding (MAPF) is the problem of moving a team of agent...

Multi-Agent Terraforming: Efficient Multi-Agent Path Finding via Environment Manipulation

Multi-agent pathfinding (MAPF) is concerned with planning collision-free...

Cooperative Solutions to Exploration Tasks Under Speed and Budget Constraints

We present a multi-agent system where agents can cooperate to solve a sy...

Pushing the Envelope: From Discrete to Continuous Movements in Multi-Agent Path Finding via Lazy Encodings

Multi-agent path finding in continuous space and time with geometric age...

Subdimensional Expansion Using Attention-Based Learning For Multi-Agent Path Finding

Multi-Agent Path Finding (MAPF) finds conflict-free paths for multiple a...

Toward a new instances of NELL

We are developing the method to start new instances of NELL in various l...

Loosely Synchronized Search for Multi-agent Path Finding with Asynchronous Actions

Multi-agent path finding (MAPF) determines an ensemble of collision-free...

1 Introduction and Related Work

((a)) The exRHCR algorithm for L-MAPF.
((b)) The exPBS algorithm for MAPF.
Figure 1: Visualization of our algorithmic framework. (LABEL:sub@fig:timeline) To solve the L-MAPF problem, exRHCR solves a sequence of (bounded horizon) MAPF queries where a new query is solved every timesteps. The task assigner (TA) generates the next query  from a given query and the solution computed for . In exRHCR queries are grouped into batches of queries, where the first query in each batch is solved using PBS whereas the next  queries are solved using exPBS. This is in contrast to RHCR, which uses only PBS. The PBS priority set that was used to solve is extracted and used to initialize exPBS in the next  queries. This process repeats iteratively. (LABEL:sub@fig:expbs_changes) The internal MAPF solver exPBS leverages experience by using it in the root node of the priority tree (PT) to constrain and guide the search. exPBS uses WL-DFS, which is a DFS variant that limits the search tree width to avoid searching in over-constrained trees. When the width limit is violated or the root PT node contains an infeasible seed priority, the original PBS is used as a fallback.

Multi-Agent Path Finding (MAPF) is the problem of finding collision-free paths for a fleet of agents operating on a shared graph Stern et al. (2019). MAPF has been widely used in modeling a variety of applications including autonomous warehouse management Hönig et al. (2019), multi-robot motion planning Dayan et al. (2021), and multi-drone delivery Choudhury et al. (2021).

A significant body of work is devoted to the study of the one-shot version of MAPF wherein each agent needs to reach a specific goal location from a given start. It is usually desirable to find high-quality solutions, which minimize a given objective function such as the total travel time of all the agents. Finding optimal solutions to MAPF is typically computationally hard Yu (2016); Ma et al. (2016). Nevertheless, numerous approaches have been developed which strive to return optimal or bounded-suboptimal solutions, including SAT solvers Surynek et al. (2016), graph-theoretic approximation algorithms Yu (2016); Demaine et al. (2019), and search-based approaches Sharon et al. (2013); Wagner and Choset (2015); Li et al. (2019a).

One of the most popular approaches for MAPF is Conflict-Based Search (CBS) Sharon et al. (2015), which is a two-level optimal solver. The top level performs a best-first search on a binary constraint tree (CT), where a given tree node specifies temporal and spatial constraints between agents in order to avoid collisions. The bottom level searches for single-agent paths that abide to the constraints in the given CT node. The high-level search continues until a feasible (collision-free) plan is found. Due to the possibly long runtime of the approach Gordon et al. (2021), subsequent work introduced various suboptimal CBS extensions to improve runtime Barer et al. (2014); Li et al. (2021a). A recent approach called Priority-Based Search (PBS) trades off the completeness guarantees of CBS with improved efficiency, by exploring a binary priority tree (PT), whose nodes specify priorities between agents Ma et al. (2019a). Priorities can be viewed as a coarser and more computationally-efficient alternative to CBS’s constraints (see more details in Section 3).

The efficiency of PBS makes it particularly suitable for solving large problem instances, or in cases where multiple MAPF instances need to be solved rapidly. For example, PBS has recently been used as a building block within an approach for tackling the Lifelong MAPF (L-MAPF) problem Ma et al. (2017, 2019b); Liu et al. (2019); Salzman and Stern (2020). In L-MAPF the agents need to execute a stream of tasks consisting of multiple locations to be visited by each agent (rather than moving to a specific goal location as in the one-shot version). A recent work Li et al. (2021b) proposed an effective approach called Rolling-Horizon Collision Resolution (RHCR) to solve L-MAPF, by breaking the L-MAPF problem into a sequence of one-shot MAPF problems (see Section 3). Clearly, by improving the efficiency of the internal MAPF solver (e.g., PBS) we can improve the efficiency of RHCR overall, as it solves multiple MAPF queries.

In this work we explore the use of experience gained from solving previous MAPF queries to speed up the solution of the entire L-MAPF problem. Using experience has been considered in various domains and properly using experience to improve MAPF solvers has been identified as a key challenge and opportunity Salzman and Stern (2020). For instance, in robot motion planning previous solutions can be reused for a new query Coleman et al. (2015). Similar ideas were employed within CBS-based approaches to incentivize agents to traverse certain regions, e.g., corridors, in a specific manner to resolve conflicts Cohen and Koenig (2016); Li et al. (2019b, 2020). Learning-based approaches have been used to implicitly encode experience for conflict resolution in CBS Huang et al. (2021) and MAPF algorithm selection Kaduri et al. (2020).With that said, we are not familiar with systematic approaches that exploit experience within MAPF queries in L-MAPF.


In this paper we develop the Experienced RHCR (exRHCR) approach for L-MAPF, which allows to transfer experience gained from solving one MAPF query to the next one in order to improve planning times (Figure 0(a) and Section 4). In particular, exRHCR solves constituent MAPF queries by interleaving between calls to the vanilla PBS and an extension of PBS which we call Experienced PBS (exPBS) (Figure 0(b)). Unlike the standard PBS, which begins the high-level search on a given MAPF instance from an empty priority set at the PT root, exPBS is initialized with a specific priority set for the root node that encodes the experience gained from the solution of a previous MAPF query using PBS. This allows to reduce the depth of the search tree and speed up the solution of MAPF queries. We also consider a lightweight version of exPBS that uses total priorities as experience, which leads to good performance in easier instances. We demonstrate that exRHCR solves L-MAPF instances up to 39% faster than RHCR, which allows to potentially increase the throughput of a given stream of task streams by increasing the number of agents we can cope with for a given time budget (Section 5).

2 Problem Definition

In this section we provide a definition of Lifelong MAPF (L-MAPF), for which we design an effective algorithmic approach in Section 4. Before describing L-MAPF, we first define the single-query, or one-shot, setting termed MAPF.

2.1 Multi Agent Path Finding (MAPF)

In the MAPF problem, we are given a graph and a set of agents , where an agent starts at a vertex and needs to reach a goal vertex . Time is discretized such that at each timestep, each agent occupies exactly one vertex. Between any two consecutive timesteps, each agent can either move to an adjacent vertex (i.e., along an edge connecting its current vertex and the destination vertex) or wait in its current vertex, where each action is assigned a unit cost.

The goal of MAPF is to find a feasible solution plan, which consists of a set of paths , where  is a path for agent from to , such that no conflicts arise between the different agents. In particular, for a given agent , a path is a sequence of vertices, such that or , and . There are two types of conflicts between any two given agents to avoid: In a vertex conflict the agents occupy the same vertex at the same timestep, i.e., for some . In an edge conflict, the agents cross the same edge at the same timestep, i.e., and for timestep .

We will also consider a relaxed version of MAPF called Windowed-MAPF (W-MAPF), which is defined for a constant parameter indicating the window size wherein conflicts should be resolved Silver (2005). I.e., a solver for W-MAPF needs to avoid conflicts only up to timestep , after which conflicts are allowed. This relaxation reduces the size of the search tree, in comparison to MAPF, which leads to faster computation times. Most solvers for vanilla MAPF can be adapted to work for the windowed case.

2.2 Lifelong Multi Agent Path Finding (L-MAPF)

In L-MAPF, agents need to execute a stream of tasks, where each task consists of moving between two specific vertices, while avoiding conflicts. In this setting the objective can be completing all tasks as quickly as possible, or maximizing throughput, which is the average number of tasks completed per unit of time.

A critical component within L-MAPF solvers is a task assigner, which specifies for each agent the next task it performs. There are various approaches to design task assigners Ma et al. (2017); Liu et al. (2019), which are outside the scope of this work. One common approach for tackling L-MAPF problems is partitioning it into a sequence of W-MAPF problems, where the task assigner is invoked after every MAPF query is completed to generate start and goal locations for the next MAPF instance Ma et al. (2017); Liu et al. (2019); Li et al. (2021b). We describe one such approach called RHCR in the next section.

3 Algorithmic Background

((a)) Initial query solved with PBS.
((b)) Query solved with exPBS.
((c)) Query solved with exPBS.
Figure 2: A toy example visualizing an execution of RHCR and exRHCR for three agents, whose current start and goal positions are illustrated as solid and dashed shapes, respectively, with parameters , and for exRHCR. Solid lines represent the path’s portion up to the first timesteps (where collisions are avoided) and dashed line represent the rest (where collisions are ignored). Solid markers on the paths represent the first steps and the rest are represented by hollow markers. Here, we chose exPBS and PBS to break ties in the low and high-level search in a specific manner, but other valid execution outcomes are possible. (LABEL:sub@fig:toy1) The W-MAPF instance with its solution obtained by PBS, where the experience gained is , which forces to avoid ’s path, and for to avoid the paths of and (which causes  to wait). See full details of PBS’s execution in the appendix. (LABEL:sub@fig:toy2) and (LABEL:sub@fig:toy3) show the next two queries  and  where they both use , which allows to obtain a feasible solution already in the tree root, i.e., without further expansion of PT nodes. In contrast, to solve RHCR would expand two PT nodes representing the same conflicts in (to see this, note that for some shortest paths the agents do collide at the same positions as in ). In both (LABEL:sub@fig:toy2) and (LABEL:sub@fig:toy3) exRHCR finds the continuation of the solution found in (LABEL:sub@fig:toy1). In (LABEL:sub@fig:toy2) waits in place to avoid ’s path, and waits in its start position to avoid . In (LABEL:sub@fig:toy3), query , the agents can follow their shortest paths without colliding, thus RHCR finds a solution in the first PT node without expanding further nodes. exRHCR also finds a solution from the first node although an experience is given.

We describe two algorithmic components, which we will build upon to design our approach for L-MAPF (Section 4). In particular, we describe Priority-Based Search (PBS) for MAPF, and Rolling-Horizon Collision Resolution (RHCR) for L-MAPF.

3.1 Priority-Based Search (Pbs)

PBS is a recent approach for MAPF that can be used to solve instances with a large number of agents, or situations where multiple MAPF instances need to be solved rapidly (as in L-MAPF). We now provide an overview of PBS, and refer the reader to Ma et al. (2019a) for the full description.

PBS maintains priorities between agents which are used to resolve conflicts. On the high-level search PBS explores a priority tree (PT), where a given node  of PT encodes a (partial) priority set , …}. A priority means that agent has precedence over agent  whenever a low-level search is invoked (see below). In addition to the ordering, each PT node maintains single-agent plans that represent the current MAPF solution (possibly containing conflicts). PBS starts the high-level search with the tree root whose priority set is empty, and assigns to each agent its shortest path. We mention that PBS can be initialized with a non-empty priority set in its root node, albeit previous work has not specified an effective method to specify this set. Whenever PBS expands a node , it invokes a low-level search to compute a new set of plans which abide to the priority set . If a collision between agents, e.g., and , is encountered in the new plans, PBS generates two child PT nodes with the updated priority sets , respectively. The high-level search chooses to expand at each step a PT node in a depth-first search (DFS) manner. The high-level search terminates when a valid solution is a found at some node , or when no more nodes for expansion remain, in which case PBS declares failure.

The low-level search of PBS proceeds in the following manner. For a given PT node , PBS performs a topological sort of the agents according to from high priority to low, and plans individual-agent paths based on the sorting. Given a topological sort , for some , the low-level search iterates over the  agents in the topological sort, and updates their plans such that they do not collide with any higher-priority agents (agents that do not appear on this list maintain their original plans). It then checks whether collisions occur between all the agents combined.

3.2 Rolling-Horizon Collision Resolution (Rhcr)

RHCR Li et al. (2021b), is a state-of-the-art framework for solving L-MAPF. RHCR accepts as parameters a time window , and a replanning rate , and decomposes the L-MAPF problem into a sequence of W-MAPF queries that are solved one by one. In particular, after obtaining a W-MAPF plan for a query with time window , the next query is generated by executing the solution plan for for steps. This yields the agent’s start locations for query . The goal locations remain the same for agents that did not complete their task and new goal locations are assigned by the task assigner for agents that did. For a concrete example of a task assigner, see Section 5.

RHCR requires a W-MAPF solver as an internal submodule. Accordingly, several bounded-horizon versions of state-of-the-art MAPF algorithms were used, among which PBS proved to be the most effective as a W-MAPF solver within RHCR. Overall, it was observed that by using a small time window , and consequently a small replaning rate , RHCR can obtain faster solutions than alternative approaches. This, in turn, allows to solve queries containing more agents and potentially improve the system’s throughput Li et al. (2021b).

4 Leveraging Experience in Lifelong MAPF

In this section we present our algorithmic approach for leveraging experience in L-MAPF, which we call Experienced RHCR (exRHCR). First we discuss some properties of the original RHCR framework, which will be instrumental in the development of exRHCR.

Clearly, by improving the efficiency of the internal MAPF solver (e.g., PBS) we can improve the efficiency of RHCR overall, as it solves multiple W-MAPF queries. However, speeding up MAPF solvers without leveraging additional structure within the RHCR approach can be difficult. Next, we observe that there is an underlying structure emerging from the fact that the W-MAPF queries are consecutive, i.e., a solution plan to one query defines the next query . In particular, considering that a short replan rate is typically used by RHCR, consecutive W-MAPF queries can be quite similar to one another in terms of initial and goal agent locations. This, in turn, can lead to similar queries, in terms of the conflicts that need to be avoided and the agents between which they arise. Despite this, current RHCR implementations solve every subsequent W-MAPF query from scratch.

In contrast to RHCR, our approach allows to transfer experience gained from solving one W-MAPF query to the next one in order to improve planning times. In particular, exRHCR solves W-MAPF queries by interleaving between calls to the vanilla PBS and an extension of PBS which we call Experienced PBS (exPBS). Unlike the standard PBS, which begins the high-level search on a given W-MAPF instance from an empty priority set  at the root of the PT, exPBS is initialized with a specific priority set  for the root of the PT. This priority  encodes the experience gained from the solution of a previous W-MAPF query obtained via PBS. See illustration of exRHCR and exPBS in Figure 1. Additionally, in Figure 2 we illustrate two consecutive W-MAPF sub-queries ( and ) within an L-MAPF problem, that have the same conflicts in terms of types of agents and positions in case that PBS is used for solving both queries. However, the conflicts in can be avoided by reusing the experience obtained in to solve using exPBS.

Encoding experience as priorities has the desirable property of being generalizable in the sense that a priority used for solving one query can be applied to a similar query without necessarily over-constraining the solution. This is because priorities provide high-level specification that are likely to apply in a variety of queries, rather than hard constraints which need to be followed exactly (e.g., the constraints used in CBS that specify for each agent exact locations that should be avoided in specific points in time). Experience encoded in the form of a priority set can thus be thought as a way to warm-start consecutive MAPF queries.

In some cases previous experience might not be useful for the given query, in which case we want to ensure the robustness of our MAPF solver to uninformative experience. In exPBS this is achieved by restarting the search with an empty priority set in case that experience causes the search to diverge. In the remainder of this section we detail exRHCR and the internal MAPF solver exPBS.

4.1 Experienced Rhcr (exRHCR)

The exRHCR algorithm accepts as inputs a graph representing the environment, and a task assigner , which implicitly maintains the agent’s current start and goal positions as well as the remaining tasks. Using the task assigner as a black box helps keeping our description below generic. Similarly to RHCR, exRHCR has the parameters of time window , and replanning rate . Two additional parameters that we introduce for exRHCR are the experience lookahead , which determines the number of exPBS calls after every PBS call (see details below), and the exPBS high-level PT width limit . The latter parameter helps to identify situations where previous experience turns out to be uninformative for the current query, and to restart the high-level search with an empty priority set (see Section 4.2).

The motivation behind using the lookahead parameter is keeping the experience up-to-date, and employing it only when it is likely to be relevant to the current query, which should not differ vastly from the query from which the experience was extracted. We also ensure the experience does not become overconstrained by generating a brand new experience every few runs (rather than, e.g., constantly passing the priority set obtained from solving the current query as experience to the next query, and so on, where priorities are accumulated from each run).

We suggest to set around the value , which means the experience is used in all the queries of the planning horizon it was created in to maximize the experience utilization. See more details in Section 5.2.

exRHCR is detailed in Algorithm 1 with the differences from RHCR highlighted in blue. exRHCR first uses the task assigner to obtain the initial W-MAPF instance  and initializes to zero the counter which represent the current W-MAPF query index [Line 1]. It then iteratively solves  and all subsequent W-MAPF instances until all tasks are completed [Line 2]. It runs PBS with the time window  on query  to obtain a plan , which specifies agent paths for the current W-MAPF instance [Line 3]. In contrast to RHCR, the priority set of the PT node for which  was obtained is stored in memory. We term this the seed priority set and denote it by . Subsequently, and similar to RHCR, a new W-MAPF query  is generated by the task assigner, and the counter  is updated [Line 4]. This is done by executing the plan for steps to update the agents’ start locations, and potentially updating the agents’ goals and tasks. At this point, exRHCR differs from RHCR: for the next  W-MAPF instances [Line 5], unless all tasks are finished [Line 6], the algorithm invokes exPBS with the seed priority set (using the same window size and with a parameter , which will be explained shortly) [Line 7]. Given the new solution plan and replaning rate , the query and the counter are updated as before [Line 8] and this inner loop is repeated.

Inputs: L-MAPF query, graph , task assigner
Parameters: Window size , replanning rate , experience lookahead , width limit
Output: Paths for all agents

1:  ;   ;
2:  while not  do
3:      PBS ();
4:     ;   ;
5:     repeat times // run exPBS with PBS as fallback
6:         if return // tasks finished
7:          exPBS ();
8:         ;   ;
9:  end while
Algorithm 1 exRHCR

4.2 Experienced Pbs (exPBS)

exPBS utilizes PBS’s option of starting the high-level search from a given seed priority set which we denote as an experience rather than from the typically-used empty priority set. An additional difference is that exPBS avoids overexploring the PT by limiting the width of the explored PT rooted in , in case that  does not lead to a solution fast enough (or does not find a solution at all), and restarts the search with a priority by calling the “vanilla” PBS. We call the usage of PBS after exPBS terminates with no solution the “fallback”. To limit the tree width, exPBS explores the high-level PT using a Width-Limited Depth-First Search (WL-DFS), which we describe below, rather than the standard DFS used by PBS. Thus, exPBS accepts two additional parameters when compared to PBS: the seed priority set  and the width-limit parameter .

We provide additional details on WL-DFS. Given a tree graph, let its width denote the maximal number of nodes across all levels (where two nodes are on the same tree level if their distance from the root, or depth, is the same). The parameter specifies the maximal width allowed when exploring a PT using WL-DFS. To keep track of the current width of the PT we maintain for each level a counter representing the number of nodes in the level, and increment it whenever new nodes are added. When the width of the PT exceeds (or when no solution exists), WL-DFS aborts the search of the PT rooted in , and invokes the vanilla PBS solver without width limitation.

We chose to limit the search efforts by limiting the PT width as we found that it is a good indicator whether the experience is over-constrained or not (over-constrained experience tend to force the search to expand entire, wide subtrees). We found (see Section 5) that alternative measures such as the number of expanded nodes require per-instance tuning as exPBS uses a Depth First Search (DFS) on the PT and thus we need to account for the scenario and the number of agents. In contrast, we empirically found that the width is a robust parameter that does not require tuning. Indeed, in all our experiments (Section 5) we used the same width value. Finally, in Section 5.3 we study the effect of WL-DFS with different values of on the overall performance and show the robustness of the method to the specific choice of .

4.3 Alternative Instantiation with Total Priority

((a)) Warehouse environment.
((b)) Sorting environment.
Figure 3: Benchmark environments for L-MAPF problems. (LABEL:sub@fig:scenario_kiva) A Warehouse domain with obstacles (black) representing pods Liu et al. (2019). Working stations are drawn in blue and task locations around the inventory pods are drawn in green. (LABEL:sub@fig:scenario_sorting) A Sorting center with obstacles representing chutes Li et al. (2021b). Working stations are marked in blue, and task locations, which represent drop-off locations around chutes, are shown in green. Note that we used the undirected versions of these benchmarks.
((a)) Runtime for Warehouse.
((b)) Runtime for Sorting.
((c)) Success Rate for Warehouse.
((d)) Success Rate for Sorting.
Figure 4: L-MAPF experiments results. (LABEL:sub@fig:lifelong_kiva) and (LABEL:sub@fig:lifelong_sorting

) depict the average runtimes and standard deviation of a W-MAPF query for the different L-MAPF solvers. For each environment, the top plot shows the relative improvement of each method compared to 

RHCR. (LABEL:sub@fig:lifelong_success_rate_kiva) and (LABEL:sub@fig:lifelong_success_rate_sorting) depict the success rate obtained by all planners.

The key ingredients of our algorithmic framework are (i) the notion of experience derived from a non-informed planner (partial priority and PBS in the method described) and (ii) how the experience is used in the W-MAPF planner (exPBS in the method described). Here, we suggest an alternative instantiation and discuss its merits.

Recall that after running PBS, we obtain a partial priority , which is used within exPBS. Here we introduce a lightweight alternative which computes a total priority111A total priority specifies priorities between all the agents of the form .  that is consistent with (namely, if in then in ). Note that such a consistent total priority always exists and is easy to compute. Now, we run exPBS with the seed , which boils down to running a prioritized planner Silver (2005) and running PBS in case of failure.

Using a total priority is less generalizable than a partial priority and thus the planner is more likely to fall back to PBS. On the other hand, running a prioritized planner is extremely fast and when the problem is “easy” it may often succeed even with this more constrained notion of experience. As we will see in our experiments, this approach is advantageous in easier settings due to its simplicity.

5 Experimental Results

We provide an empirical evaluation of our exRHCR approach and compare it to RHCR. We implemented the algorithms in C++ and tested them on an Ubuntu machine with RAM and a Intel i7 CPU. We used benchmarks simulating a warehouse (Figure 2(a)) and a sorting center (Figure 2(b))222Code and benchmarks are available at https://github.com/NitzanMadar/exPBS-exRHCR. .

For both environments we randomly initialize the start location for each agent from all possible locations, and the task assigner samples a goal location randomly from the blue and green locations depicted in Figure 3. Each time an agent reaches a goal location of a specific color the task assigner specifies a new goal location uniformly at random from the other color that is not currently assigned to another agent.

In Section 5.1 we compare our exRHCR framework with RHCR. Next, in Sections 5.2 and 5.3 we focus on exRHCR with partial priorities, which has more promise in tackling hard L-MAPF instances, and discuss the effect that the lookahead  and the width limit parameters have on exRHCR and exPBS, respectively.

5.1 L-MAPF Experiments

For each of the two environments (i.e., Warehouse and Sorting) we generated multiple L-MAPF instances and tested them on a varying numbers of agents . In particular, we set and for Warehouse and Sorting, respectively. For each combination of an environment and , we randomly generated  L-MAPF instances.

We consider three planners, RHCR, and our two planners: (i) using a partial priority with exPBS, denoted by and (ii) the alternative instantiation described in Section 4.3 using a total priority with prioritized planning, denoted by . We set the replanning rate  and the time window to be and for both Warehouse and Sorting environments. For exRHCR we used an experience lookahead of , and width limit when using partial priority experience.

In Figure 4, we report the solvers’ average MAPF query runtime for a total L-MAPF execution of timesteps. In particular, each of the L-MAPF instances induces W-MAPF instances (a total of timesteps divided by the replan rate of ). We report the average runtime and standard deviation across all W-MAPF instances solved, as well as success rates. We limit the runtime for a W-MAPF query to seconds, after which we declare failure. In such a case the runtime of a failed W-MAPF instance is seconds.

exRHCR with improved the average runtime (over RHCR) up to  in Warehouse and  in Sorting. When considering , we can see (as in Ma et al. (2019a)) that the approach is highly effective when the number of agents is small. However, given a large number of agents, this method has a high failure rate (fallback is used roughly of the time for the largest number of agents). For a large number of agents achieves the best performance compared to RHCR and .

We provide a few more observations. (i) In all the L-MAPF experiments we performed, the difference in the average solution cost between PBS and exPBS was negligible (roughly ), which suggests that reusing experience does not hinder solution quality. (ii) As a result, the throughput difference per instance and number of agents is also negligible (roughly ). (iii) The improved runtime we described above (Figure 4) suggests that for a given time budget and an average W-MAPF query, exRHCR can accommodate more agents than RHCR. This suggests that the improved efficiency of our approach can improve the overall throughput in automated logistic domains.

5.2 Effect of Lookahead Parameter on Rhcr

Figure 5: Average PT depth of for the Warehouse and Sorting domains using 150 and 300 agents, respectively for different values of . Here, corresponds to running RHCR without using experience.

We consider the effect that the lookahead parameter has on the performance of . Specifically, we fixed and for the Sorting and Warehouse environments, respectively and evaluate the average PT depth as a function of for different values of (Figure 5)

First, we observe that our method improves over RHCR for every value of considered. Additionally, our suggested value of yields close-to-optimal values and is a good rule of thumb in the absence of any other information. Not surprisingly, selecting reduces performance in all of the cases tested, as it uses experience outside the planning horizon for which it was created. Finally, selecting may be beneficial when the replanning rate is smaller and the time window is large as the relevance of the experience seems to diminish as increases. For example, if and , in the last query the experience used, it’ relevant for only out of the steps.

We also compare our framework to two alternatives that do not use the lookahead parameter , as mentioned in Section 4.1: (i) running PBS once and using the same experience for all subsequent queries within exPBS and (ii) using the priority set from a solution for a given query as an experience in the next query. We used the Warehouse domain with the same parameter as in Section 5.1 and . We found that the average runtime decreases by 8% and 21%, respectively, using alternative (i) compared to exRHCR, and by 9% and 15%, respectively using alternative (ii). Additionally, the success rate dropped by 15.2% and 11.1% using (i), and by 10.6% and 11.1% using (ii).

5.3 Effect of Width Limit on Wl-Dfs

We assess how the width limit parameter affects the performance of exPBS and consequently . We use the Warehouse environment with and and fixed the number of agents to be . We report average metrics (detailed shortly) for MAPF queries for different values of in Figure 6. Note that for exPBS behaves very similarly to PBS as it reverts to PBS after the first backtrack. The setting of simulates a version wherein no fallback is taken and WL-DFS is equivalent to DFS.

For every value of we logged the following attributes and report averaged values across all MAPF queries (starting with the top attribute and going in a clockwise manner with respect to Figure 6): (i) average runtime; (ii) average PT width; (iii) average number of A* node expansions in the low-level exPBS and PBS search; (iv) average number of PT node expansions in the high-level exPBS and PBS search; (v) average depth of PT (in case of a fallback, it consists of the sum between the depth of the exPBS and PBS trees). The reported values in the plot are normalized by the maximal value per attribute.

Having a small opportunity to reuse experience , or allowing the search to explore the PT rooted in a seed experience indefinitely , significantly reduces the performance of exRHCR in comparison with the other values of  for all attributes indicating the benefits of WL-DFS.

However, the reason why each one of these extreme values is not useful is different: When , exPBS starts to run and in many cases (roughly, of the times) falls back to PBS. This implies a small overhead caused by unnecessarily running exPBS many times. In contrast, when , exPBS rarely falls back to PBS which may imply PT overexploration the PT when the PBS fallback could have found a solution quicker. This implies a large overhead of running exPBS incurred a small number of times.

Finally, we consider an alternative implementation that uses a limit on the number of node expansions (rather than width). We used the Warehouse domain with the same parameter as used in Section 5.1 and

. To estimate the number of nodes to be used as a parameter, we empirically found from previous experiments the average number of nodes expanded when exPBS succeeds in a given scenario. In our case, this was 99 and 209 for 180 and 220 agents, respectively. We evaluated a node limit of 80, 100 and 150 for 180 agents and a node limit of 150, 200 and 250 for 220 agents. We found that the success rate dropped by 0%–2.6% for 180 agents and by 5.6%–56% for 220 agents. Moreover, the runtime increased by roughly 1%–18% for 180 agents and by 0.7%–10% for 220 agents. We note the parameter selection is more sensitive and domain-dependent compared to width limit, as mentioned in Section 


Figure 6: Radar plot depicting the effect of  on various attributes of . All values are normalized according to the maximal value attained on the axis they belong to.

6 Conclusions and Future Work

In this paper we described exRHCR, a new approach for leveraging experience in L-MAPF instances, which allows to reduce computational effort in constituent MAPF queries by reusing priority sets from previous queries within W-MAPF solvers. We demonstrated empirically that our approach can substantially improve runtime and has the potential to increase system throughput by incorporating additional agents.

Our work introduces various directions for future research. In the short run, we plan to explore approaches for systematic selection of the parameter

, and additional heuristics for terminating and restarting the

exPBS search.

In the long run, it would be beneficial to consider advanced experience-retrieval strategies. For example, can we design an “experience database” that contains queries and their solution priorities? Here one can potentially retrieve the nearest-neighbor query under some metric to be used in exPBS. Learning-based approaches can come in handy as well by, e.g., identifying similar queries and generating experience artificially.

Finally, the algorithms we introduced can be used to create a hierarchical approach for solving W-MAPF problems in L-MAPF: after running PBS we obtain a partial priority  and then compute a consistent total priority . We then start by running a prioritized planner using . In the case of failure, we fall back to exPBS which uses  to warm-start the search and if this planner fails, we fall back to PBS.


This research was supported in part by the Israeli Ministry of Science & Technology grants no. 3-16079 and 3-17385, by the United States-Israel Binational Science Foundation (BSF) grants no. 2019703 and 2021643, and the Ravitz Fellowship. The authors also thank the anonymous reviewers for their insightful comments and suggestions.


  • M. Barer, G. Sharon, R. Stern, and A. Felner (2014) Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In SoCS, pp. 19–27. Cited by: §1.
  • S. Choudhury, K. Solovey, M. J. Kochenderfer, and M. Pavone (2021) Efficient large-scale multi-drone delivery using transit networks. J. Artif. Intell. Res. 70, pp. 757–788. Cited by: §1.
  • L. Cohen and S. Koenig (2016) Bounded suboptimal multi-agent path finding using highways.. In IJCAI, pp. 3978–3979. Cited by: §1.
  • D. Coleman, I. A. Şucan, M. Moll, K. Okada, and N. Correll (2015) Experience-based planning with sparse roadmap spanners. In ICRA, pp. 900–905. Cited by: §1.
  • D. Dayan, K. Solovey, M. Pavone, and D. Halperin (2021) Near-optimal multi-robot motion planning with finite sampling. In ICRA, pp. 9190–9196. Cited by: §1.
  • E. D. Demaine, S. P. Fekete, P. Keldenich, H. Meijer, and C. Scheffer (2019) Coordinated motion planning: reconfiguring a swarm of labeled robots with bounded stretch. SIAM J. on Comput. 48 (6), pp. 1727–1762. Cited by: §1.
  • O. Gordon, Y. Filmus, and O. Salzman (2021) Revisiting the complexity analysis of conflict-based search: new computational techniques and improved bounds. In SoCS, pp. 64–72. Cited by: §1.
  • W. Hönig, S. Kiesel, A. Tinka, J. W. Durham, and N. Ayanian (2019) Persistent and robust execution of MAPF schedules in warehouses. RA-L 4 (2), pp. 1125–1131. Cited by: §1.
  • T. Huang, S. Koenig, and B. Dilkina (2021) Learning to resolve conflicts for multi-agent path finding with conflict-based search. In AAAI, pp. 11246–11253. Cited by: §1.
  • O. Kaduri, E. Boyarski, and R. Stern (2020) Algorithm selection for optimal multi-agent pathfinding. In ICAPS, pp. 161–165. Cited by: §1.
  • J. Li, G. Gange, D. Harabor, P. J. Stuckey, H. Ma, and S. Koenig (2020) New techniques for pairwise symmetry breaking in multi-agent path finding. In ICAPS, Vol. 30, pp. 193–201. Cited by: §1.
  • J. Li, D. Harabor, P. J. Stuckey, A. Felner, H. Ma, and S. Koenig (2019a) Disjoint splitting for multi-agent path finding with conflict-based search. In ICAPS, Vol. 29, pp. 279–283. Cited by: §1.
  • J. Li, D. Harabor, P. J. Stuckey, H. Ma, and S. Koenig (2019b) Symmetry-breaking constraints for grid-based multi-agent path finding. In AAAI, Vol. 33, pp. 6087–6095. Cited by: §1.
  • J. Li, W. Ruml, and S. Koenig (2021a) EECBS: a bounded-suboptimal search for multi-agent path finding. In AAAI, Cited by: §1.
  • J. Li, A. Tinka, S. Kiesel, J. W. Durham, T. K. S. Kumar, and S. Koenig (2021b) Lifelong multi-agent path finding in large-scale warehouses. In AAAI, pp. 11272–11281. Cited by: §1, §2.2, §3.2, §3.2, Figure 3.
  • M. Liu, H. Ma, J. Li, and S. Koenig (2019) Task and path planning for multi-agent pickup and delivery. In AAMAS, pp. 1152–1160. Cited by: §1, §2.2, Figure 3.
  • H. Ma, D. Harabor, P. J. Stuckey, J. Li, and S. Koenig (2019a) Searching with consistent prioritization for multi-agent path finding. In AAAI, Vol. 33, pp. 7643–7650. Cited by: §1, §3.1, §5.1.
  • H. Ma, W. Hönig, T. K. S. Kumar, N. Ayanian, and S. Koenig (2019b) Lifelong path planning with kinematic constraints for multi-agent pickup and delivery. In AAAI, pp. 7651–7658. Cited by: §1.
  • H. Ma, J. Li, T. K. S. Kumar, and S. Koenig (2017) Lifelong multi-agent path finding for online pickup and delivery tasks. In AAMAS, pp. 837–845. Cited by: §1, §2.2.
  • H. Ma, C. A. Tovey, G. Sharon, T. K. S. Kumar, and S. Koenig (2016) Multi-agent path finding with payload transfers and the package-exchange robot-routing problem. In AAAI, pp. 3166–3173. Cited by: §1.
  • O. Salzman and R. Stern (2020) Research challenges and opportunities in multi-agent path finding and multi-agent pickup and delivery problems. In AAMAS, pp. 1711–1715. Cited by: §1, §1.
  • G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant (2015) Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219, pp. 40–66. Cited by: §1.
  • G. Sharon, R. Stern, M. Goldenberg, and A. Felner (2013) The increasing cost tree search for optimal multi-agent pathfinding. Artificial Intelligence 195, pp. 470–495. Cited by: §1.
  • D. Silver (2005) Cooperative pathfinding.. Artificial Intelligence and Interactive Digital Entertainment 1, pp. 117–122. Cited by: §2.1, §4.3.
  • R. Stern, N. R. Sturtevant, A. Felner, S. Koenig, H. Ma, T. T. Walker, J. Li, D. Atzmon, L. Cohen, T. S. Kumar, et al. (2019) Multi-agent pathfinding: definitions, variants, and benchmarks. In SoCS, pp. 151–158. Cited by: §1.
  • P. Surynek, A. Felner, R. Stern, and E. Boyarski (2016) Efficient SAT approach to multi-agent path finding under the sum of costs objective. In ECAI, pp. 810–818. Cited by: §1.
  • G. Wagner and H. Choset (2015) Subdimensional expansion for multirobot path planning. Artificial Intelligence 219, pp. 1–24. Cited by: §1.
  • J. Yu (2016) Intractability of optimal multirobot path planning on planar graphs. RA-L 1 (1), pp. 33–40. Cited by: §1.


We extend the explanation of the exRHCR toy example we used in Figure 2 by providing a visualization of how PBS constructs its PT and obtains a solution for the first W-MAPF query . Recall that and we have three agents (solid shapes) with three assigned tasks represented by goal location in dashed shapes.

In Figure 8 we visualize the final structure of the PT tree of PBS after a solution is found. The high-level search begins in the root node  with the empty priority set. Figure 6(a) illustrates the solution paths obtained by the low-level search for node . In particular, each agent takes its shortest path without regard to the other agents as no priorities were specified. The paths are:

where for a pair the values represent the row and column numbers of a cell. As shown, the first collision occurs between agent and agent , at timestep in position (this position is visualized with a criss-cross red-green pattern denoting the agents in collision). The second collision occurs between agent and at timestep in position (marked by blue-red criss-cross). Due to the collision, the node is expanded into the two nodes and . As the first collision occurs between and , the priority is added to the priority set of , and to .

Next, we assume for the purpose of the example that DFS chooses for expansion (expanded nodes are marked with gray background) and executes the low-level search to find paths that abide to the priority set . As there are no specified priorities between and  their paths from  remain as-is, whereas the path of agent is updated and a wait action is added at position , as it needs to be executed while giving precedence to (see Figure 6(b)). The new path of is . The current state search is still a conflict between as detailed before. Consequently, two child nodes and with the additional priorities and , respectively, are added to the PT.

In the next step of the high-level search, DFS picks for expansion, where , which induces the total priority order of . Correspondingly, a low-level solution can be defined to be

As there are no conflicts for the first timesteps, we have a found a feasible solution (Figure 6(c)). Note that this solution is not unique: other priorities set can be used to solve this W-MAPF, and different paths can be obtained by the low-level search depending on tie breaks.