## 1 Introduction

The multi-agent path finding (MAPF) problem consists a graph, and a set of agents. Time is discretized into time steps. The arrangement of agents at time-step is denoted as . Each agent has a start position and a goal position . At each time step an agent can either move to an adjacent location or wait in its current location. The task is to find a sequence of move/wait actions for each agent , moving it from to such that agents do not conflict, i.e., do not occupy the same location at the same time. Formally, an MAPF instance is a tuple . A solution for is a sequence of arrangements such that results from valid movements from for , and .

MAPF has practical applications in video games, traffic control, robotics etc. (see [Sharon et al.2015] for a survey). The scope of this paper is limited to the setting of fully cooperative agents that are centrally controlled. MAPF is usually solved aiming to minimize one of the two commonly-used global cumulative cost functions: (1) Sum-of-costs (denoted ) is the summation, over all agents, of the number of time steps required to reach the goal location [Dresner and Stone2008, Standley2010, Sharon et al.2013, Sharon et al.2015]. Formally, , where is an individual path cost of agent . (2) Makespan: (denoted ) is the time until the last agent reaches its destination (i.e., the maximum of the individual costs) [Surynek2010, Surynek2014, Surynek2015].

Optimal solvers for MAPF can be divided to two classes. (1) Search-based solvers. These algorithms consider MAPF as a graph search problem. Some of these algorithm are variants of the A* algorithm that search in a global search space – all different ways to place agents into vertices, one agent per vertex [Standley2010, Wagner and Choset2015]. Others algorithms such as Icts [Sharon et al.2013] and Cbs [Sharon et al.2015, Boyarski et al.2015] search different search spaces and employ novel (non-A*) search tree. All these search-based solvers were originally designed for the sum-of-costs MAPF variant. But, with simple modifications, they can be modified to work for the makespan variant. (2) Reduction-based solvers. By contrast, many recent optimal solvers reduce MAPF to known problems such as CSP [Ryan2010], SAT [Surynek2012]

, Inductive Logic Programming

[Yu and LaValle2013a] and Answer Set Programming [Erdem et al.2013]. While most reduction-based solvers address the makespan variant, an optimal reduction-based solver for the sum-of-costs variant was recently introduced [Surynek et al.2016b]. In this paper we further widen this direction and introduce SAT-based suboptimal solvers.Finding optimal solutions for both variants is NP-Hard [Yu and LaValle2013b, Surynek2010]; as the state-space grows exponentially with (# of agents). Therefore, many suboptimal solvers were developed. Some suboptimal solvers aim to to quickly find paths for all agents while paying no attention to the quality of the solution, i.e., how far it is from the optimal solution. We refer to such algorithms as any solution MAPF solvers. Many any solution MAPF solvers were proposed [Ryan2010, Cohen et al.2015, Silver2005, Botea and Surynek2015, Sajid et al.2012], and there is even a polynomial time any solution MAPF solver. These algorithms are usually used when is large and some of them are not complete.

In some cases, the user might ask for some guarantee on the quality of the solution returned. A common type of such a requirement is that the solution found is bounded suboptimal, that its cost is where is the cost of the optimal solution and is a parameter that sets the desired amount of suboptimality - sometimes called the error. A solver that returns bounded-suboptimal solutions is referred to as a bounded-suboptimal algorithm or more specifically -bounded suboptimal.

Despite the large number of papers devoted to optimal or to suboptimal solutions, we are only aware of two approaches that provided bounded suboptimal solutions Ecbs [Barer et al.2014] and Cbs with highways [Cohen et al.2015], both are modifications of the conflict based search (Cbs) algorithm. In this paper we introduce two new SAT-based solvers: uMdd-Sat, an any solution MAPF solver, and eMdd-Sat, a bounded-suboptimal MAPF solver. We experimentally compare our new SAT solvers with relevant any solution or bounded-suboptimal algorithms and show that our SAT solvers is comparable and sometimes outperform other algorithms in many circumstances.

## 2 Background: Optimal SAT-based Solver

Our suboptimal algorithms presented in this paper are based on a SAT-based optimal MAPF (called Mdd-Sat) algorithm for the sum-of-costs variant which was [Surynek et al.2016a]. The main idea in Mdd-Sat is to convert the optimization problem (finding minimal sum-of-costs) to a sequence of decision problems – is there a solution of a given sum-of-costs . A formula has been introduced such that is satisfiable if and only if there is a solution of sum-of-costs . We now provide sufficient details about that are needed for the rest of this paper. More information on this formula and its exact variables can be found in [Surynek et al.2016a].

Let be the cost of the shortest individual path for agent (ignoring collisions with the other agents), and let . is called the sum of individual costs (SIC) [Sharon et al.2013]

. It is a known admissible heuristic for optimal sum-of-costs search algorithms, since it is a lower bound on the minimal sum-of-costs.

is calculated by relaxing the problem by omitting the other agents, solving single-agent shortest path problem. Similarly, we define . is length of the longest of the shortest individual paths and is thus a lower bound on the minimal makespan. is built on top of the following understanding about the maximal makespan () of solutions with sum-of-costs . Let .###### Proposition 1

If a solution with sum-of-costs exists then its makespan is at most .

Proof outline [Surynek et al.2016a] Clearly, if there is a solution of cost
then its makespan will be no greater than . But, we want a
solution of cost , which is plus some . In the worst-case
all the extra moves belong to the agent with the largest
shortest-path. Thus, the resulting path
of that agent would be , as required.

Based on Proposition 1, is constructed by generating a time expansion graph [Surynek et al.2016a] (denoted TEG) of layers. A TEG is a directed acyclic graph (DAG) in which the set of vertices of the underlaying graph are duplicated for all time-steps from 0 up to the desired number of layers (). Possible actions (move along edges or wait) are represented as directed edges between successive time steps. Formally a TEG with layers is defined as follows:

###### Definition 1

Time expansion graph of depth is a digraph where and .

Figure 1 illustrates a TEG with 4 layers. has a a propositional variable for every pair of agent and edge in the TEG. Setting this variable to TRUE represents that the edge is traversed by that agent. Thus, an assignment to these variables represents a solution the the MAPF problem. Appropriate constraints are added to to make sure the solution is valid.

To verify that a solution to represents a solution with sum of costs lower than , we add a cardinality constraint over these agent-edge variables. Cardinality constraint is a constraint that allows counting variables set to true in a formula and in general bound a numeric cost. The SAT literature offers several techniques for encoding a cardinality constraint [Bailleux and Boufkhad2003, Silva and Lynce2007]. Formally, for a bound and a set of propositional variables the cardinality constraint is satisfied iff the number of variables from the set that are set to TRUE is . Actually we use the cardinality constraint to bound the number of edges each agent traverses in addition to the first edges. So, that the cardinality constraint in fact ensures that the number of such extra moves is at most . This is done by modifying the TEG, marking some edges are standard and others as extra (see Figure 1). We do so for efficiency reasons, following Surynek et al. (SurynekFSB16).

Algorithm 1 summarizes the Mdd-Sat algorithm. is initialized as zero and in every iteration it is increased (Line 1). is set to (Line 1) and to (Line 1). Next the formula is built, representing the decision problem of asking whether there is a solution with sum-of-costs and makespan . A SAT solver is tasked to check if is solvable. If such a solution exists, it is returned. Otherwise and consequently and are incremented by 1 and another iteration of building and running the SAT solver is activated. This algorithm is complete but cannot detect unsolvability. However, this can be detected in polynomial time using other algorithms [Kornhauser et al.1984].

## 3 From Optimal to Suboptimal Solver

To convert SAT-MDD (Algorithm 1) to a suboptimal any solution algorithm, we simply remove the cardinality constraints from the construction of . Let denote the resulting formula. Since has all the constraints in except the cardinality constraints, then clearly a satisfying assignment to still represents a feasible solution (no collisions between agents etc.). Since is less constrained than , we expect it to be solved faster. Indeed, we observed this in our preliminary experiments. Using in Algorithm 1 instead of looses, however sum-of-cost optimality.

Hence, replacing with in Algorithm 1 leads to a sub-optimal version of the Mdd-Sat solver that is faster than the optimal version. We refer to this unbounded version of Mdd-Sat as uMdd-Sat. A key question is what is the suboptimality of the solutions uMdd-Sat returns? Is it really unbounded? We show later that even without the cardinality constraints, the suboptimality of the solutions outputted is bounded, due to how is constructed. Next, we show how to control the suboptimality of the returned solution by introducing a relaxed version of the optimal cardinality constraints, allowing the algorithm’s user to balance runtime and suboptimality.

### 3.1 Bounded Suboptimal SAT-based Solver

The key to our bounded-suboptimal SAT-based solver is
that it modified the parameter
used in construction of .
In SAT-MDD, is incremented by one in every iterations.
Allowing parameter to be less restrictive; that is, replace
with , where is an
integer value, produces formula of the same size but representing more
solutions.^{1}^{1}1The change from to does not affect the number of clauses that represent the cardinality constraint, because we coded the cardinality constraints using a sequential counter, whose size is proportional to the number of
propositional variable involved but not to the value of the bound [Sinz2005].
Since , we expect
a formula with the sum-of-costs bounded by to be easier to
solve than that with the original .

The following proposition shows that for a solvable MAPF the sum-of-costs of the solution obtained by the above process differs from the optimal one by at most . Let us denote the formula constructed for a TIG with layers (representing a makespan of ) and parameter as .

###### Proposition 2

Let be a non-negative integer and let be the first satisfiable formula encountered in the sequence of formulae , ,…,, . Then solution represented by has sum-of-costs where is the optimal sum-of-costs for .

Proof: Formula in the penultimate iteration was not solvable. This means that no solution of makespan at most and sum-of-costs at most exists. But we also know that all solutions of sum-of-costs fit under the makespan of at most . Hence unsolvability of formula together with implies that there is no solution of sum-of-costs at all. Therefore, the optimal sum-of-costs is at least . The solvability of tells that there is a solution of of sum-of-costs which differs from the optimum by at most .

Observe that the only property of we used was that it is a non-negative integer but there is no requirement that it must be constant across individual iterations of the algorithm. Proposition 2 holds even if we use a non-negative as a function of instead of a constant. This property can be used to modify the above SAT-based framework to an -bounded suboptimal algorithm.

###### Corollary 1

Given an error the iterative SAT-based suboptimal framework can modified to an -bounded suboptimal algorithm by appropriate setting of .

Proof: Let . Hence the sum-of-costs of the solution returned by the algorithm is at most while the optimum is at least hence the ratio between the sum-of-costs of returned solution and the sum-of-costs of the optimal one is at most .

The pseudo-code of the -bounded suboptimal SAT-based algorithm is presented as Algorithm 2. We refer to this algorithm as eMdd-Sat.

Note that a further minor improvement of the pseudo-code could be done which exploits the original optimization of the formula. Observe that in any solution to a MAPF problem it holds that . Therefore, if . then there is no need to add cardinality any constraints to , as the solution is guaranteed to be bounded by .

This inequality represents a limit of the degree of relaxation achievable by allowing more freedom over the cost bound imposed by the cardinality constraint. Hence the -bounded suboptimal SAT-based algorithm tends to be near optimal anyway. Precisely, efectively the algorithm will be -bounded in the worst case.

## 4 Experimental Evaluation

We performed a large set of experiments to evaluate uMdd-Sat and eMdd-Sat, our suggested any solution and bounded suboptimal versions of Mdd-Sat. We used various 4-connected grids as the underlying graphs.

(i) The first set of small densely populated instances consisted of girds of sizes 88, 1616, and 3232 with nodes occupied by obstacles. To obtain instances of various difficulties the number of agents was varied from 1 to 32, 1 to 128, and 1 to 256 in case of 88, 1616, and 3232 grids respectively (the step was varied from 1 in the range of small units of agents to 16 in the range of hundreds of agents). Ten random instances were genereted for each number of agents by randomly choosing an initial position and then performing a random walk to set the target position.

(ii) Instances of the second testing set are based on three structurally different large maps taken from Sturtevant’s repository [Sturtevant2012]. These are Dragon Age Origion (DAO) maps denoted as brc202d, den520d, and ost003d which are a standard benchmark for MAPF (see Figure 2). Again the number of agents was varied from 1 to 256 to obtain instances of various difficulties (the step ranged from 1 to 16) and 10 random instances were generated for each number of agents.

All tests were run on a machine with CPU Intel i7 3.2 Ghz, 8 GB RAM under Ubuntu Linux 15 and Windows 10 respectively. The timeout for all solvers has been set to 500 seconds.

### 4.1 Evaluation of the Unbounded Case

In this section we evaluate the performance of uMdd-Sat, our any solution suboptimal SAT-based solver. We compared uMdd-Sat with two suboptimal algorithms that are by design unbounded: Push-and-Swap [Luna and Bekris2011, de Wilde et al.2014], which is a polynomial time rule-based algorithm, and UniRobot[Surynek2015], which is a SAT-based algorithm that reduces MAPF with agents to a problem of finding vertex disjoint paths [Seymour1980]. We also compared uMdd-Sat against Ecbs [Barer et al.2014] a state-of-the-art bounded-suboptimal algorithm that is based on the CBS MAPF solver. To make the comparison with unbounded MAPF solver fair, we set the suboptimality bound of Ecbs to a very large number (500).

Runtime results of the experiments with the unbounded versions on small grids and DAO maps are shown in Figures 3 and 4 respectively. Runtimes for all testing instances that were below the limit of 500 seconds were sorted and shown in the figure (the x-axis corresponds to ordering of instances according to increasing runtime and the y-axis corresponds to runtime in seconds). The intuitive understanding of this presentation is that the faster algorithm has its line in the lower part of the figure.

Consider first the runtime results for the 16x16 and 32x32 grids (Figure 3). Push-and-Swap is the fastest algorithm in these small grids and UniRobot turned out to be worst performing algorithm. The comparison of Ecbs and eMdd-Sat shows that in the easier instances (those that are sorted in the left-hand side of the -axis), Ecbs is faster, while for the harder instnances (those on the right-hand side of the -axis) eMdd-Sat performs better.

Now consider the runtime results on the DAO maps (Figure 4), which are much larger than the aforementioned grids. Here too, UniRobot turned out to be the worst performing algorithm, and in general not applicable on large DAO maps. The bottom-right plot in Figure 4, which shows the number of instances solved by the remaining algorithms as a function of the runtime, that is, for a given value the value shows the number of instances solved given seconds. All three algorithms (Ecbs, Push-and-Swap, and uMdd-Sat) managed to solve all instances within the time limit, but uMdd-Sat was somewhat slower than the other two. Consider the other plots in Figure 4 we can conclude that in this domain Ecbs was in general faster.

While the compared algorithms do not provide a bound on the sum-of-costs of their solutions, it may still be of interest in practice. We observed that the sum-of-costs of the solutions found by tested algorithms were significantly different from the optimum and from each other. Figure 5 shows the sum-of-costs of the solutions found for the DAO instances. This presentation is similar to the plots in the previous figures, but here the instances are sorted according ot their sum-of-costs. The interpretation is the same: lower curves corresponds to finding lower sum-of-costs. The results show that both UniRobot and Push-and-Swap generate worse solutions than Ecbs and eMdd-Sat. The solutions quality returned by Ecbs and eMdd-Sat are comparable, with slightly better solutions found by Ecbs in some cases.

Altogether we can conclude that for unbounded suboptimal case uMdd-Sat is a reasonable option: perhaps not always the fastest or the one with the lowest sum-of-costs, but comparable to the state-of-the-art. This is encouraging, especially since if SAT solvers continue to become better, the performance of SAT-based algorithms such as uMdd-Sat will continue to improve.

### 4.2 Evaluation of the Bounded Case

Next, we conducted experiments to evaluate eMdd-Sat, our bounded-suboptimal Mdd-Sat variant. Here we only compared againts Ecbs as the other algorithms (Push-and-Swap and UniRobot) do not guarantee a bounded-suboptimal solution. The first set of experiments evaluate the behavior of both algorithms for different values of , i.e., for different required suboptimality bounds. The same set of instances used for the unbounded experiments were also used here.

First, we measured the success rate of each algorithm, which is the the ratio of successfully solved instances under a predetermined time limit. The time limit in our experiments was 500 seconds. Figure 6 shows the algorithms’ success rate (-axis) as a function of the required suboptimality bound (the -axis), which ranges from 1.1 to 1.0. Results for 3232 with 100 agents and ost003d with 200 agents are shown. It can be observed that eMdd-Sat is better than Ecbs for closer to optimal suboptimality bounds, outperforming Ecbs starting at bound and lower. For the 32x32 grids, which are more dense than the DAO map, the advantage of eMdd-Sat even begin earlier, again highlighting the advantage of SAT-based algorithms in harder problems. Next, we focus our evaluation on bound , to focus on the cases where eMdd-Sat is effective.

Results for for small girds and DAO maps are presented in figures 7 and 8. In this we can observe that Mdd-Sat tends to be faster in all small grids for the harder problems. In our analysis (results not shown for space limitation), we observed that these were the cases with higher density of agents.

Results for DAO maps indicate that in easier instances containing fewer agents Ecbs is faster. However with the increasing difficulty of instances and density of agents the gap in performance is narrowed until eMdd-Sat starts to perform better in harder instances. This trend is best visible on the ost003d map.

Let us note that the maximum achievable by relaxing the cardinality constraint within the suboptimal eMdd-Sat approach for DAO maps is: for brc202d, for den520d, and for ost033d all cases with 200 agents. Setting these or greater bounds in eMdd-Sat is equivalent to complete removal of the cardinality constraint. That is, it is equivalent to running uMdd-Sat.

## 5 Conclusions

The SAT-based approach represented by eMdd-Sat has an advantage of using the learning mechanism built-in the external SAT solver. On the other hand, search based methods represented by Ecbs are specially designed for solving MAPF and do not bring the overhead of a general purpose SAT solver. We attribute the good performance of the eMdd-Sat approach to clause learning mechanism.

This conclusion corresponds with the fact that advantage of eMdd-Sat appears in harder instances with long runs of the SAT solver where the clause learning mechanism has enough time to prune the search space efficiently. On the other hand the SAT-based approach has an overhead of building formula and communication with the external solver which negativelly affects performance in sparsely occupied instances.

One of possible future research directions is to integrate learning mechanism into specialized MAPF solver which would eliminate the overhead of usage of the external SAT solver. Vertices represented within layers of MDD can be regarded as values of a multi-value decision variables representing positions of agents at individual time steps. Learning mechanism over such finite domain variables would be very similar to nogood recording known from modern CSP solvers [Dechter2003]. Reasoning about MAPF in the context of nogood recording and CSP would open door to higher level constraint propagation than that offered by SAT’s unit propagation. Lastly, we believe that this work will open the way to developing bounded-suboptimal SAT-based algorithms for other planning problems.

## 6 Acknowledgements

This paper is supported by a project commissioned by the New Energy and Industrial Technology Development Organization Japan (NEDO) and the joint grant of the Israel Ministry of Science and the Czech Ministry of Education Youth and Sports number 8G15027.

## References

- [Bailleux and Boufkhad2003] O. Bailleux and Y. Boufkhad. Efficient CNF encoding of boolean cardinality constraints. In CP, pages 108–122, 2003.
- [Barer et al.2014] Max Barer, Guni Sharon, Roni Stern, and Ariel Felner. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. In Symposium on Combinatorial Search (SoCS), 2014.
- [Botea and Surynek2015] A. Botea and P. Surynek. Multi-agent path finding on strongly biconnected digraphs. In AAAI, pages 2024–2030, 2015.
- [Boyarski et al.2015] E. Boyarski, A. Felner, R. Stern, G. Sharon, D. Tolpin, O. Betzalel, and S. Shimony. ICBS: improved conflict-based search algorithm for multi-agent pathfinding. In IJCAI, pages 740–746, 2015.
- [Cohen et al.2015] L. Cohen, T. Uras, and S. Koenig. Feasibility study: Using highways for bounded-suboptimal mapf. In SOCS, pages 2–8, 2015.
- [de Wilde et al.2014] B. de Wilde, A. ter Mors, and C. Witteveen. Push and rotate: a complete multi-agent pathfinding algorithm. JAIR, 51:443–492, 2014.
- [Dechter2003] Rina Dechter. Constraint processing. Elsevier Morgan Kaufmann, 2003.
- [Dresner and Stone2008] K. Dresner and P. Stone. A multiagent approach to autonomous intersection management. JAIR, 31:591–656, 2008.
- [Erdem et al.2013] E. Erdem, D. G. Kisa, U. Oztok, and P. Schueller. A general formal framework for pathfinding problems with multiple agents. In AAAI, 2013.
- [Kornhauser et al.1984] D. Kornhauser, G. Miller, and P. Spirakis. Coordinating pebble motion on graphs, the diameter of permutation groups, and applications. In FoCS, pages 241–250, 1984.
- [Luna and Bekris2011] R. Luna and K. E. Bekris. Push and swap: Fast cooperative path-finding with completeness guarantees. In IJCAI, pages 294–300, 2011.
- [Ryan2010] M. Ryan. Constraint-based multi-robot path planning. In ICRA, pages 922–928, 2010.
- [Sajid et al.2012] Qandeel Sajid, Ryan Luna, and Kostas Bekris. Multi-agent pathfinding with simultaneous execution of single-agent primitives. In SOCS, 2012.
- [Seymour1980] P.D. Seymour. Disjoint paths in graphs. Discrete Mathematics, 29(3):293 – 309, 1980.
- [Sharon et al.2013] G. Sharon, R. Stern, M. Goldenberg, and A. Felner. The increasing cost tree search for optimal multi-agent pathfinding. Artif. Intell., 195:470–495, 2013.
- [Sharon et al.2015] G. Sharon, R. Stern, A. Felner, and N. R. Sturtevant. Conflict-based search for optimal multi-agent pathfinding. Artif. Intell., 219:40–66, 2015.
- [Silva and Lynce2007] J. Silva and I. Lynce. Towards robust CNF encodings of cardinality constraints. In CP, pages 483–497, 2007.
- [Silver2005] D. Silver. Cooperative pathfinding. In AIIDE, pages 117–122, 2005.
- [Sinz2005] C. Sinz. Towards an optimal CNF encoding of boolean cardinality constraints. In CP, 2005.
- [Standley2010] T. Standley. Finding optimal solutions to cooperative pathfinding problems. In AAAI, pages 173–178, 2010.
- [Sturtevant2012] Nathan R. Sturtevant. Benchmarks for grid-based pathfinding. Computational Intelligence and AI in Games, 4(2):144–148, 2012.
- [Surynek et al.2016a] Pavel Surynek, Ariel Felner, Roni Stern, and Eli Boyarski. Efficient SAT approach to multi-agent path finding under the sum of costs objective. In ECAI, pages 810–818, 2016.
- [Surynek et al.2016b] Pavel Surynek, Ariel Felner, Roni Stern, and Eli Boyarski. An empirical comparison of the hardness of multi-agent path finding under the makespan and the sum of costs objectives. In Symposium on Combinatorial Search (SoCS), 2016.
- [Surynek2010] P. Surynek. An optimization variant of multi-robot path planning is intractable. In AAAI, 2010.
- [Surynek2012] P. Surynek. Towards optimal cooperative path planning in hard setups through satisfiability solving. In PRICAI, pages 564–576. 2012.
- [Surynek2014] P. Surynek. Compact representations of cooperative path-finding as SAT based on matchings in bipartite graphs. In ICTAI, pages 875–882, 2014.
- [Surynek2015] P. Surynek. Reduced time-expansion graphs and goal decomposition for solving cooperative path finding sub-optimally. In IJCAI, pages 1916–1922, 2015.
- [Wagner and Choset2015] G. Wagner and H. Choset. Subdimensional expansion for multirobot path planning. Artif. Intell., 219:1–24, 2015.
- [Yu and LaValle2013a] J. Yu and S. LaValle. Planning optimal paths for multiple robots on graphs. In ICRA, pages 3612–3617, 2013.
- [Yu and LaValle2013b] J. Yu and S. M. LaValle. Structure and intractability of optimal multi-robot path planning on graphs. In AAAI, 2013.