## 1. Introduction

The problem of planning a series of routes for mobile robots to destinations and avoiding collisions can be modeled as a cooperative pathfinding

problem. Traditionally, this problem is often simulated in highly organized environments such as grids, which include several obstacles and agents. To find the paths of these agents, the straightforward method is looking for the answer in a joint configuration space which is composed of the state spaces of several single agents. Such a space is typically searched using a heuristic guided function such as A*

hart_formal_1968. However, the problem of cooperative pathfinding has been proved to be PSAPCE-hardhopcroft_complexity_1984.In 2005, David Silversilver_cooperative_2005 introduced three decoupled approaches which decompose the problem into several single-agent navigations: Local Repair A*(LRA*), Hierarchical Cooperative A*(HCA*) and Windowed Hierarchical Cooperative A* (WHCA*). While these methods can efficiently find the solution, the completeness and optimality of these algorithms cannot be guaranteed.

In 2010, Standleystandley_finding_2010 proposed two techniques in centralized approach which takes account of all agents at once, called Independence Detection(ID) and Operator Decomposition(OD). The combination of these two techniques, the ID+OD algorithm, which is capable of solving relatively large problems in milliseconds, is both complete and optimal. Standley then refined the algorithm into an anytime algorithm, Optimal Anytime(OA), which first finds out a solution rapidly, and then utilizes any spare time to improve that solutionstandley_complete_2011 incrementally. While these algorithms offer a high solution quality to this problem, their scalability is limited.

There were also many attempts to use the sampling-based algorithm, such as RRTlavalle_randomized_2001, to solve the multi-agent path planning problem, e.g.kamio_random_2005. But these algorithms cannot guarantee the convergence to the optimal solution, except ferguson_anytime_2006 and otte_any-com_2014, which, however, did not compare to the traditional techniques in terms of the solution quality and scalability. After Karaman and Frazzoli introduced an asymptotically optimal algorithm, which is called RRT*karaman_sampling-based_2011, in 2011, Čápcap_multi-agent_2013 marries this algorithm to classical multi-agent motion-planning algorithm and proposes Multi-agent RRT* algorithm, which outperforms Standley’s OA algorithm concerning success rate and runtime. While this algorithm can solve the multi-agent path planning problem efficiently, the scalability of MA-RRT* in relatively dense environments is also limited. The reason is that the GREEDY function in this algorithm cannot guide the random tree to avoid the obstacles effectively and reach the goals in each timestep.

To employ the MA-RRT* on a more complex world that contains more obstacles, one can increase the runtime of this algorithm to improve the possibility of finding a solution. However, this method cannot guarantee that the paths will be found in time, neither the solution quality. Up to now, there is no prior work which enhances the ability of MA-RRT* in finding the solutions in relatively dense environments. This work presents a new MA-RRT* based algorithm, called Multi-agent RRT* Potential Field(MA-RRT*PF), which works by employing a potential field in heuristic function to guide the GREEDY search of the rapidly random tree.

The main contributions of this paper are as follows: 1) The proposed MA-RRT*PF has better scalability than MA-RRT* and even the isMA-RRT*, which is the improved version of MA-RRT*, in relatively dense environments while its solution quality comes close to the MA-RRT*. 2) The enhanced version of MA-RRT*PF, informed-sampling MA-RRT*PF(isMA-RRT*PF), performs very similarly to isMA-RRT* concerning to the suboptimality of solutions, while its scalability is much better than isMA-RRT*, MA-RRT* and MA-RRT*PF in relatively dense environments.

## 2. problem formulation

To make a fair comparison with the MA-RRT* algorithm, which is simulated on graphs, this paper test all algorithms in a four-connected grid world and uses the following definition. Assuming that n agents are running on a Euclidean space, and each agent, which takes up a single cell of the grid world, has a unique start location and destination. For each timestep, all agents can move to its four neighbor cells if it is free or stay on its current locationstandley_complete_2011. A cell is free means that it will not be occupied by an agent at the end of the timestep and does not include an obstacle. The total number of timesteps that the agent has taken from its start state to the destination is regarded as the cost of the individual agent’s path. If all the agents can reach their goal without collision, then the sum of each path cost is taken as the cost of the final solution, which is the metric of solution quality.

## 3. Multi-agent RRT*

The multi-agent RRT* algorithm is designed based on RRT* algorithm, which can expeditiously find a path from a specific start location to a given target region in continuous state space by incrementally building a treekaraman_sampling-based_2011. Even the first solution is found, the RRT* algorithm will continue to improve the solution quality by sampling new random states, which causes to the discovery of a lower-cost path.

MA-RRT* inherits all the properties of RRT*. However, it is also different from RRT* in the grid world. The main difference is that, in continuous configuration space, if two nodes are mutually visible, then they can be connected. While in the discrete space, two nodes can only be connected if the heuristic search can find a valid path between the two nodes. Thus, MA-RRT* more like a graph version of RRT*(G-RRT*), unless it searches for the shortest path in a configuration space which stands for the joint-state of all agentscap_multi-agent_2013. The algorithm 1 shows the skeleton of the MA-RRT* algorithm, and the EXTEND and GREEDY procedure are shown in the algorithm 2 and 3 respectively.

The MA-RRT* begins with a tree that is rooted at the joint initial state and continues to sample the free joint space before extending the tree to that random sample . The tree will span towards the free space through this process until it is interrupted. At each iteration, the SAMPLE routine randomly chooses a free state in the joint space. Then, the EXTEND function adds a new node to the tree(if the tree does not contain this node), by steering from the nearest node to , after which it searches the nodes from that are near the within a distance of and chooses a node between those nodes and as the parent of to make has the lowest cost to the initial state. Finally, it updates the cost of nodes from in the rewiring procedure, which checks whether those nodes decrease the total cost by assigning as the parent.

The GREEDY function, shown in algorithm 3, is to check whether the local steer procedure can connect the two states from the configuration space or not. In this function, the joint state x is decomposed to n single-agent states to search the candidate path for each agent separately. In each timestep, the algorithm steers all agents in a particular order from their start node to their destination d by merely depending on heuristic guided search, which utilizes Euclidean distance to the target as a metric. If all agents could reach the target or the cost of paths exceeds the user-specified threshold , the local steer process will return the calculated paths of all agents. Next, the algorithm checks whether the routes generated for all agents collide or not. If they are conflicted, the algorithm will return the old paths which are created by the prior timestep; if not, it will return the newest paths as a series of joint transitional states(an edge of the tree) between the s and d (If all agents could reach the target) or s and currently arrived node (If at least one agent did not reach goal and the cost of paths exceeds the user-specified threshold ).

The performance of MA-RRT* can be improved by frequently sampling the regions that are more likely to include high-quality solutionscap_multi-agent_2013. Therefore, the new SAMPLE routine, shown in algorithm 5, samples the random states for every agent separately around each agent’s optimal path and returns a joint state. This improved version is called informed-sampling MA-RRT*(isMA-RRT*), which runs the G-RRT* for every agent first and then implements MA-RRT* for all agents, as shown in algorithm 4.

## 4. Multi-agent RRT* Potential Field

MA-RRT* is proved to be convergent, sound, complete and optimal in cap_multi-agent_2013. However, The GREEDY procedure, which guides the tree spanning towards the goal location relying on Euclidean-distance-based heuristic function, cannot avoid obstacles effectively.

The reason is that the heuristic function only chooses the nodes that are closest to the target from the child nodes of node , which is incorporated into the path at prior timestep, as the next point to be merged. While at the next timestep, the same procedure may choose again as the next point to be incorporated if the obstacles blocked in the straight line from the current position to the target. Thus, the GREEDY search loops between the two points until the maximum cost, , is reached, losing the chance to connect two given states in the free space and reducing the exploration ability of the search tree.

To deal with this problem, the MA-RRT*PF is proposed, which utilizes the skeleton of MA-RRT* and modifies it with a potential field procedure contained in the GREEDY function. The skeleton of MA-RRT*PF is shown in algorithm 6. Like the MA-RRT*, the tree grows from an initial state and samples a random state in the joint configuration space by the SAMPLE procedure. Then it spans towards from and chooses the best parent from and before updating the cost of nodes from in the EXTEND function, which is shown in the algorithm 7.

In the GREEDY function that guides the random tree extend to the random samples, shown in algorithm 8, the G-value is introduced to form a potential field in a motion graph which contains all obstacles and all agents’ start and goal locations. On this map, the obstacles are marked with G-value 1, while start and goal locations are marked with -1, and for those free cells, the value is initially set with 0.

At the beginning of the GREEDY function, the motion graph is decomposed to n graphs() according to the different agents. For each agent, the motion graph is identical to the original map, including map size and G-values; But instead of recording all start and goal locations, the map only contains the start point and target of the agent itself. Like the same procedure in the GREEDY function of MA-RRT*, the joint state of all agents is also broken down into n single-agent states to steer each agent from the start node s to the goal node d. While the core difference is that if all agents could find a path between s and d by both and functions, the algorithm will return the paths of all agents if it is collision-free, or it will return the path found in the prior timestep. As shown in algorithm 8, the calculates the Euclidean distance from node to the goal, and the represents the G-value of position on the map.

At the end of each timestep, the value of the point that the agent currently incorporates into the path will plus 1, while the others remain the same, forming a potential field on the map. This potential field will guide the agent to explore the areas that are explored infrequently in the single-agent configuration space. Thus, the function filters out the nodes closest to the target, and the function filters out the nodes that are rarely explored before. The heuristic function chooses a node that satisfies both conditions from the child of to incorporate into the path. As a result, the algorithm can quickly break the loop mentioned before in the GREEDY function of MA-RRT*. Thus, the MA-RRT*PF can extensively extend the search tree to the free space by connecting most states, even when the obstacles blocked on the straight line between those states.

However, Like the MA-RRT*, the MA-RRT*PF algorithm evenly samples the states in agents’ joint configuration space, which would cause a relatively lower convergence rate. To improve the speed of MA-RRT*PF in finding the solutions, we take the idea from isMA-RRT*. In the improved version, the algorithm runs G-RRT* for every single agent to find some optimal paths and then runs the MA-RRT*PF for all agents together with biased sampling, which samples states near the single-agent optimal path. This algorithm is called informed-sampling MA-RRT*PF(isMA-RRT*PF), shown in algorithm 9. The SAMPLE procedure is presented in algorithm 10.

## 5. experiments and results

This paper compared the capability of the MA-RRT*, MA-RRT*PF, isMA-RRT* and isMA-RRT*PF in terms of scalability and suboptimality. In the sampling procedure, all four algorithms choose the final goal state as the new random sample with the probability of

p, which is the user-specified parameter, to speed the procedure of spanning towards the target. All experiments were performed on matlab 2018a 64-bit in a common program framework and tested on intel core i7 8700k 3.7 GHz CPU.To make a fair comparison between these four algorithms, this paper utilizes the problem instance set of cap_multi-agent_2013, mentioned as follows, to evaluate the capability of the algorithms. The agents run in a grid-like square-shaped world, where each agent occupies a single cell. At each timestep, all agents can stay on the cell waiting for other agents or move to the 4-neighborhood cell of its location if these cells are free. To construct a relatively dense environments, the 25 percent of the grids were removed to represent obstacles or barriers. A unique start location and destination were selected randomly for every agent.

The problem instances set varied in the following two parameters: The grid sizes: 10x10, 30x30, 50x50, 70x70, 90x90 and the numbers of agents: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, which are the same to cap_multi-agent_2013. The two parameters were combined in each grid size and number of agents, and for each combination, this paper randomly set 120 instances. Therefore, this experiment contained 6000 different problem instances in total. All algorithms were implemented on the same instance set, and the runtime of each instance was limited to 5 seconds. The results are plotted in Figure 1 and Figure 2.

In Figure 1, the values in the x-axis are the index of instances which are sorted according to the runtime needed when the first valid solution is found, the values in the y-axis is the runtime when the algorithm finds the first solution. For each algorithm, the ordering can be different. The last point of x-position in the performance curve indicates the number of instances which are solved within 5 seconds. It can be seen that MA-RRT* resolved 19% of the instances, MA-RRT*PF 58%, isMA-RRT* 34% and isMA-RRT*PF 80%, from the problem instance set.

The relative solution quality is shown in Figure 2, which compared suboptimality of all algorithms in terms of the first returned solution and the best solution found within 5 seconds runtime limit. The suboptimality is calculated by the following formula:

As shown in the Figure 2, MA-RRT*PF and isMA-RRT*PF have a similar rate of suboptimality to MA-RRT* and isMA-RRT* respectively.

## 6. conclusion and future work

This paper proposed MA-RRT*PF, an anytime algorithm that offers a practical and straightforward solution to the multi-agent path planning problem in relatively sparse environments. Unlike MA-RRT*, whose random tree is usually blocked by the obstacles, our techniques employee potential field procedures to GREEDY function to guide the rapidly random tree efficiently avoid obstacles and reach the goal. We compared the capability of our algorithm with MA-RRT* and isMA-RRT*. The experiment results show that MA-RRT*PF performs much better than MA-RRT*, even isMA-RRT*, in terms of scalability in solving multi-agent path planning problems while still maintaining the solution quality. Besides, the improved version, isMA-RRT*PF, has the highest success rate(80%) in finding the solutions when compared to the other three algorithms, while its solution quality is very close to isMA-RRT*. Thus, MA-RRT*PF and its improved version, isMA-RRT*PF, are more practical than MA-RRT* and isMA-RRT* in solving most multi-agent path planning problems in relatively dense environments.

This paper simulated the algorithm on a motion graph, which connected the states in the tree by a valid path. However, the algorithm can also be extended to continuous space by using the straight-line visibility approach in place of the GREEDY function.

In the future, we will continue to improve the scalability of MA-RRT*PF in relatively dense environments by employing different functions.

The authors would like to thank Dr.Kaigui and Dr.Dongming for providing the guidance. The work is supported by the GS501100001809National Natural Science Foundation of Chinahttp://dx.doi.org/10.13039/501100001809 under Grant No.: GS50110000180961662083