1 Introduction
One of the fundamental tasks for a team of mobile agents is planning trajectories for the agents so as to avoid collisions between them. The problem arises in a variety of application domains such as in the teams of autonomous robots or in air traffic management systems.
The problem of finding coordinated paths for a group of agents is studied since 1980s. In theory it is well known that finding even nonoptimal collisionfree paths from defined start positions to goal positions for a group of objects in restricted 2d space is PSPACEhard hopcroft and thus (unless P=PSPACE) there exist instances that cannot be solved in polynomial time.
Recent advancements in the field of multiagent path planning include methods that solve restricted variant of the cooperative path finding problem, where pointlike agents move using synchronous discrete steps on a graph. For this variant of the problem there exist complete polynomial algorithms such as BIBOX surynek or Push & Rotate dewilde , nonpolynomial (but often effective) optimal algorithms standley and anytime algorithms cap_2013_a . However, the graph abstraction used in this problem formulation is not suitable for systems where the size of the agents cannot be neglected.
Research in trajectory planning for multiple agents has been made in the field of decoupled and centralized continuous planning domains. Decoupled approaches often offer better efficiency, but lose completeness. Frequently used decoupled technique is prioritized planning erdmann . Here priorities are assigned to agents which specify the order in which their singleagent planning will take place. Each agent takes into account the plans of all agents with higher priority as moving obstacles. This method can be very efficient especially in uncluttered environments, but it is intrinsically incomplete. The method requires that the individual replanning takes place in the spacetime. It is relatively straightforward to perform such planning with forward search algorithms such as A*, but it remains unclear how should be such a planning efficiently done in continuous space e.g. using some of the samplingbased algorithms.
Little attention has been devoted so far to the problem of finding coordinated trajectories for agents in continuous environments with polygonal obstacles. A straightforward solution is to construct the joint state space of all agents and search such a space using one of the samplingbased methods for continuous path planning. Alternatively, one can resort to a reactive collision avoidance techniques that proved to be very efficient in practice. The most prominent representatives of the two mentioned approaches are RRT* and ORCA.
In 2011 Karaman and Frazzoli karaman published the RRT* algorithm, an anytime extension of RRT (rapidly exploring random tree lavalle ) that is probabilistically complete and asymptotically optimal. The algorithm is designed for continuous state spaces in which it can efficiently find a path from a given start state to a given goal region by incrementally building a tree that is rooted at the start state and spans towards randomly sampled states from some given state space. Once such a tree first reaches the goal region, we can follow its edges backwards to obtain the first feasible path from start state to the target region. With more and more samples being added to the tree, the solution incrementally improves.
Optimal reciprocal collision avoidance (ORCA) is one of the many methods based on velocity obstacle paradigm. It was proposed in 2011 by Jur van den Berg et al. vandenberg and demonstrated to be capable of generating collisionfree motion for a large number of agents in a cluttered workspace with local guarantees on optimality and safety. Even though ORCA is a powerful tool for solving multiagent collision avoidance, there are many instances of the problem that it is not able to solve due to the lack of cooperation and planning. An example of such an instance is any scenario where an agent has to make a long trip away from a narrow corridor to clear the way to another agent.
The goal of this paper is to analyze the problem of planning collisionfree continuous paths for multiple holonomic agents and evaluate the performance of the two approaches in 2d polygonal environments.
We study the applicability of the RRT* algorithm for multiagent path planning and compare it with ORCA – one of the most widely used techniques for the problem, which has been successfully used in several software and hardware multiagent implementations. Moreover, we propose a novel variant of RRT* algorithm that is specifically suited for multiagent path planning. The new ORCARRT* combines RRT* planning with reactive collision avoidance. Our experiments show that the combined algorithm is able to solve instances that were not solved by either of the techniques alone. Further, on many other instances the new algorithm provides higher quality solution that the other algorithms.
The rest of the paper is structured as follows. In Section 2, we state the multiagent path finding problem. Then, in Section 3 we describe the multiagent RRT* and introduce three variants of the algorithm that are applicable for multiagent path planning in continous environments. The paper is concluded by a largescale experimental analysis comparing the performance of the individual algorithms.
2 Problem statement
We define the multiagent path finding problem as follows. Consider agents operating in 2dimensional Euclidean space with polygonal obstacles. The starting positions of agents are given as tuple , where is the starting position of th agent. The tuple gives the agents’ destinations. We assume that the agents have discshaped bodies, where the radius of agent is denoted as . The final trajectory of th agent is a mapping of time to 2 dimensional Euclidean space of the agent, where . The time is the minimal time after which the th agent remains at its destination,
Let denote the regions of the space occupied by the obstacles. Then, the collisionfree property of the set of multiagent trajectories can be defined as
where is Euclidean distance in 2 dimensional space and denotes a disc of radius centered at . The task is to find tuple such that the agents never collide and the sum of times agents spend on the path to their final destinations is minimal. The problem statement is defined as follows:
3 RRT* for multiagent path finding
The RRT* algorithm was originally designed for singleagent motion planning, however, it can be adapted to solve multiagent path planning problems. In this section we will discuss how can be the RRT* algorithm leveraged in a multiagent setting.
We let the RRT* algorithm operate in the joint state space , where is the state space of the th agent. The initial state in the joint state space is , goal state is and any other sampled state is also tuple containing positions of all agents. The algorithm searches for a path from to , which can be then decomposed into a set trajectories for each individual agent . The pseudocode of RRT* algorithm for multiagent path planning is exposed in Algorithm 1.
The distance metric used in the algorithm is the sum of Euclidean distances of all agents:
where and .
Now, we provide definitions of the following RRT* primitive procedures:
Nearest Neighbor: Given a graph and a state , the function returns a vertex that is the closest to state in terms of the distance metric .
Near Vertices: Given a graph , a state and the numbers , the function returns a set , where , is a constant and is the dimension of the space .
Local Steering Procedure: Given two states and , a domainspecific local steering procedure returns true if the steering procedure is able to connect the state to the state . In the context of multiagent path finding, the steering procedure seeks for a set of paths such that and for each agent and all paths are mutually collisionfree. We consider three different methods for local steering which are discussed in detail in Section 3.1.
Find best parent: Given a graph , a set and a state , the function returns the best parent vertex for state , i.e. the vertex that yields the lowest cost of path .
Rewire: Given a graph , a set and a state the function examines all vertices from to see whether their cost can be improved by going through the new vertex . If there are any such vertices, the tree is rewired so that these vertices become children of , which allows them to improve the cost of their path from the initial state .
The main loop of the multiagent RRT* algorithm (see Algorithm 1) works as follows: Until interrupted, the algorithm samples states from the joint state space (line 4). Each such a random sample is used in an attempt to extend the tree. First, the nearest vertex from the tree to the random sample is determined (line 5). Then, the algorithm attempts to connect the vertex to the random sample using the local steering procedure (line 6). If the two points can be connected using the local steering procedure the algorithm proceeds to its optimizing part, otherwise it tries again with another sample. Then a set of vertices that are within a specified distance to the is determined (line 7). The set serves two purposes: Firstly it is used to find the best parent for the new vertex (line 8). Secondly, after the new vertex is added to the tree, the vertices from are rewired if it improves their cost (line 10).
Once the goal state^{1}^{1}1Here we assume that the point
is sampled with a certain nonzero probability (e.g. 1%) to overcome the need to define the goal as a region instead of a point.
is successfuly added to the tree, one can follow the links backwards to obtain a valid path from the state to the state . However, even after the first solution has been returned, the algorithm does not stop iterating. Instead, the algorithm continously extends and rewires the tree, which leads to incremental discovery of further higherquality paths.3.1 Multiagent Extensions
The crucial component influencing the performance of the RRT* algorithm is the local steering procedure. We consider a multiagent steering function of the following form:
where . The function is an extension function that returns a set of paths , where is a path for agent such that and . Note that the steering procedure returns true only if the trajectories generated by the extension function are 1) avoiding all static obstacles and 2) the trajectories are mutually collisionfree.
We study three methods that can serve as a valid extension function for the multiagent steering procedure: 1) Line extension connects individual agents using straight lines, 2) VisibilityGraph extension connects individual agents using the optimal singleagent path, and 3) ORCA extension generates joint path by simulating ORCA algorithm, where each agent tries to follow its singleagent optimal path.
3.1.1 Line Extension
The first considered extension method is a relatively straightforward extension of classical singleagent RRT* planner as exposed in karaman , which connects two states using straight line paths. In a multiagent setting, we connect each of the agents by a straight path:
where the is a function defined as
where denotes the maximal speed of the th agent. Observe that such a trajectory prescribes that the agent should move to the point along the straight line at maximum speed; when the point is reached, the agent stays there. Note that it can happen that a solution returned by this method is rejected by the steering procedure if a) any of the trajectories intersects some of the obstacles or b) some of the trajectories are in mutual collision.
3.1.2 Visibility Graph Extension
Another considered method uses the singleagent optimal paths (i.e. optimally avoids obstacles, but ignores interactions among the agents) between the points and . In 2d polygonal environments, one can efficiently find such a path by constructing a visibility graph and subsequently by finding the shortest path in such a graph. Let the shortest path in the visibility graph of th agent be represented as a sequence of edges . The resulting trajectory of agent will follow this path. The Visibility Graph extension function can be formally defined as
where is a function defined as
where function returns the source vertex of the edge . Note that the trajectories returned by this method are guaranteed to be obstacle avoiding, therefore, the steering procedure may only reject them if the resulting trajectories are in mutual collision.
3.1.3 ORCA Extension
The last considered extension method generates trajectories by following optimal singleagent paths together with Optimal Reciprocal Collision Avoidance (ORCA) technique, which is used for reactive collision avoidance between the agents. ORCA is a decentralized reactive collision avoidance method based on the velocityobstacle paradigm that performs optimization in the space of velocity vectors for each agent. During ORCA, each agent creates a velocity obstacle for every other agent based on their currently observed position and velocity. Each such a velocity obstacle cuts out a halfplane in the space of possible velocities for the agent. Given the agent’s desired velocity and the constraints induced by the other agents, a linear program with
constraints is constructed and solved to obtain the optimal velocity vector the agent should follow in the next time step vandenberg .To provide the desired velocity vector in each time step, we use the same visibility graph based navigation as in the visibility graph extension method. At each timestep, the agent finds the optimal path from its current position to its destination on the visibility graph and sets the desired velocity vector to point at this direction.
The ORCA extension function is defined as
where is a trajectory of th agent obtained by simulating the ORCA method with as start positions and as goal positions.
The trajectories returned by ORCA are guaranteed to be obstacle avoiding. They are also guaranteed to be mutually collisionfree. However, due to the reactive nature of the algorithm, the method is not guaranteed to find a solution if a solution exists. Since some of the executions may end up in infinite loops or deadlocks, it is not uncommon to see the agents being stuck at one point and never reach their designated target position. Therefore one has to bound the maximum number of timestep each ORCA simulation is allowed to perform.
Our hypothesis is that there is a significant number of problem instances that cannot be solved by ORCA alone, but that could be efficiently solved by multiagent RRT* with ORCA extensions.
An example of an artificial instance that was not solved by ORCA is in Figure 1. In such a scenario, the reactive technique tryies to resolve the conflict by letting both agents to go back in the corridor, but at some point it decides to return back to its desired directon, resulting in a deadlock situation.
4 Experimental analysis
In this section we experimentally evaluate features of proposed extension of multiagent RRT* algorithm. We compare three RRT* based algorithms  Line RRT*, Visibility Graph RRT* and ORCA RRT* with reactive ORCA. The experiments have been performed on four types of 2d testing environments – empty, door, cross and maze environments (see Figure 2). All environments are constrained by a fixed boundary having 1000x1000 dimension. The metrics for comparison has been focused on success rate of the algorithms and the quality of the solution for various environment settings. The measured parameters are:

idealistic solution cost is the cost of a solution for which the function in Equation 2 is relaxed in a way that it omits its first constraint i.e. permits collisions between agents. The idealistic (i.e. lower bound) solution cost is defined as
The goal arrival time is obtained by computing a singleagent optimal path for each agent using the visibility graph.

suboptimality measure gives indicative quality ratio by comparison of the given solution to the lower bound of the solution provided by idealistic solution cost. It shows how many times is the given solution worse than the idealistic solution. It is defined as
.

success rate shows the percentage of scenarios solved by the algorithms. The success rate depends on suboptimality in a way that a given solution is successful only if its suboptimality is lower than a defined threshold. If the algorithm does not provide any solution within time frame defined by experiment setting, it is automatically considered unsuccessful.
4.1 Benchmark set
The experiment has been performed on a benchmark set composed of four environments as depicted in Figure 2 with the number of agents varying from 2 to 10 and the agent body radii ranging from 50 to 100. For each combination of environment, number of agents and agent radius, the benchmark set contains 10 different settings of agents’ start and goal positions, altogether 2160 benchmark scenarios. Since RRT* is a stochastic algorithm, we run the algorithm five times with different random seeds for each problem instance. Altogether, the presented experimental results are based on 34560 runs.
The benchmark set was created by an algorithm that guarantees that for each problem instance there is exactly one collision cluster i.e. the path finding problem cannot be divided into mutually noncolliding groups of agents, which would be easier to solve. This is guaranteed by adding agent’s random start and goal positions iteratively only when a collision occurs between agent’s shortest path from start to goal and any other agent’s shortest path. The procedure that creates a problem instance is given in Algorithm 2.
An example of one such generated problem instance is depicted in Figure 3. It shows the maze environment with 10 agents of minimal radus and maximal radius and the corresponding optimal singleagent trajectories obtained by running the A* algorithm on a visibility graph.
In the next sections we will discuss the results of experimental evaluation from the perspective of the instance set coverage/success rate of the algorithms and the solution quality/suboptimality.
4.2 Success rate
We compare the success rate of four implemented algorithms with respect to different numbers of agents and different radii of agents. The limit on runtime of the tested algorithms has been set to 5 seconds. To illustrate the distribution of solution quality, we plot these graphs for different suboptimality thresholds:
We can observe three significant phenomena. Firstly, the success rate of RRT* based algorithms LineRRT* and Visibility GraphRRT* drops fast with increasing number of agents. This behavior was expected because the planning takes place in a state space exponential in the number of agents. These algorithms are therefore able to solve the defined problems for the number of agents up to approximately 5 in the studied scenarios. Note that RRT* is provably probabilistically complete karaman , but for some instances given the limited runtime the algorithm was not able to find even the first feasible solution. On the other hand the success rate of both RRT* algorithms is stable with changing threshold.
Second, the success rate of the ORCA reactive technique drops with the increasing radii of the agents. This can be partly explained by the existence of corridors in the test scenarios that with the increasing radii of the agents become "narrow" and therefore hard to solve locally. If the agents are small, then they can swap positions without leaving the corridor, which is easy to achieve using local collision avoidance. If the agents are large, then there will be corridors that can accommodate only one agent at a time, which requires longer term planning, where one agent keeps the way clear for the other agent to pass which is only achievable with planning approaches.
Further, we can observe that the solutions generated by ORCA are often of low quality (notice the difference between Figure 4(d) and 5(d)). This typically happens in crowded environments, where are the agents likely to get stuck in slowly evolving deadlock situations.
Third, the success rate of the ORCARRT* algorithm is close to one for both high number of agents and high agent radius. It drops only with the combination of high extremes of both parameters. This behavior is achieved by the combination of planning and reactive approaches. The planning component is able to solve the instances that require planning, while the reactive component is able to solve instances with higher numbers of agents.
Moreover, we can see that the success rate of ORCARRT* deteriorates much less rapidly with decreasing threshold than pure ORCA, which implies that ORCARRT* in general finds in the given runtime limit higher quality solutions than pure ORCA alone. Since the first extension in ORCARRT* is in fact identical to running pure ORCA, these results confirm that the algorithm exhibits incremental behavior, i.e. it improves the quality of the generated solution in time.
An important observation is that the there are instances that neither of the algorithms was able to solve on its own, but that got solved when the two algorithms were combined. For many other instances the combination of RRT* and ORCA provided a higherquality result within the given runtime limit than each of the algorithms alone.
4.3 Suboptimality
Figure 8 and 8 show the histograms of ranks assigned to algorithms for runtime limits 5 and 1 seconds. A rank from 1 to 4 is assigned to each algorithm for each experiment according to its solution suboptimality compared to other algorithms. If two algorithms achieve the same suboptimality, the ranks are assigned to them randomly. If an algorithm was not able to find any solution, its rank is 4. Rank 1 means that an algorithm achieved lowest suboptimality for particular problem instance. Difference between figures 8 and 8 shows how the ranks of the algorithms (e.g. solution quality – suboptimality) depend on the running time limit. We observe that VisibilityGraphRRT* algorithm achieved the worst ranks. LineRRT* is slightly better due to it’s ability to sample the state space very fast. ORCA algorithm achieved the second rank and ORCARRT* always achieved the first rank in the majority of problem instances. The difference between Figure 8 and 8 shows that while ORCA finds the best solution early, it does not benefit from added runtime. The variants of RRT* incrementally improve the solution and thus their performance is more dependent on the given runtime limit.
The results of the experiments confirm the ability of ORCARRT* to find higherquality solutions compared to both RRT* variants. As a complement to the best problem instance set coverage, the ORCARRT* is also dominant in terms of the quality of the returned solution.
5 Conclusions
In this paper we studied the problem of finding coordinated paths for holonomic agents in 2d polygonal environments. This problem is challenging due to its prohibitive complexity. We studied several RRT*based algorithms for multiagent coordinated path finding and a reactive approach ORCA. We found that while both approaches have limited coverage of the problem instance space, an approach combining planning and reactive technique benefits from both its parts, providing a better problem instance set coverage and higher solution quality.
We call the new algorithm ORCARRT*. While RRT*based algorithms often suffer from the exponential growth of the state space and thus are unable to solve instances with high number of agents, the reactive part of ORCARRT* is able to overcome this problem. On the other hand reactive techniques are often unable to solve problems containing local minima. Due to its RRT* planning part the ORCARRT* algorithm can avoid such local minima by random sampling of the state space.
ORCARRT* is an anytime algorithm, which can iteratively improve the provided solution. We experimented with several running time limits and examined the differences in the provided solutions. In the future work we plan to deploy the ORCARRT* on hardware agents and investigate possibility of the extension towards nonholonomic agents.
Acknowledgements
The presented work was supported by the Czech Republic Ministry of Education, Youth and Sports, grants no. 7H11102 (D3CoS) and no. LD12044, and by the ARTEMIS Joint Undertaking under the number 269336 (www.d3cos.eu).
References
 [1] B. de Wilde, A. W. ter Mors, and C. Witteveen. Push and rotate: cooperative multiagent path planning. AAMAS, 2013.
 [2] M. Erdmann and T. LozanoPerez. On multiple moving objects. Algorithmica, 2:477–521, 1987.
 [3] J. Hopcroft, J. Schwartz, and M. Sharir. On the complexity of motion planning for multiple independent objects; pspacehardness of the warehouseman’s problem. IJRR, 3(4):76–88, 1984.
 [4] Karaman and Frazzoli. Samplingbased algorithms for optimal motion planning. I. J. Robotic Res., 20(5):846–894, 2011.
 [5] S. M. LaValle and J. J. Kuffner. Randomized kinodynamic planning. I. J. Robotic Res., 20(5):378–400, 2001.
 [6] T. S. Standley. Finding optimal solutions to cooperative pathfinding problems. AAAI, 2010.
 [7] P. Surynek. A novel approach to path planning for multiple robots in biconnected graphs. In Proceedings of ICRA 2009, pages 3613–3619, 2009.
 [8] J. Van Den Berg, S. J. Guy, M. Lin, and D. Manocha. Reciprocal nbody collision avoidance. In Robotics research, pages 3–19. Springer, 2011.
 [9] M. Čáp, P. Novák, J. Vokřínek, and M. Pěchouček. Multiagent RRT*: Samplingbased cooperative pathfinding (extended abstract). In Proceedings of the AAMAS’13, 2013.