multiagent_planning
Implementation of several multiagent trajectory generation algorithms
view repo
This paper introduces a novel algorithm for multiagent offline trajectory generation based on distributed model predictive control (DMPC). By predicting future states and sharing this information with their neighbours, the agents are able to detect and avoid collisions while moving towards their goals. The proposed algorithm computes transition trajectories for dozens of vehicles in a few seconds. It reduces the computation time by more than 85 previous optimization approaches based on sequential convex programming (SCP), with only causing a small impact on the optimality of the plans. We replaced the previous compatibility constraints in DMPC, which limit the motion of the agents in order to avoid collisions, by relaxing the collision constraints and enforcing them only when required. The approach was validated both through extensive simulations for a wide range of randomly generated transitions and with teams of up to 25 quadrotors flying in confined indoor spaces.
READ FULL TEXT VIEW PDF
We present a distributed model predictive control (DMPC) algorithm to
ge...
read it
This paper presents a new efficient algorithm which guarantees a solutio...
read it
Agent behavior is arguably the greatest source of uncertainty in traject...
read it
This paper presents a pedestrian motion model that includes both low lev...
read it
Dynamic traversal of uneven terrain is a major objective in the field of...
read it
This paper presents a distributed optimization method for informative
tr...
read it
We propose a model predictive control approach to pushing based manipula...
read it
Implementation of several multiagent trajectory generation algorithms
Generating collision-free trajectories when dealing with multiagent systems is a safety-critical task. In missions that require cooperation of multiple agents, such as crop inspection [1] and warehouse management [2], it is desirable to safely drive the agents from their current locations to a set of final positions. Solving this task, known as multiagent point-to-point transition, is therefore an integral component of any robust multiagent system.
There are two main variations of the multiagent point-to-point transition problem: the labelled and the unlabelled agent problem. In the former, each agent has a fixed final position that cannot be exchanged with other agents [3, 4]; in the latter, the algorithm is free to assign the goals to the agents, as to ease the complexity of the transition problem [5]. This paper will focus on the labelled agent problem.
A common approach is to formulate this as an optimization problem. One of the first techniques developed relied on Mixed Integer Linear Programming (MILP), modelling collision constraints as binary variables
[3]. This method, although viable, is computationally expensive and not suited for large groups of agents.More recently, Sequential Convex Programming (SCP) [6] has been used for faster computation compared to MILP. In [4], SCP is used to compute optimal-energy trajectories for quadrotor teams. Although useful for small teams, the algorithm does not scale well with the number of agents. A decoupled version of that algorithm is proposed in [7, 8], which provides better scalability at the cost of suboptimal solutions. However, the required decoupling leads to a sequential greedy strategy (i.e., turning agent trajectories previously solved for as obstacles for subsequent agents) that decreases the success rate as the number of agents increases.
Discrete approaches divide the space in a grid and use known discrete search algorithms [9], but limit the initial and final locations to be vertices of the underlying grid. Other mixed approaches combine optimization techniques and predefined behaviours to manage collisions in 2D [10].
Distributed optimization approaches can effectively include pair-wise distance constraints [11]. Furthermore, the computational effort is distributed among the agents and therefore reduced compared to centralized approaches. Optimal reciprocal collision avoidance (ORCA) leverages velocity obstacles to guarantee collision-free trajectories for holonomic [12] and non-holonomic [13] agents. While provably safe, the method may be conservative by assuming a constant velocity profile over a time horizon, effectively limiting the motion of the agents. Distributed model predictive control (DMPC) [14] has been applied in coordination tasks such as formation control [15, 16], but not explicitly for point-to-point transitions. Particularly interesting are synchronous implementations of DMPC [17], where the agents simultaneously update their predictions, further reducing execution time thanks to parallel computing.
Both ORCA and DMPC require explicit coordination of the agents in order to guarantee collision avoidance. In ORCA, the agents are aware of the collision avoidance policies of nearby neighbours, and coordinate to share the responsibility of avoiding the collision. In DMPC, a compatibility constraint is usually introduced to limit the position deviation between prediction updates [18]. In this paper, we empirically demonstrate that, in moderately large swarms, agents without these explicit coordination schemes still exhibit high collision avoidance capabilities. Furthermore, the removal of the explicit coordination has the potential of reducing algorithm complexity and conservative behaviour of the agents.
The key contributions of this paper are three-fold: we present a fast and parallelizable DMPC algorithm for multiagent point-to-point transitions, replace previous conservative compatibility constraints in DMPC by relaxing the collision constraints, and demonstrate the efficacy of the method with quadrotor transitions in high-density scenarios.
The rest of the paper is organized as follows: Section II states the problem. Section III introduces the optimization formulation to solve it. The algorithm is presented in Section IV and demonstrated in simulation (Section V) and experiments with a swarm of quadrotors (Section VI).
The goal is to generate collision-free trajectories that drive agents from initial to final locations within a given amount of time, subject to state and actuation constraints. We aim to generate such trajectories offline and execute them with our experimental platform, the Crazyflie 2.0 quadrotor.
The agents are modeled as unit masses in , with double integrator dynamics. This is a simplified model of a quadrotor equipped with an underlying position controller. We use , , to represent the discretized position, velocity and accelerations of agent at time step , where accelerations are the inputs. With a discretization step , the dynamic equations are given by
(1) | ||||
(2) |
We constrain the motion of the agents to match the physical world. First, the agents have limited actuation, which bounds its maximum acceleration,
(3) |
Secondly, the agents must remain within a volume (e.g., an indoor flying arena). We impose:
(4) |
The collision avoidance constraint is designed such that the agents safely traverse the environment. In the case of quadrotors, aerodynamic effects from neighbouring agents may lead to crashes. Thus, we model the collision boundary for each agent as an ellipsoid elongated along the vertical axis, to capture the downwash effect of the agents’ propellers, similar to [9]. The collision constraint between agents and is defined using a scaling matrix ,
(5) |
where is the degree of the ellipsoid ( is a usual choice) and is the minimum distance between agents in the XY plane. The scaling matrix is defined as
(6) |
We choose and . Thus, the required minimum distance in the vertical axis is .
The problem formulated in Sec. II can be naturally posed as an optimization problem. In single-agent standard model predictive control (MPC), an optimization problem is solved at each time step, which finds an optimal input sequence over a given prediction horizon based on a model that describes the agent’s dynamics. The first input of that sequence is applied to the real system and the resulting state is measured, which is the starting point for the next optimization problem. In an offline planning scenario such as ours, we do not measure the agent’s state after applying an input (since there is no physical agent yet), instead we apply the input directly to the model to compute the next step of the generated trajectory. The same procedure is repeated until the whole trajectory is generated. This methodology can be applied in a distributed fashion, where each agent executes the iterative optimization to generate trajectories, but with the possibility of sharing information with neighbouring agents.
Our approach is based on synchronous DMPC, where the agents share their previously predicted state sequence with their neighbours before simultaneously solving the next optimization problems. At every discrete-time index , each agent simultaneously computes a new input sequence over the horizon following these steps:
Check for future collisions using the latest predicted states of the neighbours, computed at time step .
Build the optimization problem, including state and actuation constraints, and collision constraints if we detect collisions.
After obtaining the next optimal sequence, the first element is applied to the model and the agents move one step ahead. Future states are predicted over the horizon and shared with other agents.
The process is repeated until all agents reach their desired goals. Below we formalize the mathematical setup of the optimization problem.
Using the dynamics in (1) and (2), we can develop a linear model to express the agents’ states over a horizon of fixed length . First we introduce the notation , which represents the predicted value of with the information available at . In what follows, is the discrete-time index of the prediction horizon. The dynamic model of agent is given by
(7) |
with being a identity matrix and a matrix of zeros. We select the acceleration as the model’s input (and variable to optimize). A compact representation is
(8) |
where , , and (model input). Define the initial state at instant , . Then we can write the position sequence as an affine function of the input sequence ,
(9) |
where is defined as
(10) |
with matrix selecting the first three rows of the matrix products (those corresponding to the position states). Lastly, reflects the propagation of the initial state,
(11) |
The objective function that is minimized to compute the optimal input sequence has three main components: trajectory error, control effort and input variation. A similar formulation can be found in [19].
This term drives the agents to their goals. We aim to minimize the sum of errors between the positions at the last time steps of the horizon and the desired final position . The error term is defined as
(12) |
This term can be turned into a quadratic cost function in terms of the input sequence using (9),
(13) |
where is a positive semidefinite and block-diagonal matrix that weights the error at each time step. A value of leads to with matrix chosen as a diagonal positive semidefinite matrix. Higher values of lead to more aggressive agent behaviour with agents moving faster towards their goals, but may also lead to overshooting at the target location.
We also aim to minimize the control effort using the quadratic cost function
(14) |
Similarly, is positive semidefinite and block-diagonal, , where weights the penalty on the control effort.
This term is used to minimize variations of the acceleration, leading to smooth input trajectories. We define the quadratic cost
(15) |
To transform (15) into a quadratic form, first we define a matrix ,
(16) |
and introduce the vector
to include the term (previously applied input),(17) |
Finally, we write (15) in quadratic form as
(18) |
where is positive semidefinite and block-diagonal, defined as , where weights the penalty on control variation. The total cost function is obtained by adding together (13), (14) and (18),
(19) | ||||
When computing the input sequence over the horizon, the agents must satisfy constraints (3) and (4). Define to be
(20) | |||
The physical limits are formulated as
(21) | ||||
Lastly, the inequality constraints of the problem are stacked and brought in the form .
If agent does not detect any future collisions, then it updates its input sequence by solving the following optimization problem: —l— U_iJ_i(U_i) A_inU_i≤b_in.
The formulation in (III-E) results in a quadratic programming problem with decision variables and inequality constraints, which means it scales independently of the number of agents.
The previous formulation is useful for simple planning scenarios where the agents can follow straight lines to their goals without colliding. In a more general setting, agents must avoid each other constantly to reach their goals. In order to avoid collisions, we leverage the predictive nature of DMPC to detect colliding trajectories using an adapted version of (5). Agent detects a collision violation at time step of the previously computed horizon whenever the inequality
(22) |
does not hold with any neighbour . Note that at solving time , the agents only have information of the other agents computed at , meaning that the collision is predicted to happen at time . In what follows, represents the first time step of the horizon where agent predicts a collision with any neighbour. We include collision constraints with the subset of agents defined as
where models the radius around the agent, which defines the neighbours to be considered as obstacles when solving the problem. For example, we may include all agents within a radius 3 times bigger than the collision boundary, then . The main idea is that by limiting the size of the subset , the amount of inequality constraints for agent can be reduced while still including all the critical information to compute its next action.
If the agent detects collisions, it must include collision constraints to compute the new input sequence. To treat infeasibility issues while solving the optimization problem, we formulate the following relaxed collision constraint:
(23) |
where is a new decision variable that relaxes the constraint. Note that at , we aim to optimize the value of to satisfy (23). The constraint is linearized using a Taylor series expansion about the previous predicted position of agent at time , namely ,
(24) |
with and . On the left-hand side of (24), we note that the constraint is imposed on the position at time (), which is one time step after the predicted collision. This choice was made based on an empirical assessment of the algorithm’s performance on a wide range of transition scenarios. It was found that by imposing the constraint one time step after the predicted collision, the agents exhibited more preemptive collision avoidance capabilities and were able to complete the transitions faster on average. Furthermore, one can argue that the separating plane constraint in (24), although applied only at time , implicitly constraints the positions in the previous time steps of the horizon as well.
To turn the collision constraint into an affine function of the decision variables, first we augment the previous formulation to include the relaxation variables. Consider , with , defined as the stack vector of all . We now introduce the augmented decision vector , obtained by concatenating vectors and . The matrices derived above can be easily augmented to account for the augmented decision vector, by completing them with zeros where multiplied with the vector . We cast (24) into an affine function of the decision variables,
(25) |
where is defined as
(26) |
By stacking the inequalities in (25) for the colliding neighbours, we obtain the complete collision constraint,
(27) |
Additionally, we impose in order to bound the amount of relaxation allowed. We also consider the following linear and quadratic cost terms to penalize the relaxation on the collision constraint:
, where , are scalar tuning parameters, measuring how much the relaxation is penalized. The augmented cost function in the collision avoidance case is defined as
(28) |
Finally, the convex optimization problem with collision avoidance for agent is formulated as —l— U_iJ_aug,i(U_i) A_in,augU_i≤b_in,aug.
The subscript ‘aug’ indicates the use of augmented state matrices, as outlined before. The inequality tuple is obtained by stacking the physical limits, collision constraint and relaxation variable bounds. The augmented problem has decision variables and inequality constraints.
The proposed DMPC algorithm for point-to-point transitions is outlined in Algorithm 1. It requires as input the initial and desired final locations for agents (), and outputs the trajectory to complete the transition. Variables and a are defined as the concatenation of the transition trajectories for every agent, while is the concatenation of the latest predicted positions for all agents.
In line 1, every is initialized as a line pointing from initial to final location with a constant velocity profile. Each agent’s states are initialized to be at the corresponding initial position with zero velocity. The main loop (lines 3-11) repeatedly solves optimization problems for the agents, building the transition trajectory until they arrive at their goals or a maximum number of time steps is exceeded. Note that for , we consider . The inner loop (lines 4-9) can be solved either sequentially or in parallel, since there is no data dependency between the individual problems of each agent.
To build and solve the corresponding QP (line 5), first we check for predicted collisions over the horizon, as described in Sec. III-F. If no collisions are detected, we solve the reduced problem in (III-E), otherwise we solve the collision avoidance problem in (III-F). If the optimizer finds a solution to the QP, then we can propagate the states using (8) and obtain the predicted position and velocity over the horizon (lines 6-9). Lastly, if a solution for the transition was found, we interpolate the solution with time step to obtain a higher resolution. An optional step is to scale the solution, as suggested in [7], to push the acceleration limits to the maximum allowed. Finally, we verify that no collisions were violated after interpolation (note that the scaling step only compresses the position profile of each agent, which has no effect on the pairwise distances). If the solution passes all the sanity checks, then the algorithm is deemed successful, otherwise an empty solution is returned.
To illustrate how DMPC is able to manage colliding trajectories, Fig. 6 shows a transition problem for four agents in the plane. Initially, as shown in Fig. (a)a, the agents follow a direct path towards their desired final locations. In Fig. (b)b, collisions are detected and the constraints start acting within the optimization problem. After a few time steps, the agents compute the non-colliding plan seen in Fig. (c)c. The trajectories generated with a centralized approach are quite different than the DMPC trajectories, as shown in Fig. (d)d. However, the sum of travelled distance of all agents is fairly similar in both cases, with only a 1.7% increase in the distributed approach. A more thorough comparison is provided in Sec. V.
Since the algorithm acts as an offline trajectory planner, whenever it fails to find a solution we can apply repair strategies and retry solving the problem. We identify possible reasons for algorithm failure together with a repair strategy.
QP infeasible: An agent is unable to avoid collisions given the acceleration and relaxation limits. A proposed repair strategy is to locally retry solving the QP with an increased value for .
Unavoided collision: This can happen when solving the problem (collision predicted at means that it is unavoidable), or it could be detected after the interpolation step. Solving with different values for and/or , or a tighter lower bound , may lead to a solution.
Not reaching goal within time steps: The lack of coordination between the agents may lead to oscillations or deadlocks, in which case the agents may not reach their goal location within the maximum allowed time. Higher values of may help in avoiding such situations.
Strategy (1) is actively used inside the QP function of Algorithm 1. The other two strategies are not part of the proposed method, but one could add an additional outer loop that checks for the failure of the algorithm and applies a repair strategy if required.
This section provides a simulation analysis of the DMPC algorithm. Implementation was done in MATLAB 2017a (using a sequential implementation of Algorithm 1) and executed on a PC with an Intel Xeon CPU with 8 cores and 16GB of RAM, running at 3GHz. The agents were modelled based on the Crazyflie 2.0 platform, using , , and (to avoid downwash effects).
To test the capabilities of DMPC to compute transition trajectories, we compared its performance with two other state-of-the-art algorithms: centralized [4] and decoupled [7] SCP. Scenarios with random sets of initial and final positions were generated and solved using all three algorithms. The volume of the workspace was kept fixed at , and the number of agents ranged from 4 to 20. Since the centralized and decoupled approaches require a fixed arrival time, we first solved each test using DMPC and determined the required time to complete the transition, and then set that as the arrival time of the SCP methods. If DMPC failed to solve, the arrival time was set to as the maximum allowed time to complete the transition, .
All three approaches shared the time step parameters and , as well as the agent specifications. For DMPC we used a horizon length , parameter and a maximum relaxation of . After solving the problem with each method, the solution was interpolated and analysed for possible collisions. For all the methods we allowed a collision tolerance of 5cm after interpolation, meaning that for success we required at least a distance greater than .
Fig. (a)a shows the probability of success as the density of agents increases. The proposed DMPC algorithm was able to find a solution in more than 95% of the trials, for every density scenario considered. The centralized approach was able to find a solution in every case, while the decoupled approach failed increasingly with increasing density.
As for the computation time, Fig. (b)b
reveals the biggest advantage of our approach: a reduction of up to 97% in computation time with respect to centralized SCP and of 85% with decoupled SCP. The various design decisions to speed up the algorithm, such as including collision constraints only when needed, and only on one time step, have a noticeable impact and are the main reason of the scalability of the method. The significant runtime variance observed in the other two approaches is due to the test-by-test variance in arrival time, as seen in Fig.
(d)d. Note that even with the highly reduced runtimes, the tested implementation of DMPC was not taking advantage of its parallelizable nature.To measure the optimality of the generated trajectories we analysed the sum of travelled distances by the agents, as highlighted in Fig. (c)c. Our distributed approach produces longer paths on average, with respect to both the centralized and decoupled SCP. The suboptimality increases with the density of the environment, since the agents actively adjust their trajectories to avoid collisions, and oftentimes those adjustments lead to non-optimal paths towards their goals.
We previously analysed the performance of DMPC over increasingly dense environments, but with a relatively low number of agents. To evaluate the algorithm’s performance with a higher number of agents, we considered scenarios with a fixed workspace density of and a wide range of number of agents. The results are summarized in Fig. 12. From the failure sources listed in Sec. IV-B, only collision failures were identified for this test case. Even though the density of agents was constant, the proposed DMPC algorithm had a higher failure rate as the number of agents increased. The inclusion of more decision-making agents exposes a limitation of our method, mainly due to the lack of explicit coordination between the agents. In particular, our collision avoidance strategy assumes that the neighbouring agents are ‘passive’ and follow their predicted path, which is not true since they are also changing plans to avoid collisions. Despite this limitation, our algorithm remains effective for moderately large swarms with a 90% success rate with up to 100 agents.
In this section we present experimental results using Algorithm 1 as an offline trajectory planner for a swarm of Crazyflies 2.0. The algorithm was implemented in C++ using OOQP as the solver. A video of the performance is fount at http://tiny.cc/dmpc-swarm.
Leveraging the parallel nature of the inner loop of Algorithm 1, we can design a strategy to parallelize the computation. The idea is to equally split the agents into smaller clusters to be solved in parallel using a multicore processor. The optimization problems of the agents inside a cluster are solved sequentially, but with the advantage of iterating through fewer agents. After all the clusters finish solving their QPs, they synchronize the updated predictions and repeat the process.
The number of cores of a CPU constrain the maximum number of clusters we can deploy. In Fig. 13 we tested different numbers of clusters to solve a wide variety of transition scenarios. It was found that 8 clusters led to the best result for our computing hardware. In this case, each cluster ran separately on one of the eight cores of our CPU. This parallel strategy (8 clusters) reduced the computation time by more than 60% compared to using purely sequential execution (1 cluster).
To perform the pre-computed transition motion on the quadrotors, we communicated via radio link with each drone and sent the following information at 100 Hz: 1) position setpoints and 2) position and velocity estimates from an overhead motion capture system. The setpoints were tracked using an on-board position controller based on
[20]. One transition scenario is depicted in Fig. 16, in which the swarm was to transition from a grid to a ‘DSL’ configuration. The difficulty of this particular scenario was increased by the central agent acting as a static obstacle (i.e., obstacle with constant position over the horizon).We required with . The DMPC algorithm was able to find a solution for this case in 1.8 seconds. In Fig. (a)a, the curves delimiting the gray area correspond to the minimum and maximum inter-agent distance in each step for six independent executions of the transition. Although trajectories are planned such that any inter-agent distance must remain above the warning zone (yellow band), the experimental curve goes slightly below that value. The warning zone is, in practice, a safety margin to compensate for unmodelled phenomena in our planning algorithm, such as imperfect trajectory tracking, time delays and aerodynamics. Also, since we were using an overhead motion capture system, the reflective markers sitting on top of the drones may be occluded while performing convoluted transitions. During these occlusion periods, the state estimation of our system relied on a constant acceleration profile to predict the current position, introducing biases in the position estimation. Taking all these phenomena into account, it is natural for the minimum distance curve to go farther below than planned; however, it still remains above the collision zone. It is critical for the warning zone to be large enough, as to absorb any mismatch between the idealized planning and the real world. Its size is directly controlled by , which needs to be carefully chosen for robust trajectory executions.
Finally, Fig. (b)b shows that the agents progress towards their goal and are able to complete the transition up to some small tolerance. Once the agents enter the tolerance region below the dashed red line, they were automatically commanded to hover in place. The on-board position controller reported a maximum error close to 3 cm during hover, which explains why the maximum distance curve goes slightly above the tolerance region in the steady-state regime.
Aside from the showcased scenario, the system has been tested on many randomly generated transitions, as seen in the video that accompanies this paper.
The DMPC algorithm developed in this paper enables fast multiagent point-to-point trajectory generation. Using model-based predictions, the agents detect and avoid future collisions while moving to their waypoints. We incorporate collision constraints only when required and with a limited subset of neighbours, which allows our method to drastically outperform previous sequential convex programming (SCP) methods in terms of runtime, with a small impact on the optimality of the plans. The formulation allows for parallel computing, which further reduces the runtime.
Our constraint relaxation paradigm showed high success rate in completing transitions for moderately large swarms, even in high-density scenarios. Our approach circumvents the conservative nature of previous DMPC approaches, which limit the motion of the agents between prediction updates. The empirical results of this work show that collision avoidance can be achieved without the previously used compatibility constraints. Experimental tests with quadrotors demonstrated the versatility of the method to quickly generate intricate transition trajectories mid-flight.
As future work, we plan to investigate coordination paradigms that enable our algorithm to find a solution for even larger swarms, without compromising the scalability demonstrated in this work.
Comments
There are no comments yet.