## 1 Introduction

Due to their practical importance, multi-agent collision avoidance and control have been extensively studied across different communities including AI, robotics and control. Considering continuous stochastic trajectories, reflecting each agent’s uncertainty about its neighbours’ time-indexed locations in an environment space, we exploit a distribution-independent bound on collision probabilities to develop a conservative collision-prediction module. It avoids temporal discretisation by stating collision-prediction as a one-dimensional optimization problem. If mean and standard deviation are computable Lipschitz functions of time, one can derive Lipschitz constants that allow us to guarantee collision prediction success with low computational effort. This is often the case, for instance, when dynamic knowledge of the involved trajectories is available (e.g. maximum velocities or even the stochastic differential equations).

To avoid collisions detected by the prediction module, we let an agent re-plan repeatedly until no more collisions occur with a definable probability. Here, re-planning refers to modifying a control signal (influencing the basin of attraction and equilibrium point of the agent’s stochastic dynamics) so as to bound the collision probability while seeking low plan execution cost in expectation. To keep the exposition concrete, we focus our descriptions on an example scenario where the plans correspond to sequences of setpoints of a feedback controller regulating an agent’s noisy state trajectory. However, one can apply our method in the context of more general policy search problems.

In order to foster low social cost across the entire agent collective, we compare two different coordination mechanisms. Firstly, we consider a simple fixed-priority scheme [11], and secondly, we modify an auction-based coordination protocol [7] to work in our continuous setting. In contrast to pre-existing work in auction-style multi-agent planning (e.g. [7, 16]) and multi-agent collision avoidance (e.g. [15, 1, 2]), we avoid a priori discretizations of space and time. Instead, we recast the coordination problem as one of incremental open-loop policy search. That is, as a succession of continuous optimisation or root-finding problems that can be efficiently and reliably solved by modern optimisation and root-finding techniques (e.g. [23, 13]).

While our current experiments were conducted with linear stochastic differential equation (SDE) models with state-independent noise (yielding Gaussian processes), our method is also applicable to any situation where mean and covariances can be evaluated. This encompasses non-linear, non-Gaussian cases that may have state-dependent uncertainties (cf. [12]).

This preprint is an extended and improved version of a conference paper that appeared in Proc. of the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014) [6].

### 1.1 Related Work

Multi-agent trajectory planning and task allocation methods have been related to auction mechanisms by identifying locations in state space with atomic goods to be auctioned in a sequence of repeated coordination rounds (e.g. [7, 16, 26]). Unfortunately, even in finite domains the coordination is known to be intractable – for instance the sequential allocation problem is known to be NP-hard in the number of goods and agents [22, 14]. Furthermore, collision avoidance corresponds to non-convex interactions.

This renders the coordination problem inapplicable to standard optimization techniques that rely on convexity of the joint state space. In recent years, several works have investigated the use of mixed-integer programming techniques for single- and multi-agent model-predictive control with collision avoidance both in deterministic and stochastic settings [7, 19]. To connect the problem to pre-existing mixed-integer optimization tools these works had to limit the models to dynamics governed by linear, time-discrete difference equations with state-independent state noise. The resulting plans were finite sequences of control inputs that could be chosen freely from a convex set. The controls gained from optimization are open-loop – to obtain closed-loop policies the optimization problems have to be successively re-solved on-line in a receding horizon fashion. However, computational effort may prohibit such an approach in multi-agent systems with rapidly evolving states.

Furthermore, prior time-discretisation comes with a natural trade-off. On the one hand, one would desire a high temporal resolution in order to limit the chance of missing a collision predictably occurring between consecutive time steps. On the other hand, communication restrictions, as well as poor scalability of mixed-integer programming techniques in the dimensionality of the input vectors, impose severe restrictions on this resolution. To address this trade-off,

[10]proposed to interpolate between the optimized time steps in order to detect collisions occurring between the discrete time-steps. Whenever a collision was detected they proposed to augment the temporal resolution by the time-step of the detected collision thereby growing the state-vectors incrementally as needed. A detected conflict, at time

, is then resolved by solving a new mixed-integer linear programme over an augmented state space, now including the state at . This approach can result in a succession of solution attempts of optimization problems of increasing complexity, but can nonetheless prove relatively computationally efficient. Unfortunately, their method is limited to linear, deterministic state-dynamics.Another thread of works relies on dividing space into polytopes [17, 1], while still others [8, 9, 21, 15] adopt a potential field. In not accommodating uncertainty and stochasticity, these approaches are forced to be overly conservative in order to prevent collisions in real systems.

In contrast to all these works, we will consider a different scenario. Our exposition focuses on the assumption that each agent is regulated by influencing its continuous stochastic dynamics. For instance, we might have a given feedback controller with which one can interact by providing a sequence of setpoints constituting the agent’s plan. While this restricts the choice of control action, it also simplifies computation as the feedback law is fixed. The controller can generate a continuous, state-dependent control signal based on a discrete number of control decisions, embodied by the setpoints. Moreover, it renders our method applicable in settings where the agents’ plants are controlled by standard off-the-shelf controllers (such as the omnipresent PID-controllers) rather than by more sophisticated customized ones. Instead of imposing discreteness, we make the often more realistic assumption that agents follow continuous time-state trajectories within a given continuous time interval. Unlike most work [25, 27, 21, 1]

in this field, we allow for stochastic dynamics, where each agent cannot be certain about the location of its team-members. This is crucial for many real-world multi-agent systems. The uncertainties are modelled as state-noise which can reflect physical disturbances or merely model inaccuracies. While our exposition’s focus is on stochastic differential equations, our approach is generally applicable in all contexts where the first two moments of the predicted trajectories can be evaluated for all time-steps. As noted above, this paper is an extended version of work that has been published in the proceedings of AAMAS’14

[6] and an earlier stage of this work was presented at an ICML [5] workshop.## 2 Predictive Probabilistic Collision Detection with Criterion Functions

Task. Our aim is to design a collision-detection module that can decide whether a set of (predictive) stochastic trajectories is collision-free (in the sense defined below). The module we will derive is guaranteed to make this decision correctly, based on knowledge of the first and second order moments of the trajectories alone. In particular, no assumptions are made about the family of stochastic processes the trajectories belong to. As the required collision probabilities will generally have to be expressed as non-analytic integrals, we will content ourselves with a fast, conservative approach. That is, we are willing to tolerate a non-zero false-alarm-rate as long as decisions can be made rapidly and with zero false-negative rate. Of course, for certain distributions and plant shapes, one may derive closed-form solutions for the collision probability that may be less conservative and hence, lead to faster termination and shorter paths. In such cases, our derivations can serve as a template for the construction of criterion functions on the basis of the tighter probabilistic bounds.

Problem Formalization. Formally, a collision between two objects (or agents) at time can be described by the event

. Here, denote the objects’ diameters, and are two (possibly uncertain) trajectories in a common, -dimensional interaction space.

In a stochastic setting, we desire to bound the collision probability below a threshold at any given time in I. We loosely say that the trajectories are collision-free if .

Approach. For conservative collision detection between two agents’ stochastic trajectories , we construct a criterion function (eq. as per Eq. 2 below). A conservative criterion function has the property . That is, a collision between the trajectories with probability above can be ruled-out if attains only positive values. If one could evaluate the function , an ideal criterion function would be

(1) |

It is ideal in the sense that . However, in most cases, evaluating the criterion function in closed form will not be feasible. Therefore, we adopt a conservative approach: That is, we determine a criterion function such that provably, we have , including the possibility of false-alarms. That is, it is possible that for some times , , in spite of .

Utilising the conservative criterion functions for collision-prediction, we assume a collision occurs unless . If the trajectories’ means and standard deviations are Lipschitz functions of time then one can often show that is Lipschitz as well. In such cases negative values of can be found or ruled out rapidly, as will be discussed in Sec. 2.1. In situations where a Lipschitz constant is unavailable or hard to determine, we can base our detection on the output of a global minimization method such as DIRECT [13].

### 2.1 Finding negative function values of Lipschitz functions

Let . Assume we are given a
Lipschitz continuous *target function* with Lipschitz constant
. That is, . Let
and define to be the *sample grid* of
size consisting of the inputs at which we choose to evaluate the
target .

*Our goal is to prove or disprove the existence of a negative function value of target *.

#### 2.1.1 A naive algorithm

As a first, naive method, Alg. 1 leverages Lipschitz continuity to answer the question of positivity correctly after a finite number of function evaluations.

The algorithm evaluates the function values on a finite grid assuming a uniform constant Lipschitz number . The grid is iteratively refined until either a negative function value is found or, the Lipschitz continuity of function allows us to infer that no negative function values can exist. The latter is the case whenever where is the grid of function input (time) samples, and a Lipschitz number of the function which is to be evaluated.

The claim is established by the following Lemma:

###### Lemma 2.1.

Let be a Lipschitz function with Lipschitz number . Furthermore, let be an equidistant grid with .

We have, if .

###### Proof.

Since is a Lipschitz constant of we have . Now, let and such that . Consistent with the premise of the implication we aim to show, we assume and, without loss of generality, we assume . Let . Since we have . Finally, implies . ∎

Appart from a termination criterion, the lemma establishes that larger Lipschitz numbers will generally cause longer run-times of the algorithm as finer resolutions will be required to ensure non-negativity of the function under investigation.

#### 2.1.2 An improved adaptive algorithm

Next, we will present an improved version of the algorithm provided above.
We can define two functions, *ceiling* and *floor* , such that (i) they bound the target , and (ii) the bounds get tighter for denser grids. In particular, one can show that uniformly if converges to a dense subset of .
Define .
It has been shown that and
(see [23, 13]).
It is trivial to refine this to take localised Lipschitz constants into account:
where is a Lipschitz number valid on interval .

This suggests the following algorithm: We refine the grid to grid , by including as a new sample. This process is repeated until either of the following stopping conditions are met: (i) a negative function value of is discovered (), or (ii) (in which case we are guaranteed that no negative function values can exist).

For pseudo-code refer to Alg. 2.

### 2.2 Deriving collision criterion functions

This subsection is dedicated to the derivation of a (Lipschitz) criterion function. In lieu to the approach of [7, 20], the idea is to define hyper-cuboids sufficently large to contain a large enough proportion of each agent’s probability mass to ensure that no collision occurs (with sufficient confidence) as long as the cuboids do not overlap. We then define the criterion function so as to negative values whenever the hyper-cuboids do overlap.

For ease of notation, we omit the time index . For instance, in this subsection,

now denotes random variable

rather than the stochastic trajectory.The next thing we will do is to derive sufficient conditions for absence of collisions, i.e. for .

To this end, we make an intermediate step: For each agent we define an open hyper-cuboid centred around mean . As a -dimensional hyper-cuboid, is completely determined by its centre point and its edge lengths . Let denote the event that and . We derive a simple disjunctive constraint on the component distances of the means under which we can guarantee that the collision probability is not greater than the probability of at least one object being outside its hyper-cuboid. This is the case if the hypercuboids do not overlap. That is, their max-norm distance is at least .

Before engaging in a formal discussion we need to establish a preparatory fact:

###### Lemma 2.2.

Let denote the th component of object ’s mean and . Furthermore, let be the event that no collision occurs and the event that and . Assume the component-wise distance between the hyper-cuboids is at least , which is expressed by the following disjunctive constraint:

Then, we have :

###### Proof.

Since we have

. It remains to be shown that : Let . Thus, . For contradiction, assume . Then, for all .

Hence, which contradicts our disjunctive constraint in the premise of the lemma. q.e.d.

∎

###### Theorem 2.3.

Let denote the th component of object ’s mean and . Assume, are random variables with means , respectively. The max-norm distance between hypercuboids is at least (i.e. the hypercuboids do not overlap), which is expressed by the following disjunctive constraint:

Then, we have :

where .

One way to define a criterion function is as follows:

(2) |

where is the parameter vector of radii. (For notational convenience, we will often omit explicit mention of parameter in the function argument.)

For more than two agents, agent overall criterion function is

Thm. 2.3 tells us that the collision probability is bounded from above by the desired threshold if , provided we chose the radii () such that .

Let

. Probability theory provides several distribution-independent bounds relating the radii of a (possibly partly unbounded) hypercuboid to the probability of not falling into it. That is, these are bounds of the form

where is a continuous function that decreases monotonically with increasing radii and represents additional information. In the case of Chebyshev-type bounds information about the first two moments are folded in, i.e. where is the variance (-covariance) matrix. We then solve for radii that fulfil the inequality while simultaneously ensuring collision avoidance with the desired probability.

Inspecting Eq. 2, it becomes clear that, in order to maximally diminish conservatism of the criterion function, it would be ideal to choose the radii in such that subject to the constraints . Solving this constrained optimisation problem can often be done in closed form.

In the context where is derived from a Chebyshev-type bound, we propose to set as many radii as large as possible (in order to decrease ( to satisfy the constraints) while setting the radii as small as possible without violating the constraint (where is some dimension). That is, we define the radii as follows: Set . The remaining unknown variable, , then is defined as the solution to the equation . The resulting criterion function, denoted by , we obtain with this procedure of course depends on the arbitrary choice of dimension . Therefore, we obtain a less conservative criterion function by repeating this process for each dimension and then constructing a new criterion function as the point-wise maximum: .

A concrete example of this procedure is provided below.

#### 2.2.1 Example constructions of distribution-independent criterion functions

We can use the above derivation as a template for generating criterion functions.

Consider the following concrete example. Combining union bound and the standard (one-dim.) Chebyshev bound yields . Setting every radius, except , to infinitely large values and equal to yields , i.e. . (Note, this a correction of the radius provided in the conference version of this paper.) Finally, inserting these radii ( for ) into Eq. 2 yields our first collision criterion function:

Of course, this argument can be made for any choice of dimension . Hence, a less conservative, yet valid, choice is

(3) |

Notice, this function has the desirable property of being Lipschitz continuous, provided the mean and standard deviation functions are. In particular, it is easy to show where, as before, denotes a Lipschitz constant of function .

For the special case of two dimensions, we can derive a less conservative alternative criterion function based on a tighter two-dimensional Chebyshev-type bound [28]:

###### Theorem 2.4 (Alternative collision criterion function).

Let spatial dimensionality be . Choosing

() in Eq. 2 yields a valid distribution-independend criterion function. That is, .

A proof sketch and a Lipschitz constant (for non-zero uncertainty) are provided in the appendix. Note, the Lipschitz constant we have derived therein becomes infinite in the limit of vanishing variance. In that case, the presence of negative criterion values can be tested based on the sign of the minimum of the criterion function. This can be found employing a global optimiser. Future work will investigate, in how far Hoelder continuity instead of Lipschitz continuity can be leveraged to yield a similar algorithm as the one provided in Sec. 2.1.2.

#### 2.2.2 Multi-agent case.

Let , such that a subset of agents. We define the event that collides with at least one of the agents in ’ at time as . By union bound, .

###### Theorem 2.5 (Multi-Agent Criterion).

Let be valid criterion functions defined w.r.t. collision bound .
We define *multi-agent collision criterion function* . If then the collision probability with ’ is bounded below . That is,

###### Proof.

Let , such that a subset of agents. We define the event that collides with at least one of the agents in ’ at time as .

We have established that if then . Now, let . Hence,. Thus, Therefore, . By union bound, . Consequently, we have . q.e.d.

∎

Moreover, is Lipschitz if the constituent functions are (see Appendix B).

Our distribution-independent collision criterion functions have the virtue that they work for all distributions – not only the omnipresent Gaussian. Unfortunately, distribution-independence is gained at the price of conservativeness ( ref. to Fig. 2). In our experiments in Sec. 4, the collision criterion function as per Thm. B.3 is utilized as an integral component of our collision avoidance mechanisms. The results suggest that the conservativeness of our detection module does not entail prohibitively high-false-alarm rates for the distribution-independent approach to be considered impractical. That said, whenever distributional knowledge can be converted into a criterion function. One could then use our derivations as a template to generate refined criterion functions using Eq. 2 with adjusted radii ,, reflecting the distribution at hand.

## 3 Collision Avoidance

In this section we outline the core ideas of our proposed approach to multi-agent collision avoidance. After specifying the agent’s dynamics and formalizing the notion of a single-agent plan, we define the multi-agent planning task. Then we describe how conflicts, picked-up by our collision prediction method, can be resolved. In Sec. 3.1 we describe the two coordination approaches we consider utilizing to generate conflict-free plans.

I) Model (example). We assume the system contains a set of agents indexed by . Each agent ’s associated plant has a probabilistic state trajectory following stochastic controlled -dimensional state dynamics (we consider the case ) in the continuous interval of (future) time . We desire to ask agents to adjust their policies to avoid collisions. Each policy gives rise to a stochastic belief over the trajectory resulting from executing the policy. For our method to work, all we require is that the trajectory’s mean function and covariance matrix function are evaluable for all times .

A prominent class for which closed-form moments can be easily derived are linear stochastic differential equations (SDEs). For instance, we consider the SDE

(4) |

where are matrices is the state trajectory and is a vector-valued Wiener process. Here, could be interpreted as the control policy of a linear feedback-controller parametrised by . It regulates the state to track a desired trajectory where denotes the indicator function of the half-open interval and each
is a setpoint. If is positive definite the agent’s state trajectory is determined by setpoint sequence (aside from the random disturbances) which we will refer to as the agent’s *plan*.
For example, plan could be used to regulate agent ’s start state to a given *goal state* between times and . For simplicity, we assume the agents are always initialized with plans of this form before coordination commences.

One may interpret a setpoint as some way to alter the stochastic trajectory. Below, we will determine setpoints that modify a stochastic trajectory to reduce collision probability while maintaining low expected cost. From the vantage point of policy search, is agent ’s policy parameter that has to be adjusted to avoid collisions.

II) Task. Each agent desires to find a sequence of setpoints such that (i) it moves from its start state to its goal state along a low-cost trajectory and (ii) such that along the trajectory its plant (with diameter ) does not collide with any other agents’ plant in state space with at least a given probability .

III) Collision resolution. An agent seeks to avoid collisions by adding new setpoints to its plan until the collision probability of the resulting state trajectory drops below threshold . For choosing these new setpoints we consider two methods WAIT and FREE. In the first method the agents insert a time-setpoint pair into the previous plan . Since this aims to cause the agent to wait at its start location we will call the method WAIT. It is possible that multiple such insertions are necessary until collisions are avoided. Of course, if a higher-priority agent decides to traverse through , this method is too rigid to resolve a conflict. In the second method the agent optimizes for the time and location of the new setpoint. Let be the plan updated by insertion of time-setpoint pair . We propose to choose the candidate setpoint that minimizes a function being a weighted sum of the expected cost entailed by executing updated plan and a hinge-loss collision penalty . Here, is computed based on the assumption we were to execute and determines the extent to which collisions are penalized. Since the new setpoint can be chosen freely in time and state-space we refer to the method as FREE.

### 3.1 Coordination

We will now consider how to integrate our collision detection and avoidance methods into a coordination framework that determines who needs to avoid whom and at what stage of the coordination process. Such decisions are known to significantly impact the social cost (i.e. the sum of all agents’ individual costs) of the agent collective.

Fixed-priorities (FP). As a baseline method for coordination we consider a basic fixed-priority method (e.g. [11, 3]). Here, each agent has a unique ranking (or priority) according to its index (i.e. agent 1 has highest priority, agent lowest). When all higher-ranking agents are done planning, agent is informed of their planned trajectories which it has to avoid with a probability greater than . This can be done by repeatedly invoking for collision detection and resolution methods described above until no further collision with higher-ranking agents are found.

Lazy Auction Protocol (AUC). While the FP method is simple and fast the rigidity of the fixed ranking can lead to sub-optimal social cost and coordination success. Furthermore, its sequential nature does not take advantage of possible parallelization a distributed method could. To alleviate this we propose to revert the ranking flexibly on a case-by-case basis. In particular, the agents are allowed to compete for the right to gain passage (e.g. across a region where a collision was detected) by submitting bids in the course of an auction. The structure of the approach is outlined in Alg. 3.

Assume an agent detects a collision at a particular time step and invites the set of agents to join an auction to decide who needs to avoid whom. In particular, the auction determines a winner who is not required to alter his plan. The losing agents need to insert a new setpoint into their respective plans designed to avoid all other agents in while keeping the plan cost function low.

The idea is to design the auction rules as a heuristic method to minimize the social cost of the ensuing solution. To this end, we define the bids such that their magnitude is proportional to a heuristic magnitude of the expected regret for losing and not gaining passage. That is agent

submits a bid . Magnitude is defined as ’s anticipated cost for the event that the agent will not secure “the right of passage” and has to create a new setpoint (according to (III)) tailored to avoid all other agents engaged in the current auction. On the other hand, is the cost of the unchanged plan . If there is a tie among multiple agents the agent with the lowest index among the highest bidders wins.Acknowledging that

is an estimated social cost (based on current beliefs of trajectories) after the auction, we see that the winner determination rule greedily attempts to minimize social cost:

.Experiment | Experiment | |||||

NONE | AUC-WAIT | FP-WAIT | NONE | AUC-FREE | FP-FREE | |

A | 78 | 0 | 0 | 51 | 0 | 0 |

B | 13.15 | 13.57 | 12.57 | 14.94 | 16.22 | 18.13 |

C | 0.05 | 0.04 | 25.8 | 0.05 | 0.05 | 0.05 |

D | 0 | 6 | 3 | 0 | 4 | 4 |

*sqr. distance to goal*measure. Note the discrepancies in avg. path length are relatively low due to convexity effects and the contribution of state noise to the path lengths.

## 4 Simulations

As a first test, we simulated three simple multi-agent scenarios, EXP1, EXP2 and EXP3. Each agent’s dynamics were an instantiation of an SDE of the form of Eq. 4. We set to achieve collision avoidance with certainty greater than . Collision prediction was based on the improved criterion function as per Thm. B.3. During collision resolution with the FREE method each agent assessed a candidate plan according to cost function . Here is a heuristic to penalize expected control energy or path length; in the second summand, penalizes expected deviation from the goal state; the third term penalizes collisions (cf. III ). The weights are design parameters which we set to and , emphasizing avoidance of mission failure and collisions. Note, if our method was to be deployed in a receding horizon fashion, the parameters could also be adapted online using standard learning techniques such as no-regret algorithms [18, 24].

EXP1. Collision resolution was done with the WAIT method to update plans. Draws from the SDEs with the initial plans

Comments

There are no comments yet.