V-Formation via Model Predictive Control

02/20/2020 ∙ by Radu Grosu, et al. ∙ Microsoft Stony Brook University Institute of Science and Technology Austria TU Wien 0

We present recent results that demonstrate the power of viewing the problem of V-formation in a flock of birds as one of Model Predictive Control (MPC). The V-formation-MPC marriage can be understood in terms of the problem of synthesizing an optimal plan for a continuous-space and continuous-time Markov decision process (MDP), where the goal is to reach a target state that minimizes a given cost function. First, we consider ARES, an approximation algorithm for generating optimal plans (action sequences) that take an initial state of an MDP to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization, with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, the length of the horizon and the number of particles are chosen such that at least one particle reaches a next-level state. ARES can alternatively be viewed as a model-predictive control (MPC) algorithm that utilizes an adaptive receding horizon, aka Adaptive MPC (AMPC). We next present Distributed AMPC (DAMPC), a distributed version of AMPC that works with local neighborhoods. We introduce adaptive neighborhood resizing, whereby the neighborhood size is determined by the cost-based Lyapunov function evaluated over a global system state. Our experiments show that DAMPC can perform almost as well as centralized AMPC, while using only local information and a form of distributed consensus in each time step. Finally, inspired by security attacks on cyber-physical systems, we introduce controller-attacker games (CAG), where two players, a controller and an attacker, have antagonistic objectives. We formulate a special case of CAG called V-formation games, where the attacker's goal is to prevent the controller from attaining V-formation. We demonstrate how adaptation in the design of the controller helps in overcoming certain attacks.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 13

page 26

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Cyber-physical systems (CPSs) comprised of multiple computing agents are often highly distributed and may exhibit emergent behavior. V-formation in a flock of birds is a quintessential example of emergent behavior in a (stochastic) multi-agent system. V-formation brings numerous benefits to the flock. It is primarily known for being energy-efficient due to the upwash benefit a bird in the flock enjoys from its frontal neighbor. It also offers a clear view benefit, as no bird’s field of vision is obstructed by another bird in the formation. Moreover, its collective spatial flock mass can be intimidating to potential predators. It is therefore not surprising that interest in V-formation is on the rise [Con17, Blo]. Because of V-formation’s intrinsic appeal, it is important to (i) understand its control-theoretic foundations, (ii) devise efficient algorithms for the problem, and (iii) identify the vulnerabilities in these approaches to cyber-attacks.

This paper brings together our recent results on V-formation that show how the problem can be formulated in terms of Model Predictive Control (MPC), both centralized and distributed. It also shows how an MPC-based formulation of V-formation can be used as a comprehensive framework for investigating cyber-attacks on this formation.

We first consider Adaptive Receding-Horizon Synthesis of Optimal Plans (ARES) [LEH17], an efficient approximation algorithm for generating optimal plans (action sequences) that take an initial state of an MDP to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization (PSO), with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, a sampling technique for rare events, the length of the horizon and the number of particles are chosen such that at least one particle reaches a next-level state, that is, a state where the cost decreases by a required delta from the previous-level state. The level relation on states and the plans constructed by ARES implicitly define a Lyapunov function and an optimal policy, respectively, both of which could be explicitly generated by applying ARES to all states of the MDP, up to some topological equivalence relation.

We assess the effectiveness of ARES by statistically evaluating its rate of success in generating optimal plans that bring a flock from an arbitrary initial state to a state exhibiting a single connected V-formation. For flocks with 7 birds, ARES is able to generate a plan that leads to a V-formation in 95% of the 8,000 random initial configurations within 63 seconds, on average. ARES can be viewed as a model-predictive controller (MPC) with an adaptive receding horizon, which we also call adaptive MPC (AMPC). We provide statistical guarantees of convergence. To the best of our knowledge, our adaptive-sizing approach is the first to provide convergence guarantees in receding-horizon techniques.

We next present DAMPC[LTSG19], a distributed, adaptive-horizon and adaptive-neighborhood algorithm for solving the stochastic reachability problem in multi-agent systems; specifically the flocking problem modeled as an MDP. In DAMPC, at each time step, every agent first calls a centralized, adaptive-horizon model-predictive control (AMPC) algorithm to obtain an optimal solution for its local neighborhood. Second, the agents derive the flock-wide optimal solution through a sequence of consensus rounds. Third, the neighborhood is adaptively resized using a flock-wide cost-based Lyapunov function. In this way DAMPC improves efficiency without compromising convergence. The proof of statistical global convergence is non-trivial and involves showing that follows a monotonically decreasing trajectory despite potential fluctuations in cost and neighborhood size.

We evaluate DAMPC’s performance using statistical model checking, showing that DAMPC achieves considerable speed-up over AMPC (two-fold in some cases) with only a slightly lower convergence rate. Smaller average neighborhood size and lookahead horizon demonstrate the benefits of the DAMPC approach for stochastic reachability problems involving any controllable multi-agent system that possesses a cost function.

Inspired by the emerging problem of CPS security, we lastly introduce the concept of controller-attacker games [TSE17]

: a two-player stochastic game involving a controller and an attacker, which have antagonistic objectives. A controller-attacker game is formulated in terms of an MDP, with the controller and the attacker jointly determining the MDP’s transition probabilities. We also introduce

V-formation games, a class of controller-attacker games where the goal of the controller is to maneuver the plant (a simple model of flocking dynamics) into a V-formation, and the goal of the attacker is to prevent the controller from doing so. Controllers in V-formation games utilize AMPC, giving them extraordinary power: we prove that under certain controllability conditions, an AMPC controller can attain V-formation with probability 1.

We evaluate AMPC’s performance on V-formation games using statistical model checking. Our results show that (a) as we increase the power of the attacker, the AMPC controller adapts by suitably increasing its horizon, and thus demonstrates resiliency to a variety of attacks; and (b) an intelligent attacker can significantly outperform its naive counterpart.

The rest of the paper is organized as follows. Section 2 provides background content in the form of our dynamic model of V-formation, stochastic reachability, and PSO. Sections 3-5 present the ARES algorithm, the DAMPC algorithm, and controller-attacker games for V-formation, respectively. Section 7 offers our concluding remarks.

This paper was written on the occasion of Jos Baeten’s retirement as general director of CWI and professor of theory of computing of ILLC. Jos was a highly influential collaborator of the third author (Smolka), and remains a good friend and colleague. Jos’s feedback to Smolka on the invited talk he gave on V-formation at CONQUEST 2016 was an important impetus for moving the work forward.

2. Background

This section introduces the basic concepts and techniques needed to formulate and derive our results.

2.1. Dynamic Model for V-formation

In our flocking model, each bird in the flock is modeled using four variables: a 2-dimensional vector

denoting the position of the bird in a 2D space, and a 2-dimensional vector denoting the velocity of the bird. We use to denote a state of a flock with birds. The control actions of each bird are 2-dimensional accelerations and 2-dimensional position displacements (see discussion of and

below). Both are random variables.

Let , and respectively denote the position, velocity, acceleration, and displacement of the -th bird at time , . The behavior of bird in discrete time is modeled as follows:

(1)

The next state of the flock is jointly determined by the accelerations and the displacements based on the current state following Eq. 2.1.

Every bird in our model [GPR14] moves in 2-dimensional space performing acceleration actions determined by a global controller. When there is no external disturbance, the displacement term is zero and the equations are:

(2)

The controller detects the positions and velocities of all birds through sensors, and uses this information to compute an optimal acceleration for the entire flock. A bird uses its own component of the solution to update its velocity and position.

We extend this discrete-time dynamical model to a (deterministic) MDP by adding a cost (fitness) function111A classic MDP [RN10] is obtained by adding sensor/actuator or wind-gust noise, which are the case we are addressing in the follow-up work. based on the following metrics inspired by [YGST16]:

  • Clear View (). A bird’s visual field is a cone with angle that can be blocked by the wings of other birds. We define the clear-view metric by accumulating the percentage of a bird’s visual field that is blocked by other birds. Fig. 1 (left) illustrates the calculation of the clear-view metric. Let be the part of the angle subtended by the wing of Bird on the eye of Bird that intersects with Bird ’s visual cone with angle . Then, the clear view for Bird , , is defined as , and the total clear view, , is defined as . The optimal value in a V-formation is , as all birds have a clear view. Note that the value can be computed using Bird ’s velocity and position, and Bird ’s position using standard trigonometric functions.

  • Velocity Matching (). The accumulated differences between the velocity of each bird and all other birds, summed up over all birds in the flock defines . Fig. 1 (middle) depicts the values of in a velocity-unmatched flock. Formally, . The optimal value in a V-formation is , as all birds will have the same velocity (thus maintaining the V-formation).

  • Upwash Benefit (). The trailing upwash is generated near the wingtips of a bird, while downwash is generated near the center of a bird. We accumulate all birds’ upwash benefits using a Gaussian-like model of the upwash and downwash region, as shown in Fig. 1 (right) for the right wing. Let be the projection of the vector along the wing-span of Bird . Similarly, let be the projection of along the direction of . Specifically, the upwash benefit for Bird coming from Bird is given by

    where is the error function, which is a smooth approximation of the sign function, is a 2D-Gaussian with mean at the origin, and is a 2D-Gaussian shifted so that the mean is . The parameter is the wing span, and is the relative position where upwash benefit is maximized. The total upwash benefit, , for Bird is . The maximum upwash a bird can obtain is upper-bounded by 1. Since we are working with cost (that we want to minimize), we define . The optimal value for in a V-formation is , as the leader does not receive any upwash.

Finding smooth and continuous formulations of the fitness metrics is a key element of solving optimization problems. The PSO algorithm has a very low probability of finding an optimal solution if the fitness metric is not well-designed.

Figure 1. Illustration of the clear view (), velocity matching (), and upwash benefit () metrics. Left: Bird ’s view is partially blocked by birds and . Hence, its clear view is . Middle: A flock and its unaligned bird velocities results in a velocity-matching metric . In contrast, when the velocities of all birds are aligned. Right: Illustration of the (right-wing) upwash benefit bird receives from bird depending on how it is positioned behind bird . Note that bird ’s downwash region is directly behind it.

Let be a flock configuration at time-step . Given the above metrics, the overall fitness (cost) metric is of a sum-of-squares combination of , , and defined as follows:

(3)

where is the receding prediction horizon (RPH), is a sequence of accelerations of length , and is the configuration reached after applying to . Formally, we have

(4)

where is the th acceleration of . As discussed further in Section 3, we allow RPH to be adaptive in nature.

The fitness function has an optimal value of in a perfect V-formation. Thus, there is a need to perform flock-wide minimization of at each time-step to obtain an optimal plan of length of acceleration actions:

(5)

The optimization is subject to the following constraints on the maximum velocities and accelerations: , where is a constant and . The above constraints prevent us from using mixed-integer programming, we might, however, compare our solution to other continuous optimization techniques in the future. The initial positions and velocities of each bird are selected at random within certain ranges, and limited such that the distance between any two birds is greater than a (collision) constant , and small enough for all birds, except for at most one, to feel the .

2.2. V-Formation MDP

This section defines Markov Decision Processes (MDPs) and the corresponding MDP formulated by Lukina et al. [LEH17] for the V-formation problem.

Definition 1.

A Markov decision process (MDP) is a 5-tuple consisting of a set of states , a set of actions , a transition function , where is the probability of transitioning from state to state under action , a cost function , where is the cost associated with state , and an initial state distribution .

The MDP modeling a flock of birds is defined as follows. The set of states is , as each bird has a D position and a D velocity vector, and the flock contains birds. The set of actions is , as each bird takes a D acceleration action and there are birds. The cost function is defined by Eq. 3. The transition function is defined by Eq. 1. As the acceleration vector for bird at time is a random variable, the state vector , is also a random variable. The initial state distribution

is a uniform distribution from a region of state space where all birds have positions and velocities in a range defined by fixed lower and upper bounds.

2.3. Stochastic Reachability Problem

Given the stochasticity introduced by PSO, the V-formation problem can be formulated in terms of a reachability problem for the Markov chain induced by the composition of a Markov decision process (MDP) and a controller.

Before we can define traces, or executions, of , we need to fix a controller, or strategy, that determines which action from to use at any given state of the system. We focus on randomized strategies. A randomized strategy (controller) over is a function of the form , where

is the set of probability distributions over

. That is, takes a state and returns an action consistent with the probability distribution . Applying a policy to the MDP defines the Markov chain. . We use the terms strategy and controller interchangeably.

In the bird-flocking problem, a controller would be a function that determines the accelerations for all the birds given their current positions and velocities. Once we fix a controller, we can iteratively use it to (probabilistically) select a sequence of flock accelerations. The goal is to generate a sequence of actions that takes an MDP from an initial state to a state with .

Definition 2.

Let be an MDP, and let be the set of goal states of . The stochastic reachability problem is to design a controller for such that for a given , the probability of the underlying Markov chain to reach a state in in steps (for a given ) starting from an initial state, is at least .

We approach the stochastic reachability problem by designing a controller and quantifying its probability of success in reaching the goal states.

2.4. Particle Swarm Optimization

Particle Swarm Optimization (PSO) is a randomized approximation algorithm for computing the value of a parameter minimizing a possibly nonlinear cost (fitness) function. Interestingly, PSO itself is inspired by bird flocking [KE95]. Hence, PSO assumes that it works with a flock of birds.

Note, however, that in our running example, these birds are “acceleration birds” (or particles), and not the actual birds in the flock. Each bird has the same goal, finding food (reward), but none of them knows the location of the food. However, every bird knows the distance (horizon) to the food location. PSO works by moving each bird preferentially toward the bird closest to food.

The work delineated in this paper uses Matlab-Toolbox particleswarm, which performs the classical version of PSO. This PSO creates a swarm of particles, of size say , uniformly at random within a given bound on their positions and velocities. Note that in our example, each particle represents itself a flock of bird-acceleration sequences , where is the current length of the receding horizon. PSO further chooses a neighborhood of a random size for each particle , , and computes the fitness of each particle. Based on the fitness values, PSO stores two vectors for : its so-far personal-best position , and its fittest neighbor’s position . The positions and velocities of each particle in the particle swarm are updated according to the following rule:

(6)

where is inertia weight, which determines the trade-off between global and local exploration of the swarm (the value of is proportional to the exploration range); and are self adjustment and social adjustment, respectively; are randomization factors; and is the vector dot product, that is, random vector : .

If the fitness value for is lower than the one for , then is assigned to . The particle with the best fitness over the whole swarm becomes a global best for the next iteration. The procedure is repeated until the number of iterations reaches its maximum, the time elapses, or the minimum criteria is satisfied. For our bird-flock example we obtain in this way the best acceleration.

3. Adaptive Receding-Horizon Synthesis of Optimal Plans (ARES)

ARES [LEH17] is a general adaptive, receding-horizon synthesis algorithm (ARES) that, given an MDP and one of its initial states, generates an optimal plan (action sequence) taking that state to a state whose cost is below a desired threshold. ARES implicitly defines an optimal, online policy-synthesis algorithm, assuming plan generation can be performed in real-time. ARES can alternatively be viewed as a model-predictive control (MPC) algorithm that utilizes an adaptive receding horizon, a technique we refer to as Adaptive MPC (AMPC).

ARES makes repeated use of PSO [KE95] to effectively generate a plan. This was in principle unnecessary, as one could generate an optimal plan by calling PSO only once, with a maximum plan-length horizon. Such an approach, however, is in most cases impractical, as every unfolding of the MDP adds a number of new dimensions to the search space. Consequently, to obtain adequate coverage of this space, one needs a very large number of particles, a number that is either going to exhaust available memory or require a prohibitive amount of time to find an optimal plan.

3.1. The ARES Algorithm

One could in principle solve the optimization problem defined in Sections 2.1 and 2.2 by calling PSO only once, with a horizon in equaling the maximum length allowed for a plan. This approach, however, tends to lead to very large search spaces, and is in most cases intractable. Indeed, preliminary experiments with this technique applied to our running example could not generate any convergent plan.

Figure 2. Blue bars are the values of the cost function in every time step. Red dashed line is the cost-based Lyapunov function used for horizon and neighborhood adaptation. Black solid line is neighborhood resizing for the next step given the current cost.

A more tractable approach is to make repeated calls to PSO with a small horizon length . The question is how small can be. The current practice in model-predictive control (MPC) is to use a fixed , (see the outer loop of Fig. 3, where resampling and conditional branches are disregarded). Unfortunately, this forces the selection of locally-optimal plans (of size less than three) in each call, and there is no guarantee of convergence when joining them together. In fact, in our running example, we were able to find plans leading to a V-formation in only of the time for random initial flocks.

Inspired by Importance Splitting (see Fig. 4 (right) and Fig. 3), we introduce the notion of a level-based horizon, where level equals the cost of the initial state, and level equals the threshold . Intuitively, by using an asymptotic cost-convergence function ranging from to , and dividing its graph in equal segments, we can determine on the vertical axis a sequence of levels ensuring convergence.

The asymptotic function ARES implements is essentially , but specifically tuned for each particle. Formally, if particle has previously reached level equaling , then its next target level is within the distance . In Fig. 3, after passing the thresholds assigned to them, values of the cost function in the current state are sorted in ascending order . The lowest cost should be apart from the previous level at least on its for the algorithm to proceed to the next level .

Figure 3. Graphical representation of ARES.
1 foreach  do
2       particleswarm(); // use PSO in order to determine best next action for the MDP with RPH
3       Cost(); // calculate cost function if applying the sequence of optimal actions of length
4       if  then
5             // new level-threshold
6            
7       end if
8      
9 end foreach
Algorithm 1 Simulate ()

The levels serve two purposes. First, they implicitly define a Lyapunov function, which guarantees convergence. If desired, this function can be explicitly generated for all states, up to some topological equivalence. Second, the levels help PSO overcome local minima (see Fig. 4 (left)). If reaching a next level requires PSO to temporarily pass over a state-cost ridge, then ARES incrementally increases the size of the horizon , up to a maximum size . For particle , passing the thresholds means that it reaches a new level, and the definition of ensures a smooth degradation of its threshold.

Figure 4. Left: If state has cost , and its successor-state has cost less than , then a horizon of length 1 is appropriate. However, if has a local-minimum cost , one has to pass over the cost ridge in order to reach level , and therefore ARES has to adaptively increase the horizon to 3. Right: The cost of the initial state defines and the given threshold defines . By choosing equal segments on an asymptotically converging (Lyapunov) function (where the number is empirically determined), one obtains on the vertical cost-axis the levels required for ARES to converge.

Another idea imported from IS and shown in Fig. 3, is to maintain clones of the MDP (and its initial state) at any time , and run PSO, for a horizon , on each -unfolding of them. This results in an action sequence of length (see Algo. 1). This approach allows us to call PSO for each clone and desired horizon, with a very small number of particles per clone.

1 Sort ascending by their current costs; // find indexes of MDPs whose costs are below the median among all the clones
2 for  to  do
3       if  then
4             Sample uniformly at random from ; ;
5            
6      else
7             ; // Keep more successful MDPs unchanged
8       end if
9      
10 end for
Algorithm 2 Resample ()

To check which particles have overcome their associated thresholds, we sort the particles according to their current cost, and split them in two sets: the successful set, having the indexes and whose costs are lower than the median among all clones; and the unsuccessful set with indexes in , which are discarded. The unsuccessful ones are further replenished, by sampling uniformly at random from the successful set (see Algo. 2).

The number of particles is increased if no clone reaches a next level, for all horizons chosen. Once this happens, we reset the horizon to one, and repeat the process. In this way, we adaptively focus our resources on escaping from local minima. From the last level, we choose the state with the minimal cost, and traverse all of its predecessor states to find an optimal plan comprised of actions that led MDP to the optimal state . In our running example, we select a flock in V-formation, and traverse all its predecessor flocks. The overall procedure of ARES is shown in Algo. 3.

Input : 
Output :  // synthesized optimal plans
1 Initialize ; ; ; ; ; ; while (  do
2       // find and apply best actions with RPH
3       Simulate(); ; // find minimum cost among all the clones
4       if  then
5             ; // new level has been reached
6             ; ; ; // reset adaptive parameters
7             Resample();
8      else
9             if  then
10                   ; // improve time exploration
11                  
12            else
13                   if  then
14                         ; ; // improve space exploration
15                        
16                  else
17                        break;
18                   end if
19                  
20             end if
21            
22       end if
23      
24 end while
25Take a clone in the state with minimum cost at the last level ;
26 foreach  do
27       // find predecessor and corresponding action
28      
29 end foreach
Algorithm 3 ARES
Proposition 1 (Optimality and Minimality).

(1) Let be an MDP. For any initial state of , ARES is able to solve the optimal-plan synthesis problem for and . (2) An optimal choice of in function , for some particle , ensures that ARES also generates the shortest optimal plan.

Sketch.

(1) The dynamic-threshold function ensures that the initial cost in is continuously decreased until it falls below . Moreover, for an appropriate number of clones, by adaptively determining the horizon and the number of particles needed to overcome , ARES always converges, with probability 1, to an optimal state, given enough time and memory. (2) This follows from convergence property (1), and from the fact that ARES always gives preference to the shortest horizon while trying to overcome . ∎

The optimality referred to in the title of the paper is in the sense of (1). One, however, can do even better than (1), in the sense of (2), by empirically determining parameter in the dynamic-threshold function . Also note that ARES is an approximation algorithm, and may therefore return non-minimal plans. Even in these circumstances, however, the plans will still lead to an optimal state. This is a V-formation in our flocking example.

3.2. Evaluation of ARES

To assess the performance of our approach, we developed a simple simulation environment in Matlab. All experiments were run on an Intel Core i7-5820K CPU with 3.30 GHz and with 32GB RAM available.

We performed numerous experiments with a varying number of birds. Unless stated otherwise, results refer to 8,000 experiments with 7 birds with the following parameters: , , , , , , and . The initial configurations were generated independently uniformly at random subject to the following constraints:

  1. Position constraints: .

  2. Velocity constraints: .

Figure 5. Left: Example of an arbitrary initial configuration of 7 birds. Right: The V-formation obtained by applying the plan generated by ARES. In the figures, we show the wings of the birds, bird orientations, bird speeds (as scaled arrows), upwash regions in yellow, and downwash regions in dark blue.
Successful Total
No. Experiments 7573 8000
Min Max Avg Std Min Max Avg Std
Cost, 2.88 9 4 3 2.88 1.4840 0.0282 0.1607
Time, 23.14s 310.83s 63.55s 22.81s 23.14s 661.46s 64.85s 28.05s
Plan Length, 7 20 12.80 2.39 7 20 13.13 2.71
RPH, 1 5 1.40 0.15 1 5 1.27 0.17
Table 1. Overview of the results for 8,000experiments with 7 birds

Table 1 gives an overview of the results with respect to the 8,000 experiments we performed with 7 birds for a maximum of 20 levels. The average fitness across all experiments is

, with a standard deviation of

. We achieved a success rate of with fitness threshold . The average fitness is higher than the threshold due to comparably high fitness of unsuccessful experiments. When increasing the bound for the maximal plan length to 30 we achieved a success rate in 1,000 experiments at the expense of a slightly longer average execution time.

No. of birds 3 5 7 9
Avg. duration 4.58s 18.92s 64.85s 269.33s
Table 2. Average duration for 100 experiments with various number of birds

The left plot in Fig. 6 depicts the resulting distribution of execution times for 8,000

runs of our algorithm, where it is clear that, excluding only a few outliers from the histogram, an arbitrary configuration of birds (Fig. 

5 (left)) reaches V-formation (Fig. 5 (right)) in around 1 minute. The execution time rises with the number of birds as shown in Table 2.

In Fig. 6, we illustrate for how many experiments the algorithm had to increase RPH (Fig. 6 (middle)) and the number of particles used by PSO (Fig. 6 (right)) to improve time and space exploration, respectively.

Figure 6. Left: Distribution of execution times for 8,000runs. Middle: Statistics of increasing RPH . Right: Particles of PSO for 8,000experiments

After achieving such a high success rate of ARES for an arbitrary initial configuration, we would like to demonstrate that the number of experiments performed is sufficient for high confidence in our results. This requires us to determine the appropriate number of random variables necessary for the Monte-Carlo approximation scheme we apply to assess efficiency of our approach. For this purpose, we use the additive approximation algorithm as discussed in [GPR14]. If the sample mean is expected to be large, then one can exploit the Bernstein’s inequality and fix to . This results in an additive or absolute-error -approximation scheme:

where approximates with absolute error and probability .

In particular, we are interested in being a Bernoulli random variable:

Therefore, we can use the Chernoff-Hoeffding instantiation of the Bernstein’s inequality, and further fix the proportionality constant to , as in [HLMP04a].

Hence, for our performed 8,000experiments, we achieve a success rate of 95% with absolute error of and confidence ratio 0.99.

Moreover, considering that the average length of a plan is 13, and that each state in a plan is independent from all other plans, we can roughly consider that our above estimation generated 80,000 independent states. For the same confidence ratio of 0.99 we then obtain an approximation error

, and for a confidence ratio of 0.999, we obtain an approximation error .

4. Adaptive-Neighborhood Distributed Control

In Section 3, we introduced the concept of Adaptive-Horizon MPC (). gives controllers extraordinary power: we proved that under certain controllability conditions, an controller can attain V-formation with probability 1. We now present  [LTSG19], a distributed version of that extends along several dimensions. First, at every time step, runs a distributed consensus algorithm to determine the optimal action (acceleration) for every agent in the flock. In particular, each agent starts by computing the optimal actions for its local subflock. The subflocks then communicate in a sequence of consensus rounds to determine the optimal actions for the entire flock.

Secondly, features adaptive neighborhood resizing in an effort to further improve the algorithm’s efficiency. Like with an adaptive prediction horizon in , neighborhood resizing utilizes the implicit Lyapunov function to guarantee eventual convergence to a minimum neighborhood size. thus treats the neighborhood size as another controllable variable that can be dynamically adjusted for efficiency purposes. This leads to reduced communication and computation compared to the centralized solution, without sacrificing statistical guarantees of convergence such as those offered by its centralized counterpart . Statistical global convergence can be proven.

4.1. DAMPC System Model

We consider a distributed setting with the following assumptions about the system model.

  1. Birds can communicate with each other without delays. As explained below, each bird adaptively changes its communication radius. The measure of the radius is the number of birds covered, and we refer to it as bird ’s local neighborhood , including bird itself.

  2. All birds use the same algorithm to satisfy their local reachability goals, i.e., to bring the local cost , , below the given threshold .

  3. Birds move in continuous space and change accelerations synchronously at discrete time points.

  4. After executing its local algorithms, each bird broadcasts the obtained solution to its neighbors. In this manner, every bird receives solution proposals, which differ due to the fact that each bird has its own local neighborhood. To achieve consensus, each bird takes as its best action the one with the minimal cost among the received proposals. The solutions for the birds in the considered neighborhood are then fixed. The consensus rounds repeat until all birds have fixed solutions.

  5. At every time step, the value of the global cost function is received by all birds in the flock and checked for improvement. The neighborhood for each bird is then resized based on this global check.

  6. The upwash benefit for bird defined in Section 2.1 maintains connectivity of the flock along the computations, while our algorithm manages collision avoidance.

4.2. The Distributed AMPC Algorithm

In this section, we solve a stochastic reachability problem in the context of V-formation control, and demonstrate that the algorithm can be used as an alternative hill-climbing, cost-based optimization technique avoiding local minima.

Maximum and current local horizon lengths
neighborhood of the i’s bird
the number of birds in the neighborhood ()
number of time-steps allowed by the property
sequence of synthesized acceleration for all birds for each time-step
acceleration that has not yet been fixed
, superscript for the first and last, respectively, elements in the horizon sequence
, sequence of accelerations and corresponding states of the horizon length reached at time-step by bird locally in its neighborhood
dynamical threshold defined based on the last achieved local cost in the neighborhood
accelerations and corresponding states for all birds achieved globally as unions of the last elements in the best horizon sequences reached locally in each neighborhood
accelerations and states for all birds achieved globally as unions of the first elements in the best horizon sequences reached locally in each neighborhood
– level achieved globally at time-step after applying to the current state
dynamical threshold defined based on the last achieved global level
Table 3. Table of Notation

(see Alg. 4) takes as input an MDP , a threshold defining the goal states , the maximum horizon length , the maximum number of time steps , the number of birds , and a scaling factor . It outputs a state in and a sequence of actions taking from to a state in .

Input : 
Output : ,
1 ; ; ; ; ; ; while (  do
2        ;   // No bird has a fixed solution yet
3        while  do
4               for   do in parallel
5                      ;   // neighbors of
6                      ; ;
7               end
8              ;   // Best solution in R
9               // Fix ’s neighbors solutions
10               for  do
11                      ;   // The solution for bird
12                     
13               end for
14              
15        end while
16       // First action and next state
17        ; ; ; ; if  then
18               ; // Proceed to the next level
19              
20        end if
21       ;   // Adjust neighborhood size
22       
23 end while
Algorithm 4 DAMPC

The initialization step (Line 1) chooses an initial state from , fixes an initial level as the cost of , sets the initial time and number of birds to process . The outer while-loop (Lines 2-22) is active as long as has not reached and time has not expired. In each time step, first sets the sequences of accelerations for all to (not yet fixed), and then iterates lines 4-15 until all birds fix their accelerations through global consensus (Line 10). This happens as follows. First, all birds determine their neighborhood (subflock) and the cost decrement that will bring them to the next level (Lines 6-7). Second, they call LocalAMPC (see Section 4.3), which takes sequences of states and actions fixed so far and extends them such that (line 8) the returned sequence of actions and corresponding sequence of states decrease the cost of the subflock by . Here notation means the whole sequence including the last element (some number, the farthest point in the future where the state of the subflock is fixed), which can differ from one neighborhood to another depending on the length of used horizon. Note that an action sequence passed to LocalAMPC as input contains and the goal is to fill in the gaps in solution sequence by means of this iterative process. In Line 10, we use the value of the cost function in the last resulting state as a criterion for choosing the best action sequence proposed among neighbors . Then the acceleration sequences of all birds in this subflock are fixed (Lines 12-14).

After all accelerations sequences are fixed, that is, all are eliminated, the first accelerations in this sequence are selected for the output (Line 17). The next state is set to the union of for all neighbors , the state of the flock after executing is set to the union of . If we found a path that eventually decreases the cost by , we reached the next level, and advance time (Lines 18-20). In that case, we optionally decrease the neighborhood, and increase it otherwise (Line 21).

The algorithm is distributed and with a dynamically changing topology. Lines 4, 10, and 18 require synchronization, which can be achieved by broadcasting corresponding information to a central hub of the network. This can be a different bird or a different base station at each time-step.

4.3. The Local AMPC Algorithm

LocalAMPC is a modified version of the AMPC algorithm [TSE17], as shown in Alg. 5. Its input is an MDP , the current state of a subflock , a vector of acceleration sequences , one sequence for each bird in the subflock, a cost decrement to be achieved, a maximum horizon and a scaling factor . In some accelerations may not be fixed yet, that is, they have value .

Its output is a vector of acceleration sequences , one for each bird, that decreased the cost of the flock at most, the state of the subflock after executing all actions.

Input : , , , , ,
Output : ,
1 ;   // Initial swarm size
2 ;   // Initial horizon
3 repeat
4        // Run PSO with local information and
5        PSO();
6        ; ;   // increase horizon, swarm size
7       
8until ; ;   // Return temporary sequences
Algorithm 5 LocalAMPC

LocalAMPC first initializes (Line 1) the number of particles to be used by PSO, proportionally to the input horizon , to the number of birds , and the scaling factor . It then tries to decrement the cost of the subflock by at least , as long as the maximum horizon is not reached (Lines 3-7).

For this purpose it calls PSO (Line 5) with an increasingly longer horizon, and an increasingly larger number of particles. The idea is that the flock might have to first overcome a cost bump, before it gets to a state where the cost decreases by at least . PSO extends the input sequences of fixed actions to the desired horizon with new actions that are most successful in decreasing the cost of the flock, and it computes from scratch the sequence of actions, for the entries. The result is returned in . PSO also returns the states of the flock after applying the whole sequence of actions. Using this information, it computes the actual cost achieved.

4.4. Dynamic Neighborhood Resizing

The key feature of is that it adaptively resizes neighborhoods. This is based on the following observation: as the agents are gradually converging towards a global optimal state, they can explore smaller neighborhoods when computing actions that will improve upon the current configuration.

Adaptation works on lookahead cost, which is the cost that is reachable in some future time. Line 19 of is reached (and the level is incremented) whenever we are able to decrease this look-ahead cost. If level is incremented, neighborhood size is decremented, and incremented otherwise, as follows:

(7)

In Fig. 7 we depict a simulation-trace example, demonstrating how levels and neighborhood size are adapting to the current value of the cost function.

Figure 7. Left: Blue bars are the values of the cost function in every time step. Red dashed line in the value of the Lyapunov function serving as a threshold for the algorithm. Black solid line is resizing of the neighborhood for the next step given the current cost. Right: Step-by-step evolution of the flock from an arbitrary initial configuration in the left lower corner towards a V-formation in the right upper corner of the plot.

4.5. Local Convergence

Lemma (Local convergence).

Given , an MDP with cost function cost, and a nonempty set of target states with . If the transition relation is controllable with actions in for every (local) subset of agents, then there exists a finite (maximum) horizon such that LocalAMPC is able to find the best actions that decreases the cost of a neighborhood of agents in the states by at least a given .

Proof.

In the input to LocalAMPC, the accelerations of some birds in may be fixed (for some horizon). As a consequence, the MDP may not be fully controllable within this horizon. Beyond this horizon, however, PSO is allowed to freely choose the accelerations, that is, the MDP is fully controllable again. The result now follows from convergence of AMPC (Theorem 1 from [TSE17]). ∎

4.6. Global Convergence and Stability

Global convergence is achieved by our algorithm, where we overcome a local minimum by gradually adapting the neighborhood size to proceed to the next level defined by the Lyapunov function. Since we are solving a nonlinear nonconvex optimization problem, the cost itself may not decrease monotonically. However, the look-ahead cost – the cost of some future reachable state – monotonically decreases. These costs are stored in level variables in Algorithm  and they define a Lyapunov function .

(8)

where the levels decrease by at least a minimum dynamically defined threshold: .

Lemma .

defined by (8) is a valid Lyapunov function, i.e., it is positive-definite and monotonically decreases until the system reaches its goal state.

Proof.

Note that the cost function is positive by definition, and since equals for some state , is nonnegative. Line 18 of Algorithm  guarantees that is monotonically decreasing by at least .

Lemma (Global consensus).

Given Assumptions 1-7 in Section 4.1, all agents in the system will fix their actions in a finite number of consensus rounds.

Proof.

During the first consensus round, each agent in the system runs LocalAMPC for its own neighborhood of the current size . Due to Lemma 4.5, such that a solution, i.e. a set of action (acceleration) sequences of length , will be found for all agents in the considered neighborhood . Consequently, at the end of the round the solutions for at least all the agents in , where is the agent which proposed the globally best solution, will be fixed. During the next rounds the procedure recurses. Hence, the set of all agents with nfy values is monotonically decreasing with every consensus round. ∎

Global consensus is reached by the system during communication rounds. However, to achieve the global optimization goal we prove that the consensus value converges to the desired property.

Definition 3.

Let be a sequence of random vector-variables and be a random or non-random. Then converges with probability one to if

Lemma (Max-neighborhood convergence).

If is run with constant neighborhood size , then it behaves identically to centralized AMPC.

Proof.

If uses neighborhood , then it behaves like the centralized AMPC, because the accelerations of all birds are fixed in the first consensus round. ∎

[Global convergence] Let be an MDP with a positive and continuous cost function and a nonempty set of target states , with . If there exists a finite horizon and a finite number of execution steps , such that centralized AMPC is able to find a sequence of actions that brings from a state in to a state in , then is also able to do so, with probability one.

Proof.

We illustrate the proof by our example of flocking. Note that the theorem is valid in the general formulation above for the fact that as global Lyapunov function approaches zero, the local dynamical thresholds will not allow neighborhood solutions to significantly diverge from reaching the state obtained as a result of repeated consensus rounds. Owing to Lemma 4.5, after the first consensus round, Alg. 5 finds a sequence of best accelerations of length , for birds in subflock , decreasing their cost by . In the next consensus round, birds outside have to adjust the accelerations for their subflock , while keeping the accelerations of the neighbors in to the already fixed solutions. If bird fails to decrease the cost of its subflock with at least within prediction horizon , then it can explore a longer horizon up to . This allows PSO to compute accelerations for the birds in in horizon interval , decreasing the cost of by . Hence, the entire flock decreases its cost by (this defines Lyapunov function in Eq. 8) ensuring convergence to a global optimum. If is reached before the cost of the flock was decreased by , the size of the neighborhood will be increased by one, and eventually it would reach . Consequently, using Theorem 1 in [TSE17], there exists a horizon that ensures global convergence. For this choice of and for maximum neighborhood size, the cost is guaranteed to decrease by , and we are bound to proceed to the next level in