1. Introduction
Cyberphysical systems (CPSs) comprised of multiple computing agents are often highly distributed and may exhibit emergent behavior. Vformation in a flock of birds is a quintessential example of emergent behavior in a (stochastic) multiagent system. Vformation brings numerous benefits to the flock. It is primarily known for being energyefficient due to the upwash benefit a bird in the flock enjoys from its frontal neighbor. It also offers a clear view benefit, as no bird’s field of vision is obstructed by another bird in the formation. Moreover, its collective spatial flock mass can be intimidating to potential predators. It is therefore not surprising that interest in Vformation is on the rise [Con17, Blo]. Because of Vformation’s intrinsic appeal, it is important to (i) understand its controltheoretic foundations, (ii) devise efficient algorithms for the problem, and (iii) identify the vulnerabilities in these approaches to cyberattacks.
This paper brings together our recent results on Vformation that show how the problem can be formulated in terms of Model Predictive Control (MPC), both centralized and distributed. It also shows how an MPCbased formulation of Vformation can be used as a comprehensive framework for investigating cyberattacks on this formation.
We first consider Adaptive RecedingHorizon Synthesis of Optimal Plans (ARES) [LEH17], an efficient approximation algorithm for generating optimal plans (action sequences) that take an initial state of an MDP to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization (PSO), with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, a sampling technique for rare events, the length of the horizon and the number of particles are chosen such that at least one particle reaches a nextlevel state, that is, a state where the cost decreases by a required delta from the previouslevel state. The level relation on states and the plans constructed by ARES implicitly define a Lyapunov function and an optimal policy, respectively, both of which could be explicitly generated by applying ARES to all states of the MDP, up to some topological equivalence relation.
We assess the effectiveness of ARES by statistically evaluating its rate of success in generating optimal plans that bring a flock from an arbitrary initial state to a state exhibiting a single connected Vformation. For flocks with 7 birds, ARES is able to generate a plan that leads to a Vformation in 95% of the 8,000 random initial configurations within 63 seconds, on average. ARES can be viewed as a modelpredictive controller (MPC) with an adaptive receding horizon, which we also call adaptive MPC (AMPC). We provide statistical guarantees of convergence. To the best of our knowledge, our adaptivesizing approach is the first to provide convergence guarantees in recedinghorizon techniques.
We next present DAMPC[LTSG19], a distributed, adaptivehorizon and adaptiveneighborhood algorithm for solving the stochastic reachability problem in multiagent systems; specifically the flocking problem modeled as an MDP. In DAMPC, at each time step, every agent first calls a centralized, adaptivehorizon modelpredictive control (AMPC) algorithm to obtain an optimal solution for its local neighborhood. Second, the agents derive the flockwide optimal solution through a sequence of consensus rounds. Third, the neighborhood is adaptively resized using a flockwide costbased Lyapunov function. In this way DAMPC improves efficiency without compromising convergence. The proof of statistical global convergence is nontrivial and involves showing that follows a monotonically decreasing trajectory despite potential fluctuations in cost and neighborhood size.
We evaluate DAMPC’s performance using statistical model checking, showing that DAMPC achieves considerable speedup over AMPC (twofold in some cases) with only a slightly lower convergence rate. Smaller average neighborhood size and lookahead horizon demonstrate the benefits of the DAMPC approach for stochastic reachability problems involving any controllable multiagent system that possesses a cost function.
Inspired by the emerging problem of CPS security, we lastly introduce the concept of controllerattacker games [TSE17]
: a twoplayer stochastic game involving a controller and an attacker, which have antagonistic objectives. A controllerattacker game is formulated in terms of an MDP, with the controller and the attacker jointly determining the MDP’s transition probabilities. We also introduce
Vformation games, a class of controllerattacker games where the goal of the controller is to maneuver the plant (a simple model of flocking dynamics) into a Vformation, and the goal of the attacker is to prevent the controller from doing so. Controllers in Vformation games utilize AMPC, giving them extraordinary power: we prove that under certain controllability conditions, an AMPC controller can attain Vformation with probability 1.We evaluate AMPC’s performance on Vformation games using statistical model checking. Our results show that (a) as we increase the power of the attacker, the AMPC controller adapts by suitably increasing its horizon, and thus demonstrates resiliency to a variety of attacks; and (b) an intelligent attacker can significantly outperform its naive counterpart.
The rest of the paper is organized as follows. Section 2 provides background content in the form of our dynamic model of Vformation, stochastic reachability, and PSO. Sections 35 present the ARES algorithm, the DAMPC algorithm, and controllerattacker games for Vformation, respectively. Section 7 offers our concluding remarks.
This paper was written on the occasion of Jos Baeten’s retirement as general director of CWI and professor of theory of computing of ILLC. Jos was a highly influential collaborator of the third author (Smolka), and remains a good friend and colleague. Jos’s feedback to Smolka on the invited talk he gave on Vformation at CONQUEST 2016 was an important impetus for moving the work forward.
2. Background
This section introduces the basic concepts and techniques needed to formulate and derive our results.
2.1. Dynamic Model for Vformation
In our flocking model, each bird in the flock is modeled using four variables: a 2dimensional vector
denoting the position of the bird in a 2D space, and a 2dimensional vector denoting the velocity of the bird. We use to denote a state of a flock with birds. The control actions of each bird are 2dimensional accelerations and 2dimensional position displacements (see discussion of andbelow). Both are random variables.
Let , and respectively denote the position, velocity, acceleration, and displacement of the th bird at time , . The behavior of bird in discrete time is modeled as follows:
(1) 
The next state of the flock is jointly determined by the accelerations and the displacements based on the current state following Eq. 2.1.
Every bird in our model [GPR14] moves in 2dimensional space performing acceleration actions determined by a global controller. When there is no external disturbance, the displacement term is zero and the equations are:
(2) 
The controller detects the positions and velocities of all birds through sensors, and uses this information to compute an optimal acceleration for the entire flock. A bird uses its own component of the solution to update its velocity and position.
We extend this discretetime dynamical model to a (deterministic) MDP by adding a cost (fitness) function^{1}^{1}1A classic MDP [RN10] is obtained by adding sensor/actuator or windgust noise, which are the case we are addressing in the followup work. based on the following metrics inspired by [YGST16]:

Clear View (). A bird’s visual field is a cone with angle that can be blocked by the wings of other birds. We define the clearview metric by accumulating the percentage of a bird’s visual field that is blocked by other birds. Fig. 1 (left) illustrates the calculation of the clearview metric. Let be the part of the angle subtended by the wing of Bird on the eye of Bird that intersects with Bird ’s visual cone with angle . Then, the clear view for Bird , , is defined as , and the total clear view, , is defined as . The optimal value in a Vformation is , as all birds have a clear view. Note that the value can be computed using Bird ’s velocity and position, and Bird ’s position using standard trigonometric functions.

Velocity Matching (). The accumulated differences between the velocity of each bird and all other birds, summed up over all birds in the flock defines . Fig. 1 (middle) depicts the values of in a velocityunmatched flock. Formally, . The optimal value in a Vformation is , as all birds will have the same velocity (thus maintaining the Vformation).

Upwash Benefit (). The trailing upwash is generated near the wingtips of a bird, while downwash is generated near the center of a bird. We accumulate all birds’ upwash benefits using a Gaussianlike model of the upwash and downwash region, as shown in Fig. 1 (right) for the right wing. Let be the projection of the vector along the wingspan of Bird . Similarly, let be the projection of along the direction of . Specifically, the upwash benefit for Bird coming from Bird is given by
where is the error function, which is a smooth approximation of the sign function, is a 2DGaussian with mean at the origin, and is a 2DGaussian shifted so that the mean is . The parameter is the wing span, and is the relative position where upwash benefit is maximized. The total upwash benefit, , for Bird is . The maximum upwash a bird can obtain is upperbounded by 1. Since we are working with cost (that we want to minimize), we define . The optimal value for in a Vformation is , as the leader does not receive any upwash.
Finding smooth and continuous formulations of the fitness metrics is a key element of solving optimization problems. The PSO algorithm has a very low probability of finding an optimal solution if the fitness metric is not welldesigned.
Let be a flock configuration at timestep . Given the above metrics, the overall fitness (cost) metric is of a sumofsquares combination of , , and defined as follows:
(3) 
where is the receding prediction horizon (RPH), is a sequence of accelerations of length , and is the configuration reached after applying to . Formally, we have
(4) 
where is the th acceleration of . As discussed further in Section 3, we allow RPH to be adaptive in nature.
The fitness function has an optimal value of in a perfect Vformation. Thus, there is a need to perform flockwide minimization of at each timestep to obtain an optimal plan of length of acceleration actions:
(5) 
The optimization is subject to the following constraints on the maximum velocities and accelerations: , where is a constant and . The above constraints prevent us from using mixedinteger programming, we might, however, compare our solution to other continuous optimization techniques in the future. The initial positions and velocities of each bird are selected at random within certain ranges, and limited such that the distance between any two birds is greater than a (collision) constant , and small enough for all birds, except for at most one, to feel the .
2.2. VFormation MDP
This section defines Markov Decision Processes (MDPs) and the corresponding MDP formulated by Lukina et al. [LEH17] for the Vformation problem.
Definition 1.
A Markov decision process (MDP) is a 5tuple consisting of a set of states , a set of actions , a transition function , where is the probability of transitioning from state to state under action , a cost function , where is the cost associated with state , and an initial state distribution .
The MDP modeling a flock of birds is defined as follows. The set of states is , as each bird has a D position and a D velocity vector, and the flock contains birds. The set of actions is , as each bird takes a D acceleration action and there are birds. The cost function is defined by Eq. 3. The transition function is defined by Eq. 1. As the acceleration vector for bird at time is a random variable, the state vector , is also a random variable. The initial state distribution
is a uniform distribution from a region of state space where all birds have positions and velocities in a range defined by fixed lower and upper bounds.
2.3. Stochastic Reachability Problem
Given the stochasticity introduced by PSO, the Vformation problem can be formulated in terms of a reachability problem for the Markov chain induced by the composition of a Markov decision process (MDP) and a controller.
Before we can define traces, or executions, of , we need to fix a controller, or strategy, that determines which action from to use at any given state of the system. We focus on randomized strategies. A randomized strategy (controller) over is a function of the form , where
is the set of probability distributions over
. That is, takes a state and returns an action consistent with the probability distribution . Applying a policy to the MDP defines the Markov chain. . We use the terms strategy and controller interchangeably.In the birdflocking problem, a controller would be a function that determines the accelerations for all the birds given their current positions and velocities. Once we fix a controller, we can iteratively use it to (probabilistically) select a sequence of flock accelerations. The goal is to generate a sequence of actions that takes an MDP from an initial state to a state with .
Definition 2.
Let be an MDP, and let be the set of goal states of . The stochastic reachability problem is to design a controller for such that for a given , the probability of the underlying Markov chain to reach a state in in steps (for a given ) starting from an initial state, is at least .
We approach the stochastic reachability problem by designing a controller and quantifying its probability of success in reaching the goal states.
2.4. Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a randomized approximation algorithm for computing the value of a parameter minimizing a possibly nonlinear cost (fitness) function. Interestingly, PSO itself is inspired by bird flocking [KE95]. Hence, PSO assumes that it works with a flock of birds.
Note, however, that in our running example, these birds are “acceleration birds” (or particles), and not the actual birds in the flock. Each bird has the same goal, finding food (reward), but none of them knows the location of the food. However, every bird knows the distance (horizon) to the food location. PSO works by moving each bird preferentially toward the bird closest to food.
The work delineated in this paper uses MatlabToolbox particleswarm, which performs the classical version of PSO. This PSO creates a swarm of particles, of size say , uniformly at random within a given bound on their positions and velocities. Note that in our example, each particle represents itself a flock of birdacceleration sequences , where is the current length of the receding horizon. PSO further chooses a neighborhood of a random size for each particle , , and computes the fitness of each particle. Based on the fitness values, PSO stores two vectors for : its sofar personalbest position , and its fittest neighbor’s position . The positions and velocities of each particle in the particle swarm are updated according to the following rule:
(6) 
where is inertia weight, which determines the tradeoff between global and local exploration of the swarm (the value of is proportional to the exploration range); and are self adjustment and social adjustment, respectively; are randomization factors; and is the vector dot product, that is, random vector : .
If the fitness value for is lower than the one for , then is assigned to . The particle with the best fitness over the whole swarm becomes a global best for the next iteration. The procedure is repeated until the number of iterations reaches its maximum, the time elapses, or the minimum criteria is satisfied. For our birdflock example we obtain in this way the best acceleration.
3. Adaptive RecedingHorizon Synthesis of Optimal Plans (ARES)
ARES [LEH17] is a general adaptive, recedinghorizon synthesis algorithm (ARES) that, given an MDP and one of its initial states, generates an optimal plan (action sequence) taking that state to a state whose cost is below a desired threshold. ARES implicitly defines an optimal, online policysynthesis algorithm, assuming plan generation can be performed in realtime. ARES can alternatively be viewed as a modelpredictive control (MPC) algorithm that utilizes an adaptive receding horizon, a technique we refer to as Adaptive MPC (AMPC).
ARES makes repeated use of PSO [KE95] to effectively generate a plan. This was in principle unnecessary, as one could generate an optimal plan by calling PSO only once, with a maximum planlength horizon. Such an approach, however, is in most cases impractical, as every unfolding of the MDP adds a number of new dimensions to the search space. Consequently, to obtain adequate coverage of this space, one needs a very large number of particles, a number that is either going to exhaust available memory or require a prohibitive amount of time to find an optimal plan.
3.1. The ARES Algorithm
One could in principle solve the optimization problem defined in Sections 2.1 and 2.2 by calling PSO only once, with a horizon in equaling the maximum length allowed for a plan. This approach, however, tends to lead to very large search spaces, and is in most cases intractable. Indeed, preliminary experiments with this technique applied to our running example could not generate any convergent plan.
A more tractable approach is to make repeated calls to PSO with a small horizon length . The question is how small can be. The current practice in modelpredictive control (MPC) is to use a fixed , (see the outer loop of Fig. 3, where resampling and conditional branches are disregarded). Unfortunately, this forces the selection of locallyoptimal plans (of size less than three) in each call, and there is no guarantee of convergence when joining them together. In fact, in our running example, we were able to find plans leading to a Vformation in only of the time for random initial flocks.
Inspired by Importance Splitting (see Fig. 4 (right) and Fig. 3), we introduce the notion of a levelbased horizon, where level equals the cost of the initial state, and level equals the threshold . Intuitively, by using an asymptotic costconvergence function ranging from to , and dividing its graph in equal segments, we can determine on the vertical axis a sequence of levels ensuring convergence.
The asymptotic function ARES implements is essentially , but specifically tuned for each particle. Formally, if particle has previously reached level equaling , then its next target level is within the distance . In Fig. 3, after passing the thresholds assigned to them, values of the cost function in the current state are sorted in ascending order . The lowest cost should be apart from the previous level at least on its for the algorithm to proceed to the next level .
The levels serve two purposes. First, they implicitly define a Lyapunov function, which guarantees convergence. If desired, this function can be explicitly generated for all states, up to some topological equivalence. Second, the levels help PSO overcome local minima (see Fig. 4 (left)). If reaching a next level requires PSO to temporarily pass over a statecost ridge, then ARES incrementally increases the size of the horizon , up to a maximum size . For particle , passing the thresholds means that it reaches a new level, and the definition of ensures a smooth degradation of its threshold.
Another idea imported from IS and shown in Fig. 3, is to maintain clones of the MDP (and its initial state) at any time , and run PSO, for a horizon , on each unfolding of them. This results in an action sequence of length (see Algo. 1). This approach allows us to call PSO for each clone and desired horizon, with a very small number of particles per clone.
To check which particles have overcome their associated thresholds, we sort the particles according to their current cost, and split them in two sets: the successful set, having the indexes and whose costs are lower than the median among all clones; and the unsuccessful set with indexes in , which are discarded. The unsuccessful ones are further replenished, by sampling uniformly at random from the successful set (see Algo. 2).
The number of particles is increased if no clone reaches a next level, for all horizons chosen. Once this happens, we reset the horizon to one, and repeat the process. In this way, we adaptively focus our resources on escaping from local minima. From the last level, we choose the state with the minimal cost, and traverse all of its predecessor states to find an optimal plan comprised of actions that led MDP to the optimal state . In our running example, we select a flock in Vformation, and traverse all its predecessor flocks. The overall procedure of ARES is shown in Algo. 3.
Proposition 1 (Optimality and Minimality).
(1) Let be an MDP. For any initial state of , ARES is able to solve the optimalplan synthesis problem for and . (2) An optimal choice of in function , for some particle , ensures that ARES also generates the shortest optimal plan.
Sketch.
(1) The dynamicthreshold function ensures that the initial cost in is continuously decreased until it falls below . Moreover, for an appropriate number of clones, by adaptively determining the horizon and the number of particles needed to overcome , ARES always converges, with probability 1, to an optimal state, given enough time and memory. (2) This follows from convergence property (1), and from the fact that ARES always gives preference to the shortest horizon while trying to overcome . ∎
The optimality referred to in the title of the paper is in the sense of (1). One, however, can do even better than (1), in the sense of (2), by empirically determining parameter in the dynamicthreshold function . Also note that ARES is an approximation algorithm, and may therefore return nonminimal plans. Even in these circumstances, however, the plans will still lead to an optimal state. This is a Vformation in our flocking example.
3.2. Evaluation of ARES
To assess the performance of our approach, we developed a simple simulation environment in Matlab. All experiments were run on an Intel Core i75820K CPU with 3.30 GHz and with 32GB RAM available.
We performed numerous experiments with a varying number of birds. Unless stated otherwise, results refer to 8,000 experiments with 7 birds with the following parameters: , , , , , , and . The initial configurations were generated independently uniformly at random subject to the following constraints:

Position constraints: .

Velocity constraints: .
Successful  Total  
No. Experiments  7573  8000  
Min  Max  Avg  Std  Min  Max  Avg  Std  
Cost,  2.88  9  4  3  2.88  1.4840  0.0282  0.1607 
Time,  23.14s  310.83s  63.55s  22.81s  23.14s  661.46s  64.85s  28.05s 
Plan Length,  7  20  12.80  2.39  7  20  13.13  2.71 
RPH,  1  5  1.40  0.15  1  5  1.27  0.17 
Table 1 gives an overview of the results with respect to the 8,000 experiments we performed with 7 birds for a maximum of 20 levels. The average fitness across all experiments is
, with a standard deviation of
. We achieved a success rate of with fitness threshold . The average fitness is higher than the threshold due to comparably high fitness of unsuccessful experiments. When increasing the bound for the maximal plan length to 30 we achieved a success rate in 1,000 experiments at the expense of a slightly longer average execution time.No. of birds  3  5  7  9 
Avg. duration  4.58s  18.92s  64.85s  269.33s 
The left plot in Fig. 6 depicts the resulting distribution of execution times for 8,000
runs of our algorithm, where it is clear that, excluding only a few outliers from the histogram, an arbitrary configuration of birds (Fig.
5 (left)) reaches Vformation (Fig. 5 (right)) in around 1 minute. The execution time rises with the number of birds as shown in Table 2.In Fig. 6, we illustrate for how many experiments the algorithm had to increase RPH (Fig. 6 (middle)) and the number of particles used by PSO (Fig. 6 (right)) to improve time and space exploration, respectively.
After achieving such a high success rate of ARES for an arbitrary initial configuration, we would like to demonstrate that the number of experiments performed is sufficient for high confidence in our results. This requires us to determine the appropriate number of random variables necessary for the MonteCarlo approximation scheme we apply to assess efficiency of our approach. For this purpose, we use the additive approximation algorithm as discussed in [GPR14]. If the sample mean is expected to be large, then one can exploit the Bernstein’s inequality and fix to . This results in an additive or absoluteerror approximation scheme:
where approximates with absolute error and probability .
In particular, we are interested in being a Bernoulli random variable:
Therefore, we can use the ChernoffHoeffding instantiation of the Bernstein’s inequality, and further fix the proportionality constant to , as in [HLMP04a].
Hence, for our performed 8,000experiments, we achieve a success rate of 95% with absolute error of and confidence ratio 0.99.
Moreover, considering that the average length of a plan is 13, and that each state in a plan is independent from all other plans, we can roughly consider that our above estimation generated 80,000 independent states. For the same confidence ratio of 0.99 we then obtain an approximation error
, and for a confidence ratio of 0.999, we obtain an approximation error .4. AdaptiveNeighborhood Distributed Control
In Section 3, we introduced the concept of AdaptiveHorizon MPC (). gives controllers extraordinary power: we proved that under certain controllability conditions, an controller can attain Vformation with probability 1. We now present [LTSG19], a distributed version of that extends along several dimensions. First, at every time step, runs a distributed consensus algorithm to determine the optimal action (acceleration) for every agent in the flock. In particular, each agent starts by computing the optimal actions for its local subflock. The subflocks then communicate in a sequence of consensus rounds to determine the optimal actions for the entire flock.
Secondly, features adaptive neighborhood resizing in an effort to further improve the algorithm’s efficiency. Like with an adaptive prediction horizon in , neighborhood resizing utilizes the implicit Lyapunov function to guarantee eventual convergence to a minimum neighborhood size. thus treats the neighborhood size as another controllable variable that can be dynamically adjusted for efficiency purposes. This leads to reduced communication and computation compared to the centralized solution, without sacrificing statistical guarantees of convergence such as those offered by its centralized counterpart . Statistical global convergence can be proven.
4.1. DAMPC System Model
We consider a distributed setting with the following assumptions about the system model.

Birds can communicate with each other without delays. As explained below, each bird adaptively changes its communication radius. The measure of the radius is the number of birds covered, and we refer to it as bird ’s local neighborhood , including bird itself.

All birds use the same algorithm to satisfy their local reachability goals, i.e., to bring the local cost , , below the given threshold .

Birds move in continuous space and change accelerations synchronously at discrete time points.

After executing its local algorithms, each bird broadcasts the obtained solution to its neighbors. In this manner, every bird receives solution proposals, which differ due to the fact that each bird has its own local neighborhood. To achieve consensus, each bird takes as its best action the one with the minimal cost among the received proposals. The solutions for the birds in the considered neighborhood are then fixed. The consensus rounds repeat until all birds have fixed solutions.

At every time step, the value of the global cost function is received by all birds in the flock and checked for improvement. The neighborhood for each bird is then resized based on this global check.

The upwash benefit for bird defined in Section 2.1 maintains connectivity of the flock along the computations, while our algorithm manages collision avoidance.
4.2. The Distributed AMPC Algorithm
In this section, we solve a stochastic reachability problem in the context of Vformation control, and demonstrate that the algorithm can be used as an alternative hillclimbing, costbased optimization technique avoiding local minima.
Maximum and current local horizon lengths  
neighborhood of the i’s bird  
the number of birds in the neighborhood ()  
number of timesteps allowed by the property  
sequence of synthesized acceleration for all birds for each timestep  
acceleration that has not yet been fixed  
,  superscript for the first and last, respectively, elements in the horizon sequence  
,  sequence of accelerations and corresponding states of the horizon length reached at timestep by bird locally in its neighborhood  
dynamical threshold defined based on the last achieved local cost in the neighborhood  
accelerations and corresponding states for all birds achieved globally as unions of the last elements in the best horizon sequences reached locally in each neighborhood  
accelerations and states for all birds achieved globally as unions of the first elements in the best horizon sequences reached locally in each neighborhood  
– level achieved globally at timestep after applying to the current state  
dynamical threshold defined based on the last achieved global level 
(see Alg. 4) takes as input an MDP , a threshold defining the goal states , the maximum horizon length , the maximum number of time steps , the number of birds , and a scaling factor . It outputs a state in and a sequence of actions taking from to a state in .
The initialization step (Line 1) chooses an initial state from , fixes an initial level as the cost of , sets the initial time and number of birds to process . The outer whileloop (Lines 222) is active as long as has not reached and time has not expired. In each time step, first sets the sequences of accelerations for all to (not yet fixed), and then iterates lines 415 until all birds fix their accelerations through global consensus (Line 10). This happens as follows. First, all birds determine their neighborhood (subflock) and the cost decrement that will bring them to the next level (Lines 67). Second, they call LocalAMPC (see Section 4.3), which takes sequences of states and actions fixed so far and extends them such that (line 8) the returned sequence of actions and corresponding sequence of states decrease the cost of the subflock by . Here notation means the whole sequence including the last element (some number, the farthest point in the future where the state of the subflock is fixed), which can differ from one neighborhood to another depending on the length of used horizon. Note that an action sequence passed to LocalAMPC as input contains and the goal is to fill in the gaps in solution sequence by means of this iterative process. In Line 10, we use the value of the cost function in the last resulting state as a criterion for choosing the best action sequence proposed among neighbors . Then the acceleration sequences of all birds in this subflock are fixed (Lines 1214).
After all accelerations sequences are fixed, that is, all are eliminated, the first accelerations in this sequence are selected for the output (Line 17). The next state is set to the union of for all neighbors , the state of the flock after executing is set to the union of . If we found a path that eventually decreases the cost by , we reached the next level, and advance time (Lines 1820). In that case, we optionally decrease the neighborhood, and increase it otherwise (Line 21).
The algorithm is distributed and with a dynamically changing topology. Lines 4, 10, and 18 require synchronization, which can be achieved by broadcasting corresponding information to a central hub of the network. This can be a different bird or a different base station at each timestep.
4.3. The Local AMPC Algorithm
LocalAMPC is a modified version of the AMPC algorithm [TSE17], as shown in Alg. 5. Its input is an MDP , the current state of a subflock , a vector of acceleration sequences , one sequence for each bird in the subflock, a cost decrement to be achieved, a maximum horizon and a scaling factor . In some accelerations may not be fixed yet, that is, they have value .
Its output is a vector of acceleration sequences , one for each bird, that decreased the cost of the flock at most, the state of the subflock after executing all actions.
LocalAMPC first initializes (Line 1) the number of particles to be used by PSO, proportionally to the input horizon , to the number of birds , and the scaling factor . It then tries to decrement the cost of the subflock by at least , as long as the maximum horizon is not reached (Lines 37).
For this purpose it calls PSO (Line 5) with an increasingly longer horizon, and an increasingly larger number of particles. The idea is that the flock might have to first overcome a cost bump, before it gets to a state where the cost decreases by at least . PSO extends the input sequences of fixed actions to the desired horizon with new actions that are most successful in decreasing the cost of the flock, and it computes from scratch the sequence of actions, for the entries. The result is returned in . PSO also returns the states of the flock after applying the whole sequence of actions. Using this information, it computes the actual cost achieved.
4.4. Dynamic Neighborhood Resizing
The key feature of is that it adaptively resizes neighborhoods. This is based on the following observation: as the agents are gradually converging towards a global optimal state, they can explore smaller neighborhoods when computing actions that will improve upon the current configuration.
Adaptation works on lookahead cost, which is the cost that is reachable in some future time. Line 19 of is reached (and the level is incremented) whenever we are able to decrease this lookahead cost. If level is incremented, neighborhood size is decremented, and incremented otherwise, as follows:
(7) 
In Fig. 7 we depict a simulationtrace example, demonstrating how levels and neighborhood size are adapting to the current value of the cost function.
4.5. Local Convergence
Lemma (Local convergence).
Given , an MDP with cost function cost, and a nonempty set of target states with . If the transition relation is controllable with actions in for every (local) subset of agents, then there exists a finite (maximum) horizon such that LocalAMPC is able to find the best actions that decreases the cost of a neighborhood of agents in the states by at least a given .
Proof.
In the input to LocalAMPC, the accelerations of some birds in may be fixed (for some horizon). As a consequence, the MDP may not be fully controllable within this horizon. Beyond this horizon, however, PSO is allowed to freely choose the accelerations, that is, the MDP is fully controllable again. The result now follows from convergence of AMPC (Theorem 1 from [TSE17]). ∎
4.6. Global Convergence and Stability
Global convergence is achieved by our algorithm, where we overcome a local minimum by gradually adapting the neighborhood size to proceed to the next level defined by the Lyapunov function. Since we are solving a nonlinear nonconvex optimization problem, the cost itself may not decrease monotonically. However, the lookahead cost – the cost of some future reachable state – monotonically decreases. These costs are stored in level variables in Algorithm and they define a Lyapunov function .
(8) 
where the levels decrease by at least a minimum dynamically defined threshold: .
Lemma .
defined by (8) is a valid Lyapunov function, i.e., it is positivedefinite and monotonically decreases until the system reaches its goal state.
Proof.
Note that the cost function is positive by definition, and since equals for some state , is nonnegative. Line 18 of Algorithm guarantees that is monotonically decreasing by at least .
∎
Lemma (Global consensus).
Given Assumptions 17 in Section 4.1, all agents in the system will fix their actions in a finite number of consensus rounds.
Proof.
During the first consensus round, each agent in the system runs LocalAMPC for its own neighborhood of the current size . Due to Lemma 4.5, such that a solution, i.e. a set of action (acceleration) sequences of length , will be found for all agents in the considered neighborhood . Consequently, at the end of the round the solutions for at least all the agents in , where is the agent which proposed the globally best solution, will be fixed. During the next rounds the procedure recurses. Hence, the set of all agents with nfy values is monotonically decreasing with every consensus round. ∎
Global consensus is reached by the system during communication rounds. However, to achieve the global optimization goal we prove that the consensus value converges to the desired property.
Definition 3.
Let be a sequence of random vectorvariables and be a random or nonrandom. Then converges with probability one to if
Lemma (Maxneighborhood convergence).
If is run with constant neighborhood size , then it behaves identically to centralized AMPC.
Proof.
If uses neighborhood , then it behaves like the centralized AMPC, because the accelerations of all birds are fixed in the first consensus round. ∎
[Global convergence] Let be an MDP with a positive and continuous cost function and a nonempty set of target states , with . If there exists a finite horizon and a finite number of execution steps , such that centralized AMPC is able to find a sequence of actions that brings from a state in to a state in , then is also able to do so, with probability one.
Proof.
We illustrate the proof by our example of flocking. Note that the theorem is valid in the general formulation above for the fact that as global Lyapunov function approaches zero, the local dynamical thresholds will not allow neighborhood solutions to significantly diverge from reaching the state obtained as a result of repeated consensus rounds. Owing to Lemma 4.5, after the first consensus round, Alg. 5 finds a sequence of best accelerations of length , for birds in subflock , decreasing their cost by . In the next consensus round, birds outside have to adjust the accelerations for their subflock , while keeping the accelerations of the neighbors in to the already fixed solutions. If bird fails to decrease the cost of its subflock with at least within prediction horizon , then it can explore a longer horizon up to . This allows PSO to compute accelerations for the birds in in horizon interval , decreasing the cost of by . Hence, the entire flock decreases its cost by (this defines Lyapunov function in Eq. 8) ensuring convergence to a global optimum. If is reached before the cost of the flock was decreased by , the size of the neighborhood will be increased by one, and eventually it would reach . Consequently, using Theorem 1 in [TSE17], there exists a horizon that ensures global convergence. For this choice of and for maximum neighborhood size, the cost is guaranteed to decrease by , and we are bound to proceed to the next level in
Comments
There are no comments yet.