I Introduction
Output tracking in dynamical systems, such as robots, flight control, economics, biology, cyberphysical systems, is the practice of designing decision makers which ensure that a system’s output tracks a given signal [6, 14].
Wellknown existing methods for nonlinear output regulation and tracking include control techniques based on nonlinear inversions [9], highgain observers [11], and the framework of model predictive control (MPC) [1, 18]. Recently a new approach has been proposed, based on the NewtonRaphson flow for solving algebraic equations [24]. Subsequently it has been tested on various applications including controlling an inverted pendulum, and position control of platoons of mobile robotic vehicles [25, 20]. While perhaps not as general as the aforementioned established techniques, it seems to hold out promise of efficient computations and large domains of stability.
The successful deployment of complex control systems in real world applications increasingly depends on their ability to operate on highly unstructured – even adversarial – settings, where apriori knowledge of the evolution of the environment is impossible to acquire. Moreover, due to the increasing interconnection between the physical and the cyber domains, control systems become more intertwined with human operators, making modelbased solutions fragile to unpredictable. Towards that, methods that augment lowlevel control techniques with intelligent decision making mechanisms have been extensively investigated in [19]
[7, 23], offers a suitable framework to allow control systems to autonomously adapt by leveraging data gathered from their environment. To enable datadriven solutions for autonomy, learning algorithms use artificial neural networks (NNs); classes of functions that, due to properties that stem from their neurobiological analogy, offer adaptive data representations and prediction based on external observations.NNs have been used extensively in control applications [15]
, both in openloop and closedloop fashion. In closedloop applications, NNs have been utilized as dynamics approximators, or in the framework of reinforcement learning, in enabling online solution of the HamiltonJacobiBellman equation
[21]. However, the applicability of NNs in openloop control objectives is broader, due to their ability to operate as classifiers, or as nonlinear function approximators
[5].The authors of [15] introduced NN structures for system identification as well as adaptive control. Extending the identification capabilities of learning algorithms, the authors of [4]
introduce a robustification term that guarantees asymptotic estimation of the state and the state derivative. Furthermore, reinforcement learning has received increasing attention since the development of methods that solve optimal control problems for continuous time control systems online without the knowledge of the dynamics
[22]. Prediction has been in the forefront of research conducted on machine learning. Learningbased attack prediction was employed both in [26] and [2] in the context of cybersecurity, and [16] utilized NNs to solve a pursuit evasion game by constructing both the evader’s and the pursuer’s strategies offline using precomputed trajectories. Recently, authors of this paper have applied NN for online model construction in a control application [10].This paper applies an NN technique to the pursuitevasion problem investigated in [17], which is more challenging than the problem addressed in [10]. The strategies of both pursuers and evader are based on respective games. In Ref. [17], the pursuers know the game of the evader ahead of time, and an MPC technique is used to determine their trajectories. In this paper the pursuers do not have an apriori knowledge of the evader’s game or its structure, and they employ an NN in real time to identify its inputoutput mapping. We use our trackingcontrol technique [24] rather than MPC, and obtain similar results to [17]. Furthermore, the input to the system has a lesser dimension that its output, and hence the control is underactuated. We demonstrate a way of overcoming this limitation, which may have a broad scope in applications.
The rest of the paper is structured as follows. Section II describes our proposed control technique and some preliminary results on NN, and it formulates the pursuersevader problem. Section III describes results on modelbased and learningbased strategies. Simulation results are presented in Section IV. Finally, Section V concludes the paper and discusses directions for future research.
Ii Preliminaries and Problem Formulation
Iia Tracking Control Technique
This subsection recounts results published in our previous work in which predictionbased output tracking was used for fullyactuated systems [24, 25, 20]. Consider a system as shown in Figure 1 with , , , and . The objective of the controller is to ensure that
(1) 
for a given (small) .
To illustrate the basic idea underscoring the controller, let us first assume that (i) The plant subsystem is a memoryless nonlinearity of the form
(2) 
for a continuouslydifferentiable function , and (ii) the target reference is a constant, for a given .^{1}^{1}1Henceforth we will use the notation For a generic signal , to distinguish it from its value at a particular point , . These assumptions will be relaxed later. In this case, the tracking controller is defined by the following equation,
(3) 
assuming that the Jacobian matrix is nonsingular at every point computed by the controller via (3). Observe that (3) defines the NewtonRaphson flow for solving the algebraic equation , and hence (see [24, 25]) the controller converges in the sense that . Next, suppose that the reference target is timedependent, while keeping the assumption that the plant is a memoryless nonlinearity. Suppose that is bounded, continuous, piecewisecontinuously differentiable, and is bounded. Define
(4) 
then (see [25]), with the controller defined by (3), we have that
(5) 
Note that Eqs. (2) and (3) together define the closedloop system. Observe that the plantequation (2) is an algebraic equation while the controller equation (3) is a differential equation, hence the closedloop system represents a dynamical system. Its stability, in the sense that is bounded whenever and are bounded, is guaranteed by (5) as long as the control trajectory does not pass through a point where the Jacobian matrix is singular.
Finally, let us dispense with the assumption that the plant subsystem is a memoryless nonlinearity. Instead, suppose that it is a dynamical system modeled by the following two equations,
(6)  
(7) 
where the state variable is in , and the functions and satisfy the following assumption. (i). The function is continuously differentiable, and for every compact set there exists such that, for every and , . (ii). The function is continuously differentiable. ∎ This assumption ensures that whenever the control signal is bounded and continuous, the state equation (6) has a unique solution on the interval .
In this setting, is no longer a function of , but rather of which is a function of . Therefore (2) is no longer valid, and hence the controller cannot be defined by (3). To get around this conundrum we pull the feedback not from the output but from a predicted value thereof. Specifically, fix the lookahead time , and suppose that at time the system computes a prediction of , denoted by . Suppose also that is a function of , hence can be written as , where the function is continuously differentiable.
Now the feedback law is defined by the following equation,
(8) 
The state equation (6) and control equation (8) together define the closedloop system. This system can be viewed as an dimensional dynamical system with the state variable and input . We are concerned with a variant of BoundedInputBoundedState (BIBS) stability whereby if and are bounded, is bounded as well. Such stability nolonger can be taken for granted as in the case where the plant is a memoryless nonlinearity.
We remark that a larger means larger prediction errors, and these translate into larger asymptotic tracking errors. On the other hand, an analysis of various secondorder systems in [24] reveals that they all were unstable if is too small, and stable if is large enough. It can be seen that, a requirement for a restricted prediction error can stand in contradiction with the stability requirement. This issue was resolved by speeding up the controller in the following manner. Consider , and modify (8) by multiplying its right hand side by , resulting in the following control equation:
It was verified in [24, 25, 20], that regardless of the value of , a largeenough stabilizes the closedloop system.^{2}^{2}2This statement seems to have a broad scope, and does not require the plant to be a minimumphase system. Furthermore, if the closedloop system is stable then the following bound holds,
(9) 
where is defined by (4). Thus, a large gain can stabilize the closedloop system and reduce the asymptotic tracking error.
IiB Problem Formulation
In an attempt to broaden the application scope of the control algorithm, underactuated systems such as the fixedwing aircraft are explored, which are widely used in the domain of aerospace engineering. The behavior of a fixed wing aircraft at constant elevation can be approximated by a planar Dubins vehicle with states [12] ,
where denotes the planar position of the vehicle, its heading and the angular acceleration, constrained as, . The input saturation enforces a minimum turning radius equal to . For testing the efficacy of the controller for the underactuated system, henceforth referred to as the pursuer, it is tasked with tracking an evading vehicle, modeled as a single integrator, with dynamics as follows:
where denote the planar position of the evader, and is its speed. We consider two cases; one where the evader is agnostic to the pursuer and follows a known trajectory and the other where the the evader is adversarial in nature and its trajectory is not known to the pursuer. The next section will provide two solutions for the problem of estimating the evader’s trajectory based, respectively, on a modelbased approach and a learningbased approach.
Iii Predictive Framework
Iiia ModelBased Pursuit Evasion
The considered system is underactuated because the pursuer’s position, , is twodimensional while it is controlled by an onedimensional variable, . This raises a problem since the application of the proposed tracking technique requires the control variable and system’s output to have the same dimension. To get around this difficulty, we define a suitable function and set where and are the predicted position of the pursuer and the evader at time ; we apply the NewtonRaphson flow to the equation . The modified controller becomes
(10) 
Since is a scalar, the modified algorithm works similar to the base case.
Assume general nonlinear system dynamics as in (6) with output described in (7). The predicted state trajectory is computed by holding the input to a constant value over the prediction horizon, given by the following differential equation:
(11) 
with the initial condition as shown in [24]. The predicted output at is . Furthermore, by taking the partial derivative of (11) with respect to u(t), we obtain
(12) 
with the initial condition . The above is a differential equation in and (11) and (12) can be solved numerically. Finally, the values of and can be substituted in (10) to get the control law.
In the next section, results are presented for an agnostic as well as an adversarial pursuer evader system. However, as mentioned above, in the adversarial problem formulation, the trajectory of the evader is not known in advance, which can be overcome in two ways.
In the first approach, the pursuer(s) use game theory to predict the approximate direction of evasion. As mentioned in
[8], in the case of single pursuer, the evader’s optimal strategy is to move along the line joining the evader and pursuer’s position, if the pursuer is far enough. When the distance between the pursuer and the evader reduces to the turning radius of the pursuer, the evader switches strategies and enters into the nonholonomic constraint region of the pursuer. This can be represented as follows:(13) 
Here is the expected evasion angle of the evader and is the distance between the pursuer and evader,
If there are multiple pursuers, it is assumed that the evader follows the same strategy by considering only the closest pursuer. It is notable that this will not provide the pursuers a correct prediction of the evader’s motion as they do not know about the goal seeking behavior mentioned above. However, it gives a good enough approximation of the pursuer’s motion that the algorithm can be used for tracking.
The second approach involves learning the evader’s behavior over time using NN. The pursuers take their positions and the position of the evader as input and the NN gives the estimated evasion direction as the output after training.
To showcase the efficacy of our method, we consider a pursuit evasion problem, involving multiple pursuing agents. Such problems are typically formulated as zerosum differential games [8]. Due to the difficulty of solving the underlying HamiltonJacobiIsaacs (HJI) equations [3] of this problem, we shall utilize the method described in IIA to approximate the desired behavior. Furthermore, we show that augmenting the controller with learning structures in order to tackle the pursuit evasion problem without explicit knowledge of the evader’s behavior is straightforward.
In order to formulate the pursuit evasion problem, we define a global state space system consisting of the dynamics of the pursuers and the evader. For ease of exposition, the analysis will focus on the pursuer, evader problem, since extending the results to multiple pursuers is straightforward.
The global state dynamics become,
(14) 
where the subscripts indicate the autonomous agent. For compactness, we denote the global state vector as
, the pursuers’ control vector , and the nonlinear mapping described by the righthand side of (14). Thus, given the initial states of the agents , the evolution of the pursuit evasion game is described by .Subsequently, this zerosum game can be described as a minimax optimization problem through the cost index,
(15) 
where , is the distance between the th pursuer and the evader, are user defined contants, and is a discount factor. The first term ensures that the pursuers remain close to the evader, while the second term encourages cooperation between the agents. The cost decreases exponentially to ensure that the integral has a finite value in the absence of equilibrium points.
Let be a smooth function quantifying the value of the game when specific policies are followed starting from state . Then, we can define the corresponding Hamiltonian of the game as,
(16) 
The optimal feedback policies , of this game are known to constitute a saddle point [3] such that,
(17)  
(18) 
Under the optimal policies (17),(18), the HJI equation is satisfied,
(19) 
Evaluating the optimal pursuit policies, yields the singular optimal solutions described by, , where is the partial derivative of the value function with respect to the state , calculated by solving (19). To obviate the need for bangbang control, as is derived by (17) and (18) we shall employ the predictive tracking technique described in Section IIA to derive approximate, easy to implement, feedback controllers for the pursuing autonomous agents. Furthermore, by augmenting the predictive controller with learning mechanisms, the approximate controllers will have no need for explicit knowledge of , the evader’s policy.
The following theorem presents bounds on the optimality loss induced by the use of the lookahead controller approximation.
Theorem 1.
Let the pursuit evasion game evolve according to the dynamics given by (14), where the evader is optimal with respect to (IIIA) and the pursuers utilize the learningbased predictive tracking strategy given (10). Then, the tracking error of the pursuers and the optimality loss due to the use of the predictive controller are bounded if , such that, , where with denoting the partial derivative of the game value with respect to the state component .
Proof: Consider the Hamiltonian function when the approximate controller, denoted and the NNbased prediction of the evader’s policy, are used,
(20) 
Taking into account the nonlinear dynamics of the system (14), one can rewrite (20) in terms of the optimal Hamiltonian as,, where is the HJI equation that is obtained after substituting (17) and (18) in (16). Now, take the orbital derivative of the value function along the trajectories using the approximate controllers as, Substituting (20) yields Thus, since , ,
Hence for , we have . Thus is a forward invariant set, which implies that the tracking error and the optimality loss over any finite horizon is bounded.
Note that we do not use optimal control or MPC to solve the pursuit evasion problem. Instead, the controller is governed by (10), which is simple to implement and has low computational complexity. ∎
IiiB Deep LearningBased Pursuit Evasion
A deep NN, consisting of hidden layers, describes a nonlinear mapping between its input space and output space
. Each layer receives the output of the previous layer as an input and, subsequently, feeds its own output to the next layer. Each layer’s output consists of the weighted sum of its input alongside a bias term, filtered through an applicationspecific activation function
[7].Specifically, let be the input space of a specific layer, and the corresponding output space. Then the layer’s output is,
where is the input vector, gathered from training data or from the output of previous layers, is a collection of weights for each layer, the bias term and is the layer’s activation function. We note that it is typical to write the output of layer compactly, with slight abuse of notation, as,
(21) 
where , and is the activation function of the previous layer, taking as input the vector .
It is known [13], that twolayer NNs possess the universal approximation property, according to which, any smooth function can be approximated arbitrarily close by an NN of two or more layers. Let be a simply connected compact set and consider the nonlinear function . Given any , there exists a NN such structure such that,
where . We note that, typically, the activation function of the output layer is taken to be linear.
Evaluating the weight matrix
in a network is the main concern of the area of machine learning. In this work, we employ the gradient descent based backpropagation algorithm. Given a collection of
training data, stored in the tuple , where , , , we denote the output errors as Then, the update equation for the weights at each optimization iteration is given by,(22) 
where denotes the learning rate. We note that the update index need not correspond to the sample index , since different update schedules leverage the gathered data in different ways [13]. It can be seen that in order for the proposed method to compute the pursuers’ control inputs, an accurate prediction of the future state of the evader is required. However, this presupposes that the pursuers themselves have access to the evader’s future decisions; an assumption that is, in most cases, invalid. Thus, we augment the pursuers’ controllers with a NN structure, that learns to predict the actions of the evader, based on past recorded data.
Initially, we assume that the evader’s strategy is computed by a feedback algorithm, given her relative position to the pursuers. This way, the unknown function we wish to approximate is , with, where, denote the distance of pursuer to the evader in the X and Y axes, respectively. In order to train the network, we let the pursuers gather data regarding the fleet’s position with respect to the evader, as well as her behavior over a predefined time window .
Increasing the time window will allow the pursuers to gather more training data for the predictive network. However, this will not only increase the computational complexity of the learning procedure, but will make the pursuers more inert to sudden changes in the evader’s behavior. Simulation results corroborate our choice of training parameters. ∎
Subsequently, we denote by , the current prediction function for the evader’s strategy, i.e., , where , denotes the current weight estimate of the NNs output layer, and is the current estimate of the hidden layers, parametrized by appropriate hidden weights.
While the learning algorithm for the evader’s behavior operates throughout the duration of the pursuit, thus making the approximation weights timevarying, we suppress their explicit dependence on time since the process is openloop, in the sense that the system is learning in batches, rather that in a continuous fashion. ∎
Iv Simulation Results
This section presents results for the problems briefly described in the previous section. First, the agnostic evader case is considered followed by the adversarial case. For the second case, single and multiple pursuer systems are considered separately. The controller is implemented on a Dubins vehicle. For the purpose of tracking, we define the system output to be , .
Iva Single Pursuer  Agnostic Target
In this subsection, the controller is tested on a Dubins vehicle with the task of pursuing an agnostic target moving along a known trajectory. Since the vehicle has a constant speed and an input saturation is enforced, it has an inherent minimum turning radius. For this simulation, we set m/s and the input saturation is first set to rad/s and then to rad/s. The evader moves along two semicircular curves with a constant speed which is less than .
As a consequence, when the pursuer catches up to the evader, it overshoots and has to go around a full circle to again start tracking. Naturally, lower turning radius translates to better tracking as the vehicle can make “tighter” turns. This can be seen when comparing the trajectories of the vehicle in Figure 2 with Figure 2. For the same trajectory of the evader, the tracking performance is far better in the second case. Once the pursuer catches up to the target, the maximum tracking error in the first case is approximately meters and only meter in the second case, shown in Figures 2 and 2. This is consistent with the fact that the ratio of the turning radii is .
IvB Single Pursuer  Adversarial Evader
The pursuer is again modelled as a Dubins vehicle, while the evader is modelled as a single integrator with a maximum velocity less than the speed of the pursuer. Hence, while the pursuer is faster, the evader is more agile, and can instantly change its direction of motion. In this and subsequent cases, the evader is considered adversarial in nature and uses game theory to choose evasion direction.
Let and be the position vector of the pursuer and evader respectively at time . First, the pursuer makes an estimate of the optimal evasion direction based on the relative position of the evader and itself at time using (13). Assuming this direction of evasion to be fixed over the prediction window from to gives the predicted position of the evader at all time instances in this interval, denoted as . Next, the pursuer estimates its own predicted position if its input is kept constant, called . Finally, is set as and the value of ( being the ensemble vector of the states of the pursuer and the evader) is used to compute the input differential equation (10).
Figures 3 shows the trajectories of the pursuer and the evader, with the goal for the evader set to to point . It can be observed that the evader moves towards the goal while the pursuer is far away and starts evasive maneuvers when it gets close to it, by entering its nonholonomic region. Figure 3 displays the tracking error, defined as the distance between the pursuer and the evader, which is almost periodic. This is because the evader’s maneuver forcing the pursuer to circle back. The peak tracking error after the pursuer catches up is slightly more than twice the turning radius, as expected.
IvC Multiple Pursuers  Adversarial Evader
While the previous section had only one pursuer, this simulation considers the case of two pursuers and a single evader. Having multiple pursuers means there must be cooperation between them in order to optimally utilize resources. Thus, a pursuer can no longer make decisions solely based on the position of the evader relative to itself. The positions of the rest of the pursuers must also be factored in. Thus we redefine the expression for to include these parameters as shown below for the case of two pursuers. Let be the distance between the two pursuers, and let
(23) 
The first term ensures that the pursuers remain close to the evader, while the second term encourages cooperation between agents. The last term is added to repel pursuers apart if they come close to each other, as having multiple pursuers in close vicinity of each other is suboptimal.
Figure 5 shows the trajectories of the pursuers and the evader when the goal for the evader is set to the point . In this case, the pursuers close in on the evader and trap it away from its goal due to their cooperative behavior. The evader is forced to continuously perform evasive maneuvers as the other pursuer closes in when the first has to make a turn. This can be seen more clearly in the tracking error plot given in Figure 5. After catching up with the evader, it can be seen that when one pursuer is at its maximum distance, the other is at its minimum. The results achieved show good coordination between the pursuers and low tracking error and are qualitatively comparable to [17].
Lastly, we present the results under the learningbased prediction. In Figure 7, we present a comparative result of the tracking error of the modelbased algorithm visàvis the NNbased control. Figure 7 showcases the quality of the performance of the proposed algorithm based on the game theoretic cost metric. From these figures, it can be seen that the NN structure offers fast predictive capabilities to the controller; hence the overall performance is comparable to the model based control.
V Conclusion and Future Work
This work extends the framework of predictionbased nonlinear tracking in the context of pursuit evasion games. We present results for vehicle pursuit of agnostic targets, modeled as moving along known trajectories, as well as adversarial target tracking, where the evader evolves according to gametheoretic principles. Furthermore, to obviate the need for explicit knowledge of the evader’s strategy, we employ learning algorithms alongside the predictive controller. The overall algorithm is shown to produce comparable results to those in the literature, while it precludes the need for solving an optimal control problem.
Future work will focus on developing robustness guarantees will allow for more realistic scenarios, where noise and external disturbances are taken into consideration.
References
 [1] (2012) Nonlinear model predictive control. Vol. 26, Birkhäuser. Cited by: §I.
 [2] (2010) Network security: a decision and gametheoretic approach. Cambridge University Press. Cited by: §I.
 [3] (1999) Dynamic noncooperative game theory. Vol. 23, Siam. Cited by: §IIIA, §IIIA.
 [4] (2013) A novel actor–critic–identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49 (1), pp. 82–92. Cited by: §I.

[5]
(1995)
Neural networks for pattern recognition
. Oxford university press. Cited by: §I.  [6] (1996) Nonlinear inversionbased output tracking. IEEE Transactions on Automatic Control 41 (7), pp. 930–942. Cited by: §I.
 [7] (2009) Neural networks and learning machines. Vol. 3, Pearson Upper Saddle River. Cited by: §I, §IIIB.
 [8] (1999) Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation. Cited by: §IIIA, §IIIA.
 [9] (1990) Output regulation of nonlinear systems. IEEE Trans. Automat. Control 35, pp. 131–140. Cited by: §I.
 [10] (2019) Predictive learning via lookahead simulation. In AIAA Scitech 2019 Forum, San Diego, California, January 711, pp. . Cited by: §I, §I.
 [11] (1998) On the design of robust servomechanisms for minimum phase nonlinear systems. Proc. 37th IEEE Conf. Decision and Control, Tampa, FL (), pp. 3075–3080. Cited by: §I.
 [12] (2006) Planning algorithms. Cambridge university press. Cited by: §IIB.
 [13] (1998) Neural network control of robot manipulators and nonlinear systems. CRC Press. Cited by: §IIIB, §IIIB.
 [14] (1996) A different look at output tracking: control of a vtol aircraft. Automatica 32 (1), pp. 101–107. Cited by: §I.
 [15] (1990) Identification and control of dynamical systems using neural networks. IEEE Transactions on neural networks 1 (1), pp. 4–27. Cited by: §I, §I.
 [16] (1995) Synthesis of optimal strategies for differential games by neural networks. In New Trends in Dynamic Games and Applications, pp. 111–141. Cited by: §I.
 [17] (2015) Robust uav coordination for target tracking using outputfeedback model predictive control with moving horizon estimation. In American Control Conference, Chicago, Illinois, July 13, Cited by: §I, §IVC.
 [18] (2017) Model predictive control: theory, computation, and design, 2nd edition. Nob Hill, LLC. Cited by: §I.
 [19] (1983) Intelligent robotic control. IEEE Transactions on Automatic Control 28 (5), pp. 547–557. Cited by: §I.
 [20] (2018) Tracking control by the newtonraphson flow: applications to autonomous vehicles. In European Control Conference, Naples, Italy, June 2528, Cited by: §I, §IIA, §IIA.
 [21] (2010) Online actor–critic algorithm to solve the continuoustime infinite horizon optimal control problem. Automatica 46 (5), pp. 878–888. Cited by: §I.
 [22] (2017) Qlearning for continuoustime linear systems: a modelfree infinite horizon optimal control approach. Systems & Control Letters 100, pp. 14–20. Cited by: §I.
 [23] (2013) Optimal adaptive control and differential games by reinforcement learning principles. Vol. 2, IET. Cited by: §I.
 [24] (2017) Performance regulation and tracking via lookahead simulation: preliminary results and validation. In 56th IEEE Conf. on Decision and Control, Melbourne, Australia, December 1215, pp. . Cited by: §I, §I, §IIA, §IIA, §IIA, §IIIA.
 [25] (2018) Tracking control via variablegain integrator and lookahead simulation: application to leaderfollower multiagent networks. In Sixth IFAC Conference on Analysis and Design of Hybrid Systems (2018 ADHS)l, Oxford, the UK, July 1113, pp. . Cited by: §I, §IIA, §IIA, §IIA.
 [26] (2009) A data mining approach to strategy prediction. In 2009 IEEE Symposium on Computational Intelligence and Games, pp. 140–147. Cited by: §I.
Comments
There are no comments yet.