1 Introduction
Nowadays, the ATM system is based on an airspace management paradigm that leads to demand imbalances that cannot be dynamically adjusted. This entails, among others, higher air traffic controllers’ (ATCO) workload, which determines the maximum system capacity. With the aim of overcoming ATM system drawbacks, different initiatives, dominated by Single European Sky ATM Research (SESAR) in Europe and Next Gen in the US, have promoted the transformation of the current environment towards a new trajectory based ATM paradigm. In this Trajectory Based Operations paradigm the trajectory becomes the cornerstone upon which all the ATM capabilities will rely on, supporting the whole trajectory life cycle: From the trajectory planning, to the negotiation and agreement, execution, amendment and modification stages.
The proposed transformation requires high fidelity aircraft trajectory planning and prediction capabilities, supporting the trajectory life cycle at all stages efficiently. Indeed, predictability is considered as the main driver to enhance operational key performance areas, such as capacity, efficiency for all stakeholders (i.e. airspace users, air traffic controllers, network manager, airspace navigation service providers, etc), and, of course, safety. Enhancing predictability and bringing more automation into all stages of operations emerges both, as a midterm need (the European Network Manager EUROCONTROL, forecasts increases in traffic of +50% in 2035 compared to 2017, meaning 16 million flights across Europe) and as a longterm need (2035+).
Current trajectory predictors are based on deterministic formulations of the aircraft motion problem. Although there are sophisticated solutions that reach high levels of accuracy, all approaches are intrinsic simplifications to the actual aircraft behavior, which delivers appropriate results for a reasonable computational cost. Predictors’ outputs are generated based on apriori knowledge of the flight plan (i.e. airline’s planned and intended trajectory filed before departure), the expected command and control strategies released by the pilot, or the Flight Management System (FMS) instructions (known as Aircraft Intent [24]), a forecast of weather conditions to be faced throughout the trajectory, and the aircraft performance. These modelbased or physicsbased approaches are deterministic: They return always the same trajectory prediction for a set of identical inputs. Although the use of the concept of Aircraft Intent together with very precise aircraft performance models such as Base of Aircraft Data (BADA) [2] has helped to improve the prediction accuracy, the model based approach requires a set of input data that typically are not precisely known (i.e. initial aircraft weight, pilot/FMS flight modes, etc.). In addition, accuracy varies depending on the intended prediction horizon (lookahead time).
Recent efforts in the field of aircraft trajectory prediction have explored the application of statistical analysis and machine learning techniques to capture nondeterministic influences for aircraft trajectory prediction. Linear regression models
[20] [15]and neural networks
[21] [8] [23], have returned successful outcomes for improving the trajectory prediction accuracy for traffic flow forecasting. Generalized Linear Models [22] have been applied for the trajectory prediction in arrival management scenarios and multiple linear regression [32] [17]for predicting estimated times of arrival (ETA). These efforts include as input dataset historical surveillance data, and additional supporting data required for robust and reliable trajectory predictions (e.g. flight plans, airspace structure, Air Traffic Control procedures, airline strategy, weather forecasts, etc.) depending on their objectives. However, these approaches make specific assumptions, are restricted to a specific operational/tactical phase, have a limited prediction horizon, or consider specific constraints for trajectories.
In this paper we approach the trajectory prediction problem as a datadriven imitation learning problem, where we aim to imitate the experts “shaping” the trajectory, learning models that incorporate their preferences, strategies, practices etc.^{1}^{1}1Subsequently preferences, strategies, practice etc. are termed as “policy”. in an aggregated way. Towards this goal we present a comprehensive framework that comprises the Generative Adversarial Imitation Learning state of the art method, in a pipeline with trajectory clustering and classification methods. This approach can be effective (in terms of accuracy of predictions) even with a small number of historical trajectories, compared to other approaches, and more importantly, can provide accurate longterm predictions both at the pretactical (i.e. before departure, given the departure airport and the takeoff time) and at the tactical (i.e. while flying, given a state enroute) stages, compared to state of the art approaches. We provide a series of experiments, using demonstrated trajectories from Barcelona to Madrid, showing the effectiveness of our approach.
Major contributions that this paper makes are as follows:

It formalizes the trajectory prediction problem as an imitation learning process, given a set of historical trajectories provided as “expert” demonstrations, thus considering an aggregation of the individual stakeholders’ policies ”shaping” these trajectories.

It introduces a framework that is able to detect different classes (patterns) of trajectories, learns models to identify the most likely class to which a future trajectory belongs exploiting forecast contextual features (weather conditions), and applies stochastic policies to predict trajectories, subject to their membership in a specific class.

It proposes the straightforward use of state of the art deep imitation learning methods, which are able to learn trajectory models without making any assumption on the form of a cost function, in continuous stateaction spaces and with no specific requirements on specifying trajectory constraints; and finally, with minimal data preprocessing requirements.

It provides experimental results that  although they concern a specific origindestination airports pair  show the far in the future prediction abilities of the method, either at the pretactical or at the tactical stage of operations.
The structure of the paper is as follows: Section 2 provides background knowledge and defines raw and enriched trajectories, it specifies the trajectory prediction problem in its more generic form, and provides background knowledge on imitation learning. Section 3 formalizes the datadriven aircraft trajectory prediction problem as an imitation learning process, Section 4 presents the overall prediction framework with emphasis on the Generative Adversarial Imitation Learning method used, and Section 5 presents experimental results. Finally, Section 6 presents related work and discusses the effectiveness of the proposed method w.r.t. state of the art trajectory prediction methods in the aviation domain. Section 7 concludes the paper.
2 Background Knowledge
2.1 Aircraft Trajectory Prediction
In the aviation domain, a trajectory is defined as the description of movement of an aircraft both in the air and on the ground.^{2}^{2}2https://ext.eurocontrol.int/lexicon/index.php/Trajectory. This description can be provided by a chronologically ordered sequence of aircraft states described by a list of state variables. Most relevant state variables are airspeeds, 3D position (determined by latitude (f), longitude (l) and geodetic altitude (h)), the bearing (c) or heading (y) and the instantaneous aircraft mass (m). Trajectories providing only spatiotemporal information at each state (i.e. 3D positions and timestamps) can be detected by exploiting surveillance data and are called raw trajectories.
More formally, a raw trajectory of an aircraft is defined to be a sequence of pairs =, , where is a point () in the 3D space (position) and is a timestamp. In this case a trajectory state is represented by a 4D point, and the of a trajectory is equal to the number of states .
Following a datadriven approach, we aim to exploit historical 4D aircraft trajectories whose states include 3D aircraft position with timestamps, in conjunction to other contextual features that may provide useful features in the prediction process, such as weather conditions at each state, traffic, special events occurring etc. Adding variables in a trajectory state, results in a trajectory with enriched points or enriched states, thus to an enriched trajectory:
An enriched trajectory state or enriched trajectory point, corresponding to a raw trajectory point = is defined to be a triplet =, where
is a vector consisting of categorical and/or numerical variables annotating the raw trajectory state. An
enriched trajectory is defined to be a sequence of enriched states =, .In the aviation domain, towards implementing the Trajectory Based Operations paradigm, predictability of trajectories is of immense importance. Indeed, uncertainties occurring during a flight have impact on multiple stakeholders, including airspace users (i.e. airlines), the air navigation service providers (ANSPs) providing services for Air Traffic Management, the air traffic controllers, as well as ground operators and of course, passengers. Confronting uncertainties and adopting to them is costly for all. For instance, these may require assigning delays to flights, or choosing alternative routes to those planned for a flight, resulting to more fuel consumption, more workload for Air Traffic Controllers (challenging the capacity of the ATM system), and cascaded effects to the whole ATM system.
A predicted trajectory can be defined as the future evolution of the aircraft state as a function of (a) the current flight conditions (e.g. an initial state with actual weather conditions), (b) a forecast of contextual features (e.g. forecast weather conditions at specific positions) and (c) a description of how the aircraft is to transit among subsequent states starting from an initial state and on, i.e. a “policy” on how the trajectory evolves.
Casting the trajectory prediction to a datadriven problem, and assuming a set of historical, demonstrated enriched trajectories, the trajectory prediction problem can be defined as follows: Given and a “cost”^{3}^{3}3In the context of data driven methods, the cost function denotes a penalty for low adherence to demonstrated data. function , the objective is to predict a trajectory such that
(1) 
where denotes the expected cumulative costs for all states generated along the trajectory by following a policy
prescribing the probability of applying an action
at an enriched state , (we discuss about states and actions in a subsequent paragraph). Actually, according to equation 1, the ultimate objective is to find the policy that determines the generation of a minimalexpectedcumulativecost predicted trajectory .The cost function may take several forms depending on how the problem is approached: For instance, considering specific trajectories (e.g. flight plans, or cluster medoids) as constraints to which the predicted trajectory must adhere to (e.g. as in [12]), and measuring the adherence of predictions to these constraints, the cost function may take the form of a distance function between these trajectories and predicted trajectories. Other constraints may also be incorporated into the process, such as those depending on the aircraft type, allowable or desired states, origin and destination airports, etc. Generally, in a datadriven trajectory prediction process, the cost function shows the adherence of predictions to patterns, constraints and policies regarding historical cases. We delve into this issue further while formulating the trajectory prediction problem as an imitation process, in Section 3.
A final note on equation 1, concerns the trajectory states, additional features and actions considered: The formulation indicates separately the 4D position information with timestamps and other variables enriching states. Indeed, additional features may be considered in the cost function, such as weather variables, traffic, airspaces crossed, etc. Also, different prediction processes may have different prediction objectives: For instance, one may predict the aircraft position at specific time instances, or predict the time instance that a specific position will be reached, or the position together with the corresponding timestamp, or even predict some of the contextual features, such as aispaces crossed at specific time instances, or traffic. What we aim to predict in this work is the 3D aircraft position at specific time instances. Actions executed at each state determine how the trajectory evolves towards the next state (e.g. by means of change in speed, change in aircraft direction (bearing), other detailed aircraft intent instructions etc.). Actions may also vary between different approaches. Let be the set of actions assumed.
2.2 Imitation Learning
Making assumptions on, or handcrafting the cost function is crucial to the prediction process, as flown trajectories are shaped by several stakeholders each with own preferences, strategies and concerns. Thus, we are motivated to apply an imitation learning approach towards learning a policy modeling the way stakeholders shape the evolution of trajectories, considering historical trajectories as experts’ demonstrations.
Imitation learning studies the problem of learning to perform a specific task in a setting, where the learner has access to expert demonstrations, but cannot query the expert for more samples while being trained, and has no access to a cost signal. There are two fundamental approaches to imitation learning: Behavioral Cloning (BC) and Inverse Reinforcement Learning (IRL).
Behavioral Cloning [26]
addresses the imitation learning problem as a supervised learning problem over the stateaction pairs of the expert demonstrations. Behavioral Cloning solves a regression problem minimizing the error between the actions demonstrated and the policy actions, over the states of the historical trajectories. This technique suffers from compounding errors and a regret bound that grows quadratically in the time horizon of the task leading to poor performance
[28, 29].Inverse Reinforcement Learning [3, 35, 11, 10] on the other hand, aims at deriving a cost function that assigns minimal cost to trajectories demonstrated by experts and maximal cost to trajectories generated by other policies. Given that many policies may demonstrate the same trajectories, the maximum entropy inverse reinforcement learning approach aims to find the maximum entropy policy [35]. Actually, this process comprises two steps: The first one outputs a desired cost function according to the following formula,
(2) 
where, is a convex cost function regularizer, is the expert policy (provided by the demonstrated trajectories) and is the set of all policies, the entropy function and its weighting parameter. The second step is to find a policy that minimizes the expected cumulative cost and maximizes the entropy by using the cost function into a standard reinforcement learning problem, very close to the one specified by equation (1):
(3) 
Specific instances of this process result into apprenticeship learning methods, e.g. the one described in [3], assuming that the cost function is given by a linear combination of basis functions, which result to feature vectors over states and actions. The linearity assumption is restrictive for complex problems, such as the trajectory prediction problem in the ATM domain. In addition, the handcrafted state features are a big engineering burden. Finally, this method is computationally expensive as it runs a Reinforcement Learning algorithm at every cost function update, to find a policy that performs optimally w.r.t. the learnt cost function.
To address linearity limitations and handcrafted state features, the Guided Cost Learning approach proposed in [11] uses neural networks to represent the cost function. It also provides a more computationally efficient approach, by applying a single gradient step for each new update of the cost function, instead of fully optimizing the learned policy in regard to every update of the cost function. It has been demonstrated in [10] that Guided Cost Learning is equivalent to a specific instantiation of the Generative Adversarial Imitation Learning (GAIL) framework [16].
GAIL [16] can imitate complex behavior as it does not apply restrictive assumptions on the cost function and scales to large, continuous stateaction spaces. GAIL directly learns the optimal policy from expert demonstrations, quite efficiently, since it does not need to derive a cost function that will be used by a Reinforcement Learning method to derive a policy. Actually, it aims to bring the distribution of the stateaction pairs of the imitator as close as possible to that of the expert. GAIL uses an architecture similar to Generative Adversarial Networks to optimize the following objective:
(4) 
where is the imitator policy, is the expert’s policy,
is a binary classifier called discriminator which distinguishes stateaction pairs generated from
and . is the discounted causal entropy of the policy . As shown in [16], equation (4) provides a way to solve the two steps in imitation process described by equations (2) and (3).To predict aircraft trajectories via imitation learning, we are using the GAIL framework. The GAIL implementation to address the specific problem is described after the trajectory prediction specification problem given in the next Section, in Section 4.3.
3 Problem Specification
Given the abstract specification of the datadriven trajectory prediction problem in Section 2.1, and the formulation of the trajectory imitation learning problem provided in Section 2.2, we can provide a formulation of the problem we address here: The datadriven aircraft trajectory prediction problem as an imitation learning task.
Let us assume a set = of historical, enriched aircraft trajectories generated by an expert policy . These trajectories have various number of states, and therefore, various lengths . The objective is to find a policy that minimizes the difference between the expected cumulative cost of the predicted trajectories and of the trajectories in , given an approximation of the cost function that penalizes any stateaction pair generated by any policy in . As shown in [16], this objective is equivalent to finding a policy that brings the distribution of the stateaction pairs generated by it, as close as possible to the distribution of the stateaction pairs demonstrated by trajectories in .
As pointed out in Section 2.1, in this work we aim to predict the 3D aircraft position at specific time instants, given an initial time instant : Specifically, we aim at determining the evolution of the trajectory in space every seconds, i.e. at time instances , , given the position of the aircraft at time instance .
A crucial decision concerns the set of actions considered, which should adequately and unambiguously (although, in a nondeterministic way) specify the evolution of the historical as well as of the predicted trajectories. In our approach, and very close to the General Adversarial Imitation from Observations approach described in [33], we are motivated to focus on states and on their evolution, rather than on the actual actions that may determine this state evolution. This approach is also motivated by considering the following: (a) Expert trajectories do not specify in any way the actions applied in any state and thus, these have to be determined under specific assumptions that may bring noise into the learning process, (b) there are several possibilities of instruction combinations for evolving the aircraft state, at different levels of detail, which result in a highdimensional stateaction space, and which require considering constraints between instruction combinations, (c) what we aim to actually predict is the evolution of aircraft states in the 4D space (i.e. regarding position and time), and (d) the imitation learning approach that we take aims to bring the distribution of stateaction pairs of the imitator close to the corresponding distribution of the expert.
Therefore, we consider that the set contains all the possible triples (, , ) that specify the difference between states’ position information in 3D, given the constraint that this difference must be feasible within the constant period considered. Indeed, these actions can be determined by the demonstrated trajectories unambiguously and very efficiently, although in lowquality surveillance data spacetime constraints concerning the evolution of aircraft states may be violated. This action set has three additional important effects: (a) We can tune the resolution of the predicted trajectory by changing the . (b)Given a specific (e.g. 5 seconds), and the evolution of the trajectory until reaching the destination airport, we can determine the estimated time of arrival (ETA), which is simply ( ), given the predicted trajectory . (c) The transition between positions is deterministic given an action: Given position and an action (, , ), the position in the next state is .
Given the above, the datadriven aircraft trajectory prediction problem as an imitation learning task is specified as follows:
Given a set = of historical, enriched aircraft trajectories, and a time step , we need to determine a policy which optimizes the objective specified by equation (4). This policy, given the initial state of aircraft , determines the evolution of the trajectory at any time instant , . Specifically, it determines , i.e. the evolution of the aircraft position at state after time instants.
4 Trajectory Prediction Framework
This section motivates and provides a description of the overall prediction method, and presents details on the constituent steps.
Generally, given a set of trajectories between any pair of airports, one may detect different patterns of behavior, which may be due to different contextual factors that affect stakeholders’ decision making on the evolution of the trajectory. The choice of the runway approaching the destination airport, for instance, may due to airport weather conditions, traffic, or airline preferences, and may result to significantly different trajectories. What we need to do towards automating the datadriven trajectory prediction process is to detect distinct patterns of trajectories, identifying also the features that distinguish between them. Then, we can learn a distinct policy per class of trajectories, i.e. for those trajectories following a specific pattern of behavior. This can make the learning process much more efficient and effective in contrast to training a single model, considering all possible trajectories. However, to predict a single trajectory we need to know which policy to apply, thus, the mode of behavior it will most probably follow. One solution to this is to forecast the contextual features that may impact the evolution of the trajectory and determine the class of the future trajectory using these features. This classification step is thus restricted to those features, which they do distinguish between different modes of behavior, and can be forecast or can be known at any stage (tactical or pretactical) of ATM operations.
Thus, the trajectory prediction approach that we propose incorporates a trajectories clustering step, a classification step, and finally a trajectory imitation step, solving the following refinement of the abovespecified datadriven trajectory prediction problem:
Given (a) the set = of demonstrated trajectories, (b) a time step , distinct classes of , and (c) a set of states at specific “landmark” (fixed) positions enriched with forecast contextual features, where denotes the forecast time instance that the trajectory will reach the position (), and denotes a vector of forecast contextual features at that point at time , we aim to determine (i) the specific class of expert trajectories that most probably the future trajectory crossing the points in belongs, and (ii) a policy which optimizes the objective specified by equation (4) given the demonstrated trajectories in the determined class , and which specifies the evolution of the trajectory at any time instant , given the initial state of aircraft .
The subsequent subsections describe the methods used for each of the different steps.
4.1 Trajectory Clustering
First we aim to divide into a set of clusters , in such a way that trajectories belonging to the same cluster represent a pattern of behavior that is more similar compared to the behavior of trajectories outside this group [6]. Generally, it holds that
, considering that outliers may not be assigned in any cluster, and
.Towards this goal we apply an agglomerative clustering which is a bottomup hierarchical strategy initially treating each trajectory as an individual cluster (singleton) and iteratively merges similar clusters stopping when only K clusters remain. To merge two clusters we use the average distance among all their member trajectories, using the Ward merging criterion [18].
The distance measure we use is the normalized Dynamic Time Warping (DTW) measure [25], which in our case is applied to (a) find optimal matches between two trajectories, and (b) measure the trajectories’ similarity without considering their variable lengths and the variable time distances between subsequent points. In our case the following DTW formulation has been applied:
(5) 
where is the number of dimensions in the multivariate data observed. The denominator can be seen as the largest DTW distance between two trajectories, thus bounding nDTW in [0,1].
Considering that the appropriate number of clusters is unknown, the problem of determining can be transferred to a silhouette coefficient maximization problem [4]. The computation of the silhouette coefficient needs only pairwise distances and the calculation of clusters’ centroids is avoided.
4.2 Future Trajectory Classification
The future trajectory classification problem we consider in our case is as follows: Given a set of clusters , , and the set of forecast enriched states at specific “landmark” positions that the future trajectory will cross, we aim to determine the class of demonstrated trajectories that most probably the trajectory that will cross the enriched points in belongs.
These fixed positions may be waypoints (fixes) declared in a planed trajectory (e.g. the flight plan), although in this article we consider a single point that the trajectory will cross for sure: The destination airport. Specifically, we consider the singleton , , with an enriched state corresponding to the destination airport, reached at , which is equal to the estimated time of arrival. The comprises destination airport’s forecast metereological variables (specified in Section 5), although more features can be incorporated (e.g. traffic conditions).
The classifier used in the pipeline is the random forest classification algorithm
[7], which is trained with enriched trajectories in being assigned to the specific clusters identified, and is called to predict to which cluster the future trajectory crossing most probably belongs. It must be noted that each training trajectory is enriched with all the variables corresponding to those in , being assigned with the real (not forecast) values at the time of flight arrival in .4.3 GAIL: Learning to imitate trajectories
As specified above, we are using GAIL for imitation learning. Actually GAIL is trained per cluster, thus revealing a policy per cluster. Simply, GAIL employs a generative trajectory model that models and a discriminative classifier that distinguishes between the distribution of data (i.e. state action pairs) generated by the policy and the demonstrated data. Both and are represented by function approximators, with weights and , respectively. Following the implementation described in [16], GAIL alternates between an Adam [19] gradient step on to increase equation (4) with respect to , and a step on using the Trust Region Policy Optimization (TRPO) algorithm [30] to decrease equation (4). TRPO optimizes the following objective:
(6)  
where is the distribution of states generated using the priortoupdate (old) policy , is an action sampling distribution that we consider equal to , is the updated policy with parameters , is the stateaction value function of the old policy and is a constant that constraints the KL divergence between and , preventing the policy from changing too much due to noise in the policy gradient.
We approximately solve the TRPO optimization problem as described in [30] Appendix C, using the conjugate gradient method and a line search. In our setting we set , so we omit from the equation (4), following the practice demonstrated in [16].
A subtle point in our implementation is that, instead of approximating , we utilize a separate critic model to approximate the state advantage defined as
, aiming to lower the gradient variance. We follow the Generalized Advantage Estimation (GAE) approach introduced in
[31], which provides a balance between low variance and a small amount of bias introduced. Formally, we estimate the advantage from the sampled stateaction pairs as follows:(7) 
where is the discounting factor, a hyperparameter and
(8) 
Algorithm 1 shows the aforementioned procedure in more detail. Specifically, we pretrain using Behavioral Cloning. Then, at each GAIL iteration, the algorithm samples from the initial state distribution and generates rollout trajectories. It uses the generated stateaction samples and the samples of the historical trajectories to update the parameters . is updated with cross entropy loss that pushes the output for the demonstrated stateaction samples closer to 0 and stateaction samples closer to 1. Next, the imitation algorithm takes a policy step using the TRPO [30] update rule and as the cost function approximation to update . It must be noted that the parameter in the denotation of the approximation of the state advantage in Algorithm 1 is left implicit, for simplicity of the presentation.
5 Experimental Evaluation
5.1 Experimental Setting
Datasets exploited in our experiments include radar tracks (surveillance data representing raw trajectories) for flights from Barcelona to Madrid and from the 1 to the 24 of April 2016 , weather data obtained from National Oceanic and Atmospheric Administration (NOAA), and weather reports from airports (METAR). The aim is to predict trajectories for this origindestination pair.
Given these datasets, demonstrated trajectories are enriched with eleven (11) numerical variables corresponding to (a) 6 meteorological features at the corresponding 3D state position and time, provided by NOAA, and 5 features specifying actual weather conditions at the arrival airport at the time of arrival, provided by METAR. The NOAA features are pressure surface, relative humidity isobaric, temperature isobaric, wind speed gust surface, ucomponent of wind isobaric, vcomponent of wind isobaric. Features from METAR include wind direction, wind speed in knots, pressure altimeter in inches, visibility in miles, wind gust in knots.
The set of raw trajectories has been preprosessed and cleaned. The preprocessing stage interpolates points in trajectories, so that two points have a temporal distance of
seconds. This task calculates the average velocity of the aircraft between subsequent points in the original trajectory. Assuming a constant velocity between these points we can calculate the position of the aircraft every seconds, and we finally reconstruct the trajectory keeping only the points occurring every seconds along the original trajectory. This is important in order to exclude the temporal dimension from actions. The cleaning task aims to detect incomplete trajectories starting or finishing away from any of the airports, as well as flights showing inconsistent behavior (e.g. covering a significant distance within an unreasonably small amount of time), due to imperfections in the raw data.The resulting set of 528 trajectories from Barcelona to Madrid has been randomly divided into a set of 478 trajectories and a test set of 50 trajectories. However, the clustering algorithm clusters all 528 trajectories. Doing so, we are able to measure the accuracy of the trajectory classification algorithm, in conjunction to the accuracy of the trajectory imitation process.
The clustering process resulted into clusters each with 250 and 278 trajectories, taking into account all the features in the enriched trajectories. Each cluster shows a different pattern of approaching the Madrid airport, as depicted in Figure 1. Then, considering only the trajectories (i.e. after excluding the 50 test trajectories), GAIL was trained for each of the two clusters, providing two policies corresponding to the distinct behavioral modes. Subsequently, we also provide results with a single policy, after training GAIL in , without considering clusters, and thus avoiding the future trajectory classification task.
During testing, in order to determine the policy to be used for prediction, we followed the classification approach described in Section 4.2, considering , with an enriched state corresponding to the destination airport, and estimated by considering the average duration of the MadridBarcelona flights in . The features vector comprises destination airport’s forecast
meteorological variables corresponding to the five METAR variables mentioned above. The classifier was trained using 5fold cross validation that has been repeated for all combinations of hyperparameters. This process resulted in a classification method that samples trajectories with replacement, with 20 trees of max depth equal to 20, with leaf nodes comprising at least one trajectory, and with at least two trajectories in a node before splitting it.
To implement the generative model and the discriminator in GAIL we have used two neural networks, each consisting of two dense layers of 100 nodes, each layer with activation. The input for corresponds to the four 3D position and temporal variables per state, and the six meteorological features provided by NOAA. takes as additional input the three action variables. has a dense output layer with size equal to the number of action variables (i.e. three), while the output layer of has one node.
outputs for each action variable the mean of a Gaussian distribution with logarithm of standard deviation equal to 0.9, resulting to a stochastic policy. To initialize the policy’s parameters, we use Behavioral Cloning minimizing the Mean Square Error between demonstrated actions and the policy actions, over the training set, using Adam optimization. This has been trained with 100 epochs and 10 fold cross validation.GAIL is trained for 1500 batches. At each round the policy generates a batch of 50000 stateaction samples. The number of episodes needed to acquire this number of samples is not constant. At each episode the method randomly selects a starting point regarding a trajectory in the training set and uses
to generate rollouts. Rollouts terminate either when a trajectory point lies within a 5km radius from the destination airport, or when the trajectory has 1000 points, or when it lies outside the bounding box defined by the geographic (lon,lat) coordinates , corresponding to the red dots in the corners of Figure 1. These 50000 samples are used for training the Discriminator . Specifically, we use Adam optimization and 100 epochs to maximize equation (4) w.r.t. the parameters .To evaluate the proposed approach we provide results regarding the prediction of BarcelonaMadrid trajectories, in the following experimental settings: (a) Using the prediction pipeline and two policies modelling the behavioral modes shown by the two clusters identified (denoted as ”MultPolicies” approach), (b) using one policy modelling the behavior of all flights (denoted as “OnePolicy” approach)^{4}^{4}4The OnePolicy approach uses a neural network whose input is extended to include the five METAR variables used in the classification stage of the MultPolicies approach, evaluating its ability to distinguish between the detected behavioral modes.. For each of the settings, and in order to show the prediction abilities of the proposed method, we evaluate the prediction accuracy in 5 cases, by considering as initial state, a state after () minutes from , where : Results show the capacity of the method either at the pretactical (i.e. for ) or at any state during the tactical stage of operations.
The results reported are generated from 20 independent experiments per setting, considering each of the 50 test trajectories, resulting in aggregating results from 1000 experiments per setting/case combination. Specifically, we report on the trajectory prediction accuracy using the following measures: (a) Root Mean Square Error (RMSE) in meters in each of the 3 dimensions, as well as in 3D, (b) AlongTrack Error (ATE), (c) CrossTrack Error (CTE), and (d) Vertical deviation (V). ATE and CTE are according to the methodology proposed in [13]. The along track error is measured parallel to the predicted trajectory, while the cross track error is measured perpendicular to the predicted course. All measures are computed for each predicted trajectory point after computing its corresponding point in the test trajectory using the DTW method. Trajectories are not segmented. Finally, we provide results on the estimated time of arrival (ETA) according to predictions, compared to the arrival time of test trajectories.
5.2 Results
Before delving into the results provided by the imitation learning method, we need to point out that the average accuracy of 100 independent experiments of the future trajectory classifier is 0.976, with a standard deviation of 0.0094. This proves the fact that the classification method is suitable, but more importantly, that the destination airport’s meteorological forecast variables are important. More fixed points and/or features can be added in future enhancements.
OnePolicy  MultPolicies  

Long  Lat  Alt  3D  Long  Lat  Alt  3D  
0  14350  8347  457  17279  10932  5577  333  12652 
0.2  13780  8311  550  16825  10252  5477  402  12048 
0.5  9726  8847  427  14066  7490  6679  324  10565 
0.7  5979  7059  246  9916  4430  6360  188  8033 
OnePolicy  MultPolicies  
ATE  CTE  V  ETA  ATE  CTE  V  ETA  
0  31.6  577.0  67.0  245.96  305.8  154.0  23.8  274.10 
0.2  99.4  808.5  121.3  288.65  454.5  391.8  57.6  268.70 
0.5  984.9  1657.1  115.6  398.34  826.0  875.1  79.5  325.84 
0.7  851.5  1540.7  25.8  460.56  1065.0  1133.0  5.1  369.03 
Table 1 shows the mean RMSE error of the predicted vs the actual (test) trajectory in meters for each of the three dimensions and in 3D; while Table 2 shows the mean ATE, mean CTE, as well as the mean V, in meters. It also reports the mean error of the expected arrival time (ETA) in seconds for each case. Both tables are split to the OnePolicy and the MultPolicies settings results, while the rows correspond to the different values of .
Figures in Table 3 show box plots for all the measures. The y axis specifies the error measured. Horizontal lines of each box plot represent the 25, the 50, the 75 and the 100 percentile. Diamonds indicate outliers and the numbers indicate the medians. The left column provides RMSE and the right the track errors. Again, the rows correspond to the different values of .
Regarding the results, it must be noted that the MultPolicies setting provides consistently better results compared to the One Policy setting, thus providing evidence on the efficacy of the proposed pipeline. This is due to the fact that the OnePolicy setting fails to model effectively the different behavioral modes, predicting trajectories that in the worst case follow a different pattern of behaviour than the actual one. Low deviations of predicted from the actual trajectories, compared to state of the art methods (Section 6) provide firm evidence of the imitation learning approach efficacy.
Table 1 shows that the proposed method is quite effective to predict the whole trajectory at the pretactical stage (), while the RMSE is reduced while increasing , i.e. while we select a starting point far from the origin airport, simulating the tactical stage. However, Table 2 shows that the mean along and cross track errors increase while increasing , which is most probably due to the complexities of the trajectories while approaching the destination airport (i.e. due to holding patterns, maneuvers, etc.). Thus, it seems that a more refined approach must be used to address the landing part of the trajectory more accurately.
6 Related Work
Reinforcement learning techniques inherently deal with trajectories, formed as policies in an actionstate space. Such methods have been used in predicting aircraft trajectories [27], as well as human and vehicle trajectories in urban spaces with traffic/crowd. The DART [9] reinforcement learning approach for aircraft trajectory prediction exploits historical trajectories enriched with Aircraft Intent information. The action set in this case includes commands executed by the aircraft Flight Management System. This approach needs a modelbased trajectory prediction method in the loop to predict the next aircraft position given a set of commands, incurring a significant computational cost in the whole process, while it requires discretization of stateaction parameters, and learning “constraints” on the valid combinations of commands. As far as we know, our approach is the first to apply deep imitation learning methods to predict trajectories in the aviation domain.
As pointed out in the introduction, recent datadriven efforts in the field of aircraft trajectory prediction have explored the application of statistical analysis and machine learning techniques. A comprehensive review of trajectory prediction methods in different domains can be found in [14]. As far as aircraft trajectory prediction is concerned, most approaches make specific assumptions concerning the types of aircraft considered (e.g. [22], the operational phase considered (e.g. climbing, being in terminal airspace, etc.) (e.g. [15], [34]), the lookahead time ( as in [15] and [8]), or consider specific constraints for making predictions [14]. State of the art approaches in the ATM domain that are closely related to our work are those in [5], [23] and [14].
Authors in [5]
introduce a novel stochastic approach, modeling trajectories in space and time by using a set of spatiotemporal 4D joint data cubes, enriching these with aircraft motion parameters and weather conditions. This approach computes the most likely sequence of states derived by a Hidden Markov Model (HMM), which has been trained over enriched with weather variables trajectories. The algorithm computes the maximal probability of the optimal state sequence, which is best aligned with the observation sequence of the aircraft trajectory. Given that the lateral resolution of each cube is 13km and temporal resolution is 1hr, authors conclude that the mean value for the crosstrack error (12.601km when the sign is omitted or 3.444km when signed) is within the boundaries of the spatial resolution. However, our proposed method provides a much lower error along and cross track, with a very low vertical error compared to the 687.497 ft reported there, without limiting the resolution of trajectories’ representation, while learning/predicting in continuous actionstate space.
Compared to [23]
, the method proposed here is much more effective in terms of predicted trajectory deviations from the actual trajectories in all dimensions, given also that, that approach requires flight plans, as well as a number of actual trajectory points prior to prediction. Authors propose a treebased matching algorithm to construct imagelike feature maps from highfidelity meteorological datasets. They then model the trajectory points as conditional Gaussian mixtures with parameters to be learned from the proposed deep generative model, which is an endtoend convolutional recurrent neural network that consists of a long shortterm memory (LSTM) encoder network and a mixture density LSTM decoder network.
Finally, the approach in [14] is a “constrained” approach, learning the deviations of trajectories from flight plans and reporting low deviations per waypoint. This is in contrast to the proposed approach, which does not exploit any information constraining the predicted trajectory, although it is generic enough to incorporate such constraints by means of forecast states. The effectiveness of incorporating such constraints in the prediction process is within our future plans.
7 Conclusions and Future Work
In this paper we specify the datadriven trajectory prediction problem as an imitation learning task. Towards solving this problem we present a comprehensive framework comprising the Generative Adversarial Imitation Learning state of the art method, in a pipeline with trajectory clustering and classification methods. Evaluation results show the effectiveness of the method to make accurate predictions for the whole trajectory (i.e. with a prediction horizon until reaching the destination) both at the pretactical (i.e. starting at the departure airport at a specific time instant) and at the tactical (i.e. from any state while flying) stages, compared to state of the art approaches.
Future Plans include (a) verifying the effectiveness of the method for different origindestination airports, (b) exploit flight plans to constrain the prediction pipeline, (c) trying to generalize beyond specific origindestination pairs.
Acknowledgements: This research is being supported by ENGAGE KTN Catalyst and PhD projects, and partially by Greek National Funds for datAcron for the year 2018. We would like to thank our colleagues C. Spatharis and K. Blekas who implemented the clustering algorithms, G.M. Santipantakis who enriched the raw trajectories, as well our partners Boeing Research and Technology Europe and CRIDA for providing datasets.
References
 [1]
 [2] Bada, base of aircraft data. https://simulations.eurocontrol.int/solutions/badaaircraftperformancemodel/
 [3] Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: ICML, p. 1 (2004)
 [4] de Amorim, R.C., Hennig, C.: Recovering the number of clusters in data sets with noise features using feature rescaling factors pp. 126––145 (2015)
 [5] Ayhan, S., Samet, H.: Aircraft trajectory prediction made easy with predictive analytics (2016)

[6]
Bishop C.: Pattern Recognition and Machine Learning.
SpringerVerlag, Berlin, Heidelberg (2006)  [7] Breiman, L.: Machine Learning. Kluwer Academic Publishers (2001)
 [8] Cheng, T., Cui, D., Cheng, .: Data mining for air traffic flow forecasting: a hybrid model of neural network and statistical analysis. Proc. of the 2003 IEEE Intl. Conf. on Intelligent Transportation Systems 1, 211–215 (2003)
 [9] D. Scarlatti, et al.: Deliverable D2.4: Evaluation and validation of algorithms for single trajectory prediction (2018). URL http://dartresearch.eu/2018/07/10/dartfinaldeliverables/
 [10] Finn, C., Christiano, P., Abbeel, P., Levine, S.: A connection between generative adversarial networks, inverse reinforcement learning, and energybased models. arXiv preprint arXiv:1611.03852 (2016)
 [11] Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: ICML, pp. 49–58 (2016)

[12]
Georgiou, H.V., Pelekis, N., Sideridis, S., Scarlatti, D., Theodoridis, Y.:
Semanticaware aircraft trajectory prediction using flight plans.
Intl Journal of Data Science and Analytics pp. 1–14 (2019)
 [13] Gong, C., McNally, D.: A Methodology for Automated Trajectory Prediction Analysis. DOI 10.2514/6.20044788. URL https://arc.aiaa.org/doi/abs/10.2514/6.20044788
 [14] H. Georgiou, e.a.: Moving objects analytics: Survey on future location & trajectory prediction methods (2018)
 [15] Hamed, M.G., Gianazza, D., Serrurier, M., Durand, N.: Statistical prediction of aircraft trajectory : regression methods vs pointmass model (2013)
 [16] Ho, J., Ermon, S.: Generative adversarial imitation learning. In: NIPS, pp. 4565–4573 (2016)
 [17] Hong, S., Lee, K.: Trajectory prediction for vectored area navigation arrivals. Journal of Aerospace Information Systems 12(7), 490–502 (2015)
 [18] Joe H. Ward Jr.: Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association 58(301), 236–244 (1963). DOI 10.1080/01621459.1963.10500845
 [19] Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization (2014). URL http://arxiv.org/abs/1412.6980. Cite arxiv:1412.6980, also in: 3rd Intl Conf. for Learning Representations
 [20] Kun, W., Wei, P.: A 4d trajectory prediction model based on radar data. In: 2008 27th Chinese Control Conference, pp. 591–594. IEEE (2008)
 [21] Le Fablec, Y., Alliot, J.: Using neural networks to predict aircraft trajectories. In: ICAI, pp. 524–529 (1999)
 [22] de Leege, A., van Paassen, M., Mulder, M.: A Machine Learning Approach to Trajectory Prediction. DOI 10.2514/6.20134782. URL https://arc.aiaa.org/doi/abs/10.2514/6.20134782
 [23] Liu, Y., Hansen, M.: Predicting aircraft trajectories: A deep generative convolutional recurrent neural networks approach (2018)
 [24] LópezLeonés, J., Vilaplana, M.A., Gallo, E., Navarro, F.A., Querejeta, C.: The aircraft intent description language: A key enabler for airground synchronization in trajectorybased operations. In: 2007 IEEE/AIAA 26th Digital Avionics Systems Conf., pp. 1–D. IEEE (2007)
 [25] Pazzani, M.J., Keogh, E.J.: Scaling up dynamic time warping for data mining applications pp. 285–289 (2000)
 [26] Pomerleau, D.A.: Efficient training of artificial neural networks for autonomous navigation. Neural computation 3(1), 88–97 (1991)
 [27] Boeing Research and Technology Europe (patent filed in the European Patent Office): Method and system for autonomously operating an aircraft (2017)
 [28] Ross, S., Bagnell, D.: Efficient reductions for imitation learning. In: AISTATS, pp. 661–668 (2010)
 [29] Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and structured prediction to noregret online learning. In: AISTATS, pp. 627–635 (2011)
 [30] Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: ICML, pp. 1889–1897 (2015)
 [31] Schulman, J., Moritz, P., Levine, S., Jordan, M., and Abbeel, a. P.: Highdimensional continuous control using generalized advantage estimation (2016)
 [32] Tastambekov, K., Puechmorel, S., Delahaye, D., Rabut, C.: Aircraft trajectory forecasting using local functional regression in sobolev space. Transportation Res. part C: Emerging Technologies 39, 1–22 (2014)
 [33] Torabi, F., Warnell, G., Stone, P.: Generative adversarial imitation from observation (2018)
 [34] Yang, Y., Zhang, J., Cai, K.: Terminalarea aircraft intent inference approach based on online trajectory clustering. In: TheScientificWorldJournal (2015)
 [35] Ziebart, B.D., Maas, A.L., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Aaai, vol. 8, pp. 1433–1438. Chicago, IL, USA (2008)