I Introduction
Accurate prediction of aviation delay for commercial flights is a key component to improve the safety, capacity, efficiency in air traffic management, and airline business [1, 2, 3, 4, 5]
. However, as a dynamic system with uncertainties, civil aviation faces multiple unexpected events frequently that can result in flight delays or cancellations. In the case of the flight delay propagation between initially delayed flight and flights downstream, the impact can grow exponentially if the air traffic control cannot adopt an appropriate resource reallocation strategy to optimize mobility in time. Hence, estimating delay as accurately as possible within a controllable time window is critical.
Traditional methods of the aviation delay prediction are relying on modeling and simulation techniques, but it is hard to select suitable modeling assumptions for the quality of the analysis [6]. With the largescale deployments and applications of Internet of Things (IoT) devices, fast collection of ubiquitous information including spatial and temporal data motivates the utility of big data analytics and statistical machine learning in many fields[7, 8, 9, 10, 11, 12, 13, 14]. A few studies on aviation delay prediction based on supervised machine learning have been conducted in recent years. For instance, Manna et al. [15]
employ Gradient Boosted Decision Tree to predict average arrival and departure delays of a day. However, the outliers in the dataset are removed directly before the training process so that the data completeness and model generalization is not guaranteed. Gopalakrishnan et al.
[16]compare the performance of Markov Jump Linear System, Classification and Regression Trees, and Artificial Neural Networks to predict delays in air traffic networks, and reveal a tradeoff between model simplicity and prediction accuracy. Although promising results are obtained by using statistical machine learning methods, the crucial temporal elements, in particular delay propagation effect, cannot be learned by such oneshot prediction models.
Inspired by the considerable success of Recurrent Neural Networks (RNNs) and its variants applied in sequential event prediction such as natural language processing
[17], few researchers implement RNNs to capture temporal correlation of factors which may potentially influence aviation for more accurate delay prediction [6, 18, 19, 20, 21, 22]. However, a lot of valuable attributes in spatial and temporal domains are not or rarely taken into account in these studies. Therefore, we present a spatiotemporal data mining framework based on stacked LSTM networks for aviation delay prediction to bridge this gap. By combining spatiotemporal features from available data sets containing flight path information, airspace characteristics and weather, our model can learn representations of spatiotemporal sequences with multiple levels of abstraction to predict subsequent aviation delay of an airport. To alleviate the overfitting problem, a regularization technique called Dropout [23] is applied in the middle of two adjacent stacked LSTM layers.Compared with previous work based on RNN, this paper makes three main contributions:

Unlike other systems that predict delay before departure [20] or make an implicit assumption that the journey of each aircraft does not vary significantly [19], we fully consider the indeterminacy of air transportation system to figure out the aviation delay prediction problem pointedly over a time horizon of one hour.

We present a complete workflow of data manipulation in this paper. Some work [20, 18] provide comparative analysis for delay prediction using several models including RNNs but they focus more on nontimeseries methods especially boosting and bagging methods, whereas the input format for RNNs is a far away from the input format of other algorithms. Moreover, though [6] describes a data processing pipeline in detail, it creates daytoday sequences rather than flighttoflight or minutetominute sequences like us, which does not make sense in realtime air traffic control.

Multisource data are integrated and extended to generate a spatiotemporal dataset with richer information in spatial and temporal domains.
The remainder of this paper is structured as follows. Section II formulates the aviation delay prediction problem and reviews the fundamental elements of the LSTM network. Section III introduces the data sets we use and the corresponding feature engineering. Then, our proposed LSTMbased architecture is introduced. The experimental setup and results are demonstrated in Section IV that show the accuracy of our solution, while Section V concludes this paper and contemplates some future work.
Ii Problem Statement and LSTM Network
Aviation delay is affected by multiple factors. Some of them are unpredictable such as military training activities, equipment failures at the airport, extreme weather. However, there are still valuable temporal contexts that can be retrieved for delay prediction. For example, the late arrival or departure of a previous flight will affect the ontime departure and arrival of succeeding flights. Such a pattern motivates us to model the aviation delay as a multivariate time series problem.
Iia Problem Formulation
We select an airport and flights flying to to formulate the arrival delay estimation at an airport.
Considering the prediction function , with the input for , where (i) denotes the number of time stamps, (ii) denotes the number of variables or feature dimension, and (iii) is the ground truth which is denoted as where , and
is the selfdefined time interval. Therefore, our multivariate sequential data are transformed into a supervised learning problem. To be specific, if we use a sequence starting at
, the information in a certain period ( and ) would be used to predict the delay at time stamp as shown in Fig. 1.IiB Lstm
Vanilla RNNs are elaborate for sequential prediction problem but it has difficulty in overcoming exploding and vanishing error flow to learn longterm dependence [24]. Therefore, a variant of RNNs named LSTM is well designed to address these problems and has made significant advancements in applications [25, 8]. In this paper, we feed our data to the LSTM model whose basic structure of a cell is shown in Fig. 2. The gate mechanism is a key component in the LSTM structure. Gates are a track to optionally let information (hidden state and cell state ) through. There are three gates in an LSTM cell, Forget Gate, Input Gate and Output Gate, and their functions are represented by the following equations:
(1) 
(2) 
(3) 
(4) 
(5) 
(6) 
where
is the sigmoid activation function which observes states at the previous step and outputs a number
between 0 and 1 to control the degree of remaining information flow. is the input gate that is combined with the candidate value, after tanh layer to update the state, then the old cell state representing longterm memory can be replaced by the new one as shown in Equation 4. is the final output calculated by the multiplication of and tanh() which could be the input for next LSTM cell.Iii Methodology
In this section, data sets used for aviation delay prediction are introduced, followed by feature engineering that constructs a proper set representing the underlying problem of the predictive task for improved model generalization. Besides, the design of our proposed LSTMbased architecture is explained.
Iiia Data Description
In this paper, we use data of the first day of each month for the period of July 2016 through February 2017 in Coordinated Universal Time (UTC). Note that, to guarantee the spatiotemporal continuity of data, only records whose Arrival Airport
is HartsfieldJackson Atlanta International Airport (ATL) are selected. The outline descriptions of features extracted from each domain are illustrated in Fig.
3.IiiA1 Flight data
The flight records including delay information are provided by United States Department of Transportation. The data contains some most important information, for example, Departure/Arrival Airport, Scheduled Departure/Arrival Time, Actual Departure/Arrival Time and Airlines.
IiiA2 Trajectory data
In the United States, the aircraft’s trajectories are collected continuously by automatic dependent surveillancebroadcast (ADSB) [27], a surveillance technology used by Federal Aviation Administration (FAA) for air traffic control (ATC). The data from ADSB Exchange [28] contains all flights installed with ADSB equipment which covers most commercial aviation. The fields of the ADSB data includes Aircraft Identifications, position information (Longitude, Latitude, Altitude), flight status (such as Aircraft Speed and Track Angle) and aircraft attributes (such as Aircraft Model and Manufacturer’s Name).
IiiA3 ATC data
IiiA4 Weather data
The weather condition in the air route is demonstrated as a significant factor for the delay prediction task. Therefore, we gather Local Climatological Data (LCD) from the National Oceanic and Atmospheric Administration (NOAA). The flightrelated data involves Temperature, Precipitation, Humidity, Sky Conditions, Wind Speed, Wind Direction and so on.
IiiB Feature Engineering
As a preprocessing step, feature engineering refers to the process of using knowledge and statistical approaches to select and create representative features. We aim to construct a cleansed data set with relatively highdimensional feature space after feature engineering, so it enables the trained model to have better performance on unseen data.
IiiB1 Feature selection
Features with the percentage of missing values that exceed the threshold (80%) are removed. Then, we calculate the Pearson correlations of each pairwise variable and remove redundant features beyond the threshold (80%). Among the remaining features, the presumed useful ones are selected according to our domain knowledge and task needs.
IiiB2 Congestion index
The congestion indices of ground and airspace are calculated using flight data and trajectory data, respectively. The ground congestion indices are computed by counting departures and arrivals within 10minute bins. In a similar way, congestion indices of airport airspace (lowaltitude) can be obtained but we use ADSB data because it contains both commercial airplanes and private jets that makes the results more accurate. To be specific, the flight records whose aircrafttoATL distance is less than 200 KM and altitude ranges from 1200 to 10,000 feet above the mean sea level (MSL) are counted. The distance is obtained by Haversine formula given the longitude and latitude of ATL and an airplane:
(7) 
(8) 
(9) 
where is the greatcircle distance between two points on a sphere with longitude and latitude (, ) and is the radius of the Earth. Besides, we split the airspace begins at 18,000 feet above MSL into equalsized sectors and the number of aircraft in each sector per 10 minutes is defined as the congestion index of enroute airspace.
IiiB3 Encode discrete and categorical features
To ensure the consistency of the sequence length, we need to summarize the information on succeeding flights because the number of flights are uncertain at every point in time. However, it is inappropriate to take the average of these features because our data set contains some discrete and categorical variables. Considering the interpretation and diversity of these features, a hybrid encoding strategy is adopted in this paper. More narrowly, the discrete and categorical variables are split up into two types – highcardinality and lowcardinality based on the threshold of 50. Then, we encode the features using frequency encoding and onehot encoding, respectively. Onehot encoding enables the representation of the discrete feature to be mapped into Euclidean space, and a certain value of the discrete feature corresponds to a point in the Euclidean space so that it is more reasonable to calculate the distance between them. However, it would cause the curse of dimension if we encode highcardinality features using onehot encoding. Hence, we use frequency encoding, which counts and sorts the occurrence of values, to address this issue.
IiiB4 Data fusion
Before we combine various cleansed data sets, we have to convert the timestamps into a unique time zone, UTC. Then, we merge weather data with trajectory data and flight data separately to create two data sets named and , respectively. Recall Section II, here we set minutes which means if we assume the timestamp of the last instance in the temporal sequence is , the aviation delay situation at time is chosen as the ground truth. However, there may be no aircraft arrives at ATL at the time of . Thus we take the average delay of aircraft whose arrival times lie between and , which could be positive or negative, as the target of our prediction task. It is obvious that airplanes whose estimated arrival times fall within the interval between and are still flying. Therefore, spatiotemporal information of these airplanes such as how close they are to ATL airport, weather situations and their speeds are retrieved from and introduced to based on aircraft registration identification and time flag. Finally, we fuse ATC data and together when taking geolocation as the key value.
IiiC LSTM Based Architecture
The success of deep neural networks on a wide range of challenging prediction problems is commonly attributed to the hierarchy and depth [30, 31]. Inspired by this, Graves et al. [32] demonstrate that RNNs can also benefit from depth in space by stacking multiple recurrent hidden layers on top of each other. Therefore, we adopt stacked LSTM architecture for our aviation delay prediction. Fig. 5 shows the structure of our proposed framework. There is a sequence with length
, and each output of the second LSTM layer is jointed with a fully connected (FC) layer. We expect the model looks backward historical states and attempts to capture potential temporal characteristics since some important hidden representations may be lost in the last LSTM cell. Besides, FC layer with more neurons has better expressivity for a complex function. In the process of our experiments, we found that the loss converges too quickly and tend to 0 which is proven to be the overfitting problem, thus we add a Dropout regularization between two LSTM layers.
Iv Experiment
In this section, we present the construction of the time series. Then we evaluate the performance and effectiveness of our LSTMbased architecture compared with other commonlyused machine learning algorithms for aviation delay prediction.
Iva Data Preparation
The inputs for LSTM are 3dimensional sequential data. Fig. 6 shows the progress of constructing multivariate time series. It should be noted that our data only contains records of the first day of each month from July 2016 to February 2017 in UTC, so we have to split the original data into 8 days by time matching firstly and then slice them separately to create a series of time blocks with a length of N. Next, the training set and test set are generated by stacking these arrays in sequences vertically, which keeps the time continuity within a time block. The order of inputs will affect the training results since the former samples will get larger gradients in general, while the neighboring sequences in the time series have the similar distribution. Hence, we shuffle the order in which sequences are fed to the LSTM to guarantee the generalization of the trained model. Note that we do not shuffle the ordering of elements within individual sequences.
IvB Setup
Two types of data sets are used for training and evaluation. The first data set with 66 features contains only flight and weather information in the airport, while the other data set with 203 features contains not only the first data set has but also spatiotemporal information of succeeding flights. We let the sequence length
to be 30, 60, 90, and 120, respectively to generate data sets with different time steps. Mean Square Error (MSE) is chosen as the loss function in the training.
Our experiment is divided into three steps:
(1) Because most of the researches utilize ensemble methods to predict aviation delay, the first step is to compare the performance of our model and other powerful ensemble algorithms: Random Forest Regression (RF), a bagging method; and Gradient Boosting Regression Tree (GBRT), a boosting method. RF and GBRT are not designed for sequential prediction so we just feed all data into the model instead of slicing data according to
like LSTM. Besides, other commonlyused baseline models such as Linear Regression (LR), Support Vector Regression (SVR), Regression Tree (RT) and Multilayer Perceptron (MLP) are also used for comparison.
(2) The second step is to use two data sets separately to train our stacked LSTM model based on different sequence length, . Therefore, we can analyze the impact of sequence length and enroute spatiotemporal information for our model.
(3) We apply the model for different airports to verify the validity. The airports include large hub airports such as Los Angeles International Airport (LAX), O’Hare International Airport (ORD), a midsize airport, Orlando International Airport (MCO), and a small airport, Daytona Beach International Airport (DAB).
IvC Results
The MSE (
) is used as the evaluation metric in this paper. Table
I compares performance of different model settings based on data. represents the data set without information on subsequent flights, on the contrary, represents the entire data set. The best score is highlighted. In a quick glance, it seems clear that our LSTM model outperforms other models. The second best schema is the ensemble method, in which GBRT works best. These conclusions can also be drawn from Table II. LR perform poorly on both and that illustrates there is no apparent linear relation between features and target. Generally speaking, the performance of models with richer spatiotemporal features is better than models that take only temporal correlations into account. Besides, we observe that there is a trend of decline for MSE for test set as the length of the sliding window increases. We think the reason for this phenomenon is that the LSTM learns more hidden patterns of prior knowledge from longer sequences. Although longer sequences may introduce noise to the model during training, the gate mechanism prevents the abuse of longterm dependencies.Fig. 7 depicts the gap between the predicted value and the ground truth over time in a more intuitive way. It can be clearly seen that there is a very slight difference in amplitude, and their trends are consistent basically which also verifies that our proposed model possesses the ability to predict aviation delay accurately.
Table III shows the MSE values of test sets for different airports. The distributions of data for different airports vary dramatically no matter from spatiotemporal perspective (e.g. weather varies in different cities) or from flight’s perspective (e.g. an airline has different flight arrangements for different airports). Therefore, we use the data at various airports to regenerate time series with spatiotemporal features and retrained the LSTM120 model. It demonstrates the effectiveness of our proposed LSTM model and the corresponding feature engineering for large hub airports. However, this model does not make sense for mid and smallsize airports due to the relatively few flights, which are hard to generate representative sequences.
Method  Model  Test MSE 

Linear  LR  218.9779 
Nonlinear  RT  111.2693 
SVR  150.2401  
Ensemble  RF  84.3054 
GBRT  68.9447  
Neural Network  MLP  119.0170 
LSTM30  64.1153  
LSTM60  62.3336  
LSTM90  60.3336  
LSTM120  52.7232 
Method  Model  Test MSE 

Linear  LR  202.3007 
Nonlinear  RT  77.2012 
SVR  57.7634  
Ensemble  RF  52.0770 
GBRT  43.8420  
Neural Network  MLP  56.8772 
LSTM30  43.9821  
LSTM60  39.5061  
LSTM90  35.7900  
LSTM120  25.6320 
Airport  Test MSE 

ATL  25.6320 
LAX  27.3342 
ORD  46.7891 
MCO  73.5672 
DAB  110.6825 
V Conclusion and Future Work
In this work, we present a novel aviation delay prediction framework based on stacked LSTM for commercial flights. We provide a complete data processing pipeline and generate a data set with richer spatiotemporal features while regarding delay propagation and uncertainty in the flights. Our experiments verify that our framework can predict a commercial flight arrival delay in the US within an acceptable error.
There is some significant work needed to do in the future. Firstly, we should collect more data to improve the robustness of our proposed model. Moreover, due to the difference in the number and frequency of flights in airports, in particular small airports which only hold a small volume of data that is not enough to train a highperformance but datagreedy deep learning model, this is a strong motivation to implement transfer learning to improve model performance for such airports with the help of prior knowledge.
Acknowledgment
This research was supported by the Center for Advanced Transportation Mobility (CATM), USDOT Grant #69A3551747125.
References
 [1] H. Song, R. Srinivasan, T. Sookoor, and S. Jeschke, Smart cities: foundations, principles, and applications. John Wiley & Sons, 2017.
 [2] H. Song, G. Fink, and S. Jeschke, Security and Privacy in CyberPhysical Systems. Wiley Online Library.
 [3] Y. Sun, H. Song, A. J. Jara, and R. Bie, “Internet of things and big data analytics for smart and connected communities,” IEEE Access, vol. 4, pp. 766–773, 2016.
 [4] H. Song, D. B. Rawat, S. Jeschke, and C. Brecher, Cyberphysical systems: foundations, principles and applications. Morgan Kaufmann, 2016.
 [5] Y. Liu, X. Weng, J. Wan, X. Yue, H. Song, and A. V. Vasilakos, “Exploring data validity in transportation systems for smart cities,” IEEE Communications Magazine, vol. 55, no. 5, pp. 26–33, 2017.
 [6] Y. J. Kim, S. Choi, S. Briceno, and D. Mavris, “A deep learning approach to flight delay prediction,” in 2016 IEEE/AIAA 35th Digital Avionics Systems Conference (DASC). IEEE, 2016, pp. 1–6.
 [7] Z. Lv, H. Song, P. BasantaVal, A. Steed, and M. Jo, “Nextgeneration big data analytics: State of the art, challenges, and future research topics,” IEEE Transactions on Industrial Informatics, vol. 13, no. 4, pp. 1891–1899, 2017.
 [8] G. Dartmann, H. Song, and A. Schmeink, Big data analytics for cyberphysical systems: machine learning for the internet of things. Elsevier, 2019.
 [9] Y. Liang, Z. Cai, J. Yu, Q. Han, and Y. Li, “Deep learning based inference of private information using embedded sensors in smart devices,” IEEE Network, vol. 32, no. 4, pp. 8–14, 2018.
 [10] Z. Cai and Z. He, “Trading private range counting over big iot data,” in 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2019, pp. 144–153.
 [11] X. Zheng and Z. Cai, “Privacypreserved data sharing towards multiple parties in industrial iots,” IEEE Journal on Selected Areas in Communications, vol. 38, no. 5, pp. 968–979, 2020.
 [12] G. Han, S. Shen, H. Song, T. Yang, and W. Zhang, “A stratificationbased data collection scheme in underwater acoustic sensor networks,” IEEE Transactions on Vehicular Technology, vol. 67, no. 11, pp. 10 671–10 682, 2018.
 [13] J. Tan, W. Liu, M. Xie, H. Song, A. Liu, M. Zhao, and G. Zhang, “A low redundancy data collection scheme to maximize lifetime using matrix completion technique,” EURASIP Journal on Wireless Communications and Networking, vol. 2019, no. 1, pp. 1–29, 2019.
 [14] L. A. Tawalbeh, W. Bakheder, and H. Song, “A mobile cloud computing model using the cloudlet scheme for big data applications,” in 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2016, pp. 73–77.

[15]
S. Manna, S. Biswas, R. Kundu, S. Rakshit, P. Gupta, and S. Barman, “A
statistical approach to predict flight delay using gradient boosted decision
tree,” in
2017 International Conference on Computational Intelligence in Data Science (ICCIDS)
. IEEE, 2017, pp. 1–5.  [16] K. Gopalakrishnan and H. Balakrishnan, “A comparative analysis of models for predicting delays in air traffic networks.” ATM Seminar, 2017.
 [17] W. Yin, K. Kann, M. Yu, and H. Schütze, “Comparative study of cnn and rnn for natural language processing,” arXiv preprint arXiv:1702.01923, 2017.
 [18] G. Gui, F. Liu, J. Sun, J. Yang, Z. Zhou, and D. Zhao, “Flight delay prediction based on aviation big data and machine learning,” IEEE Transactions on Vehicular Technology, vol. 69, no. 1, pp. 140–150, 2019.

[19]
N. McCarthy, M. Karzand, and F. Lecue, “Amsterdam to dublin eventually
delayed? lstm and transfer learning for predicting delays of low cost
airlines,” in
Proceedings of the AAAI Conference on Artificial Intelligence
, vol. 33, 2019, pp. 9541–9546.  [20] S. Ayhan, P. Costas, and H. Samet, “Predicting estimated time of arrival for commercial flights,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 33–42.
 [21] F. Kong, J. Li, B. Jiang, T. Zhang, and H. Song, “Big data‐driven machine learning‐enabled traffic flow prediction,” Trans. Emerg. Telecommun. Technol., vol. 30, no. 9, Sep. 2019. [Online]. Available: https://doi.org/10.1002/ett.3482

[22]
F. Kong, J. Li, B. Jiang, and H. Song, “Shortterm traffic flow prediction in smart multimedia system for internet of vehicles based on deep belief network,”
Future Generation Computer Systems, vol. 93, pp. 460 – 472, 2019. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167739X18320326  [23] W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv preprint arXiv:1409.2329, 2014.
 [24] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 [25] Y. Liu, L. L. Njilla, J. Wang, and H. Song, “An lstm enabled dynamic stackelberg game theoretic method for resource allocation in the cloud,” in 2019 International Conference on Computing, Networking and Communications (ICNC), 2019, pp. 797–801.
 [26] AVwebflight safety. [Online]. Available: https://www.avweb.com/flightsafety/faaregs/airroutetrafficcontrol/
 [27] C. L. Scovel III and I. General, “FAA’s progress and challenges in advancing the next generation air transportation system,” Statement of the Honorable Calvin L Scovel III, Inspector General, US Department of Transportation before the Committee on Transportation and Infrastructure Subcommittee on Aviation United States House of Representatives, Washington DC, vol. 17, 2013.
 [28] ADSB ExchangeWorld’s largest coop of unfiltered filght data. [Online]. Available: https://www.adsbexchange.com/
 [29] 123ATC. [Online]. Available: https://123atc.com/
 [30] R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, “How to construct deep recurrent neural networks,” arXiv preprint arXiv:1312.6026, 2013.
 [31] M. Hermans and B. Schrauwen, “Training and analysing deep recurrent neural networks,” in Advances in neural information processing systems, 2013, pp. 190–198.
 [32] A. Graves, A.r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in 2013 IEEE international conference on acoustics, speech and signal processing. IEEE, 2013, pp. 6645–6649.
Comments
There are no comments yet.