Deep Learning based Urban Vehicle Trajectory Analytics

A `trajectory' refers to a trace generated by a moving object in geographical spaces, usually represented by of a series of chronologically ordered points, where each point consists of a geo-spatial coordinate set and a timestamp. Rapid advancements in location sensing and wireless communication technology enabled us to collect and store a massive amount of trajectory data. As a result, many researchers use trajectory data to analyze mobility of various moving objects. In this dissertation, we focus on the `urban vehicle trajectory,' which refers to trajectories of vehicles in urban traffic networks, and we focus on `urban vehicle trajectory analytics.' The urban vehicle trajectory analytics offers unprecedented opportunities to understand vehicle movement patterns in urban traffic networks including both user-centric travel experiences and system-wide spatiotemporal patterns. The spatiotemporal features of urban vehicle trajectory data are structurally correlated with each other, and consequently, many previous researchers used various methods to understand this structure. Especially, deep-learning models are getting attentions of many researchers due to its powerful function approximation and feature representation abilities. As a result, the objective of this dissertation is to develop deep-learning based models for urban vehicle trajectory analytics to better understand the mobility patterns of urban traffic networks. Particularly, this dissertation focuses on two research topics, which has high necessity, importance and applicability: Next Location Prediction, and Synthetic Trajectory Generation. In this study, we propose various novel models for urban vehicle trajectory analytics using deep learning.

READ FULL TEXT VIEW PDF

Authors

page 19

page 38

12/18/2018

Attention-based Recurrent Neural Network for Urban Vehicle Trajectory Prediction

As the number of various positioning sensors and location-based devices ...
05/14/2021

Urban Analytics: History, Trajectory, and Critique

Urban analytics combines spatial analysis, statistics, computer science,...
03/25/2020

A Survey on Trajectory Data Management, Analytics, and Learning

Recent advances in sensor and mobile devices have enabled an unprecedent...
03/07/2022

Trajectory Test-Train Overlap in Next-Location Prediction Datasets

Next-location prediction, consisting of forecasting a user's location gi...
03/16/2020

MPE: A Mobility Pattern Embedding Model for Predicting Next Locations

The wide spread use of positioning and photographing devices gives rise ...
07/11/2018

Moving Objects Analytics: Survey on Future Location & Trajectory Prediction Methods

The tremendous growth of positioning technologies and GPS enabled device...
07/30/2020

Revisiting the Modifiable Areal Unit Problem in Deep Traffic Prediction with Visual Analytics

Deep learning methods are being increasingly used for urban traffic pred...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1.1 Motivation and Objective

A "trajectory" refers to a trace generated by a moving object in geographical spaces, usually represented by a series of chronologically ordered points, where each point consists of a geo-spatial coordinate set and a timestamp [zheng2011computing]. Throughout the last few decades, many researchers utilized different types of trajectories to enhance the understandings of movement patterns of different moving objects. For example, in meteorology, many researchers tracked meteorological events such as hurricanes and typhoons for decades and analyzed them to prevent the loss from natural disasters [hubert1957hurricane, stohl1998computation]. Also, researchers in transportation engineering and urban planning are paying more attention to trajectory analytics and they analyzed patterns of pedestrians and vehicles to understand the mobility patterns of pedestrians and vehicles in cities. [boltes2013collecting, rudenko2020human].

Rapid advancements in location sensing and wireless communication technology enabled us to collect and store a massive amount of spatial trajectory data [lee2011trajectory]. Over the last decade, considerable progress have been made in collecting, preprocessing, and analyzing spatial trajectory data. In transportation and urban planning, dealing with spatial trajectories is getting more and more important. This is because many moving objects including pedestrians, vehicles, and drones will be equipped with position-aware devices. Also, there will be more and more Internet of Things (IoT) devices that are communicable with the position-aware devices which collect lots of spatial trajectory data. The collected spatial trajectory data is used to explore the patterns hidden behind the data and the insights from the spatial trajectory data is very useful for planning and management of the smart cities [belhadi2021deep, hu2019driving]. The spatial trajectories will be used to enhance the quality of life of people living in smart cities.

Of particular interest in transportation engineering, urban vehicle trajectory data are collected based on the location sensors installed inside vehicles or at the roadside. This high-resolution mobility data of individual users in urban transportation networks offer unprecedented opportunities to understand vehicle movement patterns in urban traffic networks. It provides rich information on both aggregated flows and disaggregated travel behaviors. The aggregated flows include the origin-destination (OD) matrix and cross-sectional link traffic volumes. The disaggregated travel behaviors include user-centric travel experiences, namely, speed profile, link-to-link route choice behavior and travel time experienced by individual vehicles, as well as system-wide spatiotemporal mobility patterns such as origin-destination pairs, routing pattern distributions, and network traffic states [kim2015spatial]. Discovering and understanding the network-wide mobility patterns from the urban vehicle trajectory data can support decision-making for both individual users and system operators.

Urban vehicle trajectory has both spatial and temporal features, which are structurally related to each other in the context of space and time. As a result, it is quite difficult to apply the classical data mining techniques to urban vehicle trajectory data [wang2020deep]. As a result, many researchers tried to come up with a systematic way to deal with the spatiotemporal features in urban vehicle trajectory data. Early studies used models based on the statistical models and machine learning models [burbey2012survey, ebrahimpour2019comparison, shi2019survey, luca2020deep, xie2020urban]. These models achieved good results in solving many problems related to urban vehicle trajectories. Nowadays, the researchers are getting more attention to the models based on Deep Learning

motivated by the outstanding successes obtained in computer vision, speech recognition, and natural language processing

[luca2020deep].

Many previous studies claim that Deep Learning has the potential to deal with complex problems in urban vehicle trajectory analytics [luca2020deep, wang2020deep]. There are many advantages that models based on deep learning have [goodfellow2016deep]. One of the key advantages is that the deep learning models can deal more efficiently with heterogeneous and big data source [chen2016learning, yue2020deep, guo2019ifusion]. The deep learning models have automatic feature representation ability which can extract relevant features from the data automatically. This automatic feature representation ability makes it easier for deep learning models to combine raw urban vehicle trajectory data with contextual information such as weather, traffic states, traffic accidents, and census. Also, deep learning models have powerful function approximation ability, so that the models can capture complex and non-linear spatial, temporal, and sequential relationships.

The urban vehicle trajectory data is collected from the GPS sensors installed in vehicles or from the road-side units (RSU) that detect vehicles passing near the RSUs. After data collection, it is necessary to preprocess the data to smoothen the effect of noises from the collection system and reduce the size of the data to properly use the urban vehicle trajectory data. Among many preprocessing techniques, the discretization of continuous urban vehicle trajectory data is gaining much attention due to its ability not only to reduce effects of local noises, but also to improve the interpretability of spatiotemporal features in urban vehicle trajectory data [garcia2012survey, kim2016graph]. Such preprocessing methods by discretization include tessellation matching (using zones as a discrete set), map matching (using road links as a discrete set), and POI (Point-of-Interest) matching (using representative points as a discrete set). Particularly, map matching is considered very challenging since it cannot be modeled based on simple proximity measures unlike tessellation matching and POI matching.

After preprocessing, the processed urban vehicle trajectory data is used in various applications. The urban vehicle trajectory analytics is gaining increasing attention from both academia and industry because of its potential to improve the performance of many applications in multiple domains. There are two widely-studied research topics: Next location prediction and synthetic trajectory generation. Many researchers study next location prediction due to its applicability to Location-based Services (LBS). LBS uses location data of service users and provide user-specific information depending on the locations of users. Typical examples of LBS are social event recommendation, location-based advertising, and location-based incident warning system. One major advantage of next location prediction is that it provides LBSs with extended resources by giving predictive location of the users. LBSs can improve system reliability by giving more user-specific information considering their future locations [karimi2003predictive]. Also, synthetic trajectory generation is catching researchers’ attentions to solve data sparsity problem and data privacy issues. Although the sources and availability of urban trajectory data are increasing, most of the currently available trajectory datasets cover only a portion of all vehicles in the network. From network management and operations perspectives, there is a desire to infer vehicle trajectories that represent the whole population to have a more complete view of traffic dynamics and network performance. Moreover, urban vehicle trajectory data may contain personal information of individual drivers, which poses serious privacy concerns in relation to the disclosure of private information to the public or a third party [chow2011privacy]. The ability to generate synthetic trajectory data that can realistically reproduce the population mobility patterns is, therefore, highly desirable and expected to be increasingly beneficial to various applications in urban mobility.

Motivated by the current trends of researches, the overall objective of this dissertation is to apply state-of-the-art deep learning solutions to resolve the issues in urban vehicle trajectory analytics. Specifically, this dissertation focuses on two research topics in urban vehicle trajectory analytics which are considered important and challenging.

  • Next location prediction can be defined as forecasting the next location of an individual vehicle based on the historical data. Next location prediction has gained much attentions of many researchers due to its applicability to many fields such as travel recommendation, location-based services, and location-aware advertisements.

  • Synthetic trajectory generation can be defined as generating synthetic trajectories with realistic spatiotemporal mobility patterns based on the historical trajectory dataset. Although sources and availability of urban trajectory data are increasing, most of the currently available trajectory datasets cover only a portion of all vehicles in the networks. As a result, some kind of augmentation method is needed to have full population of the vehicles. Also, urban vehicle trajectories contain personal information of individual drivers such as the location of their home and work places.

1.2 Structure of Dissertation

The structure of this dissertation is organized as follows. Chapter 2 introduces general framework of urban vehicle trajectory analytics, and discuss each step in general framework by reviewing the previous researches related to each step. In Chapter 3, two main research topics (next location prediction, synthetic trajectory generation) are introduced, which are considered important and challenging based on the review in Chapter 2. Chapter 3 covers current challenges in each research topics and address research approaches to resolve the issues found in the challenges.

There are three main chapter; two for next location prediction, and one for synthetic trajectory generation. Chapter 4 and Chapter 5 studies deep learning application in next location prediction. In Chapter 4, urban vehicle trajectories are summarized by clustering-based Voronoi tessellation and represented as sequence of cells (spatial tessellations), and a novel model based on recurrent neural networks (RNN) is proposed to predict the next location (next cell) of individual vehicles. In addition, Chapter 5 introduces attention-based recurrent neural networks (ARNN) which incorporates network-wide traffic states into next location prediction. Chapter 6 presents TrajGAIL for synthetic trajectory generation. TrajGAIL uses partially observable Markov decision process (POMDP) and Generative Adversarial Imitation Learning (GAIL) to generate urban vehicle trajectories with realistic mobility patterns.

Chapter 7 provides conclusion of dissertation and summary of each main research topics with contributions and limitations of the current study, and future research directions in urban vehicle trajectory analytics. A graphical representation of the structure of this dissertation is presented in Figure 1.1

Figure 1.1: A graphical representation of the structure of this dissertation

2.1 Urban Vehicle Trajectory Analytics

In this section, the preliminaries on the urban vehicle trajectory data mining is explained. Over the last few decades, many researchers had dealt with urban vehicle trajectory data, and many literatures have a common general structure when dealing with urban vehicle trajectory data. Figure 2.1 shows the general framework of urban vehicle trajectory data analytics. The general framework of urban vehicle data analytics have three steps: Trajectory collection, trajectory preprocessing, and trajectory analytics.

Figure 2.1: General framework of urban vehicle trajectory data analysis

2.1.1 Urban Vehicle Trajectory Data Collection

The most common way to achieve urban vehicle trajectories is to collect the coordinates of the subject vehicle by GPS sensors installed in the vehicle. Global Navigation Satellite Systems (GNSS) are an essential source of human mobility data [luca2020deep]. The US Global Positioning System, or GPS, is one of the most well-known and widely-used GNSS. GPS receivers are ubiquitous in many tools of everyday life, such as mobile phones [alessandretti2018evidence], vehicles [gallotti2016stochastic, pellungrini2017data], and vessels [praczyk2019ship]. Especially on vehicles, the GPS device automatically turns on when the vehicle starts. Also, nowadays, the mobile phone connected to the vehicle can work as a GPS device when the phone is connected to the navigation applications. A typical GPS trace is a set of tuples , where is a user, is the timestamp of the measurement and , are the longitude and latitude of the current position. The precision of GPS receivers varies from a few centimeters to meters, depending on the quality of the GPS receiver and the errors generated by the system [carlson2010mapping]. GPS is complex, and the errors arise from a wide variety of sources with different dependencies and characteristics. The raw GPS data is embedded with an inseparable error from the location positioning system. As a result, it is required to preprocess raw GPS data to mitigate these errors and extract meaningful semantics.

Another way of achieving urban vehicle trajectory data is by using roadside units (RSUs) installed in the cities. The RSU-based urban vehicle trajectories are collected from the infrastructures installed alongside the roads. The vehicles are equipped with communication devices, usually Bluetooth receivers and Dedicated Short Range Communication (DSRC). When the vehicles pass near the roadside infrastructure, the roadside unit collects the timestamp, as a passage time of a vehicle, and the unique identifier assigned to the passing vehicle. The urban vehicle trajectories are obtained by tracing this unique identifier of the communication device. Each trajectory represents a sequence of the locations of roadside infrastructures that a vehicle passed along its journey. The quality of the RSU-based urban vehicle trajectories is highly dependent on the roadside units that detect the vehicles [michau2017bluetooth]. It is ideal for installing the roadside units at every intersection to collect a complete set of vehicle trajectories. However, due to the cost problem, many cities selectively install roadside units at the main intersections. For example, Brisbane has Bluetooth scanners scattered along with the road networks. The coverage of this scanner is dense in the Central Business District (CBD) area, but scarce in the suburb area. In the areas where the scanner is not installed, it is not possible to detect the vehicles and analyze the movement patterns. Also, there are some problems in the areas where the roadside units are densely installed. Sometimes, these roadside units have overlapping detection areas, which makes errors in the sequence. For example, in some cases, a vehicle can be detected in the downstream detector first and then detected by the upstream detector. Furthermore, the scanners installed in the roadside units can have missed detection. According to [michau2017bluetooth], about 20% of the detections are missing in the case of Bluetooth scanner in Brisbane city.

2.1.2 Urban Vehicle Trajectory Preprocessing

Both GPS-based and RSU-based urban vehicle trajectories contain meaningful spatiotemporal patterns in urban transportation networks, which can be used in a variety of applications. However, there are a number of problems to be resolved. If the urban vehicle trajectories are collected in a high sampling ratio, a massive amount of data would lead to enormous overhead in data storage, communications, and processing. Also, as we discussed in Section 2.1.1, urban vehicle trajectory data usually contains inevitable noise from the sensors and collection systems. Sometimes, the noise of the raw data reduces the effectiveness of systems that use such trajectories. As a result, many previous researchers use many different preprocessing techniques to filter out the noisy points and reduce the size of the data.

One of the most common preprocessing techniques for urban vehicle trajectories is denoising by trajectory filtering. Since the vehicle trajectories are not perfectly accurate due to sensor noise and other factors, it is required to use various filtering techniques to the trajectory to smooth the noise and potentially decrease the error in the measurements. The simplest form of filtering the noise from trajectories is using mean and median filters. For a measured point,

, the estimate of the actual location

is the mean (or median) of and its

predecessors in time. Although mean and median filters are both simple and powerful techniques to deal with the noise, one of the most significant disadvantage is that they have lags. If the actual location changes suddenly, the estimate from the mean and median filter cannot react as suddenly as the actual value. The estimate will only respond gradually. Kalman filters and particle filters use a measurement model and dynamic models to improve the accuracy of the estimates. Although it is not a simple task to formulate both measurement models and dynamic models, Kalman filters and particle filters overcame the lag problem in mean and median filters.

Another way of preprocessing raw vehicle trajectories is to convert the continuous values of the data points in raw vehicle trajectories to a finite set of discrete values. Discretization of continuous features is one of the common techniques in data mining. Through discretization, it is possible to smoothen the local noises, reduce the size of the raw data [pyle1999data]. Furthermore, a proper discretization would increase the interpretability of the features from the data [garcia2012survey]. The discretization process transforms quantitative data into qualitative data. In other words, it transforms continuous or numerical attributes into discrete or nominal attributes with a finite number of intervals, obtaining a non-overlapping partition of a continuous domain.

There are several types of discretization one can use when dealing with the urban vehicle trajectory data. The first approach is to partition the transportation networks into zones, sometimes called cells, and use the sequence of zones (or sequence of cells) instead of the raw vehicle trajectories. Matching continuous coordinates to the predefined tessellations (zones) is called "Tessellation Matching." The second approach is to use the links in the road networks as the discrete value and use the sequence of links instead of the raw vehicle trajectories. Matching continuous coordinates to the road link is called "Map Matching." Map matching has widely been studied by many different researchers due to its importance in urban vehicle trajectory analytics. The last approach is to define Point-of-Interest (POI) first and use the POIs as the representative locations for continuous coordinates. The process of matching coordinates to POIs is called "POI matching," and usually, researchers select the nearest POI for each coordinate. There is certainly a trade-off when using a sequence of aggregated discrete zones or cells because it loses microscopic features such as the speed profile of the subject vehicle within a link. However, when dealing with a massive amount of vehicle trajectory data in large-scale urban traffic networks, it is desirable to use zones and links because it is easier to analyze the spatiotemporal patterns of the urban traffic networks.

In fact, deciding a right discretization method relies on the scale of the analytics that a researcher would like to conduct. Using tessellation matching for preprocessing urban vehicle trajectory data can be beneficial for large-scale network analytics, while using map matching can be beneficial for analyzing movement patterns in smaller networks. POI matching can be used for both scales, because the same process applies when deciding which POIs to use (distantly located POIs for large-scale networks, and closely located POIs for small networks). Some examples of analytics using tesellation matching for large-scale networks include [krumm2006predestination, krumm2007predestination, calabrese2010human, endo2017predicting, choi2018network, choi2019attention], and examples of analytics using map matching for small networks include [horvitz2012some, ziebart2008maximum, ziebart2008navigate, choi2019real, chen2021trajvae]

2.1.3 Urban Vehicle Trajectory Analytics

Applying data analytics on urban vehicle trajectory data makes it possible to discover complex patterns in urban vehicle mobility and obtain deeper insights into travel behaviors and traffic dynamics. Also, it allows road operators and transit agencies to identify opportunities to improve their systems. The high-resolution mobility data of individual users in urban road networks offer unprecedented opportunities to understand vehicle movement patterns in urban traffic networks. It provides rich information on both aggregated flows and disaggregated travel behaviors. The aggregated flows include the origin-destination (OD) matrix and cross-sectional link traffic volumes. The disaggregated travel behaviors include user-centric travel experiences, namely, speed profile, link-to-link route choice behavior and travel time experienced by individual vehicles, as well as system-wide spatiotemporal mobility patterns, such as origin42 destination pairs, routing pattern distributions, and network traffic states [kim2015spatial]. In recent years, the urban vehicle trajectory analytics is gaining increasing attention from both academia and industry because of its capability to analyze the mobility pattern of vehicles in cities on different scales. The urban vehicle trajectory analytics gives detailed information on vehicle mobility, which consists of both patterns from the individual vehicles and patterns as aggregated traffic flow. In contrast to conventional traffic data analytics, which focuses on data collected in fixed locations, urban vehicle trajectory analytics include both analytics on an individual vehicle and aggregated traffic flow.

There are several examples of applications of urban vehicle trajectory analytics. One example is Location-based Service (LBS). LBS uses location data of service users and provide user-specific information depending on the locations of service users. Typical examples of LBS are social event recommendation, location-based advertising, and location-based incident warning system. The location prediction can be applied to predictively give information; for example, if a user’s next location is expected to be disastrous or congested, the service informs the user to change route. Furthermore, when it is not possible to continue to give service because the position of the user is lost due to sensor malfunctioning, predicting the locations of the user can temporally replace the role of positioning system and continue the service [monreale2009wherenext, morzy2007mining]. Another example is the application on agent-based traffic simulators. Unlike traditional traffic simulators which consider traffic demand as input, an agent-based traffic simulator requires information on individual vehicle journey such as origin, destination, and travel routes [martinez_agent-based_2015]. The result of urban vehicle trajectory analytics can be used for real-time application of these agent-based traffic simulators. The urban vehicle trajectory analytics can also be applied to inter-regional traffic demand forecasting. As the market of ride-sharing is continuously growing and Shared Autonomous Vehicles (SAV) are expected to be on our roads in the near future, there is a strong need to be able to predict inter-regional traffic demand so as to dispatch the proper number of SAV to areas of high demand. Location prediction model can be used to identify the demand hotspots by learning the mobility pattern of the users.

There are several research topics in urban vehicle trajectory analytics. Next Location Prediction and Synthetic Trajectory Generation are two most widely-studied research topics. Next location prediction aims to develop model to predict the future location of a vehicle based on the historical trajectory data. Predicting future locations based on previously visited locations has been widely studied in terms of predicting location where the user will visit next [noulas2012mining, gambs_next_2012, mathew2012predicting], the location where the user ends the trip [krumm2006predestination, krumm2007predestination, horvitz2012some, xue2015solving, ziebart2008navigate, marmasse2002user], and the location where the user will visit in the next time interval [hawelka2015collective, alhasoun2017city, lu2013approaching, calabrese2010human, zhao2017mobility].

The first and the second types understand individual trips as a sequence of locations, similar to the one explained previously. The latter one predicts the location that the user will visit in the next time interval, which is usually set as an hour. This may be widely applicable since this adapts temporal characteristics of mobility; however, such a task requires frequent updates of the user’s actual location. Also, one of the major problems is that most of the trips end in less than 30 minutes to 1 hour in the urban area as the majority of the trips are for commuting or visiting a commercial area. Therefore, it is hard to distinguish if the users are still traveling or staying. These methods are modeled to solve more macroscopic trips than city-scaled problems. For example, in [zhao2017mobility]

, the authors presented N-gram model to predict the trip time, entry and exit station. They used the Oyster entry and exit records data collected from the London Underground, Overground and National Rail.

There are several previous studies that used machine-learning models to predict the future location or the destination of a trip. One of them, [gambs_next_2012]

, used Mobility Markov Chain to predict the next location of an individual. The research was based on the observations of individual’s mobility so that the model must be specified by each individual. Also, in

[mathew2012predicting]

, Hidden Markov Model is used to predict pedestrian movement by using GeoLife dataset. Hidden Markov Model computes latent state at each sequence, which maximizes the likelihood of input sequence. Usually, the number of latent states or the number of clusters is given. The Hidden Markov Model calibrates the transition matrix among the latent states and emission probability to decode latent states to observable sequences. Some previous studies also tried to use Artificial Neural Networks (ANN) in trajectory prediction. Recent work by

[de2015artificial]

includes a study on the prediction of taxi destination using Multilayer Perceptron (MLP). They represented the destination as a linearly weighted combination of predefined destination clusters. The result showed that the overall distance error is considered negligible; however, it is pointed out that it is challenging to predict unpopular destinations.

With the recent development of Deep Neural Network, including RNN models, and computation powers, there have been some researches in the transportation field to predict the microscopic vehicle location for autonomous vehicles [kim2017probabilistic] and also predict mobility sequences [endo2017predicting, liu2016predicting]. The research in [kim2017probabilistic]

used RNN with Long Short Term Memory (LSTM) to predict the vehicle movement in front of a subject vehicle.

[endo2017predicting] used RNN to predict destination. In [endo2017predicting], trajectory sequence is represented as a sequence of locations in a discretized grid space, which is an arbitrary network partitioning. [choi2019real]

used a feed-forward neural network to predict the next intersection in a grid-structured road network. A set of intersections in Brisbane, Australia, are treated as POI’s to capture the link-to-link route choice behavior.

[jin2019augmented] used an augmented-intention recurrent neural network model to predict locations of vehicle trajectories of individual users. [jin2019augmented] incorporated additional information on individual users’ historical records of frequently visited locations into a next location prediction model. The past visited locations in historical records are represented as an edge-weighted graph, and a graph convolution network is used to incorporate this information into trajectory prediction. In [choi2018network], an urban road network is partitioned into zones based on the clustering of trajectory data points. The prediction model based on recurrent neural network (RNN) is proposed to predict the zone that the subject vehicle would visit. [choi2019attention] extended the idea of predicting the next zone and used network traffic state information to improve the RNN model’s performance.

Synthetic trajectory generation aims to develop a model to generate synthetic (fake) trajectories with realistic spatiotemporal mobility patterns. Synthetic data generation has gained increasing importance as the data generation process plays a significant role in various research fields in an era of data-driven world [popic2019data]. It is mainly used to serve two purposes. The first purpose is to deal with the lack of real data. In many research fields, data collection is costly, and, therefore, it is often difficult to collect enough data to train and validate models properly. In this case, it is useful to generate synthetic data similar to the real observations to increase training and test samples. The second purpose is to address the issue with the privacy and confidentiality of real data. Many types of data contain personal information, such as gender, name, and credit card usage. Synthetic data can be combined with or replace such privacy-sensitive data with a reasonable level of similarity, thereby protecting privacy while serving the intended analysis.

Urban vehicle trajectory analytics has both challenges: data sparsity and data privacy issues. Although the sources and availability of urban trajectory data are increasing, most of the currently available trajectory datasets cover only a portion of all vehicles in the network. From network management and operations perspectives, there is a desire to infer vehicle trajectories representing the whole population to have a complete view of traffic dynamics and network performance. Moreover, urban vehicle trajectory data may contain personal information of individual drivers, which poses serious privacy concerns in relation to the disclosure of private information to the public or a third party [chow2011privacy]. Therefore, the ability to generate synthetic trajectory data that can realistically reproduce the population mobility patterns is highly desirable and expected to be increasingly beneficial to various applications in urban mobility.

While synthetic trajectory generation is a relatively new topic in transportation research communities, several existing research areas have addressed similar problems. One example is trajectory reconstruction. When two points in a road network are given as an initial point (treated as sub-origin) and a target point (treated as sub-destination), the models reconstruct the most plausible route between the two points. The trajectory reconstruction can be considered as generating trajectories between sub-origins and sub-destinations. Previous studies such as [chen2011discovering] and [hu2018graph] investigated discovering the most popular routes between two locations. [chen2011discovering] first constructs a directed graph to simplify the distribution of trajectory points and used the Markov chain to calculate the transfer probability to each node in the directed graph. The transfer probability is used as an indicator to reflect how popular the node is as a destination. The route popularity is calculated from the transfer probability of each node. [hu2018graph] also used a graph-based approach to constructing popular routes. The check-in records which contain the route’s attributes are analyzed to divide the whole space into zones. Then, the historical probability is used to find the most plausible zone sequences. Also, [feng2015vehicle] and [rao2018origin] estimated origin-destination patterns by using trajectory reconstruction. Both studies used particle filtering to reconstruct the vehicle trajectory between two points in automatic vehicle identification data. The reconstructed vehicle trajectory is then used to estimate the real OD matrix of the road network.

In fact, the existing models developed for the next location prediction problem can be applied for synthetic trajectory data generation. By sequentially applying the next location predictions, a synthetic vehicle trajectory can be generated. However, most of the existing models for next location prediction adopt a discriminative modeling approach, where the next locations are treated as labels, and the model is trained to predict only one or two next locations. The discriminative models have limitations in generating full trajectories, especially when sample trajectory data are sparse. It is only the decision boundaries between the labels that the models are trained to predict, not the underlying distributions of data that allow proper generalization for sampling realistic trajectories. As a result, it is necessary to develop a model based on the generative modeling approach to perform synthetic trajectory data generation successfully.

Recently, there have been remarkable breakthroughs in generative models based on deep learning. In particular, [goodfellow2014generative]

introduced a new generative model called Generative Adversarial Networks (GAN), which addressed inherent difficulties of deep generative models associated with intractable probabilistic computations in training. GANs use an adversarial

discriminator to distinguish whether a sample is from real data or from synthetic data generated by the generator. The competition between the generator and the discriminator is formulated as a minimax game. As a result, when the model is converged, the optimal generator would produce synthetic sample data similar to the original data. The generative adversarial learning framework is used in many research fields such as image generation [radford2015unsupervised], audio generation [oord2016wavenet], and molecular graph generation [de2018molgan].

GANs have also been applied in transportation engineering. [zhang2019novel] proposed trip travel time estimation framework called T-InfoGAN based on generative adversarial networks. They used a dynamic clustering algorithm with Wasserstein distance to make clusters of link pairs with similar travel time distribution. They applied Information Maximizing GAN (InfoGAN) to travel time estimation. [xu2020ge] proposed Graph-Embedding GAN (GE-GAN) for road traffic state estimation. Graph embedding is applied to select the most relevant links for estimating a target link, and GAN is used to generate the road traffic state data of the target link. In [li2020coupled], GAN is used as a synthetic data generator for GPS data and travel mode label data. To solve the sample size problem and the label imbalance problem of a real dataset, the authors used GAN to generate fake GPS data samples of each travel mode label to obtain a large balanced training dataset. The generative adversarial learning framework is also used for synthetic trajectory generation. [liu2018trajgans] proposed a framework called trajGANs. Although this study does not include specific model implementations, it discusses the potential of generative adversarial learning in synthetic trajectory generation. Inspired by [liu2018trajgans], [rao2020lstm] proposed LSTM-TrajGAN with specific model implementations. The generator of LSTM-TrajGAN is similar to RNN models adopted in the next location prediction studies.

Previous studies which does not use deep neural networks can be categorized as "probabilistic" models or "pattern-matching" models. The probabilistic models, or Markov-based models, use Markov assumptions to model the probability distribution of the next location (or a trajectory). Common examples are

[gambs2010show, gambs_next_2012, calabrese2010human]. The pattern-matching models use tree structures to find the similar patterns from the historical dataset. The examples are [monreale2009wherenext, wang2013mining, xia2018decision]. Although these models have some degree of interpretability and can achieve good performances with a small amount of data, one of the major disadvantages of this approach is that they require a considerable effort in feature engineering and have limited memory, making it hard for them to capture long-range temporal dependencies [sabarish2015survey]

. On the other hand, recent approaches using deep learning models can overcome the disadvantages of the non-deep-learning models by using great function approximation and pattern recognition ability of deep neural networks.

Reference Data Preprocessing Model Analytics
[noulas2012mining] Foursquare (GPS-based) - Mobility feature-based model Next location prediction
[gambs_next_2012] Phonetic, GeoLife (GPS-based) POI matching Mobility Markov Chain Next location prediction
[mathew2012predicting] GeoLife (GPS-based) - Hidden Markov Model Next location prediction
[krumm2006predestination] Microsoft Multiperson Location Survey Square grid tessellation Predestination Destination prediction
[horvitz2012some] Seattle GPS data Map matching Opportunistic routing Destination prediction
[xue2015solving] T-drive - SubSyn Destination prediction
[ziebart2008maximum] Yellow Cab Taxi data Map matching MaxEnt Next location prediction
[marmasse2002user] GPS-based -

Bayes Classifier

Histogram Modeling
Hidden Markov Model
Next location prediction
[alhasoun2017city] CDR (RSU-based) POI matching

Dynamic Bayesian Networks

Next location prediction
[lu2013approaching] GPS-based - Entropy-based model Next location prediction
[calabrese2010human] AirSage (GPS-based) Square grid tessellation
Individual and collective
behavior modeling
Next location prediction
[zhao2017mobility] GPS-based - Bayesian n-gram Next location prediction
[kim2017probabilistic] GPS-based Grid tessellation Recurrent Neural Networks Next location prediction
[endo2017predicting]
Taxi service trajectory (GPS-based)
Geolife (GPS-based)
Square grid tessellation Recurrent Neural networks Destination prediction
[liu2016predicting]
Gowalla (GPS-based)
GTD (GPS-based)
POI matching Factorizing Personalized Markov Chain Next location prediction
[jin2019augmented]
Wifi sensor (RSU-based)
Foursquare (GPS-based)
- Augmented Intent Neural Network Next location prediction
[choi2018network]
Brisbane Bluetooth Data
(RSU-based)
Clustering-based Voronoi tessellation Recurrent Neural Networks Next location prediction
[choi2019attention]
Brisbane Bluetooth Data
(RSU-based)
Clustering-based Voronoi tessellation Attention-based Recurrent Neural Networks Next location prediction
[choi2019real]
Brisbane Bluetooth Data
(RSU-based)
POI matching

Multi-layer Perceptron

Next location prediction
[zhang2019novel] Didi Chuxing (GPS-based) Map matching T-infoGAN Travel time estimation
[rao2020lstm] New York data (GPS-based) POI matching LSTM-TrajGAN Synthetic trajectory generation
[chen2021trajvae] GAOTONG (GPS-based) Map matching TrajVAE Synthetic trajectory generation
Table 2.1: Examples of researches on urban vehicle trajectory analytics

3.1 Research Problems and Research Approaches

3.1.1 Next Location Prediction

Next location prediction can be defined as forecasting the next location of an individual vehicle based on the historical data. Next location prediction has gained much attention from many researchers due to its applicability to many fields such as travel recommendation, location-based services, and location-aware advertisements.

There are mainly two challenges in next location prediction as follws:

  • Design of a dense representation of temporal and spatial characteristics of urban vehicle trajectory
    The mobility pattern in human mobility, including urban vehicle trajectories, is characterized by a high degree of regularity, which is mainly encoded in the temporal order of the visitation patterns [song2010limits]. It is required to design a dense representation of temporal and spatial patterns embedded in the urban vehicle trajectory to predict the next location pattern. A proper representation of these spatiotemproal patterns makes it easier to understand the structurally related features in urban vehicle trajectory data.

  • Need for combining heterogeneous data sources to model multiple factors influencing next location prediction
    Although the temporal order of the visitation patterns is mainly used for the next location prediction in many previous researches [gambs_next_2012, ziebart2008navigate, choi2018network], human drivers consider other factors in deciding where to go next and which route to choose. These external factors include traffic states, trip purposes, weather conditions, and social contacts [luca2020deep]. As a result, it is required to combine heterogeneous data sources with the next location prediction.

To address these issues in next location prediction, this dissertation proposes three research approaches as follows:

  • Spatial feature extraction via clustering-based Voronoi tessellation Given massive amounts of vehicle trajectories, there will be an infinite number of possible data points used to describe all those trajectories as longitude and latitude coordinates are continuous in space. Also, RSU-based urban vehicle trajectories are sensitive to noise from the collection system, so that it is required to pre-process the raw data to use it for the next location prediction. As such, we partition an urban traffic network into smaller regions or cells and express each urban vehicle trajectory in terms of a sequence of cells that it has passed. In partitioning the network into cells, we use the method based on [kim2016graph]. In this method, data points in all the trajectories are combined and clustered in space based on a desired radius, denoted by , so that for each spatial cluster the distance between the centroid of the point cluster and its farthest member point is approximately

    . The centroid of each point cluster is estimated by finding the mean of the data points within the cluster. Once the centroids of all point clusters are obtained, a Voronoi tessellation method is used to construct cell boundaries (Voronoi polygons) using the centroid points as seeds. Through clustering the data points in the urban vehicle trajectory dataset, partitioning the network and representing urban vehicle trajectories as cell sequences can be understood as a way of spatial feature extraction from urban vehicle trajectory data.

  • Recurrent Neural Networks to model spatiotemporal relationship This study employs a deep learning method using Recurrent Neural Network (RNN) among various methods for sequence prediction. RNN [hochreiter1997long, cho2014learning, chung2014empirical] is a deep neural network system designed to use sequential information. Unlike other traditional deep neural network models, which assume independence among all inputs (and outputs), RNN can capture temporal dependencies in sequential data. Thus, it is suitable for performing tasks that require memories of previous inputs. As a result, RNN can be used to model spatiotemporal relationships by using cell sequences that have aggregated spatial information of urban vehicle trajectory.

  • Attention mechanism to incorporate heterogeneous data source Nowadays, drivers can easily access the network traffic state data via navigation apps in smartphones. The network traffic state is one of the most important factors when planning their journey and deciding the route. As a result, it is desirable to incorporate network-wide traffic state information into the next location prediction. Network-wide traffic state data is a heterogeneous data source compared to urban vehicle trajectories represented as cell sequences. As a result, it is required to design a systematic way to link the heterogeneous input to the next location prediction structurally. The attention mechanism [bahdanau2014neural] can be used to resolve this issue. The attention mechanism allows the next location prediction model to concentrate on a certain part of the network traffic state input and use the information for the next location prediction.

Chapter 4 and Chapter 5 present the specific details on three research approaches. Chapter 4 presents Recurrent Neural Network (RNN) model for urban vehicle trajectory prediction with spatial feature extraction via clustering-based Voronoi tessellation, and analyze the performance of the RNN model in both sequence level and aggregated region level. Chapter 5 presents attention-based Recurrent Neural Network (ARNN) model which incorporates network-wide traffic state information into RNN model developed in Chapter 5.

3.1.2 Synthetic Trajectory Generation

Synthetic trajectory generation can be defined as generating synthetic trajectories with realistic spatiotemporal mobility patterns based on the historical trajectory dataset. Although the sources and availability of urban trajectory data are increasing, most of the currently available trajectory datasets cover only a portion of all vehicles in the networks. As a result, some kind of augmentation method is needed to have full population of the vehicles. Also, urban vehicle trajectories contain personal information of individual drivers, such as the location of their homes and work places.

There are mainly two challenges in synthetic trajectory generation as follows:

  • Capturing the temporal and spatial patterns of vehicle trajectory
    Generating urban vehicle trajectories requires to understand the underlying distribution of the urban vehicle trajectories by capturing the temporal and spatial patterns in the dataset. As discussed in next location prediction, modeling urban vehicle trajectory requires a dense representation of spatiotemporal patterns in the input dataset. Therefore, it is important to find a suitable learning structure to reflect spatiotemporal characteristics in the synthetic trajectory generation.

  • Capturing both patterns as an individual and patterns as a group
    The objective of synthetic trajectory generation is to generation urban vehicle trajectories that are similar to the real vehicle travel paths observed in a road traffic network. The "similarity" between the real vehicle trajectories and the synthetic vehicle trajectories can be defined from two different perspectives. First, the trajectory-level similarity measures the similarity of an individual trajectory to a set of reference trajectories. For instance, the probability of accurately predicting the next locations—single or multiple consecutive locations as well as the alignment of the locations—are examples of trajectory-level similarity measures. Second, the dataset-level similarity measures the statistical or distributional similarity over a trajectory dataset. This type of measure aims to capture how closely the generated trajectory dataset matches the statistical characteristics such as origin-destinations (OD) and route distributions in the real vehicle trajectory dataset.

To address these issues in synthetic trajectory generation, this dissertation proposes three research approaches as follows:

  • Generative Adversarial Imitation Learning to learn various patterns from the given dataset
    We apply imitation learning

    to develop a generative model for urban vehicle trajectory data. Imitation learning is a sub-domain of reinforcement learning for learning sequential decision-making behaviors or "policies." Unlike reinforcement learning that uses "rewards" as signals for positive and negative behavior, imitation learning directly learns from sample data, so-called "expert demonstrations," by imitating and generalizing the expert’ decision-making strategy observed in the demonstrations. Nowadays, the development of many generative models made it possible to capture the complex distribution of a dataset. Especially, deep generative models such as generative adversarial networks (GAN)

    [goodfellow2014generative] show outstanding performance in reproducing images [radford2015unsupervised]. Generative Adversarial Imitation Learning (GAIL) [ho2016generative] is a variant of GAN applied in an imitation learning problem. Let us consider an urban vehicle trajectory as a sequence of decisions for choosing road links along a travel path. GAIL can be applied to develop a generator that can reproduce synthetic data by imitating the decision-making process (i.e., driver’ route choice behavior) demonstrated in the observed trajectory dataset.

  • Assuming partially observable MDP to understand spatiotemporal patterns from previous locations
    GAIL, proposed by [ho2016generative], uses a combination of IRL’s idea that learns the experts’ underlying reward function and the idea of the generative adversarial framework. GAIL effectively addresses a major drawback of IRL, which is high computational cost. However, the standard GAIL has limitations when applied to the vehicle trajectory generation problem because it is based on the IRL concept that only considers a vehicle’s current position as states in modeling its next locations [ziebart2008maximum, ziebart2008navigate, zhang2019unveiling], which is not realistic as a vehicle’s location choice depends on not only the current position but also the previous positions. To overcome these limitations, this study proposes a new approach that combines a partially-observable Markov decision process (POMDP) within the GAIL framework. POMDP can map the sequence of location observations into a latent state, thereby allowing more generalization of the state definition and incorporating the information of previously visited locations in modeling the vehicle’s next locations.

  • Performance evaluation to assess both trajectory-level similarity and distributional similarity of datasets.

    This study proposes a multi-level performance evaluation that includes both trajectory-level and dataset-level performance metrics to assess the model’s performance. In the trajectory-level evaluation, we measure how similar each generated vehicle trajectory is to a real trajectory. Two widely used evaluation metrics in sequence modeling are used to evaluate this trajectory-level similarity:

    BLEU score [papineni_bleu:_2002] and METEOR score [banerjee_meteor:_2005]. The statistical similarity between a generated trajectory dataset and a real trajectory dataset is assessed in the dataset-level evaluation. Many aspects of a dataset can be considered for statistical similarities, such as the distributions of trajectory length, origin, destination, origin-destination pair, and route. Among these variables, route distribution is the most difficult to match since producing a similar route distribution requires matching all other variables, including the lengths, origins, and destinations of vehicle trajectories in a real dataset. As such, we use a measure of route distribution similarity to evaluate dataset-level model performance.

Chapter 6 presents the specific details on both research approaches. Chapter 6 presents TrajGAIL; a generative adversarial imitation learning for generating urban vehicle trajectories. In TrajGAIL, the generation procedure of urban vehicle trajectories is formulated as an imitation learning problem based on Partially Observable Markov Decision Process (POMDP), which can effectively deal with sequential data, and this imitation learning problem is solved using GAIL, which enables trajectory generation that can scale to large road network environments.

4.1 Introduction

Large-scale mobility data that record detailed movements or trajectories of people and vehicles have become increasingly available in recent years. A trajectory in this study refers to a sequence of locations and the passage times describing the path that a vehicle follows along its journey. Applying data analytics on large-scale trajectory datasets, it is possible to discover complex patterns in human mobility and obtain deeper insights into travel behaviors and traffic dynamics, allowing road operators and transit agencies to identify opportunities to improve their systems. It is also possible to build predictive models for predicting movement patterns of travelers. In this study, we address the problem of predicting individual trajectories in an urban network using a data-driven approach based on deep learning.

Network-wide trajectory prediction aims to predict movements of individual vehicles across the network by predicting where each vehicle may be going next at a given time based on where it is now and how it got there. Viewing each trajectory as a sequence of locations, where locations can be defined at various spatial resolutions such as link-level (e.g., links and intersections) and region-level (e.g., areas and geographic subdivisions), the trajectory prediction can be considered as a sequence prediction problem, in which we wish to predict the next location in a sequence given the previous locations visited.

Among various methods for sequence prediction, this study employs a deep learning method using Recurrent Neural Network (RNN). RNN [hochreiter1997long, cho2014learning, chung2014empirical] is a deep neural network system designed to make use of sequential information. Unlike other traditional deep neural network models, which assume independence among all inputs (and outputs), RNN can capture temporal dependencies in sequential data and thus is suitable for performing tasks that require memories of previous events. As such, RNN showed great performance in learning patterns in sequential data, particularly in the areas of language modeling such as auto-texting, text recommendation, speech recognition, and auto-translation system.

Trajectory prediction has similarities with the problem of predicting words and sentences in language modeling. In sentence prediction, a large number of sentence sets are collected and words are extracted. A language model is trained to learn word sequence patterns from sentence data and predicts a certain word that would come next given a sequence of words. In the context of movement data, we can view a set of locations in a network as a set of words in a dictionary and a trajectory (location sequence) as a sentence (word sequence), thereby linking the problem of predicting the next location in a trajectory to that of predicting the next word in a sentence. Motivated by this idea, this study applies an RNN method that has been successfully applied in sequence prediction to solve our problem of trajectory prediction by adapting it to urban movement data.

4.2 Related Researches

Predicting future trip based on previously visited locations has been widely studied in terms of predicting location where the user will visit next [noulas2012mining, gambs_next_2012, mathew2012predicting], the location where the user ends the trip [krumm2006predestination, krumm2007predestination, horvitz2012some, xue2015solving, ziebart2008navigate, marmasse2002user], and the location where the user will visit in the next time interval [hawelka2015collective, alhasoun2017city, lu2013approaching, calabrese2010human, zhao2017mobility]. The first and the second ones understand individual trips as a sequence of locations, similar to the one explained previously. The latter one predicts the location that user will visit in the next time interval, which is usually set as an hour. This may be widely applicable since this adapts temporal characteristic of mobility, however, such task requires frequent updates of user’s actual location. Also, one of the major problems is that most of the trips end in less than 30 minutes to 1 hour in the urban area as the majority of the trips are for commuting or visiting a commercial area. Therefore, it is hard to distinguish if the users are still traveling or staying. In fact, these methods are modeled to solve more macroscopic trips than city-scaled problems. For example, in [zhao2017mobility], the authors presented N-gram model to predict the trip time, entry and exit station. They used the Oyster entry and exit records data collected from the London Underground, Overground and National Rail.

There are several previous studies that used machine-learning models to predict the future location or the destination of a trip. One of them, [gambs_next_2012], used Mobility Markov Chain to predict the next location of an individual. The research was based on the observations of individual’s mobility so that the model must be specified by each individual. In this research, however, we aim to build trajectory prediction model for more general purposes in both microscopic and macroscopic perspectives, therefore more generalized model with aggregated data is used. Also, in [mathew2012predicting], Hidden Markov Model is used to predict pedestrian movement by using GeoLife dataset. Hidden Markov Model computes latent state at each sequence, which maximizes the likelihood of the existence of input sequence. Usually, the number of latent states or the number of clusters is given and Hidden Markov Model calibrates the transition matrix among the latent states and emission probability to decode latent states to observable sequences.

Some previous studies also tried to use Artificial Neural Networks in trajectory prediction. Recent work by [de2015artificial] includes a study on the prediction of taxi destination by using Multilayer Perceptron (MLP). They represented the destination as a linearly weighted combination of predefined destination clusters. The result showed that the overall distance error is considerably negligible, however, it is pointed out that it is difficult to predict unpopular destinations.

With the recent development of Deep Neural Network, including RNN models, and computation powers, there have been some researches in transportation field to predict the microscopic vehicle location for autonomous vehicles [kim2017probabilistic] and also predict mobility sequences [endo2017predicting, liu2016predicting]. The research in [kim2017probabilistic] used RNN with Long Short Term Memory (LSTM) to predict the vehicle movement in front of a subject vehicle. Endo et al. in [endo2017predicting] used RNN to predict destination. In this research, trajectory sequence is represented as a sequence of locations in a discretized grid space which is an arbitrary network partitioning. Therefore, in this study, we used network partitioning method based on the vehicle trajectory data.

4.3 Methodology

4.3.1 Representing Urban Vehicle Trajectories as Cell Sequence Data

Figure 4.1: Representing urban vehicle trajectory as cell sequence

Let represent an urban vehicle trajectory consisting of data points, where data point represents the longitude and latitude coordinates of the vehicle’s position. Typical trajectory datasets include the timestamp information for each data point, but we will only consider the spatial path of each trajectory in this study as our goal is to predict the next location given the previous path regardless of the time-of-day and travel time along the journey. Incorporating such temporal information in movement prediction will be explored in future research. Given massive amounts of vehicle trajectories, there will be an infinite number of possible data points that are used to describe all those trajectories as longitude and latitude coordinates are continuous in space. To apply the concept of sentence prediction in language modeling to trajectory prediction, however, it is necessary to define a finite set of locations with which all the trajectories can be expressed in a similar way to defining word vocabulary in sentence prediction. As such, we partition an urban network into smaller regions or cells and express each trajectory in terms of a sequence of cells that it has passed. In partitioning the network into cells, we use the method based on [kim2016graph]. In this method, data points in all the trajectories are combined and clustered in space based on a desired radius, denoted by , so that for each spatial cluster the distance between the centroid of the point cluster and its farthest member point is approximately . The centroid of each point cluster is estimated by finding the mean of the data points within the cluster. Once the centroids of all point clusters are obtained, a Voronoi tessellation method is used to construct cell boundaries (Voronoi polygons) using the centroid points as seeds.

Given cells in the network, trajectory can be mapped onto the underlying cells and expressed as in terms of cell sequence , where is the index of the visited cell within trajectory . Figure 4.1 illustrates the process of converting a raw trajectory to cell sequences. The length of cell sequence, , can be smaller than the original trajectory length, l, if two or more consecutive trajectory data points belong to the same cell (i.e., ). Also, a fewer number of cells would be needed to represent the same trajectory as the cell radius, , increases. In addition to the cell sequence, two special tokens and are added to the front and the back of the cell sequence to indicate the start and the end of the trip, respectively, where value 0 is used for and is used for . As a result, the original trajectory Tr is converted to the following cell sequence form:

(4.1)

4.3.2 Predicting Urban Vehicle Trajectory using Recurrent Neural Networks

Figure 4.2: Structure of RNN model

Using the information from the previous locations is one of the key ideas to design the model structure. The conventional methods which do not consider or partially consider the previous locations fail to successfully predict the next location. As a result, it is required to design a proper structure to incorporate previous locations in next location prediction Figure 4.2 shows the overall structure of a GRU-based RNN model. The model consists of a sequence of connected GRU layers, where each layer represents a function unit called GRU that receives input () and a hidden state or “memory” from the previous GRU layer () and produces output () which is the prediction of the correct label (). At the beginning and the end of a cell sequence, and is added to indicate the start and the end of the trip respectively. As a result, the number of GRU layers correspond to the length of the cell sequence of interest plus one, where each layer performs the operation of predicting one cell based on its previous cells. For instance, if we consider a trajectory that visits cells, i.e., , the RNN model is set to have GRU layers, where input sequence () and the associated label sequence () for training the model are coded in the following format:

(4.2)

Inside each GRU, there are two functions called reset gate (r) and update gate (z), where the reset gate decides how to combine new input with memory from previous computations and the update gate decides how much memory to keep from previous computations [chung2014empirical, noulas2012mining]. The mathematical expressions for the operation taking place within the GRU of the layer are presented as follows:

(4.3)

In Eq. (4.3),

represents element-wise multiplication between two vectors and σ represents sigmoid function to limit function output between 0 and 1.

represents a one-hot vector of size N+1 (N cells + Start_code) indicating the cell visited at the ith position of the cell sentence (or indicating the start of the trip in case of Start_code) and represents a real-valued vector of size N+1 (N cells + End_code), where each value in represents the probability of visiting each cell (or the probability of terminating the trip in case of End_code) at the (i+1)th position of the sequence. denotes the memory from previous computations or the hidden state computed in GRU. U and W are matrices of parameters used inside GRU units, and V is a vector of parameters used to translate internal hidden state to vector of probability of each cell being visited in the next step , where the superscripts on U and W indicate the functions these parameters are used in Eq. (4.3) basically shows a series of computations to obtain Y from X, entailing computing the probability of each cell being visited in the next step based on the hidden state at the current step (), where this hidden state () is in turn computed based on the current cell () and the hidden state transferred from the previous step or the memory of the previous visited cells ().

Next, we define the loss function that calculates the loss or error in a model prediction, which we will aim to minimize during training. The cross-entropy loss function is used in this study as shown in Eq. (4). By treating the full trajectory (hence its cell sequence Y) as one training case, the cross-entropy loss (L) is calculated by summing the errors at each step (cell) in the sequence over the entire trajectory as follows:

(4.4)

where is the cell sequence length; is the number of cells; is the binary label set to 1 if cell n is visited at the (i+1)th step in the sequence or 0 otherwise; and denotes the probability of cell n being visited at the (i+1)th step in the sequence.

To train the RNN model, a large number of cell sequence data are prepared in the format shown in Eq. (4.2) and fed into the RNN. Since the lengths of input sequences are different, the number of GRU layers in the RNN model needs to be dynamically changed. The RNN model scans one input sequence at a time and adjusts the number of GRU layers so that the number of layers becomes the length of the given input sequence () plus one. As a result, every time when a new input sequence with length m is fed into the model, the model constructs RNN with GRU layers.

The model then calculates cross-entropy loss () between the correct label () and predicted label probability based on the current parameter (

). The model uses the stochastic gradient descent (SGD) method (see, e.g.,

[bengio2013advances] and [pascanu2013difficulty]) to update the model parameters in the direction of decreasing the loss. This process is repeated until the parameters converge. Once the model is trained and the parameters are fixed, the model can be used to predict the next cell for any given sequence of previous cells.

4.4 Model Performance Evaluation

4.4.1 Data

To evaluate the performance of the RNN-based trajectory prediction model, a case study was designed using vehicle trajectories collected from the Bluetooth sensors in Brisbane, Australia, provided by Queensland Department of Transport and Main Roads (TMR) and Brisbane City Council (BCC). The Bluetooth sensors installed in state-controlled roads and city intersections detect Bluetooth devices (e.g., in-vehicle navigation systems and mobile devices) passing the sensors and record their passage times. By tracking the identifier of each Bluetooth device, the trajectories of individual vehicles (or Bluetooth devices) can be constructed, where each trajectory represents a time-ordered sequence of Bluetooth sensor locations that a given vehicle passed. Vehicle trajectories containing a resting period (time without moving) over 1 hour are considered to have multiple trips and they are separated into multiple trajectories. For this case study, we used the data from one day on 1 March 2016 for training and the data from one day on 8 March 2016 for testing. Each dataset contains approximately 350,000 trajectories per day.

Using the method described in Figure 4.1, the Brisbane network is partitioned into cells using three different cell sizes 300m, 500m, and 1000m in terms of the desired cell radius (). The network is represented as 5712 cells, 2204 cells, and 319 cells under , , and , respectively. For each cell network, trajectories are mapped onto the underlying cells and represented as the associated cell sequences.

Figure 4.3: Cell boundaries with different desired radius ()

4.4.2 Basecase Model

To better understand the performance of the proposed RNN-based model, a simple statistical model was designed as a base-case model for a comparison. In predicting the next cell in a cell sequence, the base case model, which we will call Transition matrix method (TRN), relies on the transition matrix that describes the probability of going to a particular cell from the current cell. The transition probabilities are estimated based on the historical data, where the transition probability from cell to cell , denoted by , is determined by computing the fraction of the outgoing flows from cell to cell given the total outgoing flows from cell i as follows:

(4.5)

where is the inter-cell flow from cell i to cell j (i.e., vehicle flows passing cell and cell consecutively) and N is the number of cells in the network. In predicting where a vehicle will go next given a sequence of cells the vehicle has passed so far, TRN determines the next cell only based on the last cell of the input sequence and does not use any information about the vehicle’s travel history. The key difference between RNN and TRN is thus that the former uses the memory of the previous cells visited and incorporates sequential characteristics in predicting the next cell, while the latter is memoryless and the next cell only depends on its immediate predecessor.

4.5 Results

4.5.1 Cross-entropy Loss and Validation

Figure 4.4: The result of cross-entropy loss (a) with training dataset (b) with testing dataset
Figure 4.5: Computation time of (a) Stochastic Gradient Descent (b) computation to predict next location

Figure 4.4 shows the cross-entropy loss calculated during model training, plotted for different cell sizes. At each iteration, the parameters of the RNN model are updated based on the training dataset and the average loss () is calculated for both training and testing datasets under the given parameters as shown in Figure 4.4 (a) and (b), respectively. The RNN parameters converge after 5 iterations in all cases and the loss decreases for both training and testing datasets, indicating that the model is trained properly and not overfitting to the training data. Comparing plots across different cell sizes, the case with cell radius of 300m produces the smallest loss. This may suggest that, the smaller the cell size, the better the RNN model recognizes sequential patterns and, hence, the easier it is for the model to predict the next cell. However, as shown in Figure 4.5, the computation time exponentially increases as the cell size decreases. As such, there is a trade-off between model accuracy and computation cost and this should be incorporated when determining the cell size. The computing capacity used for this chapter is as follows: Intel Core i7-7700 CPU @ 3.60 GHz with RAM of 64GB and NVIDIA GeForce GTX 1080 Ti.

4.5.2 Cell Visit Count and Inter-cell Flow

One of the important applications of trajectory prediction is to anticipate how many vehicle will use a particular region in the network by predicting individual vehicle’s moving paths. For instance, when a major disruption (e.g., construction, special event) is anticipated for a certain region, road operators will want to identify who will travel to or pass through that region during the day to identify the potential impact of the disruption and provide the relevant travelers with more personalized and targeted information. To assess model performance in this regard, this section aims to evaluate the sequence prediction accuracy in term of cell-level (area-level) aggregated measures, namely cell visit count and inter-cell flow. Cell visit count measures the number of unique travelers who visit a particular cell during a given day and inter-cell flow (or cross-boundary flow) measures the total daily volume of vehicle traffic from one cell to another between neighboring cells.

We first generate 700,000 synthetic trajectories (in terms of cell sequences) using RNN and TRN models for three different cell sizes (R=300m, 500m and 1000m). For both RNN and TRN models, we only give origin cells, which are randomly sampled from the historical data (training set), and let the models determine the remaining sequences. Based on the 700,000 cell sequences generated by each model for each cell size, we compute the cell visit counts and inter-cell flows for all cells. As a ground truth, we also obtain the cell visit count and inter-cell flow measures from real data using the training set. To compare the measures across these three cases (RNN, TRN, and real data), the measures are normalized with the number of trajectories. To exclude abnormal trajectories, we impose the maximum trajectory length of 50 km, which is roughly equivalent to the longest straight-line distance covering the Brisbane network. To reflect this condition, the maximum number of cells in a sequence is set to 80 for R=300m, 50 for R=500m, and 25 for R=1000m in generating cell sequences. The historical data also contains unrealistically long cell sequences. These are presumed to be from taxi vehicles and ride-sharing vehicles that usually keep moving for a long period. Therefore, the limit is also applied to the historical data to discard long sequences.

A real-world trajectory often visits a certain cell more than once during its journey. It is important for a trajectory prediction model to mimic this behavior as realistically as possible because the number of cells in a sequence (sequence length) and the number of unique cells have different implications: the former gives information on traffic volume in a given cell while the latter gives information on the number of actual travelers who visit the cell. To measure the extent to which each cell sequence contains duplicate cells, we define cell re-visit ratio (D) as follows:

(4.6)

where is the length of a cell sequence in terms of the number of cells and is the number of unique cells excluding duplicate cells. The measure is a relative measure and the measure of RNN and TRN are compared with the measure obtained from the real data. On average, for R=300m, 500m and 1000m, real data show =18.6%, 20.2%, and 17.3% with =26.83, 17.05, and 9.29 and =21.85, 13.60, and 7.68, respectively. For trajectories generated by RNN, on average, D=25.7%, 24.7%, and 19.9% for =300m, 500m and 1000m, respectively (with =18.38, 14.26, and 8.33 and =13.66, 10.74, and 6.67). Trajectories generated by TRN produce the average values of D=62.8%, 56.7%, and 45.6% for R=300m, 500m and 1000m, respectively (with =21.61, 15.80, and 8.89 and =8.03, 6.84, and 4.84). In terms of average sequence length (), model-generated trajectories are on average shorter than real-world trajectories, showing that both models tend to end trips earlier than real-world trips. Between RNN and TRN, the average lengths from TRN are closer to the real-world case than those from RNN. However, TRN-generated trajectories contain a large number of duplicate cells as can be seen from very high cell re-visit ratios (), resulting in unrealistic movement tendency that a trajectory repeatedly goes back and forth between few cells instead of progressing toward its destination. On the other hand, RNN produces the cell re-visit ratios that are similar to the real data, suggesting that the ability to incorporate “memory” of the previously visited cells in RNN can prevent such an unrealistic “memoryless” behavior from happening.

Figure 4.6: The result in aggregated region level. (a) Unique cell visit count per trajectory (b) Inter-cell flow per trajectory

The Figure 4.6 shows the estimated cell visit count and inter-cell flow for TRN (blue dots) and RNN (red dots) with respect to the corresponding real-world measures from the historical data, where all measures are normalized with the number of trajectories. Each data point on the plots in Figure 4.6(a) represents the cell visit count for each cell and each data point in Figure 4.6

(b) represents the inter-cell flow for each neighboring cell-pair. Linear regression lines (dashed lines) are also shown, where the closer the regression line is to the 45-degree line (y=x), the closer the model prediction is to the reality. In all cases, the regression lines from RNN are closer to the 45-degree line than those from TRN, suggesting that RNN predicts cell-level aggregated measures more accurately than TRN.

For cell visit count, we create color maps to visualize the spatial distribution of cell measures across the Brisbane network. Figure 4.7 presents nine color maps created using the data points in Figure 4.6(a), where the results can be compared across different trajectory datasets (by column) and different cell sizes (by row). From the maps, it is clear that RNN (center) reflects the flow magnitudes and spatial patterns in the real data (left) much closely than TRN (right) does.

Figure 4.7: Spatial distribution of unique cell visit count per trajectory

4.5.3 Sequence Prediction

Figure 4.8: The result in individual sequence level (aggregated)
Figure 4.9: The result in individual sequence level (sensitivity analysis)

In this section, we focus on a more direct performance measure of cell sequence prediction, namely correct prediction probability (CPP_k), which represents the probability of correctly predicting the next k cell(s) given previously visited cells. The procedure for obtaining CPP_k measures is as follows:

For a prediction scenario, we consider three scenarios of predicting next one cell (k=1), next two cells (k=2), and next three cells (k=3), respectively. With three k values, two models (RNN and TRN) and three cell sequence databases (one for each cell size R), a total of 18 cases are generated, each of which produces the distribution of over all combinations of actual sequence length (m) and given sequence length (i).

Figure 4.8 (a)-(c) present the distributions for

, respectively, in terms of the complementary cumulative distribution function (

). The of , denoted by F␣̅_(CPP_k ) (x), is defined as the probability that CPP_k is greater than x, i.e., , where is the cumulative distribution function (CDF). For instance, in Figure 4.8 (a), the value of for 1000m RNN (blue curve) is approximately 0.375, meaning that 37.5% of the CPP_1 values produced via Figure 6 are greater than 0.75, i.e., RNN predicted the next cell correctly 35% of the time with the probability higher than 0.75. Given this interpretation, we can see that, the closer the curve is toward the top right corner (1,1) and the farther the curve is away from the bottom left corner (0,0), the better the prediction performance is. The area under curve (AUC) of , thus, provides a good summary metric quantifying the prediction performance described by ) with a single number. The value of AUC of varies from 0 to 1, where 1 occurs when the curve passes (1,1) and 0 occurs when the curve passes (0,0).

As shown in Figure 4.8, in all cases, RNN performs much better than TRN as the curves for RNN are always above those for TRN, where the curves of RNN predicting one or two consecutive cell sequences are convex toward (1,1) and those of TRN are concave toward (0,0). In case of predicting three consecutive cell sequences, although both curves for RNN and TRN are concave towards (0,0), RNN is still able to predict cell sequence where correct prediction probability of TRN at this point reaches zero at all points. For the case of predicting next one cell in Figure 4.8 (a), the average AUC of across the three cell sizes is 0.0198 for TRN and 0.5096 for RNN. When it comes to predicting multiple consecutive cells (k>1), the performance difference between RNN and TRN becomes substantial as shown in Figure 4.8 (b)-(c). The curves for TRN exhibit a sharp drop at very low values of x, resulting in the AUC values being nearly zero. On the other hand, the curves for RNN still show relatively high probabilities of having high CPP_k. For instance, for k=3, there are still approximately 10-15% of cases where RNN predicts the next three cells correctly with the probability higher than 0.75 (see ) in Figure 4.8 (c)). Overall, the average AUC of is 0.2849 for k=2 and 0.1585 for k=3.

Next, we take a closer look at the prediction results from the RNN models, focusing on the impact of original sequence length (m) and given sequence length (i) on the prediction performance expressed by AUC. Within each combination of k and cell size R, we split the associated CPP_k observations into different m and i groups. We then construct a separate CCPF curve for each group to compute AUC. Figure 4.9 shows six groups of AUC plots, where each plot contains a set of AUC curves and each AUC curve represents a set of AUC values obtained for predicting cell sequences of certain length m. The x-axis represents given sequence length i in predicting the next k cells. Some observations from Figure 4.9 are summarized as follow:

  • From each AUC curve (given a fixed m), AUC increases sharply as i increases up to a certain point (e.g., i=5 or 6), but remains stable afterward. This means that until a certain point, giving a longer initial sequence helps predicting the next cells (e.g., predicting the 3rd cell given two previous cells is easier than predicting the 2nd cell given one previous cell). Beyond that point, however, the length of initial sequence does not significantly affect prediction performance.

  • From each AUC plot (across different m), AUC increases as m increases. This indicates that, even when the same number of cells are given, the prediction performance depends on the length of the actual cell sequence (e.g., predicting the 3rd cell in a sequence of 10 cells is easier than predicting the 3rd cell within a sequence of 5 cells).

  • Across AUC plots (across different k), AUC decreases as k increases, which is expected since predicting further steps toward the future is more difficult than predicting immediate next step.

  • Across AUC plots (across different R), AUC does not significantly change with different cell sizes. There is, however, a tendency of decreasing AUC and increasing variance in AUC values when spatial resolution is low (e.g., R=1000m).

4.6 Conclusion

The overall goal of this research is to leverage massive amounts of urban movement data, which become increasingly available nowadays, to better understand city mobility dynamics and enhance the design and operations of transportation systems. Of particular interest is the ability to predict individual vehicles’ movements—at least in terms of a sequence of aggregated spatial locations—and hence anticipate the flow of vehicles at a given location and time more accurately. This study showed a promising direction toward achieving this ability by applying deep learning with Recurrent Neural Networks (RNN) on vehicle trajectory data. As a way to represent complex vehicle trajectories as simpler location sequences, this study proposes a method to partition the network into cells so that entire vehicle movements can be expressed in terms of combinations of a finite set of cells. Mapping trajectories onto cells not only reduces computational complexity but also allows working with multi-source and multi-resolution trajectories. We test different cell sizes and provide discussions on the impacts of cell size on trajectory prediction performance. Using large amounts of Bluetooth vehicle trajectory data collected in Brisbane, Australia, this study trains a RNN model to predict cell sequences. We test the model performance by computing the probability of correctly predicting the next k consecutive cells. Compared to a base-case model that relies on a simple transition matrix, the proposed RNN model shows substantially better prediction results. We also test network-level aggregate measures such as total cell visit count and inter-cell flow and observe that the RNN model can replicate real-world traffic patterns. In summary, the contribution of this study is the development of a novel network-wide trajectory prediction framework that entails (i) transforming raw trajectories into location sequence data using the proposed cell construction method, (ii) applying the RNN model to learn and predict trajectory sequence patterns by recognizing the similarity between trajectory sequence prediction and language modeling where RNN showed a great success, and (iii) proposing different performance measures, at both individual sequence level (e.g., correct prediction probability of predicting next cells in a sequence) and aggregated region level (i.e., cell visit count, inter-cell flow, network-wide usage pattern), to evaluate and demonstrate the application of the proposed model from different angles.

5.1 Introduction

Recently, with abundance of various location sensors and location-aware devices, a large amount of location data are collected in urban spaces. These collected data are studied in the form of so-called moving object trajectory which is a trace of moving object in geographical spaces represented by a sequence of chronologically ordered locations [zheng_trajectory_2015]. Of particular interest are urban vehicle trajectory data that represent vehicle movements in urban traffic networks. Such urban vehicle trajectory data offer unprecedented opportunities to understand vehicle movement patterns in urban traffic networks by providing rich information on both aggregate flows (e.g., origin-destination matrix and cross-sectional traffic volume) and disaggregate travel behaviours including user-centric travel experiences (e.g., speed profile and travel time experienced by individual vehicles) as well as system-wide spatiotemporal mobility patterns (e.g., origin-destination pairs, routing information, and network traffic state) [kim2015spatial]. Previous studies have used urban vehicle trajectory data to perform travel pattern analysis [kim2015spatial, yildirimoglu2018identification] and develop real-world applications such as trajectory-based bus arrival prediction [zimmerman2011field] and trajectory-based route recommendation system [yuan2011t].

Among many applications of trajectory data mining [mazimpaka2016trajectory], this study focuses on trajectory-based location prediction problem. This problem concerns analyzing large amounts of trajectories of people and vehicles moving around a city to make predictions on their next locations [noulas2012mining, gambs_next_2012, mathew2012predicting], destinations [krumm2006predestination, krumm2007predestination, horvitz2012some, xue2015solving, ziebart2008navigate], or the occurrences of traffic related events such as traffic jams and incidents [wang2016prediction]. In this study, we address the problem of predicting the sequence of next locations that the subject vehicle would visit, based on the information on the previous locations from the origin of the current trip and historical database representing the urban mobility patterns.

Trajectory-based location prediction is gaining increasing attention from both academia and industry because of its potential to improve the performance of many applications in multiple domains. One example is Location-based Service (LBS). LBS uses location data of service users and provide user-specific information depending on the locations of service users. Typical examples of LBS are social event recommendation, location-based advertising, and location-based incident warning system. The location prediction can be applied to predictively give information; for example, if a user’s next location is expected to be disastrous or congested, the service informs the user to change route. Furthermore, when it is not possible to continue to give service because the position of the user is lost due to sensor malfunctioning, predicting the locations of the user can temporally replace the role of positioning system and continue the service [monreale2009wherenext, morzy2007mining]. Another example is the application on agent-based traffic simulators. Unlike traditional traffic simulators which consider traffic demand as input, an agent-based traffic simulator requires information on individual vehicle journey such as origin, destination, and travel routes [martinez_agent-based_2015]. The result of vehicle location prediction can be used for real-time application of these agent-based traffic simulators. Vehicle location prediction can also be applied to inter-regional traffic demand forecasting. As the market of ride-sharing is continuously growing and Shared Autonomous Vehicles (SAV) are expected to be on our roads in the near future, there is a strong need to be able to predict inter-regional traffic demand so as to dispatch the proper number of SAV to areas of high demand. Location prediction model can be used to identify the demand hotspots by learning the mobility pattern of the users.

In our previous work [choi2018network]

, we proposed a Recurrent Neural Network (RNN) model to predict next locations in vehicle trajectories by adopting ideas from text generation model in natural language processing, where RNN has shown great success, and adapting them for use in our problem of location sequence prediction. The RNN model

[choi2018network] considered the previously visited locations as the only input to predict the next location. Despite its simple structure, the model produced promising results. For instance, for more than 50% of all the tested trajectory samples, our RNN model showed a high prediction accuracy in that the probability of correctly predicting the next location was greater than 0.7, whereas the referenced non-RNN model (used for performance comparison) showed the similar accuracy level only for less than 5% of the tested samples [choi2018network]. To further improve the model performance, this study considers additional inputs that are likely to help predictions and proposes methodology that allows the incorporation of heterogeneous input soucres into the existing RNN framework. A specific input that we consider in this study is the surrounding traffic conditions of a vehicle at the time when it starts its journey. Nowadays, drivers can easily observe the current traffic state in the urban traffic networks and plan their journey (choose their routes) by using various traffic information and routing services [adler2001investigating, cabannes2017impact]. As a result, the location sequences (chosen routes) of individual vehicles are expected to be influenced by the network traffic conditions at the beginning of their journeys. Inspired by this idea, this study proposes an Attention-based RNN model, which embeds an attention interface to enable the RNN model to consider the current traffic state as an additional input to location prediction. A detailed explanation is in the Methodology section.

5.2 Methodology

5.2.1 Representing Urban Vehicle Trajectories as Cell Sequences

Urban vehicle trajectory refers to a sequence of locations and times describing the path that a vehicle follows along its journey in urban traffic networks. Various sensors collect the location of vehicles and passage time() to form the vehicle trajectory data. The data points in vehicle trajectories are continuous in space; that is, points are continuous-scaled coordinates of longitude and latitude. To learn movement patterns from a large amount of trajectory data, however it is necessary to define a finite set of representative locations that are common to all the trajectories that have the similar path. As such, the first step in building the trajectory prediction model is to discretize the vehicle trajectory data and convert each trajectory to a sequence of discretized locations. Based on the previous studies [choi2018network, kim2016graph, kim2017trajectory], we partition the urban traffic network into smaller regions, or cells, so that continuous-scaled raw vehicle trajectory data are represented as discretized cell sequence data.

Let represent a raw vehicle trajectory consisting l number of data points, where data point represents the longitude and latitude coordinate of the vehicle’s position. Using a large number of vehicle trajectory data allows the data points in all trajectories to combine and cluster in space based on the desired radius, denoted by . Accordingly, the distance between the centroid of the point cluster and its farthest member point is approximately R for each spatial cluster. The centroid is the mean location of the data points within the cluster, and by using Voronoi tessellation method, the cell boundaries of the clusters (Voronoi polygons) are determined.

Given N cells in the network, a vehicle trajectory can be expressed as a cell sequence , where is the index of the visited cell within trajectory . Since each of the visited cells covers multiple trajectory data points, the length of the cell sequence () is always less than or equal to the length of original vehicle trajectory () (i.e. ). In addition to the cell sequence covering the original trajectory, two virtual spatial tokens and are added to the front and the back of the cell sequence. These virtual tokens are treated as virtual cells that do not exist in the actual network but only indicate the start and the end of the trip.

The cell sequence is then separated into input vector and output label vector for training, validating, and testing. Given the cell sequence containing cells including the start and the end tokens, input vector consists of first m+1 cells and output label vector consists of elements starting from the second element ().

(5.1)
Figure 5.1: Representing urban vehicle trajctory as cell sequence

5.2.2 Cell Sequence Prediction using Recurrent Neural Network

In our previous work [choi2018network], a Recurrent Neural Network (RNN) model for the cell sequence prediction was developed and evaluated. This previous RNN model was designed to predict the future cell sequences based purely on the previously visited cell sequence. In the training step, the model calculates the probability of each cell being visited in the next step () based on input vector . The model structure is shown in the Fig. 5.2. Then, the model calculates the cross-entropy loss () between the correct label () and the predicted label probability based on the current parameter (). A basic Long Short Term Memory (LSTM) unit [hochreiter1997long] is used as the hidden unit in the RNN model, i.e., RNN units in Fig. 5.2. For parameter estimation, the Adam optimizer was used to update the model parameters [kingma2014adam]. A detailed explanation of the earlier model can be found in Chapter 4

Figure 5.2: Structure of the basic Recurrent Neural Network model (RNN) for cell sequence prediction

5.2.3 Incorporating Network Traffic State Data into Cell Sequence Prediction

The drivers can easily obtain the current traffic state in the urban traffic networks, and plan their journey by using various traffic information and routing services [adler2001investigating, cabannes2017impact]. For example, between routes A and B, a driver is likely to choose route A when route B is congested and vice versa. As a result, the location sequences (chosen routes) of individual vehicles are expected to be influenced by the network traffic conditions at the beginning of their journeys. It is thus desriable to incorporate network-wide traffic state information and route choice behavior depending on the prevailing traffic state into the RNN-based cell sequence prediction model to increase the model’s prediction accuracy.

Adding additional information to RNN models, which is network traffic state in our case, is not a straightforward task. RNN models are specialied to process sequential data considering temporal dependency across time or sequence steps. When input data are all in the form of sequence, adding another sequence input can be done through a straightforward extention as RNN model can have multiple input layers and multiple hidden features to incorporate multiple sequence inputs and combine them to calculate the output. However, when the additonal input is non-sequential data, it cannot be directly represented as an input layer of the RNN model but rather should be processed outside the RNN model. The traffic state information we wish to add as an additional input to our RNN model is network-wide traffic density level at the beginning of the sequence, i.e., traffic information available at the origin () of a given trajectory), which is non-sequential data, making the problem more challenging. It may be possible to generate network traffic state data in a sequential form by feeding network traffic state at the time that a subject vehicle visits each cell in cell sequence. However, it requires a model that preidicts the location and visiting time simultaneously and that is beyond the scope of our current study, which focuses on location prediction only.

One way to address this challenge is to introduce attention mechanism. The attention mechanism can be understood as an interface between external information processed outside the RNN model and sequential inputs processed inside the RNN model, as illustrated in Fig. 5.3. The attention mechanism in neural networks was first introduced to imitate the “attention mechanism” in human brain. When humans are asked to translate a sentence from one language to other language, humans try to think of words that matche the alignment and meaning of word while also considering the global context of the sentence. Similarly, when humans are asked to write a sentence based on an image, humans not only concentrate on the important part of the image but also think of the global context of the image to write a sentence. Using the attention mechanism in neural networks has shown significant improvements in model performance in applications such as machine language translation [vaswani2017attention] and video captioning [xu2015show].

The attention mechanism allows the cell sequence prediction model or cell sequence generator to concentrate on certain part of the network traffic state input and use the information for cell sequence generation. There are mainly two tasks given to the attention mechanism: first is to set initial state for the RNN and second is to provide the network-wied traffic state information at each cell generation step. Usually, the initial state vector of RNN cell is set as zero vector since the simplest form of RNN does not consider additional information from other models or inputs. However, in the case of ARNN, there is an additional information of network traffic state. To consider this input in cell sequence generation, this information should be embedded into the model. Also, the attention mechanism allows the RNN to consider the traffic state in predicting the next location, or cell, at each step. The model is trained to calculate which information, or which region, to consider among the network traffic state data by calculating the context vector and attention weights.

Fig. 5.3 shows the structure of the Attention-based Recurrent Neural Network (ARNN) model for the cell sequence prediction. There are two types of input data in this model: the first is the network traffic state data and the second is the cell sequence representation of vehicle trajectory data. The model first processes the current network traffic state and calculates the initial state () for the RNN unit. Then, the attention interface calculates the context vector () based on the previous state vector. The context vector () is used as input to the RNN unit as well as the corresponding input vector element () to update current state vector (). The attention weight is calculated based on the context vector and previous state vector ( ). The attention weight represents the probability to attend to cell at sequence. Therefore, the sum of at each sequence is 1 ( ).

The input cell sequence () is processed based on the word-embedding method to represent the hidden features of the cells. In the training step, input vector X is directly used as an input of each RNN unit in order to calculate the output vector (). However, in the testing step, only the front n cell sequence elements are directly used. Afterward, since the output vector represents the probability of each cell being visited, we use a random sampling based on the multinomial distribution with probability to extract the next cell, also it is used as the next input vector element.

A basic Long Short Term Memory (LSTM) cell [hochreiter1997long] is used as RNN cell. And the model also uses the Adam optimizer to update the model parameters [kingma2014adam].

Figure 5.3: Structure of the proposed Attention-based Recurrent Neural Network model (ARNN) for cell sequence prediction

5.3 Model Performance Evaluation

5.3.1 Data

Urban Vehicle Trajectory Data

The vehicle trajectory data used in this research are collected from the Bluetooth sensors in Brisbane, Australia, provided by Queensland Department of Transport and Main Roads (TMR) and Brisbane City Council (BCC). The Bluetooth sensors are installed in state-controlled roads and intersections located inside the Brisbane City, and they detect Bluetooth devices (e.g., in-vehicle navigation systems and mobile devices) passing the sensors and record their passage time. By connecting the data points containing the same identifier of the Bluetooth device (MAC ID), the vehicle trajectories of individual vehicles can be constructed. Each vehicle trajectory represents a time-ordered sequence of Bluetooth sensor locations that a subject vehicle passes. If the corresponding vehicle does not move for more than an hour, it is considered that the vehicle trip has terminated. For this case study, we used the vehicle trajectory data collected in March 2016. There are approximately 276,000 trajectories in one day, and a total of 8,556,767 vehicle trajectories are collected in March 2016. We randomly sampled 200,000 vehicle trajectories for the training dataset, 10,000 vehicle trajectories for the validation dataset (used in hyper-parameter searching), and 200,000 vehicle trajectories for the testing dataset.

Brisbane urban traffic network is divided into “cells” to use the vehicle trajectory clustering and cell partitioning method proposed in the previous research [kim2016graph, kim2017trajectory]. The desired radius of the cells is set to be 300m. Accordingly, a total of 5,712 cells are generated. Among them, 2,746 cells are considered to be active since the rest of the cells are not visited by any vehicles in the historical data of vehicle trajectories. The vehicle trajectory data are processed and transformed into cell sequence data.

Network Traffic State Data

There are several ways to represent the network traffic state such as density and average speed. In this study, vehicle accumulation, which is understood as the density of cells, is used to represent the network traffic state. The vehicle accumulation for a given cell is estimated by counting the number of vehicles that are present within the cell at a given instant point in time. We processed the vehicle trajectory data and calculated the vehicle accumulation of each cell at each minute. The vehicle accumulation data are normalized by dividing the vehicle accumulation by the historical maximum number of vehicle accumulation in each cell.

The vehicle accumulation data are used as the network traffic state input to the ARNN model. When the ARNN model is trainend through each cell sequence, the model receives the vehicle accumulation data on a whole network from 10 minutes before the start time of the sequence. As a result, the shape of the input vehicle accumulation data is [N,10], where N is the number of cells in the study network.

Figure 5.4: Spatial distribution of (a) vehicle accumulation, and (b) normalized vehicle accumulation at 12pm on March 1, 2016.

5.3.2 Hyperparameter and Model Training

For each model, we applied hyperparameter searching algorithm to ensure that each model is trained to acheive its maximum performance. The hyperparameter searching algorithm used in this study is from a Python package called “Scikit-optimize” The hyperparameter searching algorithm is based on Bayesian optimization using Gaussian Process (GP). This algorithm approximates the function by assuming that the function values follow a multivariate Gaussian. The covariance of the function values is given by a GP kernel between the parameters. Then a smart choice to choose the next parameter to evaluate can be made by the acquisition function over the Gaussian prior which is much quicker to evaluate.

There are three hyperparameters to search: learning rate, embedding layer dimension, and hidden layer dimension. The learning rate determines the updating step-size at each training step. If it is too large, the model is unlikely to converge. On the other hand, if it is too small, the speed of convergence is too slow, and the model is likely to fall into local minima. Therefore, finidng an appropriate learning rate is crucial in learning neural networks. The embedding layer dimension is used to convert the cell sequence input which is treated as one-hot vector to a vector in the latent space. In other words, the embedding layer extracts feature of each cell input and represents it as a numeric vector. The hidden layer dimension determines the dimension of LSTM cells and cell decoding layer. LSTM cell is used to calculate the state vector () and cell decoding layer is used to calculate the cell-visiting probability () from the state vector ().

The models are trained for 10 epochs for each hyperparameter set and the prediction accuracy is measured by applying the trained models to the validation dataset. The result of hyperparameters are shown in Table

5.1.

Model Learning rate Dimension of Embedding Layer Dimension of Hidden Layer
RNN 6.216234e-05 413 854
ARNN 5.842804e-04 659 574
Table 5.1: Hyperparameter result

5.4 Result

5.4.1 Score based Evaluation of Generated Cell Sequences

In this study, we use two widely used evaluation metrics in sequence modeling to evaluate the accuracies of generated cell sequences: BLEU score [papineni2002bleu] and METEOR score [banerjee2005meteor]. In the previous study [choi2018network], we used the complementary cumulative distribution function of the probability to measure how accurately the model predicts the next 1, 2, or 3 consecutive cells. While this measure is intuitive and easy to interpret, there is a drawback in this method in that it considers element-wide prediction accuracy and does not take into account the whole sequence. The element-wide performance measure can be sensitive to small local mis-predictions and tend to underestimate the performance of the model. For example, when the original cell sequence is and a model is asked to predict the next cells based on the given cell sequence , the prediction of will be considered as incorrect and performing poorly by our previous method because the model miss-predicted one cell , even though the overall sequence is very similar to the original sequence. As such, this study employs score and score that consider the whole sequence and thus more robust and accurate as performance measure for sequence modeling.

BLEU score

When reference sequence is given, uses three methods to evaluate the similarity between the reference sequence and the generated sequence. This metric is one of the most widely used metrics in natural language processing and sequence-to-sequence modeling. scans through the sequence and check if the generated sequence contains identical chunks which are found in the reference sequence. Here, uses a modified form of precision to compare a reference sequence and a candidate sequence by clipping. For the generated sequence, the number of each chunk is clipped to a maximum count () to avoid generating same chunks to get higher score.

(5.2)

where is a set of cells (or chunks) in the generated sequence, is the number of the cell (or chunk) in the generated sequence, is the number of the cell (or chunk) in reference sequence, and is the total number of cells in candidate sequence. When is 1, the chunks represent the cells in the sequences. Otherwise, we consider consecutive cells as chunk and calculate the precision for each -cell-unit.

The

score represents the geometric mean of

’s with different ’s multiplied by a brevity penalty to prevent very short candidates from receiving too high score.

(5.3)

where represents the length of generated sequence, represents the length of reference sequence.

METEOR score

[banerjee2005meteor] first creates an alignment between candidate cell sequence and reference cell sequence. The alignment is a set of mappings between the most similar cells. Every cell in the candidate sequence should be mapped to zero or one cell in the reference sequence. chooses an alignment with the most mappings and the fewest crosses (fewer intersection between mappings).

To calculate score, we first define precision and recall .

(5.4)
(5.5)

where m refers to the number of single cells in candidate cell sequence which are also found in the reference cell sequence, refers to the sum of maximum number of each cell in reference cell sequence which are in candidate cell sequence, refers to the number of cells in candidate cell sequence, and refers to the number of cells in reference cell sequence.

Then, we calculate the weighted harmonic mean between precision and recall, where the ratio of the weights is 1:9.

(5.6)

To account for congruity with respect to a longer cell segment that appears both in reference and candidate cell sequences, we generate mappings based on the longer cell segment and use it to compute the penalty p. The more mappings there are, which are not adjacent in the reference and the candidate cell sequence, the higher the penalty will be. The penalty is calculated as follows:

(5.7)

where c is a set of single cells that are not adjacent in the candidate and refernce sequence, and is the number of single cells that have been mapped. This penalty reduces up to 50% and calculate the score (M).

(5.8)

5.4.2 Score Result

For each sequence in the test dataset, the scores are calculated by the following procedure.

Let be the subject cell sequence with length m, which is expressed as:

(5.9)

The subject cell sequence is divided into 2 parts: The sequence given () and the sequence to be predicted (), where g is the number of cells given to the models (ARNN and RNN).

(5.10)

Each model predicts 100 candidate cell sequences based on producing a set of 100 sequences for each . The generated candidate cell sequences are the cell sequences that have #end token at the end, representing that the trip has terminated. These candidate cell sequences may not have the same length with the original cell sequence. The length can be longer or shorter depending on when the model predicts #end token. The is used as reference cell sequence to calculate the score presented above. For each score (, , , , and ), 100 score values are calculated based on the generated 100 candidate cell sequences. The average value of each score is used to represent the model performance of the corresponding cell sequence ().

Input: (given sequence),
                 (target sequence)
                 (prediction model)
                 (score calculation function)
                 (number of predictions)
      Output:

1:
2:for  do
3:      Predict future location sequence
4:      Calculate score
5:      Append score to
6:end for
7: Take average of as score
8:return
Algorithm 1 Pseudo-code for score evaluation

10,000 cell sequences in test dataset is used to calculate the five scores. Fig. 5.5 shows the score result of each model. The x-axis represents the original length of the sequence, and the y-axis represents the value for each score metrics. The result of ARNN model (red color) shows better performance compared to the result of RNN model (blue color).

The result shows that ARNN model can predict short cell sequences more accurately up to 12% and long cell sequences more accurately up to 5%. ARNN model outperforms the RNN model in terms of both score and score. It is worth noting that the ARNN model had performance improvement in terms of score as a high score not only requires good prediction of visited cells but also accurate description of cell alignment (the visiting order of cells). The result thus confirm that the ARNN model using the attention mechanism achieves improvements in predicting the composition of cells in the sequence accurately as well as the alignment of the cells in the sequence.

The performance gap between two models tends to decrease as the original length of the sequence increases. Fig. 5.6 shows the result in terms of score improvement rate. The score improvement rate is defined as ratio of the performance score of ARNN model to the performance score of RNN model (). For each number of given cell sequence (g) and original length of cell sequence (m), this performance improvement rate is measured. And the Fig. 5.6 shows the summarized result. Points in Fig. 5.6 represents the average performance improvement rate for each original length of cell sequence (m), and the line represents the range of this value (from the minimum value to the maximum value).

In Fig. 5.6, one interesting observation is that the performance improvement rate decreases and converges to 1 (the black lines) as the original length of the cell sequence increases. This can be because the input feature given to the ARNN model is the network traffic state at the beginning of each trajectory journey. This observation has an important implication for the influence of pre-trip information in route choice behaviors. The fact that the ARNN improves the prediction at the early stages of a journey implies that the pre-trip information indeed influences travellers’ route choice decisions and differentiates route choice patterns between different pre-trip traffic conditions. The fact that the effect of pre-trip information fades away at the later stages of the journey may be the indication of drivers’ reliance on en-route trip information instead of pre-trip information and thus indicates a need for incorporating such en-route information into the model to further improve the model performance.

Figure 5.5: Boxplot of models (ARNN, RNN) for each original length of sequence (m)
Figure 5.6: Score improvement rate for each original length of sequence (m). The points represent the average value, and the lines represent the range of the score improvement rate (from the minimum value to the maximum value)

5.5 Conclusion and Future Studies

This research studies urban vehicle trajectory prediction, one of the applications of trajectory data mining. Based on the previous work [choi2018network], in this study, we proposed a novel approach to incorporate network traffic state data into urban vehicle trajectory prediction model. Attention mechanism is used as an interface to connect the network traffic state input data to the vehicle trajectory predictor proposed in the previous work. ARNN model, which is Attention-based RNN model for cell sequence prediction, is compared with RNN model, which is RNN model for cell sequence prediction, in terms of conventional scoring methods in sequence prediction. The result shows that ARNN model outperformed RNN model. The result shows that it is effective to use attention mechanism to structurally connect the network traffic state input to RNN model to predict the vehicle’s future locations. Especially, it is promising that the ARNN model showed significant performance improvement in terms of METEOR which considers not only the cells to be visited but also the alignment of the cells in sequence. The performance improvement rates tend to decrease and converge to 1 as the original number of cell sequence increase. For the further improvement of the ARNN model, this problem should be studied to maintain the performance improvement rate at steady level.

There are some limitations in this study, so further works should cover such topics. First of all, in this study, the network traffic state data were normalized by using the historical maximum value of each cell. This makes easy to represent the network traffic state, but this may lead to some problems that normalized data of the cells with very low traffic may be too sensitive to small number of vehicles and count it as heavy congestion. This makes the model overreact to these cells and makes the cell sequence prediction confused. For the further improvement of this study, different types of normalization methods should be tested. Second, although the performance measures ( and ) are widely used in the fields studying sequence prediction such as natural language processing, the application of these metrics is new in the transportation domain. The interpretation and implication of these metrics in the context of traffic modeling should be further investigated and proposed.

6.1 Introduction

Rapid advancements in location sensing and wireless communication technology enabled us to collect and store a massive amount of spatial trajectory data, which contains geographical locations of moving objects with their corresponding passage times [lee2011trajectory]. Over the last decade, considerable progress is made in collecting, pre-processing, and analyzing trajectory data. Also, the trajectory data analysis is applied in various research areas, including behavioral ecology [de2019trajectory], transportation engineering [wu2018location], and urban planning [laube2014computational].

In transportation engineering, urban vehicle trajectory data are collected based on the location sensors installed inside vehicles or at the roadside and analyzed with various methods. The high-resolution mobility data of individual users in urban road networks offer unprecedented opportunities to understand vehicle movement patterns in urban traffic networks. It provides rich information on both aggregated flows and disaggregated travel behaviors. The aggregated flows include the origin-destination (OD) matrix and cross-sectional link traffic volumes. The disaggregated travel behaviors include user-centric travel experiences, namely, speed profile, link-to-link route choice behavior and travel time experienced by individual vehicles, as well as system-wide spatio-temporal mobility patterns, such as origin-destination pairs, routing pattern distributions, and network traffic states [kim2015spatial].

Most of the studies in the vehicle trajectory data analysis use machine learning methods. The recurrent neural network, for example, is used by many previous researchers due to its ability to learn sequential information in trajectory data. In machine learning, there are mainly two approaches to modeling: the discriminative and the generative modeling. A discriminative model learns a direct map from input to output (label)

or posterior probability

, which is the conditional probability of each label given the input variable . It only learns the decision boundaries between labels and does not care about the underlying distribution of data. In contrast, a generative model captures the underlying probability distribution, i.e., joint probability , from which can be computed. One advantage of a generative model is that we can generate new (synthetic) data similar to existing data by sampling from .

Synthetic data generation based on generative models has gained increasing importance as the data generation process plays a significant role in various research fields in an era of data-driven world [popic2019data]. It is mainly used to serve two purposes. The first purpose is to deal with the lack of real data. In many research fields, data collection is costly, and, therefore, it is often difficult to collect enough data to properly train and validate models. In this case, it is useful to generate synthetic data that are similar to the real observations to increase training and test samples. The second purpose is to address the issue with the privacy and confidentiality of real data. Many types of data contain personal information, such as gender, name, and credit card usage. Synthetic data can be combined with or replace such privacy-sensitive data with a reasonable level of similarity, thereby protecting privacy while serving the intended analysis.

Urban vehicle trajectory analysis has both challenges: data sparsity and data privacy issues. Although the sources and availability of urban trajectory data are increasing, most of the currently available trajectory datasets cover only a portion of all vehicles in the network. From network management and operations perspectives, there is a desire to infer vehicle trajectories that represent the whole population to have a more complete view of traffic dynamics and network performance. Moreover, urban vehicle trajectory data may contain personal information of individual drivers, which poses serious privacy concerns in relation to the disclosure of private information to the public or a third party [chow2011privacy]. The ability to generate synthetic trajectory data that can realistically reproduce the population mobility patterns is, therefore, highly desirable and expected to be increasingly beneficial to various applications in urban mobility.

While synthetic trajectory data generation is a relatively new topic in transportation research communities, there are several existing research areas that have addressed similar problems. One example is trajectory reconstruction. When two points in a road network are given as an initial point (treated as sub-origin) and a target point (treated as sub-destination), the models reconstruct the most plausible route between the two points. The trajectory reconstruction can be considered as generating trajectories between sub-origins and sub-destinations. Previous studies such as [chen2011discovering] and [hu2018graph] investigated on discovering the most popular routes between two locations. [chen2011discovering] first constructs a directed graph to simplify the distribution of trajectory points and used the Markov chain to calculate the transfer probability to each node in the directed graph. The transfer probability is used as an indicator to reflect how popular the node is as a destination. The route popularity is calculated from the transfer probability of each node. [hu2018graph] also used a graph-based approach to constructing popular routes. The check-in records which contain the route’s attributes are analyzed to divide the whole space into zones. Then, the historical probability is used to find the most plausible zone sequences. Also, [feng2015vehicle] and [rao2018origin] estimated origin-destination patterns by using trajectory reconstruction. Both studies used particle filtering to reconstruct the vehicle trajectory between two points in automatic vehicle identification data. The reconstructed vehicle trajectory is then used to estimate the real OD matrix of the road network. Another problem that is relevant to trajectory generation is the next location prediction problem, where the "next location" of a subject vehicle is predicted based on the previously visited locations of the subject vehicle. [monreale2009wherenext]

, for example, presented a decision tree to predict the next location based on the previously visited locations. Decision-tree based models, however, occasionally overfit the training dataset and lack the generalization ability to produce diverse trajectory patterns.

[gambs_next_2012] used Mobility Markov chain (MMC) to predict the next location among the clustered points or Point-of-Interests (POIs). The POIs considered in [gambs_next_2012] are home, work, and other activity locations to model human activity trajectories throughout the day, rather than vehicle movement trajectories reflecting link-to-link vehicle driving behavior considered in this study. [choi2019real] used a feed-forward neural network to predict the next intersection in a grid-structured road network. A set of intersections in Brisbane, Australia are treated as POI’s to capture the link-to-link route choice behavior. [jin2019augmented] used an augmented-intention recurrent neural network model to predict locations of vehicle trajectories of individual users. [jin2019augmented] incorporated additional information on individual users’ historical records of frequently visited locations into a next location prediction model. The past visited locations in historical records are represented as edge-weighted graph, and graph convolution network is used to incorporate this information into trajectory prediction. In [choi2018network], an urban road network is partitioned into zones based on the clustering of trajectory data points, and the prediction model based on recurrent neural network (RNN) is proposed to predict the zone that the subject vehicle would visit. [choi2019attention] extended the idea of predicting the next zone and used network traffic state information to improve the RNN model’s performance.

In fact, the existing models developed for the next location prediction problem can be applied for synthetic trajectory data generation. By sequentially applying the next location predictions, a synthetic vehicle trajectory can be generated. However, most of the existing models for next location prediction adopt a discriminative modeling approach, where the next locations are treated as labels and the model is trained to predict one or two next locations. The discriminative models have limitations in generating full trajectories, especially when sample trajectory data are sparse. it is only the decision boundaries between the labels that the models are trained to predict, not the underlying distributions of data that allow proper generalization for sampling realistic trajectories. As a result, it is necessary to develop a model based on the generative modeling approach to successfully perform synthetic trajectory data generation.

In this paper, we apply imitation learning to develop a generative model for urban vehicle trajectory data. Imitation learning is a sub-domain of reinforcement learning for learning sequential decision-making behaviors or "policies". Unlike reinforcement learning that uses "rewards" as signals for positive and negative behavior, imitation learning directly learns from sample data, so-called "expert demonstrations," by imitating and generalizing the expert’ decision-making strategy observed in the demonstrations. If we consider an urban vehicle trajectory as a sequence of decisions for choosing road links along a travel path, imitation learning can be applied to develop a generator that can reproduce synthetic data by imitating the decision-making process (i.e., driver’ route choice behavior) demonstrated in the observed trajectory dataset. One approach to imitation learning is called Inverse Reinforcement Learning (IRL), which aims to recover a reward function that explains the behavior of an expert from a set of demonstrations. Using the recovered expert reward function as feedback signals, the model can generate samples similar to the expert’ decisions through reinforcement learning. [ziebart2008maximum] and [ziebart2008navigate] used maximum entropy IRL (MaxEnt) to generate synthetic trajectories similar to a given taxi dataset. One of the advantages of using IRL is that the model generates trajectories using both current states and expected returns of future states to determine an action—as opposed to considering only the knowledge up to the current state (e.g., previous visited locations)—, thereby enabling a better generalization of travel behavior along the whole trajectory.

Recently, there have been remarkable breakthroughs in generative models based on deep learning. In particular, [goodfellow2014generative] introduced a new generative model called Generative Adversarial Networks (GAN), which addressed inherent difficulties of deep generative models associated with intractable probabilistic computations in training. GANs use an adversarial discriminator to distinguish whether a sample is from real data or from synthetic data generated by the generator. The competition between the generator and the discriminator is formulated as a minimax game. As a result, when the model is converged, the optimal generator would produce synthetic sample data similar to the original data. The generative adversarial learning framework is used in many research fields such as image generation [radford2015unsupervised2], audio generation [oord2016wavenet], and molecular graph generation [de2018molgan].

GANs have been also applied in transportation engineering. [zhang2019novel] proposed trip travel time estimation framework called T-InfoGAN based on generative adversarial networks. They used a dynamic clustering algorithm with Wasserstein distance to make clusters of link pairs with similar travel time distribution, and they applied Information Maximizing GAN (InfoGAN) to travel time estimation. [xu2020ge] proposed Graph-Embedding GAN (GE-GAN) for road traffic state estimation. Graph embedding is applied to select the most relevant links for estimating a target link and GAN is used to generate the road traffic state data of the target link. In [li2020coupled], GAN is used as a synthetic data generator for GPS data and travel mode label data. To solve the sample size problem and the label imbalance problem of a real dataset, the authors used GAN to generate fake GPS data samples of each travel mode label to obtain a large balanced training dataset. The generative adversarial learning framework is also used for synthetic trajectory generation. [liu2018trajgans] proposed a framework called trajGANs. Although this paper does not include specific model implementations, it discusses the potential of generative adversarial learning in synthetic trajectory generation. Inspired by [liu2018trajgans], [rao2020lstm] proposed LSTM-TrajGAN with specific model implementations. The generator of LSTM-TrajGAN is similar to RNN models adopted in the next location prediction studies.

This study proposes TrajGAIL, a generative adversarial imitation learning (GAIL) model for urban vehicle trajectory data. GAIL, proposed by [ho2016generative], uses a combination of IRL’s idea that learns the experts’ underlying reward function and the idea of the generative adversarial framework. GAIL effectively addresses a major drawback of IRL, which is high computational cost. However, the standard GAIL has limitations when applied to the vehicle trajectory generation problem because it is based on the IRL concept that only considers a vehicle’s current position as states in modeling its next locations [ziebart2008maximum, ziebart2008navigate, zhang2019unveiling], which is not realistic as a vehicle’s location choice depends on not only the current position but also the previous positions. To overcome these limitations, this study proposes a new approach that combines a partially-observable Markov decision process (POMDP) within the GAIL framework. POMDP can map the sequence of location observations into a latent state, thereby allowing more generalization of the state definition and incorporating the information of previously visited locations in modeling the vehicle’s next locations. In summary, the generation procedure of urban vehicle trajectories in TrajGAIL is formulated as an imitation learning problem based on POMDP, which can effectively deal with sequential data, and this imitation learning problem is solved using GAIL, which enables trajectory generation that can scale to large road network environments.

This paper is organized as follows. Section 6.2 describes the methodology of this paper. A detailed problem formulation is presented in Section 6.2.1, and the proposed framework of TrajGAIL is presented in Section 6.2.2. Section 6.3 describes how the performance of the proposed model is evaluated. Section 6.3.1 introduces the data used in this study, and Section 6.3.2 introduces the baseline models for performance comparison. In Section 6.3.3, the evaluation results are presented at both trajectory-level and dataset-level. Finally, Section 6.4 presents the conclusions and possible future research.

6.2 Methodology

The objective of TrajGAIL is to generate location sequences in urban vehicle trajectories that are similar to real vehicle travel paths observed in a road traffic network. Here, the "similarity" between the real vehicle trajectories and the generated vehicle trajectories can be defined from two different perspectives. First, the trajectory-level similarity measures the similarity of an individual trajectory to a set of reference trajectories. For instance, the probability of accurately predicting the next locations—single or multiple consecutive locations as well as the alignment of the locations—are examples of trajectory-level similarity measures. Second, the dataset-level similarity measures the statistical or distributional similarity over a trajectory dataset. This type of measure aims to capture how closely the generated trajectory dataset matches the statistical characteristics such as OD and route distributions in the real vehicle trajectory dataset. In this section, we present the modeling framework of TrajGAIL, where the procedure of driving in a road network is formulated as a partially observable Markov decision process to generate realistic synthetic trajectories, taking into account the similarities defined above.

6.2.1 Problem Formulation

Let be an urban vehicle trajectory, where is the -coordinates and timestamp for the point of the trajectory, and be the location sequence of . When location points are continuous latitude and longitude coordinates, it is necessary to pre-process these coordinates and match them to a predefined set of discrete locations. Previous studies used different ways of defining discrete locations. For instance, [choi2018network], [choi2019attention], and [ouyang2018non] used partitioned networks, so-called cells or zones, while [choi2019real] and [ziebart2008maximum] used road links to represent trajectories. In this paper, we represent a trajectory as a sequence of links to model link-to-link route choice behaviors in urban road networks. The location sequence of each vehicle trajectory is, thus, transformed to a sequence of link IDs by link matching function :

(6.1)

where is the link ID of the visited link along the trajectory. The goal of this study is to generate the link sequence of a trajectory by modeling and learning the probability distribution of ,

for a discrete random variable

in all possible set of link IDs. Modeling this joint probability distribution is, however, extremely challenging, as also noted in the previous studies [choi2018network, ouyang2018non]. A way to resolve this problem is to use a sequential model based on the Markov property, which decomposes the joint probability to the product of conditional probabilities as follows:

(6.2)

The problem of modeling vehicle trajectories using this Markov property can be formulated as a Markov Decision Process (MDP). An MDP is a discrete-time stochastic control process based on the Markov property [howard1960dynamic]. This process provides a mathematical framework for modeling sequential decision making of an agent. An MDP is defined with four variables: , where is a set of states that the agent encounters, is a set of possible actions, is a transition model determining the next state () given the current state () and action (), and is a reward function that gives the agent the reward value (feedback signal) of its action given the current state. If the transition is stochastic, transition model can also be denoted as . A policy is defined as a -parameterized function that maps states to an action in the deterministic case , or a function that calculates the probability distribution over actions in the stochastic case. The objective of MDP’s optimization is to find the optimal policy that maximizes the expected cumulative rewards, which is expressed as:

(6.3)

where is the optimal policy with parameter , is a sampled action from (in this study, we use discrete action space, so actions are sampled from categorical distribution), and is the discount rate of future rewards.

How to define the four variables of MDP is critical to the successful training of a policy model. The states should incorporate enough information so that the next action is determined based only on the current state , and the transition model should correctly reflect the transition of states in the environment it models. Finally, the reward function should give a proper training signal to the agent to learn the optimal policy.

In TrajGAIL, the vehicle movement in a road network is formulated as an MDP. We set road segments or links as states and transitions between links as actions. In this case, the transition model can be defined as a deterministic mapping function that gives the next link given the current link and the link-to-link movement choice , i.e., . The policy represents a driver’s route choice behavior associated with selecting the next link at each intersection. This road network MDP, thus, produces vehicle trajectories—more specifically, link sequences—as a result of sequential decision making modeled by this policy.

As mentioned above, MDP assumes that the action is determined based only on the current state . However, it is likely that vehicles’ link-to-link movement choice is affected by not only the current location but also the previous locations. Moreover, vehicle movements in a road network is a result of complex interactions between a large number of drivers and road environment such as the generation and distribution of trips and the assignment of the routes and, therefore, the link choice action cannot be determined solely by road segment information alone as a state. The model needs more information such as origin, destination, trip purpose, and the prevailing traffic state. Incorporating all such information in the state definition, however, makes the problem intractable due to an extremely large state space. It is, thus, desirable to relax the assumption such that the action is determined based on the current state as well as some unobservable states.

Figure 6.1: Partially observable Markov Decision Process

This can be achieved by employing a partially observable MDP (POMDP). A POMDP assumes that an MDP determines the model dynamics, but the agent cannot directly observe the underlying states. Instead of directly using the states as MDP, POMDP uses a surrogate state such as probability distribution over the set of possible states [kaelbling1998planning] and belief state [rao2010decision]. Figure 6.1 shows the graphical model of POMDP. It is assumed that there exist latent unobservable states . We can only partially observe through observation . Using or sequence of ’s, latent state is estimated and this estimated latent state, or the belief state, is represented as .

As a result, instead of four variables of MDP, POMDP uses five variables , where represents the set of possible observations. The belief state at time state , , is estimated based on the sequence of representing all observations up to the current time , which is assumed to be the estimate of the latent unobservable state, , as follows:

(6.4)

In TrajGAIL, the observation space, , is defined as the ID of links in the road network and two virtual tokens representing the start and the end of a trip . Actions are transitions between links. In [ziebart2008maximum], the set of actions includes all possible link transitions. However, this can lead to a very large action space even with a moderate-sizes network with hundreds of links, requiring high computational cost. To reduce the computational complexity, we instead define a set of common actions that represent possible movements between two connected links, namely, , where , , and represent the movement direction at the end of each link (at intersections) and represents the termination of a trip (i.e., a vehicle reached its destination). These four actions are sufficient for our current study as we consider a grid-structured network, where all intersections are four-way intersections. However, it is also possible to model general networks with more diverse intersection structures such as five-way or T-shape intersections as we can apply a "mask" that allows flexibility to further define specific actions available for each link, which would be a subset of the network-wide common action set. For instance, one can define six actions for a network with the maximum intersection size of six and specify only a subset of available actions for each link if it has less than six connected roads.

To summarize, we formulate a partially observable Markov Decision Process to develop a generative model for vehicle trajectories, which produces the optimal policy describing optimal actions given a sequence of observations.

6.2.2 Model Framework

Preliminaries and Background - Imitation Learning

In this study, the imitation learning framework is used to develop a generative model represented in POMDP formulation. Imitation learning is a learning problem that aims to train a model that can act like a given expert. Usually, demonstrations of decisions of the expert are given as a training dataset. In this study, a real vehicle trajectory dataset serves as expert demonstrations so that the model learns the decision-making process of vehicle movements in a road network observed in the given dataset. There are mainly two categories of approaches in imitation learning: behavior cloning and inverse reinforcement learning.

Behavior cloning considers the imitation learning problem as a supervised learning problem. In behavior cloning, given the expert demonstrations, the state and action sequence is divided into independent state-action pairs and a model is trained to directly learn the relationship between input (state) and output (action) based on these sample pairs. The biggest advantage of behavior cloning is simplicity. However, because of its simplicity, the model fails to make proper generalization in complex tasks. Simple generative models based on Markov Chain

[gambs2010show] and Recurrent Neural Networks [choi2018network, choi2019attention, liu2016predicting] can be classified into this category of imitation learning.

The inverse reinforcement learning (IRL) uses an indirect approach. The objective of IRL is to find the reward function that the agent is optimizing given the measurements of agents’ behavior and sensory inputs to the agents [russell1998learning]. It is assumed that the experts follow certain rules known as a reward function. The main idea of IRL is to learn this reward function to imitate the experts based on the history of experts’ behaviors in certain situations. It is called "inverse" reinforcement learning because it learns the reward function that represents the experts’ decisions from their states and actions, whereas the reinforcement learning (RL) learns to generate states and actions from a given reward function. Some of the key papers on IRL problems include [ng2000algorithms, abbeel2004apprenticeship, ziebart2008maximum, wulfmeier2015deep, ho2016generative], which readers are referred to for more details on IRL.

Given expert policy , the objective of IRL is to find a reward function () that maximizes the difference between the expected rewards from the expert and the RL agent () such that the expert performs better than all other policies [ho2016generative], where an expectation with respect to a policy, , is used to denote an expectation with respect to the trajectory it generates (i.e., -discounted cumulative reward), . This is achieved by minimizing the expected reward from RL agent () and by maximizing the expected reward from the expert (), while minimizing the reward regularizer (). On the other hand, when a reward function () is given, the objective of RL is to find a policy () that maximizes the expected reward () while maximizing the entropy of the policy ().

(6.5)

where,

(6.6)

where is the largest possible set of reward functions (), is the convex reward function regularizer, and is the causal entropy of the policy [ho2016generative].

It is interesting to investigate the relationship between and in Eq. (6.5) and Eq. (6.6). tries to find the optimal policy that maximizes the expected rewards, and tries to find the optimal reward function that maximizes the difference between expert policy () and ’s policy (). In some sense, can be interpreted as a generator that creates samples based on the given reward, and can be interpreted as a discriminator that distinguish the expert policy from ’s policy. This relationship is similar to the framework of Generative Adversarial Networks (GAN). GANs use an adversarial discriminator () that distinguishes whether a sample is from real data or from synthetic data generated by the generator (). The competition between generator and discriminator is formulated as a minimax game. As a result, when the model is converged, the optimal generator would produce synthetic sample data similar to the original data. Eq. (6.7) shows the formulation of minimax game between and in GANs.

(6.7)

With a proper selection of the regularizer in IRL formulation in Eq. (6.5), [ho2016generative] proposed generative adversarial imitation learning (GAIL). The formulation of minimax game between the discirminator () and the policy () is shown in Eq. (6.8)

(6.8)

Eq. (6.8) can be solved by finding a saddle point . To do so, it is necessary to introduce function approximations for and since both and are unknown functions and it is very difficult, if not impossible, to define a exact function form for them. Nowadays, deep neural networks are widely used for function approximation. By computing the gradients of the objective function with respect to the corresponding parameters of and

, it is possible to train both generator and discriminator through backpropagation. In the implementation, we usually take gradient steps for

and alternatively until both networks converge.

While GAIL provides a powerful solution framework for synthetic data generation, the original GAIL model [ho2016generative] could not be directly used for our problem of vehicle trajectory generation. From our experiments, we found that the standard GAIL tends to produce very long trajectories with many loops, indicating that vehicles are constantly circulating in the network. This is because the generator in GAIL tries to maximize the expected cumulative rewards and creating a longer trajectory can earn higher expected cumulative rewards as there is no penalty of making a trajectory longer.

There are several ways to address this issue. Possible approaches include giving a negative reward whenever a link is visited to penalize a long trajectory or using positional embedding (the number of visited links) as state. However, a better approach would be to let the model know the vehicle’s visit history (a sequence of links visited so far) and learn that it is unrealistic to visit the same link over and over. Our proposed TrajGAIL framework achieves this and addresses the limitation of GAIL in trajectory generation by assuming POMDP and using RNN embedding layer.

TrajGAIL: Generative Adversarial Imitation Learning Framework for Vehicle Trajectory Generation
Figure 6.2: The model framework of TrajGAIL

TrajGAIL uses POMDP to formulate vehicle trajectory generation as a sequential decision making problem and GAIL to perform imitation learning on this POMDP to learn patterns in observed trajectories to generate synthetic vehicle trajectories similar to real trajectories. In vehicle trajectory generation, it is important to take into account not only the previous locations of the trajectories, but also the expected future locations that the trajectory is expected to visit. By using POMDP, TrajGAIL considers how realistic the previously visited locations are. By using GAIL, TrajGAIL can also consider how realistic future locations would be because the imitation learning framework in GAIL uses an objective function to maximize the expected cumulative future rewards when generating new actions, which captures how realistic the remaining locations will be. It is noted that, in this study, TrajGAIL focuses on generating location sequences (link sequences) of trajectories without considering time components. Throughout the paper, we use the term trajectory generation to refer to the generation of link sequences representing trajectory paths for the sake of brevity. Figure 6.2 shows the model framework of TrajGAIL. As in GAIL, TrajGAIL consists of the discriminator and the generator, where the discriminator gives reward feedback to the vehicle trajectories generated by the generator until both converge. The generator works as a reinforcement learning agent, and the discriminator works as an inverse reinforcement learning agent. Below we provide more details on each of these two modules.

The Generator of TrajGAIL. The primary role of the generator is to make realistic synthetic vehicle trajectories. The generator creates trajectories by a policy roll-out, or an execution of a policy from initial state ( of trip) to terminal state ( of trip). A trajectory starts with the virtual token . By sequentially applying the policy generator until the current observation reaches the other virtual token , the generator produces a whole vehicle trajectory. As our problem is formulated as a POMDP, we need to map the sequence of observations into the latent states. [rao2010decision] suggested that the belief states (), the estimate of probability distribution over latent states, can be computed recursively over time from the previous belief state. The posterior probability of state at the -th observation, denoted by can be calculated as follows:

(6.9)

where is the latent states at the -th observation including both observable and unobservable variables, is the probability of observation given , and is the transition model that maps the current state () and the action () to the next state ().

Eq. (6.9) indicates that the current belief state vector is a combination of the information from the current observation and the feedback from the previous computation of the belief state . In [rao2010decision], the author recognized the similarity between the structure of this equation and recurrent neural networks (RNN) and suggested using RNN for belief state estimation. Many previous studies on the next location prediction problem suggest that RNNs show great performance in embedding the sequence of locations into a vector [choi2018network, choi2019attention, feng2018deepmove]

. Accordingly, we use an RNN embedding layer to map the sequence of observations (link IDs) to a belief state vector. Since the entire historical sequence is embedded in the current (belief) state via RNN embedding and the actions are still determined based only on the current state, the Markov assumption in MDP is not violated, while sequential information can be effectively captured within the model. In the implementation of RNN embedding layer, the size of input tensor (observation sequence) is

and the size of output tensor (belief state vector) is , where is the batch size, is the maximum observation sequence length in the batch, and

is the number of hidden neurons.

Based on the belief state vector , the policy generator within the TrajGAIL generator module calculates the probability of the next action . The policy has a size of , where is the size of action space. The next action is sampled from a multinomial distribution with the probability . The next observation is determined by the next observation look-up table of the road network environment, . The next observation loop-up table maps the current observation () and the action () to the next observation (). This should be defined based on the map geometry data which contains information on connections between links. This process continues until the current observation reaches the virtual token .

In reinforcement learning, a value function is often used to calculate the expected return of the actions at the current state. Here, we use a state-action value function , which is estimated via the value estimator in TrajGAIL’s generator. The state-action value function, , has size of since it represents a value scalar of each input observation sequence. The value estimator has a separate RNN embedding layer to process the sequence of observations into the belief state. Based on the processed belief state and a given action, the value estimator calculates the expected return of the action at current belief state. The estimated value, or the expected return, is used as a coefficient when updating the policy generator. If the estimated value of a given action is large, the policy generator model is reinforced to give the similar actions more often. This value estimator is also modeled as a deep neural network, which is trained to minimize the value objective function, , defined as follows:

(6.10)

where a mean squared error (MSE) loss between the value estimate and the actual discounted return is used.

The objective of the policy update is to maximize the expected cumulative reward function as shown in Equation (6.3). We define the policy objective as and we maximize the policy objective to improve the policy generator at every iterations. In order to compute the gradient of the policy objective, we use the Policy Gradient Theorem [sutton2000policy]. In the Policy Gradient Theorem, for any differentiable -parameterized policy , the policy gradient of policy objective is given as:

(6.11)

where the last equality indicates that the gradient of is equal to the gradient of given by