Taxi has become one of the major transportation tools in big cities nowadays. For efficient schedule and security monitoring of taxis running in a city, mobile GPS devices are widely installed in most of taxis to record and report their trajectories. Analysis of the destinations of taxi trajectories can benefit a lot of location based services for passengers, such as recommending sightseeing places, accurate ad based on destinations, etc.
Fig. 1 shows a typical use case of the destination prediction, in which a mobile TV installed in the running taxi reports its location to an ad company at fixed time intervals. Based on the trajectory of the taxi and the knowledge learned from the historical trajectories stored in the database, the company predicts that the destination of this trip may be close to a shopping mall with high probability, and thus it pushes some discount information of the products currently on sale at the mall to the mobile TV. How to precisely predict the destination of a running taxi based on historical trajectories is the key problem of this kind of location based recommendation tasks.
The most recent event related to this prediction problem is the ECML-PKDD competition , which was set up to predict the destinations of taxi trajectories in Porto. The winner of the championship 
adopted MLP (multi-layer perception) and RNN (recurrent neural network) algorithms. In addition, some recently proposed models[3, 4, 5]
predict destination by using Markov models to match query trajectories with historical records, or cluster the moving objects according to their trajectories and makes prediction according to the mobility patterns of similar objects.
Most of existing researches model trajectories as sequences of spatio-temporal points or road segments, and process them in single spatial scale. This kind of one-dimensional data structures is hard to explicitly reveal the two-dimensional spatial features of trajectories, e.g. crossings, corners, and windings, which are highly related to the structure of road networks and very useful to understanding of the state and destination of taxi trajectories. Furthermore, we observe that trajectories usually have distinct multi-scale properties, which prefer a multi-scale algorithm to improve the prediction performance. In particular, trajectories in different spatial scale can exhibit different patterns. In a micro scale, a trajectory can be observed with precise positions on the road network. However, in this scale the side-effect of noise has great impact and it is hard to discover the global trend of a large spatial scope. On the contrary, in a macro scale, the global trend of a trajectory can be easily revealed, whist its resolution is relatively low and many details are lost to support accurate prediction. In fact, combinations of the patterns in different scales may help capture the motion features more precisely and completely for better prediction.
Based on the above observation, we propose a novel prediction algorithm, T-CONV, which models trajectories from a different view by considering them as two-dimensional images, and applies multi-layer convolutional neural networks (CNN) to extract multi-scale two-dimensional trajectory features for accurate prediction. Comprehensive experiments based on real data  show that our model can achieve better precision than state-of-the-art algorithms.
The main contributions of this paper are summarized as follows:
For accurate prediction, T-CONV models trajectories as two-dimensional images and adopts multi-layer convolutional neural networks to combine multi-scale two-dimensional trajectory features. Compared with traditional one-dimensional trajectory models, T-CONV is more explicit and efficient to capture two-dimensional local patterns in different spatial scales.
To visualize the multi-scale patterns captured by T-CONV, the gradient distribution of features is analyzed, and the results show that T-CONV can 1⃝ learn small-scale trajectory patterns effectively in its lower convolutional layers, and 2⃝ combine them into large-scale patterns in higher layers. The visualization also reveals that the portions close to the start and the end locations of an input trajectory have distinct impact on the prediction of its ultimate destination.
Beyond convolving the whole input trajectory, we further integrate multiple local enhancement convolutional fields in T-CONV to further explore the specific important areas in a trajectory image, which not only can reduce the sparsity problem in trajectory datasets, but also may greatly improve the accuracy of prediction.
We conduct comprehensive experiments based on real trajectory data, the results of which show that T-CONV can achieve better accuracy than state-of-the-art methods.
The following sections are organized as follows. Section II describes some related work. Section III offers problem analysis. Section IV presents T-CONV. Section V shows experimental results to validate T-CONV. We conclude this paper in Section VI.
2 Related Work
In this section, we introduce and review some research works related to trajectory prediction.
Most of the existing research works model a trajectory as a one-dimensional spatio-temporal sequence and predict the destination by matching the query trajectory with historical trajectories. Specifically, [7, 8, 9]
represent a map as a two-dimensional grid in a unified and fixed spatial scale, and all spatial points in one cell are considered as one location identified by the id of the cell. In this way, each trajectory can be modeled as a sequence of cell ids. These algorithms apply the Bayesian inference method to measure the probability of a given destination conditioned on the observed partial trajectory according to the statistics of the historical records. However, in real deployment, the sparsity problem of trajectories often makes it hard to find the historical trajectories exactly matching the query one. Simply adopting Bayesian inference as[7, 8, 9] may return zero probability for all candidate destinations most of the time. To solve this problem, [3, 4] decompose trajectories into sub-trajectories connecting two adjacent locations, and adopt first-orders Markov model to infer the probability of all candidate destinations. 
extends the model to consider the difference between destinations and passing-by locations by adopting an absorbing Markov chain model. utilizes the road network information and divides trajectories into road segments; the research aims to predict the future path of a moving object by matching the query trajectory with historical trajectories on a road network.  expresses a personal trajectory as a sequence of cells in a grid, and organizes movement patterns in a pattern tree to support prediction.
Furthermore, clustering algorithms are commonly used techniques to overcome the sparsity problem and extract meaningful basic units in trajectories.  clusters the moving objects according to their trajectories and makes prediction using the mobility patterns of similar objects. 
clusters historical trajectories and uses the Gaussian mixture model to describe the distribution of spatial points in a trajectory cluster. Each query trajectory is assigned to multiple clusters and the mean of the destinations of these clusters is the predicted destination. uses clustering algorithms to extract some important spatial points, named the support points, which are close to crossroads on trajectories. An HMM based algorithm is adopted to establish the relationship between support points and destinations.  proposes a variant of the DBSCAN clustering algorithm to obtain stay points, and applies the variable order Markov Model to predict next locations of personal trajectories.
More recently, neural network based algorithms are receiving more attentions, as they can more accurately predict destinations of taxi trajectories; one such algorithm  actually won the championship of the ECML-PKDD competition . As reported,  considers the first k points from the start and the last k points close to the end of a query trajectory, and feeds these spatial points into neural networks to perform prediction. A lot of neural networks are tested in the research, including normal multi-layer perception, LSTM based RNN(recurrent neural network), bi-directional RNN, and memory networks. Experiments based on a large validation set show that the bi-directional RNN outperforms the other methods.
Most of above studies extract the features of trajectories in only one specific spatial scale. Specifically, in the grid based algorithms [7, 8, 9, 3], the scale is determined by the density of the grid, which is usually predefined and unified in the systems. The clustering based algorithms [6, 12, 13, 14] cluster the spatial points and trajectories through their spatial distribution, and perform destination prediction based on the clusters which form a specific kind of spatial scale and keep unchanged during the prediction. On the other hand, in the neural network based models such as , trajectories are processed in the finest granularity, and prediction is conducted based on the spatial points, each of which is represented as a tuple of latitude and longitude.
3 Problem Analysis
3.1 Problem Definition
A taxi has two different types of states: the idle state (no passengers) and the occupied state (with passengers), which can be distinguished by the signals collected from the seat occupancy sensors in the taxi. As shown in the and steps of Fig. 1, the problem considered in this paper is to predict the destination of a running taxi in the occupied state.
Formally, given a taxi , its no. trajectory is usually recorded as a sequence of GPS locations collected from the GPS devices at fixed time intervals:
is a GPS location. and are the longitude and latitude of . is the length of the trajectory . Especially, the first location means the start of the current trip, which indicates where the passenger(s) came from. The last location is the newest location of the taxi. The ultimate destination of any trajectory is denoted by .
The prediction problem can be defined as follows: given any trajectory of a running taxi , we need to predict the ultimate destination of the taxi in the current trip based on the knowledge learned from the historical trajectory set.
3.2 Multi-scale Properties of Trajectories
While trajectories are displayed on the map, a two-dimensional image such as Fig. 2 can be obtained. Similar to other two-dimensional spatial data such as weather and traffic data, trajectories also have significant multi-scale properties: in different spatial scales, trajectories can exhibit different two-dimensional patterns. Fig. 2 shows a typical example of taxi trajectories in different scales. The dash line represents the trajectory of a running taxi starting from the location marked with ’O’. Other trajectories in the figures are historical trajectories in the training set. Fig. 2 shows the original trajectories in the highest resolution. When processing trajectories in different scales, the grid is a commonly used tool to divide the map and combine the spatial points belonging to the same cell into one indivisible unit. Fig. 2 shows the micro spatial scale with a dense grid, in which trajectories are recorded with fine granularity. In this scale, the view of trajectories displays most details of their motion patterns, and the overlapping degree of the trajectories is low, which causes the sparsity problem when performing trajectory matching. As we divide the map into a sparser grid, we can illustrate the trajectories in a larger spatial scale, as shown by Fig. 2. In this scale, the overlapping of the trajectories increases and the global trend of moving objects are easier to be captured. The trajectories are grouped into the two clusters, according to which the destination of the taxi looks like to be ’x’ with a high probability. Fig. 2 shows the macro spatial scale, in which the global trend of trajectories is clearer and all trajectories overlap, but no detail about the local difference is preserved, which may nevertheless be important for accurate prediction.
In a nutshell, trajectories in small scales show micro local spatial patterns like corners, curves and crossings, but cause sparsity problem. On the other hand, in macro scales they show the global trend clearly but it is difficult to sense important local changes. A proper combination of the trajectory patterns in multiple scales may be better than the existing single scale based methods to capture both the micro and macro spatial patterns for more accurate prediction.
4 Trajectory Modeling and Prediction
4.1 Image Based Modeling of Trajectory
Firstly, we divide the map into an grid evenly as in Fig.2(b), where is a predefined constant representing the resolution of the map. We use to denote the cell in the row and column of the grid. Each GPS point can be mapped to a cell according to its latitude and longitude. The mapping relationship is denoted as:
In this way, given a trajectory we can achieve a two-dimensional image , where the value of the pixel is defined as:
The image is called the trajectory image of in this paper. Here, the function returns a non-negative value for the GPS point to indicate the location information.
4.2 Prediction Based on CNN
Inspired by the successful application of the convolutional neural networks (CNN) based algorithms [16, 17], which enable extracting multi-scale features from pixel-level raw data of images, we propose a CNN based model, T-CONV, to capture the multi-scale spatial patterns from trajectory images and combine these patterns for destination prediction.
The model is shown in Fig. 3. Given a query trajectory , it is transformed into an trajectory image and input into the model. The value of the pixel is defined as:
In order to distinguish the end point of the trajectory from other points, is set to 1 when the last point of the trajectory is in , and it is set to 0.5 for other points in the trajectory.
The input images are fed into the CNN model, which is composed of groups of convolution layers and max-pooling layers. Specifically, the convolution layerconvolves to generate the multi-channel feature map (with channels), each element of which is:
Here is the convolution parameter, is the bias parameter, indicates the size of the sliding convolution window, and. The main contribution of the convolution operations is to extract the local spatial dependency among neighboring spatial points.
The following max-pooling layer in Fig. 3 down-samples the input feature maps to generate high-level features in a larger scale. It filters out the pixels with less activation values to generate distilled feature maps , each element of which is:
Here indicates the pooling size.
The combination of the convolution and max-pooling operations can be applied to the feature maps multiple times to form a hierarchical CNN, such as the layers in Fig. 3
. Higher-level features extract the patterns in larger scales, which are calculated from the combination of lower-level features in smaller scales. In this way, multi-scale patterns are embedded into the output features of the highest level of the CNN.
The output features of
are then flattened into one-dimensional vectorand fed to the fully connected neural layer . Moreover, the meta data about date and client information are transformed as embedded vectors  and also flattened into one-dimensional vector , which is input to the layer . Each output element of is calculated as follows:
Here , , and are the parameters of , which need to be learnt.
In the final steps, similar with , we calculate the predicted desptination as the weighted sum of the frequently visited locations, which are the cluster centers achieved by the mean-shift clustering algorithm deployed on the destinations of all training trajectories. In particular, we have:
Here is the set of cluster centers, and the weight on each cluster center is determined by the softmax function of the output elements of .
4.3 Visualization and Distillation of Patterns
In order to deeply understand the semantics of the features captured by the convolutional neural network, we can measure the relationship between the output features and the input trajectory image by calculating the following gradient:
Here indicates an output feature of the layer , and is a pixel of the trajectory image. The gradient indicates the contribution of the pixel value on the feature . By visualizing the gradient distribution of all pixels in the image, we can observe what patterns captured by the feature from the raw input image. This visualization method is similar with the Deconvolution Network presented in .
We validate this idea by training the model with the real trajectory set from the ECML-PKDD competition , and visualize the patterns captured by the features in the different layers of T-CONV. Fig. 4 shows the patterns captured by the convolutional layer while processing a typical trajectory. This pattern is achieved by firstly selecting the largest output feature of which has the greatest impact on the prediction result, then calculating the gradients over each pixel by Eq. (10), and showing the gradient distribution on the map. In a similar way, as shown in Fig. 4, we select the top 6 output features of the layer and visualize the patterns captured by them.
Fig. 4 shows that layer can capture the small-scale patterns, which focus on the detailed changing in small local areas. Meanwhile, Fig. 4 shows that the higher layer can combine the lower-level small-scale patterns into effective patterns in a larger scale. Furthermore, the red and thick areas in Fig. 4 correspond to bigger gradients and indicate important portions in the patterns. From Fig. 4, it is interesting to find that the trained T-CONV model can ”recognize” that the local areas close to the start point and end point of the trajectory contribute much more to the destination prediction. Intuitively, the end part of a trajectory is important because it is closest to the destination, and it reveals the latest trend of the trajectory. Meanwhile, the start portion of a trajectory is also important, as it indicates where the customers come from and represents some inner motivation of the trip.
To further verify the uneven contribution of different portions in a trajectory, we randomly select 1,000 trajectories from the dataset  and calculate the gradient distribution of different portions of a trajectory as shown in Fig. 5. The distribution is achieved by calculating the gradients of the largest feature in over the pixels in each trajectory, and make statistics of the gradients in different portions of a trajectory. It shows clearly that the first and last portions of a trajectory have great impact on the destination prediction. This observation motivates us to design better prediction models by further exploring these two important local areas.
4.4 Local Enhancement CNN for Prediction
Based on the observation that the local portions near the start and end of a trajectory play more significant roles to the destination prediction, we can convolve these local areas deeply beyond deploying convolution over the global image. Fig. 6 shows the architecture of this model, which focuses on the two local areas centered at the start and the end of an input trajectory. Each local area is then divided into an grid and recorded as two trajectory images: latitude image and longitude image , which record the precise latitudes and longitudes of the points in the sub-trajectory belonging to the local area. Specifically, given the local area, which contains the sub-trajectory , the pixel value of its latitude image is assigned as:
. The pixel value of its longitude image is assigned as follows:
. Here means the latitude of the center of the cell , while means the longitude of the same position. denotes the normalization function to transform its input into the range [-1,1]. With the above processing, totally four images can be obtained, which are stacked together and fed into the convolutional neural network of Fig. 3.
The local enhancement convolution on the important local areas may benefit efficient extraction of important patterns for prediction. Furthermore, by only focusing on small subsets of the whole trajectory image, the overlapping degrees of input latitude images and longitude images among different trajectories are much higher than original global trajectory images. This can help overcome the sparsity problem of trajectories , and facilitate the convergence of the learning procedure.
In this section, we conduct experiments to validate the effectiveness of the T-CONV models, and compare them with the other state-of-the-art algorithms.
5.1 Experiment setup
We test the performance of T-CONV on the real trajectory dataset of the ECML-PKDD competition, which is collected from 442 taxis running in the city of Porto for a complete year (from 2013-07-01 to 2014-06-30), and has totally 1.7 millions complete trajectories. In order to validate the prediction accuracy, we use the same testing dataset as , which contains 19,770 incomplete trajectories randomly selected from the original training set. The remaining trajectories are used for training.
Two T-CONV models are implemented: the T-CONV-Basic which means the simple CNN model presented in section IV.B, and the T-CONV-LE which means the local enhancement CNN model mentioned in section IV.D. We mainly compare T-CONV with the neural networks based models, which win the champion of the ECML-PKDD competition. These models are listed as follows:
|Bidirectional RNN with window||2.60|
: a simple multi-layer perceptron model. The input includes the latitudes and longitudes of the first and last 5 points of an input trajectory.
: a bidirectional recurrent neural network which considers a trajectory as a sequence, and each GPS point in the trajectory forms a transition state of RNN.
Bidirectional RNN with window: a variant of the above Bidirectional RNN model, which uses a sliding window of 5 successive GPS points as a transition state of RNN.
Memory network: another variant of the above RNN model which encodes the trajectories into vectors with fixed length, and compares the similarity between the input trajectories and historical trajectories in this vector space.
The evaluation error to measure these models is the Haversine distance between the real destination and the predicted location. The distance is defined as:
, is the latitude, is the longitude and is the radius of the earth.
The prediction errors of our proposed models are compared with the baseline models listed in Table 1. The result shows that both T-CONV models are better than the baseline MLP models. As shown in Fig. 3, T-CONV models share similar output layers as the MLP layers, but use the multi-layer convolutional neural networks to capture trajectory patterns. The superior performance of T-CONV demonstrates that the effectiveness of this kind of multi-scale patterns extracted by T-CONV. In addition, the T-CONV models are also better than the Bidirectional RNN model and Memory network, which are also based on one spatial scale.
When looking into the two T-CONV models, T-CONV-LE is much better than T-CONV-Basic, which shows that the convolution on the important local areas can enhance the ability of CNN to capture significant patterns. As shown in Fig. 7, we also test the prediction performance of these two models given the trajectories with different completeness ratios. The completeness ratio indicates the percentage of the current observed trajectory compared to its full trajectory, and it implies how far the end point of the trajectory to destination. Fig. 7 shows that the closer of the taxi to the destination, the lower prediction error to be achieved. Furthermore, it also shows that T-CONV-LE performs much better in most of the cases.
5.3 Sensibility of parameters
In this section, we test how the change of the hyper parameters in T-CONV-LE may affect its prediction efficiency. We focus on the two parameters related to the spatial scale of trajectory processing, namely, M and W:
: As shown in Fig. 6, is the number of rows (or columns) of the grid. A Higher corresponds to a denser grid with small granularity of the cells.
: Measured in meters, is the width of each cell of the grid. Thus indicates the size of the local areas of T-CONV-LE.
We test the performance of T-CONV-LE with different combinations of various and . Fig. 8 and Fig. 9 show how the prediction error decreases during the training stage. Specifically, Fig. 8 shows the performance of the models with different while is unchanged, and Fig. 9 shows the performance of the models with different given fixed . Both Fig. 8 and Fig. 9 indicate that the performance of T-CONV-LE is not sensitive to the slight change of and . Thus, the multi-layer convolutional neural networks of the model are robust to extracting multi-scale patterns for precise prediction.
In this paper, we have presented a convolutional neural network based model, T-CONV, to predict the destinations of taxi trajectories. Different from traditional research works which process trajectories as one-dimensional sequences in one single scale, T-CONV models trajectories as two-dimensional images and combines multi-scale trajectory patterns through multi-layer convolution operations. Furthermore, inspired by the observation that different portions of a trajectory have significant difference of contribution to the final prediction, a local enhancement T-CONV model named T-CONV-LE is presented to efficiently extract the patterns of the important local areas in a trajectory. Experiments show that T-CONV-LE can achieve better prediction accuracy than state-of-the-art algorithms.
In the future, we will extend the current T-CONV-LE model, which has two local enhancement areas with fixed size, to the more general one, which has multiple local enhancement areas with tunable size. We will also extend the experiments to validate our model in more real trajectory datasets.
-  Kaggle. Kaggle competition. https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i, 2015.
-  Alexandre de Brébisson, Étienne Simon, Alex Auvolat, Pascal Vincent14, and Yoshua Bengio14. Artificial neural networks applied to taxi destination prediction. ECML-PKDD-DCs, 2015.
-  Andy Yuan Xue, Rui Zhang, Yu Zheng, Xing Xie, Jin Huang, and Zhenghua Xu. Destination prediction by sub-trajectory synthesis and privacy protection against such prediction. In Data Engineering (ICDE), 2013 IEEE 29th International Conference on, pages 254–265. IEEE, 2013.
-  Andy Yuan Xue, Jianzhong Qi, Xing Xie, Rui Zhang, Jin Huang, and Yuan Li. Solving the data sparsity problem in destination prediction. The VLDB Journal, 24(2):219–243, 2015.
-  Xiang Li, Mengting Li, Yue-Jiao Gong, Xing-Lin Zhang, and Jian Yin. T-desp: Destination prediction based on big trajectory data. IEEE Transactions on Intelligent Transportation Systems, 17(8):2344 – 2354, 2016.
-  Meng Chen, Yang Liu, and Xiaohui Yu. Predicting next locations with object clustering and trajectory clustering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 344–356. Springer, 2015.
-  John Krumm and Eric Horvitz. Predestination: Inferring destinations from partial trajectories. In International Conference on Ubiquitous Computing, pages 243–260. Springer, 2006.
-  Ling-Yin Wei, Yu Zheng, and Wen-Chih Peng. Constructing popular routes from uncertain trajectories. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 195–203. ACM, 2012.
-  Brian D Ziebart, Andrew L Maas, Anind K Dey, and J Andrew Bagnell. Navigate like a cabbie: Probabilistic reasoning from observed context-aware behavior. In Proceedings of the 10th international conference on Ubiquitous computing, pages 322–331. ACM, 2008.
-  Sang-Wook Kim, Jung-Im Won, Jong-Dae Kim, Miyoung Shin, Junghoon Lee, and Hanil Kim. Path prediction of moving objects on road networks through analyzing past trajectories. In International Conference on Knowledge-Based and Intelligent Information and Engineering Systems, pages 379–389. Springer, 2007.
-  Ling Chen, Mingqi Lv, and Gencai Chen. A system for destination and future route prediction based on trajectory mining. Pervasive and Mobile Computing, 6(6):657–676, 2010.
-  Philippe C Besse, Brendan Guillouet, Jean-Michel Loubes, and François Royer. Destination prediction by trajectory distribution based model. arXiv preprint arXiv:1605.03027, 2016.
Juan Antonio Alvarez-Garcia, Juan Antonio Ortega, L Gonzalez-Abril, and
Trip destination prediction based on past gps log using a hidden markov model.Expert Systems with Applications, 37(12):8166–8171, 2010.
-  Jie Yang, Jian Xu, Ming Xu, Ning Zheng, and Yu Chen. Predicting next location using a variable order markov model. In Proceedings of the 5th ACM SIGSPATIAL International Workshop on GeoStreaming, pages 37–42. ACM, 2014.
-  Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
-  Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
-  Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
Matthew D Zeiler and Rob Fergus.
Visualizing and understanding convolutional networks.
European Conference on Computer Vision, pages 818–833. Springer, 2014.