1 Introduction
The prevalence of positioning technology has made it possible to track the movements of people and other objects, giving rise to a variety of locationbased applications. For example, GPS tracking using positioning devices installed on the vehicles is becoming a preferred method of taxi cab fleet management. In many social network applications (e.g., Foursquare), users are encouraged to share their locations with other users. Moreover, in an increasing number of cities, vehicles are photographed when they pass the surveillance cameras installed over highways and streets, and the vehicle passage records including the license plate numbers, the time, and the locations are transmitted to the data center for storage and further processing.
In many of these locationbased applications, it is highly desirable to be able to accurately predict a moving object’s next location. Consider the following example in locationbased advertising. Lily has just shared her location with her friends on the social network website. If the area she will pass by is known in advance, it is possible to push plenty of information to her, such as the most popular restaurant and the products on sale in that area. As another example, if we could predict the next locations of vehicles on the road, then we will be able to forecast the traffic conditions and recommend more reasonable routes to drivers to avoid or alleviate traffic jams.
Several methods have been proposed to predict next locations, most of which fall into one of two categories: (1) methods that use only the historical trajectories of individual objects to discover individual movement patterns [12, 7], and (2) methods that use the historical trajectories of all objects to identify collective movement patterns [10, 11]. The majority of the existing methods train models based on frequent patterns and/or association rules to discover movement patterns for prediction.
However, there are a few major problems with the existing methods. First, those methods focus on either the individual patterns or the collective patterns, but very often the movements of objects reflect both individual and collective properties. Second, in some circumstances (e.g., social checkin, and vehicle surveillance), the data points are very sparse; the trajectories of some objects may consist of only one record. One cannot construct meaningful frequent patterns with these trajectories. Finally, the existing methods do not give proper consideration to the time factor. Different movement patterns exist in different time, for example, Bob is going to leave his house. If it is 8 a.m. on a weekday, he is most likely to go to work. But if it is 11:30 a.m., he is more likely to go to a restaurant, and he may go shopping if it is 3 p.m on weekends. Failing to take time factor into account would result in higher error rates in predicting the next locations.
To address those problems, we propose a Next Location Predictor with Markov Modeling (NLPMM) to predict the next locations of moving objects given past trajectory sequences. NLPMM builds upon two models: the Global Markov Model (GMM) and the Personal Markov Model (PMM). GMM utilizes all available trajectories to discover global behaviours of the moving objects based on the assumption that they often share similar movement patterns (e.g., people driving from A to B often take the same route). PMM
, on the other hand, focuses on modeling the individual patterns of each moving object using its own past trajectories. The two models are combined using linear regression to produce a more complete and accurate predictor.
Another distinct feature of NLPMM lies in its treatment of the time factor. The movement patterns of objects vary from one time period to another (e.g., weekdays vs. weekends). Meanwhile, similarities also exist for different time periods (e.g., this Monday and next), and the movement patterns of moving objects tend to be cyclical. We thus propose to cluster the time periods based on the similarity in movement patterns and build a separate model for each cluster.
The performance of NLPMM is evaluated in a real dataset consisting of the vehicle passage records over a period of 31 days (1/1/2013  1/31/2013) in a metropolitan area ^{1}^{1}1The name of the city is withheld due to the anonymity rule.. The experimental results confirm the superiority of the proposed methods over existing methods.
The contributions of this paper can be summarized as follows.

We propose a Next Location Predictor with Markov Modeling to predict the next location a moving object will arrive at. To the best of our knowledge, NLPMM is the first model that takes a holistic approach and considers both individual and collective movement patterns in making prediction. It is effective even when the trajectory data is sparse.

Based on the important observation that the movement patterns of moving objects often change over time, we propose methods that can capture the relationships between the movement patterns in different time periods, and use this knowledge to build more refined models that are better suited to different time periods.

We conduct extensive experiments using a real dataset and the results demonstrate the effectiveness of NLPMM.
The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 gives the preliminaries of our work. Section 4 describes our approach of Markov modeling. Section 5 presents methods that take the time factor into consideration. The experimental results and performance analysis are presented in Section 6. Section 7 concludes this paper.
2 Related Work
There have appeared a considerable body of work on knowledge discovery from trajectories, where a trajectory is defined as a sequence of locations ordered by timestamps. In what follows, we discuss three categories of studies that are most closely related to us.
Route planning: Several studies use GPS trajectories for route planning through constructing a complete route [2, 3, 16]. Chen et al. search the BestConnected Trajectories from a database [2] and discover the most popular route between two locations [3]. Yuan et al. find the practically fastest route to a destination at a given departure time using historical taxi trajectories [16].
Longrange prediction: Longrange prediction is studied in [8, 5], where they try to predict the whole future trajectory of a moving object. Krumm proposes a Simple Markov Model that uses previously traversed road segments to predict routes in the near future [8]. Froehlich and Krumm use previous GPS traces to make a longrange prediction of a vehicle’s trajectory [5].
Shortrange prediction: Shortrange prediction has been widely investigated [12, 7, 10, 11], which is concerned with the prediction of only the next location. Some of these methods make prediction with only the individual movements [12, 7], while others use the historical movements of all the moving objects [10, 11]. Xue et al. construct a Probabilistic Suffix Tree (PST) for each road using the taxi traces and propose a method based on Variableorder Markov Models (VMMs) for shortterm route prediction [12]. Jeung et al. present a hybrid prediction model to predict the future locations of moving objects, which combine predefined motion functions using the object’s recent movements with the movement patterns of the object [7]. Monreale et al. use the previous movements of all moving objects to build a Tpattern tree to make future location prediction [10]. Morzy uses a modified version of the PrefixSpan algorithm to discover frequent trajectories and movement rules with all the moving objects’ locations [11].
In addition to the three aforementioned categories of work, there has also appeared work on using socialmedia data for trajectory mining[9, 15, 13]. Kurashima et al. recommend travel routes based on a large set of geotagged and timestamped photographs [9]. Yin et al. investigate the problem of trajectory pattern ranking and diversification based on geotagged social media [15]
. Ye et al. utilize a mixed Hidden Markov Model to predict the category of a user’s next activity and then predict a location given the category
[13].3 Preliminaries
In this section, we will explain a few terms that are required for the subsequent discussion, and define the problem addressed in this paper.
Definition 1 (Sampling Location)
For a given moving object , it passes through a set of sampling locations, where each sampling location refers to a point or a region (in a twodimensional area of interest) where the position of is recorded.
For example, the positions of the cameras in the traffic surveillance system can be considered as the sampling locations.
Definition 2 (Trajectory Unit)
For a given moving object , a trajectory unit, denoted by , is the basic component of its trajectory. Each trajectory unit can be represented by , where is the id of the sampling location of the moving object at timestamp .
Definition 3 (Trajectory)
For a moving object, its trajectory is defined as a timeordered sequence of trajectory units: .
From Definition 2, can also be represented as where ().
Definition 4 (Candidate Next Locations)
For the sampling location , we define a sampling location as a candidate next location of if a moving object can reach from directly.
The set of candidate next locations can be obtained either by prior knowledge (e.g., locations of the surveillance cameras combined with the road network graph), or by induction from historical trajectories of moving objects.
Definition 5 (Sampling Location Sequence)
For a given trajectory , its sampling location sequence refers to a sequence of sampling locations appearing in the trajectory, denoted as .
Definition 6 (Prefix Set)
For a sampling location and a given set of trajectories , its prefix set of size , denoted by , refers to the set of sequences such that each sequence is a length subsequence that immediately precedes in the sampling location sequence of some trajectory .
4 Markov Modeling
We choose to use Markov models to solve the next location prediction problem. Specifically, a state in the Markov model corresponds to a sampling location, and state transition corresponds to moving from one sampling location to the next.
In order to take into consideration both the collective and the individual movement patterns in making the prediction, we propose two models, a Global Markov Model (GMM) to model the collective patterns, and a Personal Markov Model (PMM) to model the individual patterns and solve the problem of data sparsity. They are combined using linear regression to generate a predictor.
4.1 Global Markov Model
Using historical trajectories, we can train an order GMM to give a probabilistic prediction over the next sampling locations for a moving object, where is a userchosen parameter. Let
represents a discrete probability of a moving object arriving at sampling location
. The order GMMimplies that the probability distribution
for the next sampling location of a given moving object is independent of all but the immediately preceding locations that has arrived at:(1) 
For a given trajectory dataset, an order GMM for the sampling location can be trained in the following way. We first construct the prefix set . Next, for every prefix in , we compute the frequency of each distinct sampling location appearing after this prefix in the dataset. These frequencies are then normalized to get a discrete probability distribution over the next sampling location.
We start with a first order GMM, followed by a secondorder GMM, etc., until the order GMM has been obtained, to train a variableorder GMM. In contrast to the order GMM, the variableorder GMM learns such conditional distributions with a varying and provides the means of capturing different orders of Markov dependencies based on the observed data. There exist many ways to utilize the variableorder GMM for prediction. Here we adopt the principle of longest match. That is, for a given sampling location sequence ending with , we find its longest suffix match from the set of sequences in the prefix set of .
4.2 Personal Markov Model
The majority of people’s movements are routine (e.g., commuting), and they often have their own individual movement patterns. In addition, about 73% of trajectories in our dataset contain only one point, but they also can reflect the characteristics of the moving objects’ activities. For example, someone who lives in the east part of the city is unlikely to travel to a supermarket 50 kilometers away from his home. Therefore, we propose a Personal Markov Model (PMM) for each moving object to predict next locations.
The training of PMM consists of two parts: training a variableorder Markov model for every moving object using its own trajectories of length than 1, and a zeroorder Markov model for every moving object using the trajectory units.
For training the variableorder Markov model, we construct the prefix set for every moving object using its own trajectories, and then we compute the probability distribution of the next sampling locations. Specially, we iteratively train a variableorder Markov model with order ranging from 1 to using the trajectories of one moving object.
We train a zeroorder Markov model using the trajectory units. For a moving object, let denotes the number of times a sampling location appears in the training trajectories. Let be the set of distinct sampling locations appearing in the training trajectories. Then we have
(2) 
The zeroorder Markov model can be seamlessly integrated with the variableorder Markov model to obtain the final PMM.
4.3 Integration of GMM and PMM
There are many methods to combine the results from a set of predictors. For our problem, we choose to use linear regression to integrate the two models we have proposed.
For the given
th trajectory sequence, both GMM and PMM can get a vector of probabilities,
( for GMM and for PMM), where is the number of the sampling locations, and is the probability of location being the next sampling location. We also have a vector of indicators for the th trajectory sequence, where if the actual next location is and 0 otherwise. We can predict through a linear combination of the vectors generated by GMM and PMM:(3) 
where is a unit vector, and , , and
are the coefficients to be estimated.
Given a set of training trajectories, we can compute the optimal values of through standard linear regression that minimizes , where is the Euclidean norm. The values thus obtained can then be used for prediction. For a particular trajectory, we can predict the top next sampling locations by identifying the largest elements in the estimator .
5 Time Factor
The movement of human beings demonstrates a great degree of temporal regularity [6, 1]. In this section, we will first discuss how the movement patterns are affected by time, and then show how to improve the predictor proposed in the preceding section by taking the time factor into consideration.
5.1 Observations and Discussions
We illustrate how time could affect people’s movement patterns through Figure 1. In this case, for a sampling location , there are seven candidate next locations, and the distributions over those locations do differ from one period to another. For instance, vehicles are most likely to arrive at the fifth location during the period from 9:00 to 10:00, whereas the most probable next location is the second for the period from 14:00 to 15:00.
Therefore, the prediction model should be made timeaware, and one way to do this is to train different models for different time periods. In what follows, we will explore a few methods to determine the suitable time periods. Here, we choose day as the whole time span, i.e., we study how to find movement patterns within a day. However, any other units of time, such as hour, week or month, could also be used depending on the scenario.
5.2 Time Binning
A straightforward approach is to partition the time span into a given number () of equisized time bins, and all trajectories are mapped to those bins according to their time stamps. A trajectory spanning over more than one bin is split into smaller subtrajectories such that the trajectory units in each subtrajectory all fall in the same bin. We then train independent models, each for a different time bin, using the trajectories falling in each bin. Prediction is done by choosing the right model based on the timestamp. We call this approach Time Binning (TB).
However, this approach has some limitations: the sizes of all time bins are equal, rendering it difficult to find the correct bin sizes that fit all movement patterns in the time span, as some patterns manifest themselves over longer periods whereas others shorter. One possible improvement to TB is to start with a small bin size, and gradually merge the time bins whose distributions are considered similar by some metric. For example, in Figure 1, the distribution for the period from 11:00 to 12:00 is different from the one from 10:00 to 11:00; rather, it is similar to the one from 14:00 to 15:00 (e.g., they both have the maximal probability at the second sampling location).
5.3 Distributions Clustering
We propose a method called Distributions Clustering (DC) to perform clustering of the time bins based on the similarities of the probability distributions in each bin. Here, the probability distribution refers to the transition probability from one location to another. Compared with TB
, the trajectories having similar probability distributions are expected to be put in one cluster, leading to clearer revelation of the moving patterns. Here, we use cosine similarity to measure the similarities between the distributions, but the same methodology still applies when other distance metrics such as the KullbackLeibler divergence
[4] are used.For an object appearing at a given sampling location with a time point falling into the th time bin, let be an dimensional vector that represents the probabilities of moving from to another location, where is the total number of sampling locations. We measure the similarity of two time bins and (with respect to ) using the cosine similarity, . With the similarity metric defined, we can perform clustering for each sampling location on the time bins. The algorithm is detailed in Algorithm 1. The results will be a set of clusters, each containing a set of time bins, for the sampling location .
For a given location , we can get clusters, defined as . Combined with the order Markov model, the probability distribution for the next sampling location of a given moving object can be computed with the formula:
(4) 
We then train models with the trajectories in each cluster to form a new model NLPMMDC (which stands for NLPMM with Distributions Clustering). In the new model, the sequence of justpassed locations and the time factor are both utilized by combing distributions clustering and Markov model.
6 Performance evaluation
We have conducted extensive experiments to evaluate the performance of the proposed NLPMM
using a real vehicle passage dataset. In this section, we will first describe the dataset and experimental settings, followed by the evaluation metrics to measure the performance. We then show the experimental results.
6.1 Datasets and Settings
The dataset used in the experiments consists of real vehicle passage records from the traffic surveillance system of a major metropolitan area with a 6million population. The dataset contains 10,344,058 records during a period of 31 days (from January 1, 2013 to January 31, 2013). Each record contains three attributes, the license plate number of the vehicle, the ID of the location of the surveillance camera, and the time of vehicle passing the location. There are about 300 camera locations on the main roads. The average distance between a neighboring pair of camera locations is approximately 3 kilometers.
6.2 Preprocessing
We preprocess the dataset to form trajectories, resulting in a total of 6,521,841 trajectories. According to statistics, the trajectories containing only one point account for about 73% of all trajectories, which testifies to the sparsity of data sampling. We choose a total of 1,760,897 trajectories with the length greater than one to calculate the number of candidate next locations for every sampling location. Due to the sparsity of camera locations, about 86.3% of the sampling locations have more than 10 candidate next sampling locations, and the average number of candidate next locations is about 43. We predict top next sampling locations in the experiments.
6.3 Evaluation Metrics
Our evaluation uses the following metrics that are widely employed in multilabel classification studies [14].
Prediction Coverage: It is defined as the percentage of trajectories for which the next location can be predicted based on the model. Let be 1 if it can be predicted and 0 otherwise. Then , where denotes the total number of trajectories in the testing dataset.
Accuracy: It is defined as the frequency of the true next location occurring in the list of predicted next locations. Let be 1 it does and 0 otherwise. Then .
Oneerror: It is defined as the frequency of the top1 predicted next location not being the same as the true next location. Let be 0 if the top1 predicted sampling location is the same as the true next location and 1 otherwise. Then .
Average Precision: Given a list of top predicted next locations, the average precision is defined as , where denotes the position in the predicted list, and takes the value of 1 if the predicted location at the th position in the list is the actual next location.
6.4 Evaluation of NLPMM
We evaluate the performance of NLPMM and its components, PMM, and GMM. For each experiment, we perform 50 runs and report the average of the results. First, we study the effect of the order of the Markov model by varying from 1 to 6. Figure 2(a) shows that the accuracy has an apparent improvement when the order increases from 1 to 2 for all models. The accuracy reaches the maximum when is set to 3 and remains stable as increases further. Therefore, we set to 3 in the following experiments.
Next, we evaluate the effect of top on PMM, GMM, and NLPMM. From Figure 2(b), we can observe that the accuracy of all three models improves as increases. Furthermore, the accuracy of GMM and NLPMM is significantly better than that of PMM, and the best results are given by NLPMM. Since the average number of candidate next locations is 43 (meaning there are 43 possibilities), the accuracy of 0.88 is surprisingly good when is set to 10.
6.5 Effect of the Time Factor
We evaluate the proposed methods that take into consideration of the time factor.
Figure 3(a) shows the effect of bin size on NLPMMTB (which stands for NLPMM with Time Binning). The performance of NLPMMTB starts to deteriorate when the bin size becomes less than 8, because when the bins get smaller, the trajectories in them become too sparse to generate a meaningful collective pattern. Figure 3(b) shows the effect of the number of clusters on NLPMMDC (which stands for NLPMM with Distributions Clustering). When it is set to 1, the model is the same as NLPMM. The oneerror rate declines and the average precision improves as the number increases from 1 to 5. When it continues to increase, the result starts to get worse. This is because having too many or too few clusters with either hurt the cohesiveness or the separation of the clusters.
We evaluate the performance of NLPMM, NLPMMTB and NLPMMDC using oneerror and average precision. The results are shown in Table 1. NLPMMTB and NLPMMDC perform better than NLPMM, which is because we can get a more refined model by adding the time factor and generate more accurate predictions. NLPMMDC performs best, validating the effectiveness of the method of distributions clustering. It will be used in the following comparison with alternative methods.
NLPMM  NLPMMTB  NLPMMDC  

oneerror  53.8%  53.0%  52.3% 
average precision  60.2%  60.5%  60.9% 
6.6 Comparison with existing methods
We compare the proposed NLPMMDC with the startoftheart approaches VMM [12] and WhereNext [10]. VMM uses individual trajectories to predict the next locations, whereas WhereNext uses all available trajectories to discover collective patterns. In this experiment, we predict top next sampling location. The parameters of VMM are set as follows: memory length =3, =0.3, and =1. For WhereNext, the support for constructing Tpattern tree is set as 20. For the NLPMMDC, the setting is that the order = 3 and the number of clusters is set at 5.
Figure 4 shows the performance comparison of NLPMMDC, VMM and WhereNext in terms of prediction coverage and accuracy. As shown in Figure 4(a), NLPMMDC performs the best, which can be attributed to the combination of individual and collective patterns as well as the consideration of time factor. Figure 4(b) shows that the accuracy of each model improves as the size of training set increases. It is worth mentioning that NLPMMDC performs better than VMM and WhereNext in terms of accuracy for any training set size.
7 Conclusions
In this paper, we have proposed a Next Location Predictor with Markov Modeling to predict the next sampling location that a moving object will arrive at with a given trajectory sequence. The proposed NLPMM consists of two models: Global Markov Model and Personal Markov Model. Time factor is also added to the models and we propose two methods to partition the whole time span into periods of finer granularities, including Time Binning and Distributions Clustering. New timeaware models are trained accordingly. We have evaluated the proposed models using a real vehicle passage record dataset. The experiments show that our predictor significantly outperforms the stateoftheart methods (VMM and WhereNext).
References
 [1] (2010) Which road do i take? a learningbased model of routechoice behavior with realtime information. Transportation Research Part A: Policy and Practice 44 (4), pp. 249–264. Cited by: §5.
 [2] (2010) Searching trajectories by locations: an efficiency study. In SIGMOD, pp. 255–266. Cited by: §2.
 [3] (2011) Discovering popular routes from trajectories. In ICDE, pp. 900–911. Cited by: §2.
 [4] (2002) A new shared nearest neighbor clustering algorithm and its applications. In SDM, pp. 105–115. Cited by: §5.3.
 [5] (2008) Route prediction from trip observations. SAE SP 2193, pp. 53. Cited by: §2.
 [6] (2008) Understanding individual human mobility patterns. Nature 453 (7196), pp. 779–782. Cited by: §5.
 [7] (2008) A hybrid prediction model for moving objects. In ICDE, pp. 70–79. Cited by: §1, §2.
 [8] (2008) A markov model for driver turn prediction. SAE SP 2193 (1). Cited by: §2.
 [9] (2010) Travel route recommendation using geotags in photo sharing sites. In CIKM, pp. 579–588. Cited by: §2.
 [10] (2009) WhereNext: a location predictor on trajectory pattern mining. In SIGKDD, pp. 637–646. Cited by: §1, §2, §6.6.
 [11] (2007) Mining frequent trajectories of moving objects for location prediction. In MLDM, pp. 667–680. Cited by: §1, §2.
 [12] (2009) Trafficknown urban vehicular route prediction based on partial mobility patterns. In ICPADS, pp. 369–375. Cited by: §1, §2, §6.6.
 [13] (2013) What s your next move: user activity prediction in locationbased social networks. In SDM, pp. 171–179. Cited by: §2.
 [14] (2011) On the semantic annotation of places in locationbased social networks. In SIGKDD, pp. 520–528. Cited by: §6.3.
 [15] (2011) Diversified trajectory pattern ranking in geotagged social media. In SDM, pp. 980–991. Cited by: §2.
 [16] (2010) Tdrive: driving directions based on taxi trajectories. In GIS, pp. 99–108. Cited by: §2.