Mobility Inference on Long-Tailed Sparse Trajectory

by   Lei Shi, et al.
Beihang University

Analyzing the urban trajectory in cities has become an important topic in data mining. How can we model the human mobility consisting of stay and travel from the raw trajectory data? How can we infer such a mobility model from the single trajectory information? How can we further generalize the mobility inference to accommodate the real-world trajectory data that is sparsely sampled over time? In this paper, based on formal and rigid definitions of the stay/travel mobility, we propose a single trajectory inference algorithm that utilizes a generic long-tailed sparsity pattern in the large-scale trajectory data. The algorithm guarantees a 100% precision in the stay/travel inference with a provable lower-bound in the recall. Furthermore, we introduce an encoder-decoder learning architecture that admits multiple trajectories as inputs. The architecture is optimized for the mobility inference problem through customized embedding and learning mechanism. Evaluations with three trajectory data sets of 40 million urban users validate the performance guarantees of the proposed inference algorithm and demonstrate the superiority of our deep learning model, in comparison to well-known sequence learning methods. On extremely sparse trajectories, the deep learning model achieves a 2× overall accuracy improvement from the single trajectory inference algorithm, through proven scalability and generalizability to large-scale versatile training data.



page 1

page 2

page 3

page 4


Inferring Taxi Status Using GPS Trajectories

In this paper, we infer the statuses of a taxi, consisting of occupied, ...

Methodology for Mining, Discovering and Analyzing Semantic Human Mobility Behaviors

Several institutes produce large semantic data sets about daily activiti...

Mutual Distillation Learning Network for Trajectory-User Linking

Trajectory-User Linking (TUL), which links trajectories to users who gen...

Adaptive Reinforcement Learning Model for Simulation of Urban Mobility during Crises

The objective of this study is to propose and test an adaptive reinforce...

Pattern Ensembling for Spatial Trajectory Reconstruction

Digital sensing provides an unprecedented opportunity to assess and unde...

Identifying Hidden Visits from Sparse Call Detail Record Data

Despite a large body of literature on trip inference using call detail r...

Trajectory Test-Train Overlap in Next-Location Prediction Datasets

Next-location prediction, consisting of forecasting a user's location gi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The recent surge of metropolitan-scale human trajectory data, e.g., mobile traces [Ratti06], taxi logs [yuan2013t], and geo-referenced check-ins [GeoTweet13], paves the way for a fundamental understanding of the human mobility in cities. In both theoretical and empirical studies, the urban trajectory of human is considered as interleaving segments of stay and travel [gonzalez2008understanding][brockmann2006scaling][calabrese2010geography]

. The inference of these segments from the raw trajectory data plays a pivotal role in solving many urban analytics tasks. For example, in traffic planning and optimization, the detected travels are used as the training data for the travel time estimation

[wang2014travel][Thiagarajan:2009:VAE]. In trade area analysis, the discovery of user’s visits to business sites relies on the segmentation of stays and travels from the trajectory data [qu2013trade].

In the literature, there is a consensus that the stay segments (also known as the stops) can be defined as the part of the trajectory within a spatially constrained region for a sufficiently long time [calabrese2010geography][Phithakkitnukoon10][jiang2013review]. Algorithms have been proposed that first partition the trajectory at all the record intervals larger than a spatial threshold and infer the resulting sub-trajectories as stay by the definition [calabrese2010geography][Phithakkitnukoon10]. On the other hand, no definition of the travel segment has been formulated on the trajectory data. Existing works mostly assume a dense sampling rate in the trajectory data, i.e., seconds or a few minutes on average between the consecutive trajectory records [herring2010estimating][rahmani2013path][li2017citywide]. For such data, the real-time speed of the trajectory can be calculated, which is used to accurately detect all the travels.

Nevertheless, the metropolitan-scale measurement of human trajectories is often extremely sparse over time for pragmatic constraints such as the power consumption and the user privacy. In the mobile sensing data used in this paper, the average record interval is as long as two hours, two magnitudes larger than that of the previously considered trajectory data. The existing mobility inference algorithms designed for the dense trajectories do not work any more. For instance, two consecutive records in a trajectory with a 24-hour interval reported at the nearby locations will be identified as in the same stay segment (Figure 1(c)). In fact, these records could be either the separate stays at home or two pass-bys during the daily commute. We are agnostic about their mobility given the single trajectory information only.

The inference of stay and travel on sparsely sampled trajectories are highly challenging. First, the real-world human movement is a complex process with varied speeds and spatiotemporal patterns. How can we have a comprehensive definition of stay and travel on the human trajectory for various applications? Second, with the mobility definition, how do we know which part of the trajectory can be inferred as stay or travel using the single trajectory information? How can we design the inference algorithm to work with the metropolitan-scale trajectory data with billions of records? Third, it has been known that the human movement exhibits strong regularity (e.g., a 93% predictability [Science10]). How can we leverage such regularity to overcome the limit of the single trajectory inference?

To answer the aforementioned questions, we make the following contributions in this work.

  • The formal definition of both stay and travel on the continuous trajectory model using a pair of spatial and temporal parameters. The linkage of this continuous mobility model to the sparse trajectory data is rigorously studied, which helps to formulate our research problem. (Section II)

  • The single trajectory inference algorithm called Slice & Doubly Sliding (SDS) designed according to a generic long-tailed sparsity pattern in our trajectory data (Section III). The algorithm is proved to guarantee a 100% inference precision and a lower-bounded recall subject to the single trajectory information. (Section IV)

  • The optimized encoder-decoder architecture that captures the regularity of human mobility at the population scale. Several improved architecture designs are introduced to cope with the mobility inference problem, including the decoder mask on unlabeled records, the attention mechanism for extra long trajectories, and the embedding of the mobility-related space and time information. (Section V)

We evaluate the proposed SDS algorithm and the encoder-decoder architecture on both the simulated trajectory data and a sparse trajectory data set characterizing the mobility of 40 million residents in three major Chinese cities (Section VI). The experiment results validate the theoretical performance of the SDS algorithm and demonstrate three key advantages of the deep learning model on mobility inference: the capability to utilize the spatiotemporal information of multiple trajectories, the scalability to large training data, and the generalizability to different sets of trajectories.

Ii Problem Definition

Fig. 1: Illustrative examples of Definition 1 and Definition 2: (a) continuous stay/travel segments; discrete stay/travel segments on (b) dense trajectory; (c) sparse trajectory.

We first consider the urban trajectory defined by a continuous mobility model. During a time period , a user trajectory is composed of a list of temporally continuous records by . denotes the location of a user at time .

Definition 1

Continuous mobility of a trajectory – On the continuous segment of the trajectory during a time period , denoted as , we define its mobility by:

(a) is a stay segment if: and ();

(b) is a travel segment if: does not overlap with any continuous segment satisfying (a).

Here denotes the length of a time period, is the norm that computes the spatial distance between two records. and are the temporal and spatial parameters in the mobility definition.

As shown by the red curve in Figure 1(a) where the hollow node stands for a long-time stop, Definition 1(a) models the stay segment as a sufficiently long time period () when the trajectory is kept within a circular region of radius . This definition is consistent among all the previous literature [calabrese2010geography][Phithakkitnukoon10][jiang2013review]. Note that the stay segments by definition can overlap with each other in space and time. Their enclosure is called the maximal stay segment. On the other hand, based on the ground truth that a user can either stay or travel in any time point, the segment not overlapped with any stays is defined as the travel segment (Definition 1(b)).

The continuous mobility model can not be exactly computed in the real world as the human trajectory is hardly measured continuously. In most cases, the trajectory is composed of a list of discrete records on certain time points (e.g., ) denoted by where denotes the size of the trajectory. A discrete mobility model can be defined in analogy to the continuous model.

Definition 2

Discrete mobility of a trajectory – On the discrete segment of the trajectory in a time series , denoted as , define its mobility by:

(a) is stay if: and ();

(b) is travel if: does not overlap with any discrete segment satisfying (a).

The discrete mobility model can be optimally computed by an exact algorithm (Algorithm 2). Nevertheless, the resulting mobility is not always equivalent to that of the continuous model with the full trajectory information. For example, in Figure 1(b), the stay and travel segments detected on a densely sampled trajectory by the discrete mobility model generally echo those by the continuous model (Figure 1(a)). In comparison, the detected segments shown by Figure 1(c) on the same but sparsely sampled trajectory turn out to be erroneous and largely deviate from the continuous model. The theorem below reveals the relationship between the two models.

Theorem 1

Intrinsic linkage between discrete and continuous mobility of a trajectory – Consider a discrete segment of the trajectory. Let be the maximal time interval between the consecutive records of , be the maximal movement speed in :

(a) satisfying Definition 2(a) under the parameters of and is also a stay segment by Definition 1(a) in the continuous model under the parameters of and ;

(b) satisfying Definition 2(b) under the parameters of and is also a travel segment by Definition 1(b) in the continuous model under the parameters of and .

The proof is given in Appendix A. By Theorem 1, for the discrete trajectory satisfying , i.e., having a dense sampling rate, the discrete mobility of the trajectory computed by the exact algorithm can approximate its continuous mobility with tiny parameter changes. Unfortunately, the measurement of human trajectories in big cities is often extremely sparse over time for pragmatic constraints such as the power consumption and the user privacy (e.g., the data set in Section III-A). This work studies the inference of the continuous mobility from the sparse trajectory, which can not be approximated by Theorem 1.

Problem: Mobility Inference on Sparse Trajectory

Given: (1) a set of urban users; (2) each user’s sparse trajectory that ; (3) the parameters of and that define the mobility of the trajectory.

Infer: the continuous mobility of the sparse trajectory at the time of each record, which is denoted by .

Note that the parameters of and determine the spatiotemporal scale of mobility. Unless otherwise noted, we use , to study the intra-city mobility. The parameter selection is discussed in Appendix B.

Iii Sparsity Analysis on Trajectories

By Theorem 1, our research problem seems intractable on sparse trajectories. In this section, we analyze a set of real-world trajectory data and discover a generic sparsity pattern that can be utilized in accurately inferring the human mobility.

Iii-a Data Source

The trajectory data is provided by a mobile analytics company that keeps track of billions of smart devices in China, including mobile phones, tablets, wearable devices, etc. The company’s third-party APIs are registered inside more than 100,000 types of mobile apps in a wide spectrum of domains. When a registered app is activated on a device (not necessarily being used), the API will report the location of that device to the company server. The metadata of each trajectory record is shown in Table I.

We extract the full-scale trajectory data within three major Chinese cities during a period of 90 consecutive days in 2016, as shown in Table II. The data set is immensely huge, e.g., in Beijing it captures the trajectory of 31.8 million devices, which accounts for 50% of the city’s population. The spatial precision of each record is kept within 100 meters, by using the records collected by GPS and Wi-Fi.

Field Description Sample
Time Timestamp of record 18:02:41/07/12/2016
Lon. Longitude of location 116.523625
Lat. Latitude of location 39.792935
Mid Unique device ID 1370021020431
TABLE I: The metadata of each urban trajectory record.
City #Device #Record Size Length
Beijing 31849742 8407648917 738.1G 90 days
Tianjin 8011128 2858575880 206.8G 90 days
Tangshan 2786668 920364499 64.8G 90 days
TABLE II: Three trajectory data sets used in this work.

Iii-B The Long-Tailed Sparsity Pattern

We study the sampling statistics of the trajectory data. The time intervals between consecutive records inside the trajectory are averaged to 2.5 hours in all the three data sets. With these extremely sparse trajectories, it seems impossible to infer their continuous mobility. As a reference, most journeys in a city elapse no longer than two hours, during which less than one record is reported on average.

Taking a closer look, we identify that the sampling pattern in our data set, though sparse, is highly skewed. Figure

2(a) depicts the distributions of the between-record time intervals, which follow power-law like decays in the log-log scale. We call this pattern the long-tailed sparsity: most intervals are very short while there are also quite a few extremely long intervals that contribute to the large average. Take the Beijing data set as an example, 88.9% intervals are smaller than 30 minutes (a typical for Definition 1). At the trajectory level, it is observed that most trajectories are composed of multiple densely sampled segments that are far apart from each other over time. An example trajectory is depicted in Figure 2(b).

To capture the long-tailed sparsity pattern, we define two metrics on each trajectory. These metrics are shown later to correlate with the capability for the mobility inference.

Definition 3

Sparsity metrics of a trajectory – for any trajectory observed at :

(a) global sparsity is the average time interval between the consecutive records of : ;

(b) local coverage is the ratio of the records within the dense segment: , .

Here and denote the mean function and the size of a set. Note that the global sparsity is independent of the temporal parameter , while the local coverage is related to .

We compute the sparsity metrics in the trajectory data of Beijing under five parameter settings ( minutes). The distribution of the metrics are depicted in Figure 3. The global sparsity in Figure 3(a) follows a power-law like decay similar to the distribution of the between-record intervals in Figure 2(a). The distribution of the local coverage in Figure 3(b) shows an exponentially increasing pattern that most of the trajectories have a high local coverage (with an average of 0.897 at minutes). This demonstrates that most of the records in the long-tailed sparse trajectory are in the densely sampled segment of the trajectory. As shown in Figure 3(c), the trajectories with extremely low and high sparsity tend to be smaller in length, i.e., the densely sampled short snippets or a few long-distance samples of a trajectory. The trajectories in both cases are therefore exempted from the subsequent analysis.

Note that the long-tailed sparsity pattern is also found in other data sets and application domains. For example, Gonzalez et al. studied the mobile phone user’s trajectory data where the location of the user is reported upon each phone call or text message [gonzalez2008understanding]. The time intervals between consecutive records follow a long-tailed power-law decay. In a recent work, Chen et al. analyzed the sparsely sampled geo-tagged social media data [ChenSparse16]. The distributions of the time interval and distance between records follow power-law decays within the space and time scale of a single trip (1 day, 500km).

Fig. 2: The long-tailed sparsity pattern: (a) the distribution of between-record time intervals; (b) an example trajectory.
Fig. 3: The distribution of the sparsity metrics in the data of Beijing: (a) global sparsity; (b) local coverage; (c) average trajectory length by global sparsity.

Iv Single Trajectory Inference

Input :  (sparse trajectory), , (space, time parameters)
Output :  (mobility of each record)
1 begin
       /* into segments () at every interval larger than */
2       for  do
             /* detect all the stay segments on */
3             for  do
                   /* iterate all the records backward from */
4                   for  do
                         /* cut at the first escape outside the range of */
5                         if  then
                               /* stay segment */
6                               if  then
7                                     for  do
8                                           S
10                              Break
             /* detect all the travel records on */
13             for  && S do
                   /* find the first left record outside the range of */
14                   for  do
15                         if  then
16                               Break
                   /* find the first right record outside the range of */
18                   for  do
19                         if  then
20                               Break
22                  if  then
23                         T
26      return
Algorithm 1 SDS on long-tailed sparse trajectories.

We propose the mobility inference algorithm on the single trajectory. The main idea is to leverage the long-tailed sparsity pattern discovered in our trajectory data (Section III-B). Though the average record interval in a trajectory is too large to apply Theorem 1, each trajectory can be decomposed into multiple densely sampled segments, on which the continuous mobility can be confidently inferred.

Definition 4

Dense stay segment – a discrete segment of the trajectory defined in the time series is a dense stay segment of if:

(a) is a stay segment of by Definition 2(a);

(b) any consecutive time interval of is small enough: . is the parameter used in Definition 2(a).

Observation 1

Continuous stay assumption – Consider a dense stay segment detected from the long-tailed sparse trajectory, which is defined in the time series . For any unobserved time point , we hypothesize that and .

Observation 1 states that if a user is observed frequently in a region of diameter , any intermediate location between observations is also within a similarly constrained region. We empirically validate this observation by the experiment in Appendix B

on our trajectory data set. The probability of violating the observation is below

in most cases. When Observation 1 holds, we can develop two theorems that characterize the continuous mobility of stay and travel on long-tailed sparse trajectories.

Theorem 2

Continuous mobility of dense stay segments – In the long-tailed sparse trajectory :

(a) any dense stay segment satisfying Definition 4 under the parameters of and is also the stay segment by Definition 1(a) in the continuous model under the parameters of and ;

(b) the continuous mobility of any discrete segment in the time period can be inferred as stay by Definition 1(a) under the parameters of and only if defined in is the dense stay segment under the same parameters.

Theorem 3

Continuous mobility of travel records – Consider a discrete trajectory defined in the time series :

(a) any record at time is in the travel segment by the continuous model of Definition 1(b) under the parameters of and if only there exist that: 1) ; 2) ; 3) ;

(b) any record at time can be inferred as in the travel segment by Definition 1(b) under the parameters of and only if there exist that: 1) ; 2) ; 3) .

The proofs are given in Appendix A. By Theorem 2 and Theorem 3, we design a new algorithm to infer the continuous mobility of a single long-tailed sparse trajectory, called Slice & Doubly Sliding (SDS). As shown in Algorithm 1, the algorithm first slices the trajectory into multiple dense segments at all the intervals larger than (L2). On each dense segment , the stay/travel segments are detected respectively (L310, L1119). In particular, the stay detection checks all the segments of with the condition in Definition 4 under the parameters of and by Theorem 2. To avoid the worst-case complexity, we introduce a doubly sliding window data structure which keeps track of the currently checked segment. The key of the algorithm lies in that, when one pair of records no closer than are found (L6), all the segments containing this pair of records will be pruned early in the detection and the sliding window will advance aggressively (L10). The travel detection follows Theorem 3. The average-case complexity of SDS is where is the average number of records in a maximal stay segment.

According to Theorem 2(a) and Theorem 3(a), the SDS algorithm guarantees a 100% precision in the mobility inference of both stay and travel. By Theorem 2(b) and Theorem 3(b), the lower bounds of the recalls in detecting the stay and travel are and respectively, where are the number of stay and travel records detected by the SDS algorithm from under the parameters of and . Note that the recall is defined on all the stay and travel records that can be detected given the single sparse trajectory, not on the continuous mobility of records given the full trajectory information. Another advantage of the SDS algorithm lies in that it also works for dense trajectories, each of which is treated as one densely sampled segment.

We applied the SDS algorithm to our data set in Beijing. 47.0%50.2% and 0.044%0.83% records are detected as stay and travel, depending on the parameters of and . Figure 4 shows the average stay/travel percentages by the global sparsity of a trajectory. All the curves are bell-shaped with only one peak: the highest ratio of stay is found at the global sparsity around 1.6 min (97.3%98.5%, Figure 4(a)); the highest ratio of travel is found at the global sparsity from 5 min to 10 min, which increases with (0.34%3.8%, Figure 4(b)). Before the peak of stay, the trajectory is mostly composed of less than 10 records (Figure 3(c)), with a time period shorter than and can not be inferred as stay. After the peaks of stay and travel, the ratio of detected records drops due to the increased sparsity of the trajectories. This validates Theorem 1 that sparser trajectories are harder for the continuous mobility inference.

By the empirical result, the parameters of and are chosen and used throughout this work. The details are explained in Appendix B.

Fig. 4: The percentage of stay/travel on the trajectory with different global sparsity values and ( fixed to 800 m).

V Multiple Trajectory Inference

The SDS algorithm correctly infers the continuous mobility of 47.8%50.3% records in each trajectory of our data set. For most of the other records, the inference is not feasible given the single trajectory only, as proved in Theorem 2(b) and Theorem 3(b). We propose to employ the deep learning based inference model which trains over multiple trajectories to learn the spatiotemporal regularity in human mobility beyond the definitive rules in the SDS algorithm.

The recurrent neural network (RNN), or specifically LSTM

[LSTM], is a classic model to analyze the sequence data. Nevertheless, there are two issues to directly apply LSTM in our scenario: 1) the single LSTM network maps an input sequence to an output sequence of the same length. By default, only the list of records labeled by the SDS algorithm can be used as the input; 2) the trajectories of all the users are first spliced in tandem and then cut into fix-size slices for training. The local trajectory context, instead of the per-user context, is used for inference in the single LSTM network.

V-a Trajectory Encoder and Mobility Decoder

We propose an encoder-decoder architecture for the sequence-to-sequence learning [cho2014learning][sutskever2014sequence] to overcome the limitations of the single LSTM network. As shown in Figure 5, the encoder summarizes the full trajectory of each user into the per-user context with a bidirectional LSTM network (the upper part of the figure). Based on the user context, the decoder takes records from the same trajectory and sequentially infers their mobility by another unidirectional LSTM network (the lower part of the figure).

Formally, for a trajectory of length , the encoder at the time step of are defined by


where and denote the hidden states of the forward/backward LSTM at the time step respectively, is the input record of the trajectory at using the space/time representation (Section V-B).

In the decoder side, the operation is defined by


where denotes the hidden state of the decoder LSTM, is the focused context at the time step by the attention mechanism (Section V-C), is the predicted mobility distribution at .

V-B Space/Time Representation and Embedding

Fig. 5: The encoder-decoder sequence-to-sequence learning architecture for multiple trajectory inference. Blue outlines indicate the optimized design for our problem.

In the raw trajectory data, both space and time information are recorded in high resolutions, i.e., millionths of a second (timestamp) or a degree (longitude/latitude). To cope with the neural network input, we select the appropriate spatial/temporal features, discretize their values, and compute the vector embeddings to represent them. The embeddings are updated online during the training process.

In space, we divide the territory of each city into grids of degree latitude/longitude. The location of each record is converted to the latitude and longitude indices of the grid it belongs to. Each grid index is represented by a vector of length . In time, we divide the timestamp of each record into two indices and embed them separately. The first is the absolute hour index of the timestamp. For example, the timestamp of 12:50AM on Monday has an index of . The hour index indicates the time of day and the day of the week upon the observation, which can be related to the mobility of the record. The second is the relative minute index defined as the elapsed time in minutes from the start of the current segment in the trajectory. Here the segments of a trajectory is computed by L2 of Algorithm 1. Both the hour and the minute indices have a length of in the embedding. Finally, the space and time embeddings are concatenated into a vector of length as the input to the encoder-decoder architecture, i.e., . The embedding for each grid/time index is randomized upon the initialization. We use by default.

V-C Attention Mechanism

In the encoder side of our architecture, the input trajectory on average can be too long to be summarized as a fixed length context vector. We introduce the attention mechanism [luong2015effective] where the encoder produces an array of context vectors associated with each record and saves it in a short-term memory. The decoder builds a neural network to dynamically attend the memory and compute the context vector which is used in the inference of each record.

The focused context at the time step is computed by


where is the attention matrix.

In a typical attention-based sequence-to-sequence learning architecture, e.g., neural machine translation

[bahdanau2014neural], the input sequence is a sentence of 100 words at maximum. While in our problem, the user trajectory could be much longer, up to 5000 records. Therefore, we further optimize the learning procedure by truncating each long trajectory into multiple fix-length sub-trajectories. A local context vector is trained on each sub-trajectory, as proposed by Luong et al. [luong2015effective]. The record in the decoder side will only attend to the records belonging to the same sub-trajectory in the encoder, thus resolves the issue with extra-long trajectories.

V-D Train/Test with Decoder Masks

In the decoder side, we only have the mobility label (stay or travel) for a subset of records on each trajectory. If we only include these labeled records in the training, the test performance might downgrade because of the loss of consistent context in the decoder side, as empirically shown by the performance of the baseline LSTM in Section VI-B. In our improved design, we still feed the full trajectory to the decoder side in the training, but mask out the losses for the records without labels. This allows the decoder LSTM network consistently capture the dynamics of the entire trajectory and be trained with supervisions if available.

The final objective function for training is


where denotes the mobility label at the time step , is the set of record indices labeled by the SDS algorithm.

Vi Evaluation

Vi-a Experiment Setup

Data. We evaluate the SDS algorithm and encoder-decoder model on three types of data extracted from the raw data in Section III-A.

  • [leftmargin=.1in]

  • Full data is the set of randomly selected trajectories. We apply the SDS algorithm to create the stay/travel labels on each trajectory. Because the travel labels are rare () and a large percentage of short trajectories have no travel label at all, we only select the trajectories with at least 10 travel labels. Note that this criterion does not lead to a biased selection for the mobility inference. The eligible trajectories have an average global sparsity mildly smaller than the average in the whole data set. We extract groups of non-overlapping 10K, 40K, 100K trajectories for train and test respectively. They are called FU-10K, FU-40K, FU-100K. By default, the Beijing data is used. Only the labeled records on the full data can be evaluated.

  • Re-sampled data is used to evaluate the inference performance on unlabeled records. Given a trajectory in the full data, we randomly keep each record with a probability (i.e., the re-sampling rate). The re-sampled training data is re-labeled by the SDS algorithm, normally generating a smaller percentage of labels than the full data. On the re-sampled test data, we re-use the labels generated in the full data. The series of data re-sampled from FU-10K is called RE-10K. The records with labels in the full data can be evaluated.

  • Simulated data is used to evaluate the SDS algorithm itself, as no true label can be detected beyond the algorithm. The simulated data is generated using the timestamps in the full data and then re-sampled (described in Appendix C). The labels of all the records in the simulated data are known and can be evaluated.

Method. Seven mobility inference methods are compared.

  • [leftmargin=.1in]

  • LSTM. The unidirectional LSTM cell is used as the baseline of the sequence deep learning method. Only labeled records are fed as input in both train and test.

  • Voting. A spatiotemporal bin is combined by a spatial grid and an hour index defined in Section V-B. All the labeled records in the training data are categorized into these bins and counted. The more frequent mobility type in each bin is used as the prediction for all the test records in this bin. For bins without a labeled record, a random prediction is computed.

  • LG.

    To use traditional classifiers, we conduct the window-based feature extraction on each record. Within the dense segment generated by L2 of Algorithm

    1, records are uniformly selected before and after the current record, forming a feature vector of length where the four indices in Section V-B are used to represent each record. We test through and find that achieves the best trade-off between performance and cost.

  • DT.

    It applies the decision tree

    [breiman1984classification] over the above features.

  • NB.

    It applies the Gaussian Naive Bayes.

  • L-SVM. It applies the linear SVM as the kernel SVM is not scalable to millions of records.

  • HMM. The stay/travel is used as the hidden state, the spatiotemporal offset between consecutive records is used as the observation. The prediction is computed by the Viterbi algorithm [viterbi1967error]

    . The method mimics the technique in the next location prediction using Markov chains


In each test, we measure the recision and ecall in predicting the tay and trael labels separately, which are denoted as , , , , and the overall accuracy as . As the percentage of labels is unbalanced (much fewer travels than stays), we also report

, the harmonic mean of the F1 measures for stay and travel prediction.

Vi-B Quantitative Result

SDS algorithm. We evaluate the SDS algorithm on the simulated data over RE-10K, with re-sampling rates from 1.0 to 0.1. As shown in Figure 6, the precision of both stay and travel predictions (dashed lines) is 100%, regardless of the re-sampling rate and the speed used in the simulation. This validates the theoretical result in Section IV. As the re-sampling rate decreases, which leads to a linear increase in the global sparsity by Definition 3(a) (X axis), the recall drops in a rate slightly slower than the empirical result in Figure 4.

Fig. 6: SDS inference on simulated data: (a) stay; (b) travel.
# NN Layers Dropout Prob. Truncate Size Optimizer Attention GRU
1 2 3 0 0.2 0.5 100 200 400 SGD adagrad adadelta with w/o with att. w/o
Pr. Stay () 0.97 0.95 0.96 0.97 0.96 0.85 0.97 0.97 0.87 0.97 0.85 0.95 0.97 0.95 0.95 0.95
Travel () 0.82 0.78 0.78 0.82 0.80 0.00 0.88 0.82 0.61 0.82 0.00 0.79 0.82 0.79 0.82 0.80
Re. Stay () 0.97 0.96 0.96 0.97 0.96 1.00 0.98 0.97 0.98 0.97 1.00 0.97 0.97 0.96 0.97 0.97
Travel () 0.81 0.72 0.77 0.81 0.80 0.00 0.85 0.81 0.15 0.81 0.00 0.71 0.81 0.74 0.714 0.75
0.95 0.93 0.93 0.95 0.94 0.85 0.96 0.95 0.86 0.95 0.85 0.93 0.95 0.93 0.93 0.93
0.89 0.84 0.86 0.89 0.87 0.00 0.92 0.89 0.37 0.89 0.00 0.84 0.89 0.85 0.85 0.86
Train/Test Time (s) 16k 26k 37k 16k 18k 18k 14k 16k 20k 16k 16k 16k 16k 8k 9k 5k
TABLE III: Performance with different encoder-decoder design parameters on FU-10K data.

Encoder-decoder design. We evaluate six design choices of the encoder-decoder architecture using the FU-10K data set: # of neural network layers (1), the dropout probability (0), the truncate size on extra-long trajectories (200), the training optimizer (SGD), the use of the attention mechanism (with), and the choice of RNN cells (LSTM). The default setting is given in parentheses. The result in Table III shows that more network layers, the dropout, the other optimizers, and the newer GRU cell do not work better. The attention mechanism does help in our scenario, especially for the recall of travel (more in Appendix D). Truncating to smaller segments also improves the performance and also reduces the training time.

Model comparison. On the labeled part of the FU-10K data, Table IV lists the performance of all the multiple trajectory inference methods. The encoder-decoder model (ED) is the best in most metrics, with and as high as 0.957 and 0.915. The baseline LSTM gets the second, still more than 10 percent worse than ED in . Among other classifiers, DT works the best and achieves a close to 0.7. The other models (LG, Voting, NB, HMM, L-SVM) perform badly in predicting the travels, with a or smaller than 0.2. Though LG gets the best , it achieves that by classifying all the records as stay.

Extending to the unlabeled part of the RE-10K data, we summarize the performance comparison with different re-sampling rates in Figure 7, which also corresponds to an increasing global sparsity (refer to Figure 6). Because the stay predictions are generally good for most methods, we only depict , , , and . Note that the performance of the SDS algorithm is also plotted, serving as the upper bound that can be achieved with the single trajectory information. The encoder-decoder is still the best model in most metrics when the re-sampling rate is higher than 0.2, except that HMM has a high precision on a tiny portion of travel records () and L-SVM oscillates between all-stay and all-travel predictions.

Our ED model surpasses the SDS algorithm in from the re-sampling rate of 0.6, starting to enjoy the bonus of the multiple trajectory information. However, it is consistently below the SDS in favoring the travel prediction. According to the theory in Section IV, SDS guarantees a 100% , which is much better than the ED model. Starting from a re-sampling rate of 0.3, the travel prediction performance of the ED model dives quickly. Also, trajectory completion [li2016knowledge] does not improve the inference performance. Even worse, because the technique needs to pre-compute a junction network using spatially dense trajectories as input, the test data before completion is constrained into a 5km

5km square region. The precision and recall of SDS in the spatially constrained data set is worse than the randomly sampled

FU-10K data.

We try to improve the mobility inference by using the densely sampled trajectory as the training data, i.e., the 100% re-sampled data in RE-10K; and test on the sparse trajectory, i.e., RE-10K with re-sampling rates of 0.11. This is realistic in the model building. As shown in Figure 8, with denser trajectories and more labels in the training, the test performance of the ED model (straight lines in red with square symbols) is enhanced from the model with sparse input (red lines without symbols), especially in the travel prediction and the re-sampling rate below 0.3. Taking the finding one step further, we use the 100% re-sampled RE-100K data with 10 times more trajectories for training. The result is surprisingly good – the ED model outperforms the upper bound of the single trajectory inference from the re-sampling rate of 0.7. In a 10% re-sampling, ED model achieves and compared with SDS.

0.97 0.93 0.89 0.84 0.93 0.88 0.85 0.84
0.88 0.80 0.20 0.29 0.50 0.19 0.86 0.04
0.98 0.97 0.50 1.00 0.87 0.42 1.00 1.00
0.85 0.59 0.67 0.00 0.66 0.71 0.10 0.00
0.96 0.91 0.53 0.84 0.84 0.46 0.85 0.84
0.92 0.79 0.42 0.00 0.69 0.39 0.30 0.00
TABLE IV: Comparison of inference methods on FU-10K data.
Fig. 7: The comparison of alternative inference methods on RE-10K data. Dashed lines are the proposed encoder-decoder model.
Fig. 8: Training with the 100% RE-10K data and test on the re-sampled RE-10K data. Red lines are encoder-decoder models.

Scalability and generalizability. We carry out the same experiment on the full data set with higher numbers of trajectories. As shown in Table V, the performance keeps steady using the same FU-10K as the training data and test on 10K, 40K, and 100K full trajectory data (FU-10K:10K, etc.). This shows that the model trained on a small data set can be generalized to much larger data sets. Training on the larger data further improves the test performance, which is nearly optimal for FU-100K:10K (, ).

We conduct the same experiment on RE-10K with the dense trajectory input, using the data sets from Tianjin and Tangshan. Compared with Figure 8(b) for Beijing, the of the ED model in Tianjin shows a similar curve (Figure 9(a)), surpassing the SDS from a re-sampling rate of 0.5. On the other hand, the ED model does not work better than the SDS on the Tangshan data (Figure 9(b)). We hypothesize that this is because the selected train/test data in Tangshan has a much lower percentage of travel labels (5.56%) than Beijing (10.64%) and Tianjin (16.94%). The model can not learn the useful pattern given fewer labels. Tangshan is also a smaller city than Beijing and Tianjin, where we have fewer data (Table II).

Fig. 9: Experiments with RE-10K: (a) Tianjin; (b) Tangshan.

Implications. The experiment result demonstrates that the SDS algorithm is accurate on the single trajectory (100% and ). The optimized encoder-decoder models can learn from the multiple trajectory input to improve the single trajectory mobility inference through the excellent generalizability to sparse trajectories and the scalability to large training data. In fact, we expect the proposed model to perform even better in comparison to the SDS algorithm. We only evaluate on the part of the trajectory labeled in the 100% re-sampled test data. For the unlabeled test data (38.9% for Beijing), it is reasonable to guess that our model performs similarly to the labeled part, while the SDS can not infer at all. In future, we plan to develop the re-sampling mechanism on the simulated data to test on the 100% labels of the trajectory data set.

Vii Related Work

Vii-a The Study of Urban Trajectory

Using the trajectory data in the city to understand the urban activity and human mobility has been a recent focus of study [zheng2015trajectory]. On the continuously measured trajectory, the detailed route information is available for analysis [LiuAR10][liu2011diverse][yuan2013t]. For instance, the trajectories of taxis can be used to classify drivers by their job performance [LiuAR10], or aggregated as time-dependent landmark graph [yuan2013t] and trajectory visualization [liu2011diverse], in order to compute the fastest route for drivers. Based on over one million bank note circulation reports in US, Brockmann et al. explained the human mobility as the combination of a scale-free jump and a heavy-tailed wait, and proposed a random-walk model to characterize these findings [brockmann2006scaling]. The group of Barabási explained the high degree of spatiotemporal regularity in human mobility by the tendency to avoid visiting new places and to return to the previously visited locations [gonzalez2008understanding][song2010modelling].

On the analysis of urban trajectories, the need for separating stay and travel has been partially met by the greedy algorithms similar to Algorithm 2 [calabrese2010geography][jiang2013review]. Nevertheless, none of these works formally define the stay/travel state of a trajectory, nor do they consider the mobility inference problem on sparse trajectories.

10K 10K:40K 10K:100K 100K:10K 100K:40K 100K:100K
0.97 0.97 0.97 0.99 0.99 0.99
0.88 0.88 0.88 0.97 0.97 0.97
0.98 0.98 0.98 0.99 0.99 0.99
0.85 0.85 0.85 0.96 0.96 0.96
0.96 0.96 0.96 0.99 0.99 0.99
0.92 0.91 0.91 0.98 0.98 0.98
Time (s) 17k 18k 18k 150k 150k 157k
TABLE V: The scalability of the Encoder-Decoder architecture.

Vii-B The Inference of Sparse Trajectory

There are two definitions of the sparse trajectory in the literature of urban data analysis. The first one considers a sequence of infrequent reports from vehicles. These trajectories are usually collected in a uniform time interval of seconds or a few minutes. We call them the temporally sparse trajectory [herring2010estimating][rahmani2013path][rahmani2012path][sanaullah2016developing][li2017citywide]. The second class is the spatially sparse trajectory data in which many road segments in a city are not covered by any of the trajectory, especially for a given period of time. The literature on this class of data mostly studied the travel time estimation problem [wang2014travel][sanaullah2016developing][li2017citywide].

We mainly consider the mobility inference problem on the temporally sparse trajectory, as our data set covers most of the city regions. The recent works on this topic focus on the extraction of travel paths [rahmani2013path][rahmani2012path][li2017citywide]. Typically, the problem is decomposed into two tasks: the map-matching and the path-inference. In map-matching, each location record on the trajectory is matched to a point on a particular segment of the road network [quddus2007current][ochieng2003map]. In path-inference, the matched points on the map are connected by shortest paths to form the travel path [lou2009map][he2013line].

The map-matching based techniques can not be applied directly to our mobility inference problem. First, the trajectory data in our case encompasses not only the movement of high-speed vehicles on the ground, but also those by bikes and subways. The locations of these trajectories may not be on the road network, thus are not appropriate for map-matching. It is also costly to evolve the technique with the fast-changing road network of modern cities. Second, we have both travel and stay in our data while the previous approaches mostly work on the travel part of the trajectory with a temporal sparsity two orders of magnitudes smaller than our case.The trajectory completion techniques can also be applied to compute dense trajectories from known sparse ones. In Ref. [li2016knowledge], Yang et al. proposed a geometry-based method that pre-computes the junction networks in cities and then predicts the missing part of the travel trajectory, without knowing the city map. However, their technique requires the speed and heading information of each location record, and spatially dense trajectory data set to pre-compute the junction network. Applying the same trajectory completion technique on our problem leads to worse performance than the proposed deep learning method.

Vii-C Deep Learning for Urban Analytics

Many deep learning methods have been customized to work with the urban data. Lv et al. proposed a stacked autoencoder architecture to learn the generic traffic flow features from the data collected at road-side detectors

[lv2015traffic]. Zhang et al. designed standalone residual networks to model three key temporal properties of crowd flows and dynamically aggregated the network outputs to predict the inflow and outflow of traffic in city regions [zhang2017deep]. Yao et al. presented DeepSense [yao2017deepsense], a deep learning framework to resolve the data noise and extract useful features from the mobile sensing data.

Yet, the sequence to sequence neural network models (e.g., LSTM [LSTM]) are rarely used in the urban data analytics. The recent work by Zhao et al. on the next location recommendation from time-aware trajectory data [zhao2017time] comes the closest to our study, though there are fundamental differences. We proposed an end-to-end neural network model to better understand the current trajectory and make inference at any spatiotemporal records. In contrast, Zhao et al. targeted at the one-time prediction of the next activity in the future. Their prediction is based on a ranking function with the deep learning method only used for embedding.

Viii Conclusion

This paper studies the problem of mobility inference over sparse trajectories. Based on the observation of a long-tailed sparsity pattern in the trajectory data, we design a single trajectory inference algorithm that detects the mobility of close to half of trajectory records with a guaranteed 100% precision. Furthermore, we propose an encoder-decoder architecture that learns the mobility pattern from multiple trajectories. The learning model significantly outperforms the traditional classifiers in all the performance metrics. In particular, by feeding with the large-scale densely sampled training data, our model achieves a near-optimal overall accuracy on the records labeled by the single trajectory inference algorithm. On unlabeled records, our model outperforms the single trajectory inference by a factor of two on extremely sparse trajectories. Experiment results also demonstrate that our model generalizes to different urban data sources and scales to large data sets.


Appendix A Proofs and the Exact Algorithm

Theorem 1: Intrinsic linkage between discrete and continuous mobility of a trajectory.

Proof.  Theorem 1(a). For the discrete stay segment in the time series , consider its corresponding continuous segment in the time period . satisfies . , we have , given that the straightline is the shortest distance between and . Here and are the closest time point in to and respectively. Because , , , we have . That is, is a stay segment by Definition 1(a) under the parameters of and .

Theorem 1(b). For the discrete travel trip in the time series , by definition, we have , there exist two time points satisfying . Consider the corresponding continuous segment in the time period , , we can find (the closest time point in no smaller than ) and (the closest time point in no larger than ), having . There exist two time points satisfying . That is, is a travel trip by Definition 1(b) under the parameters of and .

Theorem 2: Continuous mobility of dense stay segments.

Proof.  Theorem 2(a). For the dense stay segment defined in the time series , consider its corresponding continuous segment in the time period . We have because is the dense stay segment. For any two time points , denote the closest time points in the time series of to and as and (). We have by Observation 1. The conditions for the continuous model of the stay segment in Definition 1(a) then hold.

Theorem 2(b). For the discrete segment defined in the time series , consider its corresponding continuous segment in the time period . If , i.e., the unobserved time period of has a duration longer than . Observing a time period with can detect a different stay segment from the other part of the segment in . Then there can be travel trips surrounding the segment in to connect the trajectory. This possibility can not be validated or rejected given the information of the discrete segment only. Therefore, the corresponding continuous segment can not be inferred as stays, unless .

On the dense segment , if the corresponding continuous segment is the stay segment, by Definition 1(a), , . Therefore, must be a dense stay segment.

Input :  (dense trajectory), , (the space and time parameters)
Output :  (the mobility of each record)
1 begin
2       for  do
3             for  do
                   /* iterate all the candidate stay segments */
4                   if  then
5                         for  do
6                               for  do
7                                     if  then
8                                           False, Break
11                        if False then
12                               for  do
13                                     S
       /* the remaining records are travel trips */
18       for  do
19             if  S then
20                   T
22      return
Algorithm 2 The exact algorithm on dense trajectories.

Theorem 3: Continuous mobility of travel records.

Proof.  Theorem 3(a). For the record at time , consider any time period satisfying and . If the three conditions hold, the time period of satisfies and . We have or . Otherwise, we will have and , which leads to the contradiction of . For the either case of or , we have and . This contradicts to Definition 1(a). Therefore, the record at time can not be in any stay segment, and it must be in a travel trip by Definition 1(b).

Theorem 3(b). For the record at time , if the condition does not hold, satisfying , we have or .

Consider the smallest time point satisfying and . We should have because otherwise . There exists a time period of , for all observed , . Then , , will be possibly in a stay segment, without the information to reject the possibility.

Having and , using the proof by contradiction, we have satisfying , we have . Consider the largest time point satisfying , we can construct a time period of , for all the observed time point of having , we have . The distance between these observed time points is below . Then there can be a continuous segment in the time period of satisfying . We do not have any information to reject the inference of stays on this segment. Therefore, the record at can not be in any travel trip.

We introduce an exact algorithm to infer the discrete mobility (Definition 2) from densely sampled trajectories, as shown in Algorithm 2. The algorithm iterates all the candidate segments in a trajectory to decide whether they meet the condition of stays. The records not in any stay segments are travels. The algorithm has a computational complexity of ( is the number of records in a trajectory), which is computationally infeasible for the large-scale trajectory data. In our targeted scenario, we do not have the densely sampled trajectory.

Appendix B Material for the SDS algorithm

Fig. 10: The probability for violating Observation 1, under different and , mapped by the operator.
Fig. 11: The percentage of stay/travel on the trajectory with different global sparsity values and ( fixed to 30 min).

To validate Observation 1, we conduct an experiment on the full data set in Beijing. For each record in a trajectory, we explicitly remove the record and detect all the dense stay segments from the remaining trajectory. If the record is within a dense stay segment, we check whether the record, as time , violates Observation 1. As shown in Figure 10, among 10-billion potential record, interval pairs for each parameter setting, the probability of violating Observation 1 is below if and .

In the mobility definition of the trajectory model, the parameters of and need to be determined. In fact, these parameters provide the flexibility to capture the multi-scale mobility in the human trajectory. Inside the city boundary, and can be minutes and meters to describe the short-term stays and travels; while in the state level, and can be days and hundreds of kilometers to characterize the stay in a city and the travel between cities.

We focus on the detection of intra-city travels because the number of travel records is much fewer than the stay and the detection of stay is relatively insensitive to the parameter change (Figure 4(a), Figure 11(a)). The goal is to detect more travels while keeping the mobility definition reasonable. According to Figure 4(b), we pick because the detected ratio of travel does not increase much when switching to and it does not impose a strict stay definition which violates Observation 1. Similarly, according to Figure 11(b), we pick which maximizes the recall of travel () and allows a mild stay definition compared with . The parameters of and are consistent with the empirical settings in Ref. [calabrese2010geography][jiang2013review].

Appendix C Generation of the simulation data

First, we apply the CTRW model in [brockmann2006scaling][gonzalez2008understanding]

to generate artificial human trajectories. The model characterizes the human trajectory as a two-state interplay between the scale-free displacements (travel) and a long-tailed waiting time distribution (stay). The probability density functions of both the travel distance and the waiting/stay time apply the truncated power-law function observed in

[gonzalez2008understanding]. The exponent parameters of the function are calibrated by our trajectory data set applying the SDS algorithm. Each trajectory starts from a random location. Within each stay period, the location at any time is computed by the stay location plus a random spatial offset smaller than . The travel between consecutive stay locations is assumed to be a straight-line, constant-speed trajectory. The parameter speed controls the ratio of the stay/travel time.

Second, each trajectory is sampled using the timestamps in the full data of Section VI-A. The generated trajectory is further re-sampled by the given re-sampling rate for the real usage in the experiment.

Appendix D The Attention Mechansim

Fig. 12: The attention matrix for a trajectory segment.

Figure 12 visualizes the attention matrix A of the encoder-decoder architecture computed by Eq. (3) in the mobility inference of one typical segment truncated from the trajectory (). The nonzero values (red grids) happen close to the diagonal of the matrix, showing the local context used by the model. Initially at the beginning of the segment, the model requires a local context longer than 10 records for the cold start. After the model learns the global context of the trajectory, shorter local context is used.