Trajectory mining is a very hot topic since positioning devices are now used to track people, vehicles, vessels, natural phenomena, and animals. It has applications including but not limited to transportation mode detection (zheng2010understanding, ; endo2016deep, ; dabiri2018inferring, ; xiao2017identifying, ; etemad2018predicting, ), fishing detection (de2016improving, ), tourism (feng2017poi2vec, ), and animal behaviour analysis (fossette2010spatio, ). There are also a number of topics in this field that need to be investigated further such as high performance trajectory classification methods (endo2016deep, ; dabiri2018inferring, ; zheng2010understanding, ; xiao2017identifying, ; liu2017end, ), accurate trajectory segmentation methods (zheng2008understanding, ; soares2015grasp, ; grasp-semts2018, ), trajectory similarity and clustering (kang2009similarity, ), dealing with trajectory uncertainty (hwang2018segmenting, )soares-2017 , and semantic trajectories (parent2013semantic, ). These topics are highly correlated and solving one of them requires to some extent exploring the more than one. For example, to perform a trajectory classification, it is necessary to deal with noise and segmentation directly and the other topics mentioned above indirectly.
As one of the trajectory mining applications, transportation modes prediction is a fundamental task for decision making in smart cities and traffic management systems. Traffic policies designed based on trajectory mining can save money and time for authorities and the public. It may reduce the fuel consumption and commute time and moreover, may provide more pleasant moments for residents and tourists. Since a trajectory is a collection of geo-locations captured through the time, extracting features that show the behavior of a trajectory is of prime importance. The number of features that can be generated for trajectory data is significant. However, some of these features are more important than others for the transportation mode prediction task. Selecting the best subset of features not only save the processing time but also may increase the performance of the learning algorithm. The features selection problem and the trajectory classification task were selected as the focus of this research. The contributions of this work are listed below.
Using two feature selection approaches, we investigated the best subset of features for transportation modes prediction.
Finally, we investigate the differences between two methods of cross-validation used by the literature of transportation mode prediction. The results show that the random cross-validation method suggests optimistic results in comparison to user-oriented cross-validation.
2 Related works
Feature engineering is an essential part of building a learning algorithm. Some of the algorithms extract features using representation learning methods; On the other hand, some studies select a subset from the handcrafted features. Both methods have advantages such as learning faster, less storage space, performance improvement of learning, and generalized models building (li2017feature, ). These two methods are different from two perspectives. First, extracting features generates new features while selecting features chooses a subset of existing features. Second, selecting features constructs more readable and interpretable models than extracting features (li2017feature, ). This work focuses on the feature selection task.
Feature selection methods can be categorized into three general groups: filter methods, wrapper methods, and embedded methods (fs_guyon2003introduction, ). Filter methods are independent of the learning algorithm. They select features based on the nature of data regardless of the learning algorithm (li2017feature, ). On the other hand, wrapper methods are based on a kind of search, such as sequential, best first, or branch and bound, to find the best subset that gives the highest score on a selected learning algorithm (li2017feature, ). The embedded methods apply both filter and wrapper (li2017feature, )
such as decision tree. Feature selection methods can be grouped based on the type of data as well. The feature selection methods that use the assumption of i.i.d.(Independent and identically distributed) are conventional feature selection methods(li2017feature, ) such as he2005laplacian and zhao2007spectral . They are not designed to handle heterogeneous or auto-correlated data. Some feature selection methods have been introduced to handle heterogeneous data and stream data that most of them working on graph structure such as gu2011towards . Conventional feature selection methods are categorized in four groups: similarity-based methods like he2005laplacian , Information theoretical methods like peng2005feature , sparse learning methods such as li2012unsupervised , and statistical based methods like liu1995chi2 . Similarity-based feature selection approaches are independent of the learning algorithm, and most of them cannot handle feature redundancy or correlation between features. Likewise, statistical methods like chi-square cannot handle feature redundancy, and they need some discretization strategies. The statistical methods are also not effective in high dimensional space. Since our data is not sparse and sparse learning methods need to overcome the complexity of optimization methods, they were not a candidate for experiments. On the other hand, information retrieval methods can handle both feature relevance and redundancy. Furthermore, selected features can be generalized for learning tasks. Information gain, which is the core of Information theoretical methods, assumes that samples are independently and identically distributed. Finally, the wrapper method only sees the score of the learning algorithm and try to maximize the score of the learning algorithm. Therefore, we perform two experiments using a wrapper method and a information theoretical method.
The most common evaluation metric reported in the related works is the accuracy of the models. Therefore, we use accuracy metric to compare our work with theirs. Since the data was imbalanced, we reported the F score as well. Despite the fact that most of the related work applied the accuracy metric, it is calculated using different methods including random cross-validation, cross-validation with dividing users, cross-validation with mix users and simple division of the training and test set without cross-validation. The latter is a weak method that is used only inzhu2018transportation . The random cross-validation or the conventional cross-validation was applied in xiao2017identifying , liu2017end , and dabiri2018inferring . zheng2010understanding mixed the training and test set according to users so that 70% of trajectories of a user goes to the training set and the rest goes to test set. Only endo2016deep performed the cross-validation by dividing users between the training and test set. Because trajectory data is a kind of data with spatiotemporal dimensions and the possibility of having users in the same semantic hierarchical structure such as students, worker, visitors, and teachers, the conventional cross-validation method could provide optimistic results as studied in roberts2017cross . Similar to previous studies, we choose the Geolife dataset and transportation modes detection task. However, we investigate the effects of different cross-validation techniques.
3.1 Notations and definitions
A trajectory point, , so that , where is longitude varies from 0 to , is latitude varies from 0 to , and () is the capturing time of the moving object and is the set of all trajectory points. A trajectory point can be assigned by some features that describe different attributes of the moving object with a specific time-stamp and location. The time-stamp and location are two dimensions that make trajectory point spatio-temporal data with two important properties: (i) auto-correlation and (ii) heterogeneity STDM2017 . These features makes the conventional cross validation invalid roberts2017cross .
A raw trajectory, or simply a trajectory, is a sequence of trajectory points captured through time. . A sub-trajectory is one of the consecutive sub-sequences of a raw trajectory generated by splitting the raw trajectory into two or more sub-trajectories. For example, if we have one split point, , and is a raw trajectory then and are two sub trajectories generated by . The process of generating sub trajectories from a raw trajectory is called segmentation. We used a daily segmentation of raw trajectories and then segmented the data utilizing the transportation modes annotations to partition the data. This approach is also used in dabiri2018inferring and endo2016deep . The assumption that the transportation modes are available for test set segmentation is invalid since we are going to predict them by our model; However, we need to prepare a controlled environment similar to dabiri2018inferring and endo2016deep to study the feature selection.
A point feature is a measured value , assigned to each trajectory points of a sub trajectory . shows the feature for sub trajectory . For example, speed can be a point feature since we can calculate the speed of a moving object for each trajectory point. Since we need two trajectory points to calculate speed, we assume the speed of the first trajectory point is equal to the speed of the second trajectory point.
A trajectory feature is a measured value , assigned to a sub trajectory, . shows the feature for sub trajectory . For example, the speed mean can be a trajectory feature since we can calculate the speed mean of a moving object for a sub trajectory.
The is the notation for all trajectory features that generated using point feature . For example, represents all the trajectory features derived from point feature. Moreover, denotes the mean of the trajectory features derived from the point feature.
3.2 The framework
In this section, the sequence of steps of the framework with eight steps are explained (Figure 1).
The first step groups the trajectory points by user id, day and transportation modes to create sub trajectories (segmentation). Sub trajectories with less than ten trajectory points were discarded to avoid generating low-quality trajectories.
Point features including speed, acceleration, bearing, jerk, bearing rate, and the rate of the bearing rate were generated in step two. The features speed, acceleration, and bearing were first introduced in zheng2008understanding , and jerk was proposed in dabiri2018inferring . The very first point feature that we generated is duration. This is the time difference between two trajectory points. This feature gives us essential information including some of the segmentation position points, loss signal points, and is useful in calculating point features such as speed, and acceleration. The distance was calculated using the haversine formula. Having duration and distance as two point features, we calculate speed, acceleration and jerk using Equation , , and respectively. A function to calculate the bearing () between two consecutive points was also implemented. Two new features were introduced in etemad2018predicting , named bearing rate, and the rate of the bearing rate. Applying , we computed the bearing rate. and are the bearing point feature values in points and . is the time difference.The rate of the bearing rate point feature is computed using
. Since extensive calculations are done with trajectory points, it was necessary an efficient way to calculate all these equations for each trajectory. Therefore, the code was written in a vectorized manner in Python programming language which is faster than other online available versions.
After calculating the point features for each trajectory, the trajectory features were extracted in step three. Trajectory features are divided into two different types including global trajectory features and local trajectory features. Global features, like the Minimum, Maximum, Mean, Median, and Standard Deviation, summarize information about the whole trajectory and local trajectory features, like percentiles (e.g., 10, 25, 50, 75, and 90), describe a behavior related to part of a trajectory. The local trajectory features extracted in this work were the percentiles of every point feature. Five different global trajectory features were used in the models tested in this work. In summary, we compute 70 trajectory features (i.e., 10 statistical measures including five global and five local features calculated for 7 point features) for each transportation mode sample. In Step 4, two feature selection approaches were performed, wrapper search and information retrieval feature importance. According to the best accuracy results for cross-validation, a subset of top 20 features was selected in step 5. The code implementation of all these steps is available athttps://github.com/metemaad/TrajLib.
In step 6, the framework deals with noise in the data optionally. This means that we ran the experiments with and without this step. Finally, we normalized the features (step 7) using the Min-Max normalization method, since this method preserves the relationship between the values to transform features to the same range and improves the quality of the classification process (han2011data, ).
In this section, we detail the four experiments performed in this work to investigate the different aspects of our framework. In this work, we used the GeoLife dataset (zheng2008understanding, ). This dataset has 5,504,363 GPS records collected by 69 users, and is labeled with eleven transportation modes: taxi (4.41%); car (9.40%); train (10.19%); subway (5.68%); walk (29.35%); airplane (0.16%); boat (0.06%); bike (17.34%); run (0.03%); motorcycle (0.006%); and bus (23.33%). Two primary sources of uncertainty of the Geolife dataset are device and human error. This inaccuracy can be categorized in two major groups, systematic errors and random errors (jun2006smoothing, ). The systematic error occurs when the recording device cannot find enough satellites to provide precise data. The random error can happen because of atmospheric and ionospheric effects. Furthermore, the data annotation process has been done after each tracking as zheng2008understanding explained in the Geolife dataset documentation. As humans, we are all subject to fail in providing precise information; it is possible that some users forget to annotate the trajectory when they switch from one transportation mode to another. For example, the changes in the speed pattern (changes in the size of marker) might be a representation of human error.
We assume the bayes error is the minimum possible error and human error is near to the bayes error ng2016nuts . Avoidable bias is defined as the difference between the training error and the human error. Achieving the performance near to the human performance in each task is the primary objective of the research. The recent advancements in deep learning lead to achieving some performance level even more than the performance of doing the task by human because of using large samples and scrutinizing the data to fine clean it. However, “we cannot do better than bayes error unless we are overfitting”. ng2016nuts . Having noise in GPS data and human error suggest the idea that the avoidable bias is not equal to zero. This ground truth was our base to include research results in our related work or exclude it.
The user-oriented cross-validation and the random forest classifier were used for evaluation of transportation modes used inendo2016deep . The wrapper method implemented to search the best subset of our 70 features. The information theoretical feature importance methods were used to select the best subset of our 70 features for the transportation modes prediction task. The third experiment is a comparison between endo2016deep and our implementation. The user-oriented cross-validation, the top 20 best features, and random forest were applied to compare our work with endo2016deep . The random cross-validation on the top 20 features was applied to classify transportation modes used in dabiri2018inferring using a random forest classifier.
4.1 Classifier selection
In this experiment, we investigated among six classifiers, which classifier is the best. The experiment settings use to conventional cross-validation and to perform the transportation mode prediction task showed on dabiri2018inferringzheng2010understanding, ; xiao2017identifying, ; zhu2018transportation, ; etemad2018predicting, ). The dataset is filtered based on labels that have been applied in dabiri2018inferring (e.g., walking, train, bus, bike, driving) and no noise removal method was applied. The classifiers mentioned above were trained, and the accuracy metric was calculated using random cross-validation similar to liu2017end , xiao2017identifying , and dabiri2018inferring . The results of cross validation, presented in Figure 2, show that the random forest performs better than other models (). The second best model was XGBoost (). A Wilcoxon Signed-Ranks Test indicated that the random forest classifier results were not statistically significantly higher than the XGBoost classifier results. Wilcoxon Signed-Ranks Tests indicated that the random forest classifier results were statistically significantly higher than the SVM, Neural Network, and Adaboost classifiers results. Moreover, a Wilcoxon Signed-Ranks Test indicated that the random forest classifier results were not statistically significantly higher than the Decision Tree classifier results.
4.2 Feature selection using wrapper and information theoretical methods
The second experiment aims to select the best features for transportation modes prediction task.We selected the wrapper feature selection method because it can be used with any classifier. Using this approach, we first defined an empty set for selected features. Then, we searched all the trajectory features one by one to find the best feature to append to the selected feature set. The maximum accuracy score was the metric for selecting the best feature to append to selected features. After, we removed the selected feature from the set of features and repeated the search for union of selected features and next candidate feature in the feature set. We selected the labels applied in endo2016deep and the same cross-validation technique. The results are shown in Figure 3 (a). The results of this method suggest that the top 20 features get the highest accuracy. Therefore, we selected this subset as the best subset for classification purposes using the Random Forest algorithm.
Information theoretical feature selection is one of the methods widely used to select essential features. Random Forest is a classifier that has embedded feature selection using information theoretical metrics. We calculated the feature importance using Random Forest. Then, each feature is appended to the selected feature set and calculating the accuracy score for random forest classifier. The user-oriented cross-validation was used here, and the target labels are similar to endo2016deep . Figure 3 shows the results of cross-validation for appending features with respect to the importance rank suggested by the Random Forest.
4.3 Comparison with endo2016deep and dabiri2018inferring
In this third experiment, we filtered transportation modes which have been used by endo2016deep for evaluation. We divided the training and test dataset in a way that each user can appear only either in the training or test set. The top 20 features were selected to be used in this experiment which is the best features subset mentioned in section 4.2. Therefore, we approximately divided 80% of the data as training and 20% of the data as the test set. Thus, we compare our accuracy per segment results against endo2016deep mean accuracy, 67.9%. A one-sample Wilcoxon Signed-ranks test indicated that our accuracy results (69.50%) are higher than endo2016deep ’s results (67.9%), p=0.0431.
The label set for dabiri2018inferring ’s research is walking, train, bus, bike, taxi, subway, and car so that the taxi and car are merged and called driving. Moreover, subway and train merged and called the train class. We filtered the Geolife data to get the same subsets as dabiri2018inferring
reported based on that. Then, we randomly selected 80% of the data as the training and the rest as test set- we applied five-fold cross-validation. The best subset of features was applied the same as the previous experiment. Running the random forest classifier with 50 estimators, using SKlearn implementationscikit-learn , gives a mean accuracy of 88.5% for the five-fold cross-validation. A one-sample Wilcoxon Signed-ranks test indicated that our accuracy results (88.50%) are higher than dabiri2018inferring ’s results (84.8%), p=0.0796.
We avoided using the noise removal method in the above experiment because we believe we do not have access to labels of the test dataset and using this method only increases our accuracy unrealistically.
4.4 Effects of types of cross-validation
To visualize the effect of type of cross-validation on transportation modes prediction task, we set up a controlled experiment. We use the same classifiers and same features to calculate the cross-validation accuracy. Only the type of cross-validation is different in this experiment, one is random, and another is user-oriented cross-validation. Figure 4 shows that there is a considerable difference between the cross-validation results of user-oriented cross-validation and random cross-validation. The result indicates that random cross-validation provides optimistic accuracy and f-score results. Since the correlation between user-oriented cross-validation results is less than random cross-validation, proposing a specific cross-validation method for evaluating the transportation mode prediction is a topic that needs attention.
In this work, we reviewed some recent transportation modes prediction methods and feature selection methods. The framework proposed in etemad2018predicting for transportation modes prediction was extended, and five experiments were conducted to cover different aspects of transportation modes prediction.
First, the performance of six recently used classifiers for the transportation modes prediction was evaluated. The results show that the random forest classifier performs the best among all the evaluated classifiers. The SVM was the worst classifier, and the accuracy result of XGBoost was competitive with the random forest classifier. In the second experiment, the effect of features using two different approaches, the wrapper method and information theoretical method were evaluated. The wrapper method shows that we can achieve the highest accuracy using the top 20 features. Both approaches suggest that the (the percentile 90 of the speed as defined in section 3
) is the most essential feature among all 70 introduced features. This feature is robust to noise since the outlier values do not contribute to the calculation of percentile 90. In the third experiment, the best model was compared with the results showed inendo2016deep and dabiri2018inferring . The results show that our suggested model achieved a higher accuracy. Our applied features are readable and interpretable in comparison to endo2016deep and our model has less computational cost. Finally, we investigate the effects of user-oriented cross-validation and random cross-validation in the fourth experiments. The results showed that random cross-validation provides optimistic results in terms of the analyzed performance measures.
We intend to extend this work in many directions. The spatiotemporal characteristic of trajectory data is not taken into account in most of the works from literature. We intend to deeply investigate the effects of cross-validation and other strategies like holdout in trajectory data. Finally, space and time dependencies can also be explored to tailor features for transportation means prediction.
- (1) Gowtham Atluri, Anuj Karpatne, and Vipin Kumar. Spatio-temporal data mining: A survey of problems and methods. arXiv arXiv:1711.04710, 2017.
Sina Dabiri and Kevin Heaslip.
Inferring transportation modes from gps trajectories using a convolutional neural network.Transportation Research Part C: Emerging Technologies, 86:360–371, 2018.
Erico N de Souza, Kristina Boerder, Stan Matwin, and Boris Worm.
Improving fishing pattern detection from satellite ais using data mining and machine learning.PloS one, 11(7):e0158248, 2016.
- (4) Yuki Endo, Hiroyuki Toda, Kyosuke Nishida, and Akihisa Kawanobe. Deep feature extraction from trajectories for transportation mode estimation. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 54–66. Springer, 2016.
- (5) Mohammad Etemad, Amílcar Soares Júnior, and Stan Matwin. Predicting transportation modes of gps trajectories using feature engineering and noise removal. In Advances in AI: 31st Canadian Conf. on AI, Canadian AI 2018, Toronto, ON, CA, Proc. 31, pages 259–264. Springer, 2018.
- (6) Shanshan Feng, Gao Cong, Bo An, and Yeow Meng Chee. Poi2vec: Geographical latent representation for predicting future visitors. In AAAI, 2017.
- (7) Sabrina Fossette, Victoria J Hobson, Charlotte Girard, Beatriz Calmettes, Philippe Gaspar, Jean-Yves Georges, and Graeme C Hays. Spatio-temporal foraging patterns of a giant zooplanktivore, the leatherback turtle. Journal of Marine systems, 81(3):225–234, 2010.
- (8) Quanquan Gu and Jiawei Han. Towards feature selection in network. In Proceedings of the 20th ACM ICIKM, pages 1175–1184. ACM, 2011.
- (9) Isabelle Guyon and André Elisseeff. An introduction to variable and feature selection. Journal of ML research, 3(Mar):1157–1182, 2003.
- (10) Jiawei Han, Jian Pei, and Micheline Kamber. Data mining: concepts and techniques. Elsevier, 2011.
- (11) X He, D Cai, and P Niyogi. Laplacian score for feature selection, advances in nerual information processing systems, 2005.
- (12) Sungsoon Hwang, Cynthia VanDeMark, Navdeep Dhatt, Sai V Yalla, and Ryan T Crews. Segmenting human trajectory data by movement states while addressing signal loss and signal noise. International Journal of Geographical Information Science, pages 1–22, 2018.
- (13) Jungwook Jun, Randall Guensler, and Jennifer Ogle. Smoothing methods to minimize impact of global positioning system random error on travel distance, speed, and acceleration profile estimates. Transportation Research Record: Journal of the TRB, 1(1972):141–150, 2006.
- (14) Hye-Young Kang, Joon-Seok Kim, and Ki-Joune Li. Similarity measures for trajectory of moving objects in cellular space. In SIGAPP09, pages 1325–1330, 2009.
- (15) Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P Trevino, Jiliang Tang, and Huan Liu. Feature selection: A data perspective. CSUR, 50(6):94, 2017.
- (16) Zechao Li, Yi Yang, Jing Liu, Xiaofang Zhou, Hanqing Lu, et al. Unsupervised feature selection using nonnegative spectral analysis. In AAAI, volume 2, 2012.
Hongbin Liu and Ickjai Lee.
End-to-end trajectory transportation mode classification using bi-lstm recurrent neural network.In
Intelligent Systems and Knowledge Engineering (ISKE), 2017 12th International Conference on, pages 1–5. IEEE, 2017.
Huan Liu and Rudy Setiono.
Chi2: Feature selection and discretization of numeric attributes.
Tools with artificial intelligence, 1995. proceedings., seventh international conference on, pages 388–391. IEEE, 1995.
- (19) Andrew Ng. Nuts and bolts of building ai applications using deep learning. NIPS, 2016.
- (20) Christine Parent, Stefano Spaccapietra, Chiara Renso, Gennady Andrienko, Natalia Andrienko, Vania Bogorny, Maria Luisa Damiani, Aris Gkoulalas-Divanis, Jose Macedo, Nikos Pelekis, Yannis Theodoridis, and Zhixian Yan. Semantic trajectories modeling and analysis. ACM Comput. Surv., 45(4):42:1–42:32, August 2013.
- (21) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. MLR, 2011.
- (22) Hanchuan Peng, Fuhui Long, and Chris Ding. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8):1226–1238, 2005.
- (23) David R Roberts, Volker Bahn, Simone Ciuti, Mark S Boyce, Jane Elith, Gurutzeta Guillera-Arroita, Severin Hauenstein, José J Lahoz-Monfort, Boris Schröder, Wilfried Thuiller, et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40(8):913–929, 2017.
- (24) A. Soares Júnior, C. Renso, and S. Matwin. Analytic: An active learning system for trajectory classification. IEEE Computer Graphics and Applications, 37(5):28–39, 2017.
- (25) A. Soares Júnior, V. Cesario Times, C. Renso, S. Matwin, and L. A. F. Cabral. A semi-supervised approach for the semantic segmentation of trajectories. In 2018 19th IEEE International Conference on Mobile Data Management (MDM), pages 145–154, June 2018.
- (26) Amílcar Soares Júnior, Bruno Neiva Moreno, Valéria Cesário Times, Stan Matwin, and Lucídio dos Anjos Formiga Cabral. Grasp-uts: an algorithm for unsupervised trajectory segmentation. International Journal of Geographical Information Science, 29(1):46–68, 2015.
- (27) Xiao. Identifying different transportation modes from trajectory data using tree-based ensemble classifiers. ISPRS, 6(2):57, 2017.
Zheng Zhao and Huan Liu.
Spectral feature selection for supervised and unsupervised learning.In Proceedings of the 24th international conference on Machine learning, pages 1151–1157. ACM, 2007.
- (29) Yu Zheng, Yukun Chen, Quannan Li, Xing Xie, and Wei-Ying Ma. Understanding transportation modes based on gps data for web applications. TWEB, 4(1):1, 2010.
- (30) Yu Zheng, Quannan Li, Yukun Chen, Xing Xie, and Wei-Ying Ma. Understanding mobility based on gps data. In UbiComp 10th, pages 312–321. ACM, 2008.
- (31) Qiuhui Zhu, Min Zhu, Mingzhao Li, Min Fu, Zhibiao Huang, Qihong Gan, and Zhenghao Zhou. Transportation modes behaviour analysis based on raw gps dataset. International Journal of Embedded Systems, 10(2):126–136, 2018.