1 Introduction
An essential task for mobility data mining is trajectory segmentation. Different to classical data mining, in trajectory data mining, the attributes/features are extracted from subtrajectory parts. The partitioning is necessary because a mobility pattern, in general, does not hold for the entire trajectory, but for subtrajectory parts. Therefore, the segmentation process becomes one of the most critical preprocessing steps for trajectory data mining.
Trajectory segmentation is the process of splitting a given trajectory into several homogeneous segments regarding some criteria. This task plays a pivotal role in trajectory mining since it affects the features of each segment, as the features may depend on the size of the trajectory segment, independently of the application domain, such as fishing detection [15], animal behavior [8, 7], tourism [6], traffic dynamics [5, 7, 16, 14], vessel movement patterns [2] etc.
A trajectory is a sequence of points located in space and time, and different criteria can be used to split trajectories. There are several approaches that can be used for trajectory segmentation such as CBSMoT [12], SPD [17], WKMeans [10], GRASPUTS [15], TRACLUS [9], OWS [4], etc. Different to previous approaches where no training step is performed, we propose in this paper a supervised strategy to segment trajectory data. To the best of our knowledge this is the first approach that actually learns partitioning positions (i.e., the last trajectory point of a segment) from trajectory data characteristics for a given application domain. The main advantage of this supervised strategy is that the transitioning characteristics of a behavior change can be learned from the training data. The model built to forecast partitioning positions is further used to segment trajectories. After that, a majority vote strategy decides the proper location to place a partitioning position.
In summary, the main contributions of this work include: (i) a method for producing training data from partitioning positions on a labeled trajectory; (ii) a method to decide when a partitioning position occurs in a trajectory; and (iii) an empirical study comparing WSII and several baselines for segmentation.
This paper is organized as follows. Section 2 shows the definitions necessary to describe our trajectory segmentation method. In Section 3, the related works are described. In Section 4, we propose WSII with details. In Section 5, we applied the proposed method and other trajectory segmentation algorithms on three datasets and reported their performance results. Finally, we conclude our work in Section 6.
2 Definitions
In this section we present the basic concepts related to trajectories and used throughout this paper.
The trajectory of a moving object can be described by a time ordered sequence of locations the object has visited. We call these locations, trajectory points.
Trajectory Point
A trajectory point, , is the location of object at time , and is defined as,
(1) 
where is the longitude of the location which varies from 0 to , while is the latitude which varies from 0 to .
Raw Trajectory
A raw trajectory, or simply trajectory, is a timeordered sequence of trajectory points of some moving object ,
(2) 
Segment or Subtrajectory
is a set of consecutive trajectory points belonging to a raw trajectory ,
(3) 
The process of generating segments from a trajectory is called Trajectory Segmentation (). The most common way of defining TS involves splitting a raw trajectory into a set of nonoverlapping segments. More formally:
Trajectory Segmentation
Given a raw trajectory , we define a sequence of segments , such that
(4) 
and
(5) 
Equation 6 shows the input and output of the trajectory segmentation process, where is a raw trajectory which contains trajectory points, and is the set of all segments generated from using .
(6) 
In this notation, is the number of trajectory points and is the number of segments resulting from applying to the trajectory.
We call a trajectory point at the end of each segment as partitioning position. This means that the result of applying to a trajectory, contains partitioning positions.
Problem Definition
Given a raw trajectory , we would like to generate a sequence of segments so that each satisfies a certain homogeneity criteria for a given application domain. To evaluate the performance of the generated , we rely on the knowledge of an expert user to provide a set of semantic tuples where identifies a segment of a trajectory, generated by the expert user, and is a semantic label attached by the expert to this segment, such as for instance, a transportation mode or status of fishing or nonfishing.
3 Related works
In this section, we give an overview of several methods for trajectory segmentation. Warped KMeans (WKMeans), which is a generalpurpose segmentation algorithm based on KMeans
[11], is introduced in [10]. It modifies the KMeans algorithm by minimizing a quadratic error (cost function) while imposing a sequential constraint in the segmentation step. Since WKMeans imposes a hard sequential constraint, segments can be updated while new samples arrive without affecting too much the previous clustering configuration [10]. This algorithm receives the number of segments to be found on the data (). Having such input parameter is the main limitation of using it in domains where the number of segments is not predefined or is dynamic.The Stay Point Detection (SPD) [17] is a simple algorithm that follows the idea that between each twomovements, there is a stop. SPD applies a distance threshold () and a time threshold so that a moving object which spends more than time in the neighborhood of belongs to a stay point. Hence, each stay point identifies a segment, and the trajectory points between two stay points are generated in another segment.
An extension of DBSCAN [3], CBSMoT detects stops and moves segments in a trajectory[12]. The original definitions of a neighborhood and minimum points in DBSCAN are altered so that CBSMoT utilizes spatial and temporal aspects of trajectories. CBSMoT works based on the trajectory speed, and the stop points are consecutive trajectory points where the moving object has a lower speed.
TRACLUS [9] detects dense regions with the same line segment characteristics. This clustering algorithm has two steps: (i) partitioning of the trajectory to line segments; and (ii) clustering these lines. A cost function based on the Minimum Description Length (MDL) principle is applied in the first step to split a trajectory into its line segments. It considers three trajectory segment’s attributes:(i) parallel distance, (ii) perpendicular distance, and (iii) angular distance. Clustering line segments using DBSCAN is run in the next step [9].
GRASPUTS is an unsupervised trajectory segmentation algorithm that benefits from the Minimum Description Length (MDL) principle to build the most homogeneous segments. First, GRASPUTS generates random landmarks. Then, it builds homogeneous segments by swapping the trajectory points across temporallyordered segments and adjusting the landmarks based on its cost function’s value [15]. GRASPUTS can apply additional features on top of the raw trajectories to perform trajectory segmentation.
The OWS (Octal Window Segmentation) algorithm is based on computing the error signal generated by measuring the deviation of a middle point of an octal window [4]. The intuition behind OWS is that when a moving object changes its behavior, this shift may be detected using only its geolocation over time [4]
. OWS uses interpolation methods to find the estimated position of the moving object, i.e., where it is supposed to be if its behavior does not change. Then, OWS compares the real position of the moving object with the estimated one, creating an error signal. With such a procedure, it is possible to determine where the moving object changed its behavior and to use this information to create segments.
In this work, we extend the idea of OWS by using a configurable sliding window for interpolating points and a supervised strategy for deciding where partitioning positions should be placed. Unlike all previous segmentation algorithms, WSII is supervised. This means that WSII is able to learn the variations in the error signal generated by interpolation techniques which characterize partitioning positions over consecutive segments, avoiding in this way the decision of choosing an error threshold value (i.e., an epsilon value in the OWS) that relies on the characteristics of the domain where trajectories were collected.
4 The proposed method
Figure 1 shows an overview of the Wise Sliding Window Segmentation (WSII) method, which has four core procedures: Generate Error Signal, Create Training Data, Binary Classification Model, and Majority Vote. First, the WSII creates the error signal from the labeled dataset, which is detailed in Section 4.1. The second step is to generate the training data using the error signal, by sliding a window over its values and adding the presence or absence of a partitioning position. This part is detailed in Section 4.2. The third step is to train a binary classifier to recognize the partitioning positions over the sequence of error signals. This part is detailed in Section 4.3. Finally, unlabeled trajectories can then be segmented based on the model learned in the previous step and using the majority vote, as detailed in Section 4.4.
4.1 Generating the error signal
The first step of our proposal is similar to the OWS algorithm [4], which creates a sliding window over a trajectory to compute a signal error between trajectory points. For each sliding window, the error is generated by calculating the deviation of the interpolated midpoint of the window from the actual midpoint. This process is repeated by sliding the window by one point forward, so receiving a new trajectory point, it adds the newer point to the window set and removes the oldest point from the set. An example of this process is shown in Figure 1(a).
In Figure 1(a), the green rectangle is a sliding window of size 7, the (red triangle), and (blue triangle) are the interpolated positions. is generated using extrapolation on the first three points () and is generated using the last three points inside the window (). The green dot () is assumed to be the missing point in the sliding window, while the (orange triangle) is generated as a middle point between and . The distance between the midpoint () and the missing point () is called the error value of this window. In the example of Figure 1(a), the haversine distance from the estimated position to the real position is visible. This may indicate that the moving object’s behavior has changed at position .
An example of the error signal from a trajectory is shown in Figure 1(b). A raw trajectory with 26 points that forms 20 sliding window of size 7 (generating error for point index 4 to 23) is displayed in this example. The first three and last three error values are dropped. Window index is the index of the middle trajectory point in each window. Figure 1(b) illustrates a situation in which they are several trajectory points (e.g., around trajectory points 8 and 12) along the raw trajectory where the estimated positions were far from the real trajectory positions. These boundaries are considered as potential partitioning positions for creating trajectory segments.
4.2 Creating Training Data
The second core procedure of WSII is to create a training dataset using the sequential error values extracted in the previous step. First, we create an array of size of error signals that will belong to the first training sample, and we use the ground truth information (i.e., if in this particular region there was a change in the behavior) to annotate the label of this sample. If this window includes a partitioning position, it is labeled as and otherwise. By receiving every new trajectory point, we remove one point from the start of our window and add the new point to the end of the window. Then we create our next sample by applying the same step of labeling when a partitioning position is present in the sliding window, and if it is not. This procedure is repeated until all the error signals are evaluated.
To understand how the labeling process works, we show an example in Figure 1(c). In this example, the training data are created for the sliding window built with seven ( to ) trajectory points over eleven slides (i.e., to ). As can be seen in Figure 1(c), from to , there was no big change in the error signal (ranging from 120 to 340 meters). In , the value of 560 characterizes a high jump in the estimated error and actually reflects a real change in the behavior of the moving object, resulting in a positive example (i.e., there is a partitioning position) in the training data. Examples from to are labeled as positive due to the presence of partitioning position in the sliding window. From , the samples are again labeled as negative examples due to the absence of a partitioning position in the data.
4.3 Binary Classification Model
A binary classifier is used by WSII to categorize each error signal sample into either a partitioning position or not. The labeled trajectory data created in the previous step is used to generate training samples for this binary classifier so that it can classify signal samples into a class where a sliding window has a partitioning position (e.g., value 1) or a class when it does not have a partitioning position (e.g., value 0).
It was observed that the error signal has its minimum fluctuations far from a partitioning position, and it has its maximum fluctuations while transitioning from one segment to a new one. Therefore, detecting the area that includes partitioning positions is an indicator that the behavior has changed. We apply the binary classifier to identify these areas over a trajectory that has the highest likelihood of containing partitioning positions. In this work, we used a Random Forest classifier
[1] to benefit from its bagging power while processing long window sizes faster by limiting the number of features. However, we emphasize that any classification model can be used in this step. After forecasting these transitioning areas, we use a majority vote mechanism to decide precisely where to place a partitioning position, explained in the next section.4.4 Majority Vote
At this step, we use the same sliding window of size to decide if a partitioning position occurred. Since we are using a window slide point by point, each trajectory point can be part of sliding windows, and we classify each window using the binary classifier. This means that we have outputs that the binary classifier generates for a trajectory point belongs to the windows. Using a majority vote mechanism for these outputs leads us to the final decision: the trajectory point is a partitioning position if more than 50% of the sampled signals are labeled as a partitioning position.
Leveraging this feature and applying the voting technique, we can have a more robust evaluation to support if a point is a partitioning position or not. The decision to identify a trajectory point as a partitioning position is supported by results, each of which contributes to the final decision. This means a misclassification of the binary classifier weights . Although increasing can make the algorithm more robust to noise, it will make it fail to identify segments with a length smaller than . Furthermore, the algorithm is more robust against noisy points, which may happen in trajectory data due to device collection errors.
An example of the advantages of the majority vote mechanism are exemplified in Figure 1(d), where a window with was used. In Figure 1(d), the column was forecast by the binary classifier for to . It is possible to see in Figure 1(d) that is decided by evaluating the column values from to (0,1,0,0,0,1,1). The decision regarding a majority vote for is equal to 0 since and . For deciding the final value of the lines from to are used. The evaluation of the set (1,0,0,0,1,1,1) through a majority vote ( and ) results in the decision of 1 (i.e., a partitioning position occurred). As previously stated, such strategy makes WSII robust against spatial jumps due to GPS error in the data collection process.
5 Experimental Evaluation
In this section, we evaluate the proposed method and compare it to stateoftheart approaches. In Section 5.1, we describe the datasets. In Section 5.2, we detail the experimental setup and we report the results in Section 5.3.
5.1 Datasets
We evaluate our method on three datasets. The first is a fishing dataset containing 5190 trajectory points and 153 segments, where fishing activity labels (e.g., fishing or notfishing) were provided by specialists and used to create trajectory segments. The second is the Atlantic hurricane dataset, which contains 1990 trajectory points and 182 segments. The SaffirSimpson scale was used to determine the type of hurricane, and the transitions from one hurricanelevel to another was used for creating trajectory segments. Finally, a subset of the Geolife dataset containing 12,955 trajectory points and 181 segments was used as a third dataset. For this dataset, we use the transportation mode as the ground truth for creating the segments. The reason that we did not use the full Geolife data set was that some of the segmentation algorithms, such as GRASPUTS were not able to provide segments in a reasonable time. We create a sample Automatic Identification System (AIS) data to debug our algorithm and test our code and made it available to public ^{1}^{1}1https://github.com/metemaad/WSII.
5.2 Experimental Setup
In this work we measure the trajectory segmentation performance using Harmonic mean of Purity and Coverage, introduced in [4]. The use of purity and coverage for trajectory segmentation performance measurement originally is introduced in [15]. We do not use clustering measures such as completeness and homogeneity since the segmentation task is different from clustering. In trajectory segmentation, the order of the segments is essential, and adjacent segments can come from the same cluster. For example, an object moving to a shopping store and going back home characterizes two segments, that would be in the same ”walk” cluster.
In each experiment, we divided the dataset into ten folds, one of which is applied to tuning/training the algorithm and the rest to testing its performance. Each fold contains different trajectories of different moving objects; therefore, we individually segment each trajectory and report the average results.
Since we divide data into ten folds, we calculate ten values for the Harmonic means. A boxplot is used to show the visual difference between these ten values for each algorithm, Figure 3
. Although the boxplot can show the difference between the performance of algorithms, we perform a Mann Whitney U test (having only ten numbers, we could not prove the data follows normal distribution, so we did not use Ttest) to show that the difference between the median of each set is not generated randomly.
The state of the art methods that we compare to our approach requires some parameterization. The input parameter values estimation for GRASPUTS was using a grid search with all combinations of values reported in [15]. For the SPD algorithm, we used the suggested parameters on the original paper for the subset of Geolife dataset, and for the rest of datasets we used a grid search to find the best parameters. For CBSMoT, we applied a grid search to tune parameters using the parameter tuning fold. For OWS, we have tested the four kernels (e.g., random walk, kinematic, linear, and cubic) and used the same strategy reported in [4] to find the best value of . We decided only to report the random walk kernel findings since it obtained good results for all datasets. For a fair comparison between OWS and WSII, we only report the WSII results with the random walk kernel. Details regarding the input parameter values ranges for all algorithms can be found in the following link ^{2}^{2}2https://github.com/metemaad/WSII
5.3 Results and discussion
Figure 3.a displays the results of executing different segmentation algorithms on the Fishing dataset. A Mann Whitney U test indicated that WSII produces statistically significant higher median ( =94.32, =0.9) harmonic mean for trajectory segmentation comparing to OWS with random walk kernel ( = 9.133e05, =89.04, =1.03). Therefore, the proposed method achieved better performance in comparison to other trajectory segmentation methods.
A fishing activity is characterized by several ship turns. We believe that WSII had a better result when compared with the other algorithms because of its capability to analyze not only a single trajectory point, but a larger region (i.e., a larger sliding window size). By analyzing a larger region, WSII will only place a partitioning position when a partitioning position actually occurred (e.g., learned from the training data). Since a single turn is not enough to characterize a fishing activity, WSII’s strategy of analyzing a larger window is more robust in learning such behavior.
In this experiment, we compare five trajectory segmentation algorithms: CBSMoT, SPD, GRASPUTS, OWS with Random Walk kernel, and our proposed trajectory segmentation algorithm (WSII) on Atlantic hurricane dataset. Figure 3.b shows that WSII performed better than all other algorithms. A Mann Whitney U test indicated that WSII produces statistically significant higher median (=94.68, =2.23) harmonic mean for trajectory segmentation comparing to OWS with random walk kernel (=9.1e05, =85.67, =0.59).
In this experiment, we applied all the segmentation algorithms on a subset of Geolife containing ten different users. Each user’s trajectory creates a fold and we use one fold to tune up our algorithm each time. Figure 3.c depicts our experiment results. Moreover, a Mann Whitney U test supports the claim that WSII produces statistically significant higher median ( =92.8, =2.11) harmonic mean for trajectory segmentation comparing to OWS with random walk kernel ( = 0.00065, =88.94, =5.06).
In the Geolife dataset, there are two major types of movement: (1) fast movements of buses, trains, and cars; and (2) slow movements of walk and bike, which have a random nature. The selection of a random walk seems to be a reasonable decision in this dataset because as long as a moving object moves slowly, the random walk kernel seems to reproduce the random nature of the movement. On the other hand, for the moving object that travels fast, the behavior of random walk is similar to a linear interpolation kernel in terms of direction because the direction variation decreases.
6 Conclusions
In this paper we presented a supervised method for trajectory segmentation named Wise Sliding Window Segmentation (WSII), that uses a trained model for deciding where partitioning positions should be placed. With the majority voting strategy the method becomes more robust to noise points and avoiding unnecessary partitions. The experimental results show that WSII achieves better performance in terms of a harmonic mean of purity and coverage when compared with stateofart trajectory segmentation algorithms in three datasets of different domains. One limitation of WSII, which is a limitation for all learning methods, is that several domains do not have a labeled dataset where the patterns of movement behavior change can be learned. Although there are tools in the literature that encourage and assist the user in the process of labeling trajectory data [13], most trajectory datasets still do not provide any type of ground truth for validating supervised methods. As future work, we would like test how this algorithm performs with different sample sizes for training, i.e., how large the labeled data needs to be to find good results.
Acknowledgments
This work was financed by the Brazilian Agencies CNPq, CAPEs(Project Big Data Analytics [CAPES/PRINT process number 88887.310782/201800]), FAPESC (Project Match cofinancing of H2020 Projects  Grant 2018TR 1266); the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 777695 (MASTER1); and the Natural Sciences and Engineering Research Council of Canada (NSERC).
References
 [1] (20011001) Random forests. Machine Learning 45 (1), pp. 5–32. External Links: ISSN 15730565, Document Cited by: §4.3.
 [2] (2020) Uncovering vessel movement patterns from ais data with graph evolution analysis. 2020. Cited by: §1.
 [3] (1996) A densitybased algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96, pp. 226–231. Cited by: §3.
 [4] (2019) A trajectory segmentation algorithm based on interpolationbased change detection strategies.. In EDBT/ICDT Workshops, Cited by: §1, §3, §4.1, §5.2, §5.2.

[5]
(2018)
Predicting transportation modes of gps trajectories using feature engineering and noise removal.
In
Canadian Conference on Artificial Intelligence
, pp. 259–264. Cited by: §1.  [6] (2017) Poi2vec: geographical latent representation for predicting future visitors. In ThirtyFirst AAAI Conference on Artificial Intelligence, Cited by: §1.

[7]
(2017)
ANALYTiC: an active learning system for trajectory classification
. IEEE computer graphics and applications 37 (5), pp. 28–39. Cited by: §1.  [8] (2018) A semisupervised approach for the semantic segmentation of trajectories. In 19th IEEE International Conference on Mobile Data Management (MDM), pp. 145–154. Cited by: §1.
 [9] (2007) Trajectory clustering: a partitionandgroup framework. In Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 593–604. Cited by: §1, §3.
 [10] (2013) Warped kmeans: an algorithm to cluster sequentiallydistributed data. Information Sciences 237, pp. 196–210. Cited by: §1, §3.

[11]
(1967)
Some methods for classification and analysis of multivariate observations.
In
Proceedings of the fifth Berkeley symposium on mathematical statistics and probability
, Vol. 1, pp. 281–297. Cited by: §3.  [12] (2008) A clusteringbased approach for discovering interesting places in trajectories. In Proceedings of the 2008 ACM symposium on Applied computing, pp. 863–868. Cited by: §1, §3.
 [13] (2019) VISTA: a visual analytics platform for semantic annotation of trajectories.. In EDBT, pp. 570–573. Cited by: §6.
 [14] (2019) CRISIS: integrating ais and ocean data streams using semantic web standards for event detection. International Conference on Military Communications and Information Systems. Cited by: §1.
 [15] (2015) GRASPuts: an algorithm for unsupervised trajectory segmentation. International Journal of Geographical Information Science 29 (1), pp. 46–68. Cited by: §1, §1, §3, §5.2, §5.2.
 [16] (2019) A network abstraction of multivessel trajectory data for detecting anomalies.. In EDBT/ICDT Workshops, Cited by: §1.
 [17] (2011) Recommending friends and locations based on individual location history. ACM Transactions on the Web (TWEB) 5 (1), pp. 5. Cited by: §1, §3.