1 Introduction
In the last two decades, there has been a growing interest in analyzing driving data and understanding driving behavior, with researchers and practitioners finding new applications for this type data. In the car insurance sector, measuring how a client drives is a cornerstone of the new usagebase insurance (UBI) schemes, which provide a more custom pricing by taking into account driving behavior instead of external proxies such as sex and years of driving experience. In fleet management and fuel consumption optimization, studying the relationship between driving behavior and fuel consumption can improve driving performance and reduce costs. Regulators and policy makers can also leverage driving data to understand which factors are associated with accidents and improve road safety with better regulation. Analyzing how selfdriving cars perform can help developers understand what is working correctly with the autonomous system and which areas need to be improved.
A common approach to get insights about driving performance from driving data is to analyze maneuvers. The rationale is that the set and frequency of the maneuvers performed during a trip and the way they are executed can provide relevant information about the driving behavior of the driver during the trip. So far, the driving data normally used in this task is highfrequency telematics (also called automobile sensor data), such as GPS location, velocity and acceleration, and video recordings.
In a previous article (silva_finding_2020), we argued that using timeseries motifs detection algorithms to extract maneuvers from highfrequency telematics was more adaptable to small fluctuations in the data than previous methods and had the advantage of not requiring labels, which is extremely timeconsuming to collect. We also noted that analyzing maneuvers through motif detection in telematics data is a promising area of research that is yet to be fully explored.
Recently, jain_masa_2019 proposed a general method to discover motifs in noisy timeseries and, in one of their casestudies, they concluded that their method was capable of identifying turn maneuvers from automobile sensor data. This work further validates our claim that motifs extracted from driving sensor data are highly related to the actual maneuvers performed.
In this paper, we expand the work done in (silva_finding_2020) by proposing TripMD, a complete motif extraction and exploration system that is tailored for the task of analyzing maneuvers and driving behaviors. Other authors have looked into the task of maneuver detection using timeseries motifs (schwarz_time_2017; jain_masa_2019), however, none of these works propose a full system that extracts motifs from automobile sensor data and summarizes the information in a spaceefficient visualization.
Particularly, our main contributions are threefold:

We present TripMD, a motif detection and summarization system that was designed to extract relevant driving patterns from a set of trips. This is the first system that not only extracts but also summarizes the main motifs of the provided trips, which allows for an easy investigation of the maneuvers being performed.

Using the UAHDriveSet naturalistic driving dataset (romera_need_2016), we apply our system to the trips performed by a single driver and show that it is capable of extracting a rich set of driving patterns. We also show that these patterns can be used to distinguish between three different driving behaviors of the driver.

We demonstrate that using the patterns extracted by TripMD, we are capable of identifying the driving behavior of an unknown driver from a group of drivers whose behavior we know. In other words, the association between driving patterns and driving behavior achieved with TripMD can generalize to unseen drivers. This second investigation was also done with the UAHDriveSet dataset (romera_need_2016).
The rest of the paper is organized as follows. In Section 2, we provide an overview of timeseries motifs and motif detection algorithms, which will be helpful to understand TripMD. In Section 3, we describe TripMD in detail. Section 4 is reserved for two experiments where we showcase our system and demonstrate its usefulness. And Section 5 concludes this work and introduces some ideas for future work.
2 Preliminaries
In simple terms, a timeseries motif is a repeated pattern in the timeseries that carries information about the underlying process that generated the timeseries. Based on this general definition, there are two main ways of defining how relevant a repeated pattern is, namely based on support or based on similarity (mueen_time_2014). In the supportbased definition, the most relevant pattern is the one with the highest number of repetitions, while, in the similaritybased definition, the most relevant pattern is the one with the most identical repetitions. Therefore, the supportbased definition extracts more frequent patterns and the supportbased definition extracts more similar patterns.
There are two additional constrains that a pattern needs to meet to be considered a timeseries motif (lin_finding_2002). Firstly, two subsequences that belong to the same motif cannot overlap in time. This nonoverlapping constraint is set to avoid trivial matchings. Secondly, two subsequences need to be at a distance smaller than a predefined radius to be considered a match (and thus to belong to the same pattern).
Note that this second constraint is highly tailored to the usecase. On the onehand, in most motif detection algorithms, the radius is a parameter that needs to be defined by the user. On the other hand, there are many distances that can be used to compute similarity between subsequences (wang_experimental_2013; serra_empirical_2014) and the user must decide which distance is the most suitable to the specific usecase.
The final constrain appears when the task is not to look for a single motif but to more than one motif. In this case, based on the motif definition, it is possible to order motifs based on their relevance and to extract the top most important motifs, which are named the motifs. However, any two motifs can only coexist in the list of motifs if their centers (the subsequence that better represents the motif) have a distance higher than , where is the radius used to define the motif’s matches.
In terms of the distance functions, the most used are the Euclidean distance (das_rule_1998) and the Dynamic Time Warping (DTW) distance (berndt_using_1994)
. The Euclidean distance performs an onetoone comparison of single points from the same time location and, because of this, it is very efficient. However, the sequences being compared need to have the same size (or being padded at the end) and the distance is not robust to timeshifts, distortions or differences in phase. On the contrary, the DTW distance is capable of dealing with variablelength sequences and other misalignments by finding an optimal time mapping between the sequences that are being compared. However, this flexibility comes at the cost of efficiency, which is a major concern when analyzing timeseries data.
Independently of the distance used, because motif discovery is a task that involves comparing all possible pairs of timeseries subsequences, it is very computationally expensive. Thus, a lot of work in motif detection has been focused on making this search more efficient. The most used technique is to reduce the search space by converting the timeseries into a lowdimensional representation where the true distance is approximately maintained. Then, we can prune motif candidates in this reduced space and search for the final motifs in the reduce group of candidates. The Symbolic Aggregate approXimation (lin_experiencing_2007), or SAX, is the standard example of the this technique. It starts by braking the original timeseries into fixedsized sliding windows and then converting each window into a sequence of letters.
Using the Matrix Profile (or MP) (yeh_matrix_2016) is another commonly used strategy to speedup the search for the motifs. The MP is a meta timeseries that annotates the original timeseries by providing the distance and the index of each fixedsize sequence’s nearest neighbors, excluding trivial matches. Note that the size of the sequence is the only parameter of the method and the distance used is the Euclidean distance. There are many efficient and fast implementations of the MP, either approximate or exact, and after the MP is computed, extracting similaritybased motifs is trivial.
When working with telematic data, it is common to have data from several sensors such as the accelerometer and the velocimeter. Even in the case of the accelerometer, one still has two distinct timeseries, namely the lateral and the longitudinal acceleration. Therefore, finding maneuvers in telematic data is a multidimensional problem and as such, we need to apply techniques for detecting multidimensional motifs.
In their work, tanaka_discovery_2005 suggested to apply a dimensionality reduction technique in order to reduce the multidimensional timeseries into a single dimension, which would simplify the problem back to the onedimensional case. This is a smart approach as it can easily leverage all the existing motif detection algorithms. However, it has the drawback of information loss. If the timeseries data contains relevant information in more than one dimension at the same time (which is our case when using acceleration data), then we won’t be able to capture all the relevant motifs with this approach.
Another technique widely used in the multidimensional setting is to apply an onedimensional motif detection algorithm in each dimension independently and then look for cooccurrences to extract the multidimensional motifs (minnen_detecting_2007; vahdatpour_toward_2009; balasubramanian_discovering_2016; liu_multidimensional_2017). This setup is much more accurate (since there is no information loss) and it is more flexible (since the search for cooccurrences can be done with an allowance for asynchronous motifs and the rejection of uninformative dimensions). However, this setup is more computationally expensive.
Instead of working on a two step approach, other authors search for the multidimensional motifs directly by concatenating all the dimensions. For instance, minnen_detecting_2007 compute the sax representation for each dimension and concatenate their strings, while in (yeh_matrix_2017) the authors define a new Matrix Profile, the kdimensional MP, that encodes the distance to each sequence’s closest neighbor, taking into account k dimensions.
When the goal is to analyze maneuvers, one needs to be able to extract variablelength motifs. In other words, because the same maneuver does not always take the exact same time, it is important to have flexible methods that can extract motifs of different lengths. Even though fixedlength motifs have been the most explored so far, there are some algorithms for the variablelength case. Most authors propose to apply a fixedlength algorithm in a range of window sizes and then choose the most representative motifs based on their ranking scheme. The work of nunthanid_parameterfree_2012 and gao_exploring_2018 are two examples of this approach. Note, however, that this approach does not work for maneuvers since we cannot extract in the same motif sequences of different sizes. Lin’s grammarbased method (lin_finding_2010) and Tanaka’s EMD algorithm (tanaka_discovery_2005) take a different approach. They adapt the sequences’ representation in the lowdimensional space in order to take into consideration variablelength patterns. However, this adaption leads to algorithms that are not exact, which means that there is no guarantee that the method can find all the variablelength motifs in a given timeseries.
3 TripMD
TripMD is a system for extracting and analyzing maneuvers from trips performed by a single driver. To achieve this, we needed algorithms that worked on multidimensional timeseries and, at the same time, were able to extract and analyze variablelength patterns. Having this in mind, our solution has the following two components:

A motif extraction algorithm inspired by the algorithm created by tanaka_discovery_2005, which was tailored and tuned for the maneuver detection usecase. It includes a discrete variablelength representation (variable SAX) based on the widely used SAX (lin_experiencing_2007)
and an iterative pattern matching process that extracts motifs in multiple dimensions.

A motif clustering and visualization tool based on the SelfOrganizing Map model that extracts the most relevant motif patterns and permits the user to quickly analyze them..
In our motif extraction algorithm, we use the supportbased definition. In other words, for a certain variable SAX (VSAX) pattern, its motif is the biggest group of nonoverlapping variablelength subsequences with that VSAX representation and in which all the subsequences have a distance lower than a predefined radius to the motif’s center. Because we are working with variablelength motifs, we use the Dynamic Time Warping (DTW) distance (berndt_using_1994) to measure similarity between two multidimensional subsequences.
In the following subsections, we’ll go through each component in more detail.
3.1 Motif extraction
3.1.1 Variable SAX
Variable SAX (VSAX) is a timeseries discretization method that transforms a timeseries into a sequence of symbols that captures the general behavior of the original timeseries. It serves two main purposes. Firstly, by providing a discretization of the timeseries, it allows for a more efficient motif search and, at the same time, reduces the impact of small levels of noise. Secondly, it is capable of splitting the timeseries into subsequences of variable lengths depending on the underlying behavior of the timeseries, which allows a simple pattern matching algorithm to find variablelength motifs.
Figure 1 illustrates the main steps using a onedimensional timeseries. Initially, the timeseries is split into fixedlength sliding windows. The length of the window is one of the parameters of VSAX, the default letter size. Then, the values in each sliding window are averaged to obtain a discrete value for that window, which in turn is converted to a symbol based on a predefined segmentation of the timeseries domain. After obtaining the fixedsized sequence of symbols, the pruning phase concatenates all consecutive sliding windows with the same symbol into a single window. The rationale is that if two consecutive windows have similar behaviors (which translates into being transformed into the same symbol), then they should be a single window and be considered together when searching for motifs. Thus, in the end, we have a sequence of symbols that map to variablelength subsequences of the original timeseries and that encode information about the general behavior of the subsequences.
The segmentation of the timeseries domain is similar to the way it is done in SAX (lin_experiencing_2007)
. Breakpoints are determined based on the timeseries’ values and these breakpoints define regions in the timeseries domain that map to specific symbols. In the SAX representation, breakpoints are defined so that all regions have equal probability under a Gaussian distribution. However, VSAX uses specific percentiles of the timeseries’ values to define five regions, namely the 5th, 15th, 85th and 95th percentiles. In general, a driver spends less time performing maneuvers than he does not performing any maneuver and, thus, defining breakpoints that evenly distribute timeseries’ values among the regions does not lead to good results in the maneuver detection task. Additionally, since the percentiles are computed over all the trips, any two windows with the same symbol will be guaranteed to be in the same domain region. This is another change compared to SAX, where the breakpoints are computed independently for each window.
Finally, for multidimensional timeseries, VSAX can be applied separately to each onedimensional timeseries and then concatenate the resulting symbols in a tuple. For instance, a subsequence of a twodimensional timeseries would be mapped to a tuple of two symbols, one for each dimension. Note however that in the multidimensional case, the pruning phase is applied in all the dimensions at the same time. In other words, two consecutive subsequences are only merged if they have the same symbols in all the dimensions. Thus, in this case, each VSAX symbol corresponds to a single variablelength multidimensional subsequence of the original timeseries.
3.1.2 Motif search
The motif search is an iterative process that extracts the motifs of all possible sizes from a VSAX sequence. It was inspired by the motif detection algorithm proposed by tanaka_discovery_2005. At each iteration, it discovers all the motifs with a certain number of VSAX symbols (the pattern size) and then moves to next iteration by increasing the pattern size by one. The minimum pattern size is a parameter of the method and the iteration stops when no more motifs with the current pattern size can be found.
Given a certain pattern size, the motif search applies three steps, which are summarized in Figure 2 with a concrete example. Firstly, the VSAX sequence is split into a list of pattern words of the given size. Then, for each unique pattern word, all the subsequences with that same pattern are extracted to make the pool of the motif’s members candidates. Finally, if it exists, the motif related to those subsequences is computed and added to the list of motifs. Recall that a set of candidate subsequences can only be members of a motif under two conditions:

All the pairs of candidate subsequences do not overlap in time. This is to avoid the trivial matchings discussed in the preliminaries section.

All candidate subsequences are within a predefined radius of the motif’s center.
Thus, for each candidate subsequent, all the nonoverlapping candidates that are within from the initial candidate are extracted and stored. The final motif is defined to be the set of subsequences with the most members and the motif’s center is the original candidate that generated that set of members. If no set of more than one members is found, then the motif for that specific pattern does not exist.
3.2 Motif summarization
In a previous work (silva_exploring_2020), we proposed a new dimensionality reduction method to summarize and explore the outputs of any motif detection algorithm. The method, called DTWSOM, is a vanilla SelfOrganizing Map (kohonen_selforganizing_2001) with some adaptions to work with timeseries motifs. It receives a list of variablelength multidimensional motifs and produces a clustering of the motifs’ centers and a visualization of the results that is spaceefficient.
TripMD leverages the DTWSOM algorithm to group all the motifs found by the motif search process and to provide a visual summary of the most relevant motifs in the trips under analysis. By summarizing the extracted motifs, the user is able quick analyse the main patterns that ere extracted and can better interpret the maneuvers being performed.
However, there is an important step before applying this method. DTWSOM includes two initialization routines, a random initialization, in which the DTWSOM network is initialized with a random sample of the motifs, and an anchor initialization, in which the user provides a smaller set of the most relevant motifs. Since previous experiments indicated that the anchor initialization was more stable than the random initialization (silva_exploring_2020), TripMD uses the anchor initialization and thus it has to include a motif pruning step that computes the most relevant motifs, which are used as the anchors.
The pruning routine used in TripMD is based on the definition of motifs and the nonoverlapping requirement discussed in Section 2. Given a natural ordering of all the extracted motifs, the motif is the highest ranking motif whose center has a distance higher than to each the motifs’ centers, for . In this case, the ordering is defined with the MDL cost proposed by tanaka_discovery_2005. This score is based on the Minimum Description Length (MDL) principle (rissanen_stochastic_1998) and the lowest the score, the more relevant the motif is. After pruning, the most relevant motifs are used to initialize the DTWSOM and all the motifs are fed into the algorithm.
TripMD also imposes a constrain on the distance computation of DTWSOM. When the DTW distance is searching for an optimal match between two subsequences, TripMD limits the maximum time wrapping allowed, which leads to higher distances between subsequences with a misalignment higher than this threshold.
3.3 Parameter estimation
So far, TripMD seems to have some parameters that a user needs to set beforehand. VSAX has the default letter size, the motif search has the radius
to define the motifs and the minimum pattern size that initializes the search and DTWSOM has the number of training epochs and the maximum warping window. However, tuning all these parameters requires some time and expert knowledge, which is not userfriendly. Therefore, based on our experience with a real naturalistic driving dataset, we set some sensible default values for these parameters:

default letter size: 1 second

minimum pattern size: 3 VSAX letters

Motif radius: 0.5th percentile of the distance between all pairs of 3 second subsequences

number of epochs for DTWSOM: 20

maximum warping for DTWSOM: VSAX’s default letter size (or 1 second)
Thus, the only parameters that the user must provide is the frequency in Hertz of the timeseries provided as input, which is trivial. Then, TripMD estimates all the remaining parameters. However, if the user has some particularity in his dataset that makes the default parameters unreasonable, there’s always possibility of overriding the defaults provided by the TripMD.
4 Evaluation and discussion
To evaluate TripMD, we use the UAHDriveSet (romera_need_2016), a publicly available naturalistic driving dataset including recorded trips from six different drivers that traveled in two specific routes in Madrid, Spain. The authors asked the volunteers to drive in these two routes mimicking three different driving behaviors  normal, aggressive and drowsy. Using their DriveSafe app (bergasa_drivesafe:_2014; romera_realtime_2015), the authors collected raw data from the accelerometer, GPS and camera of a smartphone mounted in the car and processed these signals to enrich the final dataset.
In the first experiment, we pick a single driver and explore in detail the outputs obtained from TripMD. Particularly, we do an exploratory analysis of the motifs extracted by TripMD and showcase the visualizations provided by our system.
In the second experiment, we focus on the task of identifying driving behaviors. We apply TripMD to the entire UAHDriveSet. Then, using the known driving behaviors of all but one driver, we assign behavior scores to each motif cluster. Finally, we use those cluster behavior scores and the motifs extracted from the leftout driver to predict the behavior of each of that driver’s trip.
In both experiments, we use the twodimensional timeseries of the lateral and longitudinal acceleration recordings. The recordings are already aligned with the correct car axis and denoised with a Kalman filter, which means we can use them directly. The data has a frequency of 10Hz, however, in order to speed computation and further reduce noise, we downsample the timeseries to a 5HZ frequency. Additionally, we use all the default parameters for TripMD as we found that they work well for this dataset.
The code to reproduce all the experiments can be consulted in our repository ^{1}^{1}1https://github.com/misilva73/tripMD.
4.1 Analyzing a single driver with TripMD
To showcase how our system can be used to explore the driving behavior of a single person, we run TripMD on the seven trips performed by one of the drivers in the UAHDriveSet. This driver completed four trips in the secondary road route (two normal, one aggressive and one drowsy) and three trips in the motorway route (one for each driving behavior).
The motif detection component found 281 motifs and, from these, 17 motifs were used to initialized the DTWSOM model. In other words, the trips contained 17 significantly distinct driving patterns and all the other 264 motifs can be assigned to one of these patterns. DTWSOM builds an optimal assignment and provides a visualization of the clusters in a twodimensional grid (or network) that conserves the local similarity of the data. This means that two neighboring cluster in the twodimensional network are similar.
Figure 3 shows the first visualizations provided by TripMD for the driver. It contains the lateral and longitudinal acceleration of each DTWSOM unit, placed in the twodimensional network. A unit here is a multidimensional subsequence that represents the cluster in a particular part of the DTWSOM grid and thus this plot provides a summary of the main driving patterns extracted from the driver.
From this first chart, we can already see that TripMD is able of identifying a rich set of driving patterns, with lengths ranging from 2 to 3 seconds. It includes simple maneuvers, such as unit 22 that relates with a simple left turn without changes in longitudinal acceleration, and more complex maneuvers, for instance, unit 0 that corresponds to a right turn with acceleration. Additionally, the grid maintains some structure similarity. As an example, the neighboring units 15, 16, 20 and 21 all have similar driving patterns, with a clear brake maneuver a slightly positive lateral acceleration.
Figure 4 contains the two additional visualizations provided by TripMD for the driver. These plots are classical ways of visualizing a SOM network and represent different information about each of the clusters arranged in the twodimensional network. In both charts, the arrangement of the units in the twodimensional grid is consistent to Figure 3.
The first chart is called UMatrix and it shows how similar each unit is to its direct neighbors in the twodimensional network, where the brighter the color, the closer a unit is to its neighbor. This visualization is helpful to understand where are the major groups of clusters within the network. For instance, the upperright corner has a clearly defined groups of four cluster that are very similar, which corresponds to the units 15, 16, 20 and 21 with the brake maneuver discussed above.
The second chart is called Winner Matrix and it provides information about the cluster size of each unit. Particularly, it displays the exact number of motifs in each cluster on top of corresponding unit. This plot can be used to gauge how relevant each driving pattern is. For instance, unit 24 in the lower right corner contains no motifs, which means that this pattern is not needed to summarize the driver’s behavior.
Besides these default TripMD plots, Figure 5 contains information about the distribution of each driving behavior in the clusters extracted by our system. For each driving behavior, we compute the number of motif subsequences from the trips with that behavior that belong to each DTWSOM cluster. Then, we divide each cluster count by the total number of motif subsequences for all the driver’s trips that belong to that cluster to achieve the rate presented in the plots. So, for instance, 80% of the motif subsequences associated to motifs that belong to the cluster 4 come from trips with a drowsy behavior.
Interestingly, we can see that the three driving behaviors have very different distributions of their motif subsequences among the clusters. Clusters 11 and 17 have a clear majority of subsequences from normal trips and these clusters relate to a ”no maneuver” pattern and a soft brake, respectively.
The aggressive trips cover a higher variety of patterns, with 7 clusters showing a clear majority of subsequences from these trips. This increase in representation is expected as more motif subsequences will be extracted from trips where the driver performs more maneuvers. Most of these 7 clusters contain sharp acceleration patterns, which is usually associated with aggressive driving. Examples are the right turn with a pronounced brake in unit 20, the brakeacceleration pattern in unit 6 and the quick acceleration in unit 1. These sharp acceleration maneuvers without lateral movements are specially telling as they are associated with tailgating behavior, which in turn is a classical aggressive driving behavior.
Finally, the clusters with higher rates of subsequences coming from drowsy trips are located in the lower row of the DTWSOM grid, excluding unit 24. Units 14 and 18 contain a drift pattern, which is made of two consecutive lateral movements in opposing sides. It is very interesting to see these patterns here as they are usually present in cases where a tired driver lets the car deviate from a lane and then quickly recovers with a sharp turn.
4.2 Identifying driving behavior with TripMD
From the first experiment, we see that TripMD can summarize the trips from a single driver so that different driving behaviors can be identified. However, to further test our system, we focus on a harder task, namely, identifying the driving behavior of an unknown driver from a set of drivers whose behavior we know.
To accomplish this, we apply TripMD to the entire UAHDriveSet and retrieve the main driving patterns of all those trips. Then, using the known driving behavior of five drivers (the train drivers), we derive scores for all the trips of the remaining driver (the test driver). This test driver was the same used in the first experiment. For each trip of the test driver, a single score is computed for each of the three behaviors  normal, aggressive and drowsy. To compute the score for a specific test trip and a given behavior , we use the following process:

For each DWTSOM cluster :

Compute the rate , where is the number of subsequences in cluster that belong to train trips of the behavior and is the total number of subsequences in cluster that belong to train trips.

Compute , which is the number of motif subsequences in cluster that belong to test trip.

Derive the behavior score of cluster as .


Compute the trip’s behavior score as , where k is the number of DWTSOM clusters.
After computing the three behavior scores for a test trip, its predicted behavior is simply the behavior with the highest score. Finally, we compare the predicted behavior of each test driver’s trips with the real behavior performed in those trips. Table 1 summarizes the results.
Route  Behavior 





Motorway  Normal  94.1  78.3  95.5  
Secondary  Normal  35.5  41.1  43.3  
Secondary  Normal  47.0  35.9  48.1  
Motorway  Aggressive  103.1  72.6  96.3  
Secondary  Aggressive  91.0  52.2  77.8  
Motorway  Drowsy  97.5  137.9  120.6  
Secondary  Drowsy  69.7  81.9  82.4 
The test driver contains seven trips and, from these, TripMD is capable of assigning the correct behavior to all but one trip. The trip where the predicted behavior does not match the real behavior is the drowsy trip of the secondary road. However, looking at the scores, we see that the normal behavior score (which is the predicted behavior) is very close to the drowsy behavior score (which is the real behavior).
To further illustrate these results, Figure 6 shows the behavior rates of the DTWSOM clusters for the train drivers and the test driver. Here we can observe that the behavior distributions of the train drivers are close to behavior distributions of the test driver. For instance, units 14 has a high normal behavior rate in both sets of trips, units 13, 25 and 32 have similarly high aggressive behavior rates and units 2, 8 and 21 have high drowsy behavior rates for both the test and train drivers.
5 Conclusion
In this paper, we propose a system called TripMD, which identifies the main driving patterns from sensor recordings such as acceleration and velocity. The system is made of two different components. The first is a motif detection algorithm tailored for the task of maneuver detection and it extracts all the variablelength patterns from the original trip recordings. The second component is a motif clustering and visualization model that summarizes the motifs discovered from the first component so that the user can quickly understand the mains driving patterns performed in the original trips.
Compared to previous work, our system not only extracts the timeseries patterns present in trips recordings but is also capable of summarizing those patterns in a spaceefficient visualization that allows for an easy investigation. This feature is highlighted in our first experiment, where we demonstrate that TripMD can discover a wide range of driving patterns from the trips performed by a single driver of the UAHDriveSet dataset. We also conclude that the three driving behaviors marked in the dataset (normal, aggressive and drowsy) have distinct distributions among the extracted driving patterns. For instance, the aggressive behavior was more prevalent in sharp acceleration patterns and the drowsy behavior was more frequent in two patterns very similar to a drift maneuver. This further supports that usefulness and consistency of the driving patterns provided by TripMD.
We also show that TripMD can determine the driving behavior of a driver from the behaviors of other drivers. Particularly, we (1) apply our system to all the trips in the UAHDriveSet (which contain six drivers), (2) compute behavior rates for each driving pattern extracted by TripMD using the known behavior of five drivers and (3) derive behavior scores for the remaining driver, which are used to predict his behavior. From the seven trips performed by the test driver, we identify correctly the driving behavior of six trips.
Even though the results seem promising, there are still areas of improvement. Firstly, because of the Variable SAX discretization, the motif detection algorithm used in TripMD is not an exact algorithm. This means that we cannot guarantee to extract all the variablelength motifs. There are some new motif detection algorithms that claim to be exact, however, we could not find one that was capable of extracting motifs with member’s subsequences of difference sizes. Thus, investigating exact motif detection algorithms that work with variablelength motifs could be an interesting line for improvement.
Additionally, we should further test TripMD with more datasets and different tasks, such as understanding whether TripMD can be used to distinguish between drivers with prior accidents from drivers without accidents and to inform car insurance pricing models. Testing TripMD with different dataset would also be helpful to further validate and finetune the default parameters of the system.
Acknowledgments
This research did not receive any specific grant from funding agencies in the public, commercial, or notforprofit sectors.