Log In Sign Up

TripMD: Driving patterns investigation via Motif Analysis

by   Maria Inê Silva, et al.

Processing driving data and investigating driving behavior has been receiving an increasing interest in the last decades, with applications ranging from car insurance pricing to policy making. A common strategy to analyze driving behavior analysis is to study the maneuvers being performance by the driver. In this paper, we propose TripMD, a system that extracts the most relevant driving patterns from sensor recordings (such as acceleration) and provides a visualization that allows for an easy investigation. Additionally, we test our system using the UAH-DriveSet dataset, a publicly available naturalistic driving dataset. We show that (1) our system can extract a rich number of driving patterns from a single driver that are meaningful to understand driving behaviors and (2) our system can be used to identify the driving behavior of an unknown driver from a set of drivers whose behavior we know.


Multimodal Driver State Modeling through Unsupervised Learning

Naturalistic driving data (NDD) can help understand drivers' reactions t...

Driving Behavior Analysis through CAN Bus Data in an Uncontrolled Environment

Cars can nowadays record several thousands of signals through the CAN bu...

Characterizing driving behavior using automatic visual analysis

In this work, we present the problem of rash driving detection algorithm...

Statistical Characteristics of Driver Accelerating Behavior and Its Probability Model

The naturalistic driving data are employed to study the accelerating beh...

Characterizing Driving Context from Driver Behavior

Because of the increasing availability of spatiotemporal data, a variety...

Discovering and Explaining Driver Behaviour under HoS Regulations

World wide transport authorities are imposing complex Hours of Service r...

1 Introduction

In the last two decades, there has been a growing interest in analyzing driving data and understanding driving behavior, with researchers and practitioners finding new applications for this type data. In the car insurance sector, measuring how a client drives is a cornerstone of the new usage-base insurance (UBI) schemes, which provide a more custom pricing by taking into account driving behavior instead of external proxies such as sex and years of driving experience. In fleet management and fuel consumption optimization, studying the relationship between driving behavior and fuel consumption can improve driving performance and reduce costs. Regulators and policy makers can also leverage driving data to understand which factors are associated with accidents and improve road safety with better regulation. Analyzing how self-driving cars perform can help developers understand what is working correctly with the autonomous system and which areas need to be improved.

A common approach to get insights about driving performance from driving data is to analyze maneuvers. The rationale is that the set and frequency of the maneuvers performed during a trip and the way they are executed can provide relevant information about the driving behavior of the driver during the trip. So far, the driving data normally used in this task is high-frequency telematics (also called automobile sensor data), such as GPS location, velocity and acceleration, and video recordings.

In a previous article (silva_finding_2020), we argued that using time-series motifs detection algorithms to extract maneuvers from high-frequency telematics was more adaptable to small fluctuations in the data than previous methods and had the advantage of not requiring labels, which is extremely time-consuming to collect. We also noted that analyzing maneuvers through motif detection in telematics data is a promising area of research that is yet to be fully explored.

Recently, jain_masa_2019 proposed a general method to discover motifs in noisy time-series and, in one of their case-studies, they concluded that their method was capable of identifying turn maneuvers from automobile sensor data. This work further validates our claim that motifs extracted from driving sensor data are highly related to the actual maneuvers performed.

In this paper, we expand the work done in (silva_finding_2020) by proposing TripMD, a complete motif extraction and exploration system that is tailored for the task of analyzing maneuvers and driving behaviors. Other authors have looked into the task of maneuver detection using time-series motifs (schwarz_time_2017; jain_masa_2019), however, none of these works propose a full system that extracts motifs from automobile sensor data and summarizes the information in a space-efficient visualization.

Particularly, our main contributions are threefold:

  • We present TripMD, a motif detection and summarization system that was designed to extract relevant driving patterns from a set of trips. This is the first system that not only extracts but also summarizes the main motifs of the provided trips, which allows for an easy investigation of the maneuvers being performed.

  • Using the UAH-DriveSet naturalistic driving dataset (romera_need_2016), we apply our system to the trips performed by a single driver and show that it is capable of extracting a rich set of driving patterns. We also show that these patterns can be used to distinguish between three different driving behaviors of the driver.

  • We demonstrate that using the patterns extracted by TripMD, we are capable of identifying the driving behavior of an unknown driver from a group of drivers whose behavior we know. In other words, the association between driving patterns and driving behavior achieved with TripMD can generalize to unseen drivers. This second investigation was also done with the UAH-DriveSet dataset (romera_need_2016).

The rest of the paper is organized as follows. In Section 2, we provide an overview of time-series motifs and motif detection algorithms, which will be helpful to understand TripMD. In Section 3, we describe TripMD in detail. Section 4 is reserved for two experiments where we showcase our system and demonstrate its usefulness. And Section 5 concludes this work and introduces some ideas for future work.

2 Preliminaries

In simple terms, a time-series motif is a repeated pattern in the time-series that carries information about the underlying process that generated the time-series. Based on this general definition, there are two main ways of defining how relevant a repeated pattern is, namely based on support or based on similarity (mueen_time_2014). In the support-based definition, the most relevant pattern is the one with the highest number of repetitions, while, in the similarity-based definition, the most relevant pattern is the one with the most identical repetitions. Therefore, the support-based definition extracts more frequent patterns and the support-based definition extracts more similar patterns.

There are two additional constrains that a pattern needs to meet to be considered a time-series motif (lin_finding_2002). Firstly, two subsequences that belong to the same motif cannot overlap in time. This non-overlapping constraint is set to avoid trivial matchings. Secondly, two subsequences need to be at a distance smaller than a predefined radius to be considered a match (and thus to belong to the same pattern).

Note that this second constraint is highly tailored to the use-case. On the one-hand, in most motif detection algorithms, the radius is a parameter that needs to be defined by the user. On the other hand, there are many distances that can be used to compute similarity between subsequences (wang_experimental_2013; serra_empirical_2014) and the user must decide which distance is the most suitable to the specific use-case.

The final constrain appears when the task is not to look for a single motif but to more than one motif. In this case, based on the motif definition, it is possible to order motifs based on their relevance and to extract the top- most important motifs, which are named the -motifs. However, any two motifs can only coexist in the list of -motifs if their centers (the subsequence that better represents the motif) have a distance higher than , where is the radius used to define the motif’s matches.

In terms of the distance functions, the most used are the Euclidean distance (das_rule_1998) and the Dynamic Time Warping (DTW) distance (berndt_using_1994)

. The Euclidean distance performs an one-to-one comparison of single points from the same time location and, because of this, it is very efficient. However, the sequences being compared need to have the same size (or being padded at the end) and the distance is not robust to time-shifts, distortions or differences in phase. On the contrary, the DTW distance is capable of dealing with variable-length sequences and other misalignments by finding an optimal time mapping between the sequences that are being compared. However, this flexibility comes at the cost of efficiency, which is a major concern when analyzing time-series data.

Independently of the distance used, because motif discovery is a task that involves comparing all possible pairs of time-series subsequences, it is very computationally expensive. Thus, a lot of work in motif detection has been focused on making this search more efficient. The most used technique is to reduce the search space by converting the time-series into a low-dimensional representation where the true distance is approximately maintained. Then, we can prune motif candidates in this reduced space and search for the final motifs in the reduce group of candidates. The Symbolic Aggregate approXimation (lin_experiencing_2007), or SAX, is the standard example of the this technique. It starts by braking the original time-series into fixed-sized sliding windows and then converting each window into a sequence of letters.

Using the Matrix Profile (or MP) (yeh_matrix_2016) is another commonly used strategy to speed-up the search for the motifs. The MP is a meta time-series that annotates the original time-series by providing the distance and the index of each fixed-size sequence’s nearest neighbors, excluding trivial matches. Note that the size of the sequence is the only parameter of the method and the distance used is the Euclidean distance. There are many efficient and fast implementations of the MP, either approximate or exact, and after the MP is computed, extracting similarity-based motifs is trivial.

When working with telematic data, it is common to have data from several sensors such as the accelerometer and the velocimeter. Even in the case of the accelerometer, one still has two distinct time-series, namely the lateral and the longitudinal acceleration. Therefore, finding maneuvers in telematic data is a multidimensional problem and as such, we need to apply techniques for detecting multidimensional motifs.

In their work, tanaka_discovery_2005 suggested to apply a dimensionality reduction technique in order to reduce the multidimensional time-series into a single dimension, which would simplify the problem back to the one-dimensional case. This is a smart approach as it can easily leverage all the existing motif detection algorithms. However, it has the drawback of information loss. If the time-series data contains relevant information in more than one dimension at the same time (which is our case when using acceleration data), then we won’t be able to capture all the relevant motifs with this approach.

Another technique widely used in the multidimensional setting is to apply an one-dimensional motif detection algorithm in each dimension independently and then look for co-occurrences to extract the multidimensional motifs (minnen_detecting_2007; vahdatpour_toward_2009; balasubramanian_discovering_2016; liu_multi-dimensional_2017). This setup is much more accurate (since there is no information loss) and it is more flexible (since the search for co-occurrences can be done with an allowance for asynchronous motifs and the rejection of uninformative dimensions). However, this setup is more computationally expensive.

Instead of working on a two step approach, other authors search for the multidimensional motifs directly by concatenating all the dimensions. For instance, minnen_detecting_2007 compute the sax representation for each dimension and concatenate their strings, while in (yeh_matrix_2017) the authors define a new Matrix Profile, the k-dimensional MP, that encodes the distance to each sequence’s closest neighbor, taking into account k dimensions.

When the goal is to analyze maneuvers, one needs to be able to extract variable-length motifs. In other words, because the same maneuver does not always take the exact same time, it is important to have flexible methods that can extract motifs of different lengths. Even though fixed-length motifs have been the most explored so far, there are some algorithms for the variable-length case. Most authors propose to apply a fixed-length algorithm in a range of window sizes and then choose the most representative motifs based on their ranking scheme. The work of nunthanid_parameter-free_2012 and gao_exploring_2018 are two examples of this approach. Note, however, that this approach does not work for maneuvers since we cannot extract in the same motif sequences of different sizes. Lin’s grammar-based method (lin_finding_2010) and Tanaka’s EMD algorithm (tanaka_discovery_2005) take a different approach. They adapt the sequences’ representation in the low-dimensional space in order to take into consideration variable-length patterns. However, this adaption leads to algorithms that are not exact, which means that there is no guarantee that the method can find all the variable-length motifs in a given time-series.

3 TripMD

TripMD is a system for extracting and analyzing maneuvers from trips performed by a single driver. To achieve this, we needed algorithms that worked on multi-dimensional time-series and, at the same time, were able to extract and analyze variable-length patterns. Having this in mind, our solution has the following two components:

  1. A motif extraction algorithm inspired by the algorithm created by tanaka_discovery_2005, which was tailored and tuned for the maneuver detection use-case. It includes a discrete variable-length representation (variable SAX) based on the widely used SAX (lin_experiencing_2007)

    and an iterative pattern matching process that extracts motifs in multiple dimensions.

  2. A motif clustering and visualization tool based on the Self-Organizing Map model that extracts the most relevant motif patterns and permits the user to quickly analyze them..

In our motif extraction algorithm, we use the support-based definition. In other words, for a certain variable SAX (VSAX) pattern, its motif is the biggest group of non-overlapping variable-length subsequences with that VSAX representation and in which all the subsequences have a distance lower than a predefined radius to the motif’s center. Because we are working with variable-length motifs, we use the Dynamic Time Warping (DTW) distance (berndt_using_1994) to measure similarity between two multi-dimensional subsequences.

In the following subsections, we’ll go through each component in more detail.

3.1 Motif extraction

3.1.1 Variable SAX

Variable SAX (VSAX) is a time-series discretization method that transforms a time-series into a sequence of symbols that captures the general behavior of the original time-series. It serves two main purposes. Firstly, by providing a discretization of the time-series, it allows for a more efficient motif search and, at the same time, reduces the impact of small levels of noise. Secondly, it is capable of splitting the time-series into subsequences of variable lengths depending on the underlying behavior of the time-series, which allows a simple pattern matching algorithm to find variable-length motifs.

Figure 1: Simple example of the Variable SAX representation process.

Figure 1 illustrates the main steps using a one-dimensional time-series. Initially, the time-series is split into fixed-length sliding windows. The length of the window is one of the parameters of VSAX, the default letter size. Then, the values in each sliding window are averaged to obtain a discrete value for that window, which in turn is converted to a symbol based on a predefined segmentation of the time-series domain. After obtaining the fixed-sized sequence of symbols, the pruning phase concatenates all consecutive sliding windows with the same symbol into a single window. The rationale is that if two consecutive windows have similar behaviors (which translates into being transformed into the same symbol), then they should be a single window and be considered together when searching for motifs. Thus, in the end, we have a sequence of symbols that map to variable-length subsequences of the original time-series and that encode information about the general behavior of the subsequences.

The segmentation of the time-series domain is similar to the way it is done in SAX (lin_experiencing_2007)

. Break-points are determined based on the time-series’ values and these break-points define regions in the time-series domain that map to specific symbols. In the SAX representation, break-points are defined so that all regions have equal probability under a Gaussian distribution. However, VSAX uses specific percentiles of the time-series’ values to define five regions, namely the 5th, 15th, 85th and 95th percentiles. In general, a driver spends less time performing maneuvers than he does not performing any maneuver and, thus, defining break-points that evenly distribute time-series’ values among the regions does not lead to good results in the maneuver detection task. Additionally, since the percentiles are computed over all the trips, any two windows with the same symbol will be guaranteed to be in the same domain region. This is another change compared to SAX, where the break-points are computed independently for each window.

Finally, for multi-dimensional time-series, VSAX can be applied separately to each one-dimensional time-series and then concatenate the resulting symbols in a tuple. For instance, a subsequence of a two-dimensional time-series would be mapped to a tuple of two symbols, one for each dimension. Note however that in the multi-dimensional case, the pruning phase is applied in all the dimensions at the same time. In other words, two consecutive subsequences are only merged if they have the same symbols in all the dimensions. Thus, in this case, each VSAX symbol corresponds to a single variable-length multi-dimensional subsequence of the original time-series.

3.1.2 Motif search

The motif search is an iterative process that extracts the motifs of all possible sizes from a VSAX sequence. It was inspired by the motif detection algorithm proposed by tanaka_discovery_2005. At each iteration, it discovers all the motifs with a certain number of VSAX symbols (the pattern size) and then moves to next iteration by increasing the pattern size by one. The minimum pattern size is a parameter of the method and the iteration stops when no more motifs with the current pattern size can be found.

Figure 2: Simple example of the motif search process for a single pattern word BC.

Given a certain pattern size, the motif search applies three steps, which are summarized in Figure 2 with a concrete example. Firstly, the VSAX sequence is split into a list of pattern words of the given size. Then, for each unique pattern word, all the subsequences with that same pattern are extracted to make the pool of the motif’s members candidates. Finally, if it exists, the motif related to those subsequences is computed and added to the list of motifs. Recall that a set of candidate subsequences can only be members of a motif under two conditions:

  • All the pairs of candidate subsequences do not overlap in time. This is to avoid the trivial matchings discussed in the preliminaries section.

  • All candidate subsequences are within a predefined radius of the motif’s center.

Thus, for each candidate subsequent, all the non-overlapping candidates that are within from the initial candidate are extracted and stored. The final motif is defined to be the set of subsequences with the most members and the motif’s center is the original candidate that generated that set of members. If no set of more than one members is found, then the motif for that specific pattern does not exist.

3.2 Motif summarization

In a previous work (silva_exploring_2020), we proposed a new dimensionality reduction method to summarize and explore the outputs of any motif detection algorithm. The method, called DTW-SOM, is a vanilla Self-Organizing Map (kohonen_self-organizing_2001) with some adaptions to work with time-series motifs. It receives a list of variable-length multi-dimensional motifs and produces a clustering of the motifs’ centers and a visualization of the results that is space-efficient.

TripMD leverages the DTW-SOM algorithm to group all the motifs found by the motif search process and to provide a visual summary of the most relevant motifs in the trips under analysis. By summarizing the extracted motifs, the user is able quick analyse the main patterns that ere extracted and can better interpret the maneuvers being performed.

However, there is an important step before applying this method. DTW-SOM includes two initialization routines, a random initialization, in which the DTW-SOM network is initialized with a random sample of the motifs, and an anchor initialization, in which the user provides a smaller set of the most relevant motifs. Since previous experiments indicated that the anchor initialization was more stable than the random initialization (silva_exploring_2020), TripMD uses the anchor initialization and thus it has to include a motif pruning step that computes the most relevant motifs, which are used as the anchors.

The pruning routine used in TripMD is based on the definition of -motifs and the non-overlapping requirement discussed in Section 2. Given a natural ordering of all the extracted motifs, the -motif is the highest ranking motif whose center has a distance higher than to each the -motifs’ centers, for . In this case, the ordering is defined with the MDL cost proposed by tanaka_discovery_2005. This score is based on the Minimum Description Length (MDL) principle (rissanen_stochastic_1998) and the lowest the score, the more relevant the motif is. After pruning, the most relevant motifs are used to initialize the DTW-SOM and all the motifs are fed into the algorithm.

TripMD also imposes a constrain on the distance computation of DTW-SOM. When the DTW distance is searching for an optimal match between two subsequences, TripMD limits the maximum time wrapping allowed, which leads to higher distances between subsequences with a misalignment higher than this threshold.

3.3 Parameter estimation

So far, TripMD seems to have some parameters that a user needs to set beforehand. VSAX has the default letter size, the motif search has the radius

to define the motifs and the minimum pattern size that initializes the search and DTW-SOM has the number of training epochs and the maximum warping window. However, tuning all these parameters requires some time and expert knowledge, which is not user-friendly. Therefore, based on our experience with a real naturalistic driving dataset, we set some sensible default values for these parameters:

  • default letter size: 1 second

  • minimum pattern size: 3 VSAX letters

  • Motif radius: 0.5th percentile of the distance between all pairs of 3 second subsequences

  • number of epochs for DTW-SOM: 20

  • maximum warping for DTW-SOM: VSAX’s default letter size (or 1 second)

Thus, the only parameters that the user must provide is the frequency in Hertz of the time-series provided as input, which is trivial. Then, TripMD estimates all the remaining parameters. However, if the user has some particularity in his dataset that makes the default parameters unreasonable, there’s always possibility of overriding the defaults provided by the TripMD.

4 Evaluation and discussion

To evaluate TripMD, we use the UAH-DriveSet (romera_need_2016), a publicly available naturalistic driving dataset including recorded trips from six different drivers that traveled in two specific routes in Madrid, Spain. The authors asked the volunteers to drive in these two routes mimicking three different driving behaviors - normal, aggressive and drowsy. Using their DriveSafe app (bergasa_drivesafe:_2014; romera_real-time_2015), the authors collected raw data from the accelerometer, GPS and camera of a smartphone mounted in the car and processed these signals to enrich the final dataset.

In the first experiment, we pick a single driver and explore in detail the outputs obtained from TripMD. Particularly, we do an exploratory analysis of the motifs extracted by TripMD and showcase the visualizations provided by our system.

In the second experiment, we focus on the task of identifying driving behaviors. We apply TripMD to the entire UAH-DriveSet. Then, using the known driving behaviors of all but one driver, we assign behavior scores to each motif cluster. Finally, we use those cluster behavior scores and the motifs extracted from the left-out driver to predict the behavior of each of that driver’s trip.

In both experiments, we use the two-dimensional time-series of the lateral and longitudinal acceleration recordings. The recordings are already aligned with the correct car axis and denoised with a Kalman filter, which means we can use them directly. The data has a frequency of 10Hz, however, in order to speed computation and further reduce noise, we down-sample the time-series to a 5HZ frequency. Additionally, we use all the default parameters for TripMD as we found that they work well for this dataset.

The code to reproduce all the experiments can be consulted in our repository 111

4.1 Analyzing a single driver with TripMD

To showcase how our system can be used to explore the driving behavior of a single person, we run TripMD on the seven trips performed by one of the drivers in the UAH-DriveSet. This driver completed four trips in the secondary road route (two normal, one aggressive and one drowsy) and three trips in the motorway route (one for each driving behavior).

The motif detection component found 281 motifs and, from these, 17 motifs were used to initialized the DTW-SOM model. In other words, the trips contained 17 significantly distinct driving patterns and all the other 264 motifs can be assigned to one of these patterns. DTW-SOM builds an optimal assignment and provides a visualization of the clusters in a two-dimensional grid (or network) that conserves the local similarity of the data. This means that two neighboring cluster in the two-dimensional network are similar.

Figure 3: Lateral and longitudinal acceleration of the DTW-SOM’s units. They were obtained by applying TripMD to the trips of a single driver from the UAH-DriveSet. Units are placed in the DTW-SOM two-dimensional grid.

Figure 3 shows the first visualizations provided by TripMD for the driver. It contains the lateral and longitudinal acceleration of each DTW-SOM unit, placed in the two-dimensional network. A unit here is a multi-dimensional subsequence that represents the cluster in a particular part of the DTW-SOM grid and thus this plot provides a summary of the main driving patterns extracted from the driver.

From this first chart, we can already see that TripMD is able of identifying a rich set of driving patterns, with lengths ranging from 2 to 3 seconds. It includes simple maneuvers, such as unit 22 that relates with a simple left turn without changes in longitudinal acceleration, and more complex maneuvers, for instance, unit 0 that corresponds to a right turn with acceleration. Additionally, the grid maintains some structure similarity. As an example, the neighboring units 15, 16, 20 and 21 all have similar driving patterns, with a clear brake maneuver a slightly positive lateral acceleration.

Figure 4 contains the two additional visualizations provided by TripMD for the driver. These plots are classical ways of visualizing a SOM network and represent different information about each of the clusters arranged in the two-dimensional network. In both charts, the arrangement of the units in the two-dimensional grid is consistent to Figure 3.

Figure 4: U-matrix and Winner Matrix of the DTW-SOM. The model was trained on the motifs extracted from the trips of a single driver from the UAH-DriveSet.

The first chart is called U-Matrix and it shows how similar each unit is to its direct neighbors in the two-dimensional network, where the brighter the color, the closer a unit is to its neighbor. This visualization is helpful to understand where are the major groups of clusters within the network. For instance, the upper-right corner has a clearly defined groups of four cluster that are very similar, which corresponds to the units 15, 16, 20 and 21 with the brake maneuver discussed above.

The second chart is called Winner Matrix and it provides information about the cluster size of each unit. Particularly, it displays the exact number of motifs in each cluster on top of corresponding unit. This plot can be used to gauge how relevant each driving pattern is. For instance, unit 24 in the lower right corner contains no motifs, which means that this pattern is not needed to summarize the driver’s behavior.

Besides these default TripMD plots, Figure 5 contains information about the distribution of each driving behavior in the clusters extracted by our system. For each driving behavior, we compute the number of motif subsequences from the trips with that behavior that belong to each DTW-SOM cluster. Then, we divide each cluster count by the total number of motif subsequences for all the driver’s trips that belong to that cluster to achieve the rate presented in the plots. So, for instance, 80% of the motif subsequences associated to motifs that belong to the cluster 4 come from trips with a drowsy behavior.

Figure 5: Driving behavior rates for each DTW-SOM unit. For each unit, it shows the percentage of motif subsequences that come from each driving behavior.

Interestingly, we can see that the three driving behaviors have very different distributions of their motif subsequences among the clusters. Clusters 11 and 17 have a clear majority of subsequences from normal trips and these clusters relate to a ”no maneuver” pattern and a soft brake, respectively.

The aggressive trips cover a higher variety of patterns, with 7 clusters showing a clear majority of subsequences from these trips. This increase in representation is expected as more motif subsequences will be extracted from trips where the driver performs more maneuvers. Most of these 7 clusters contain sharp acceleration patterns, which is usually associated with aggressive driving. Examples are the right turn with a pronounced brake in unit 20, the brake-acceleration pattern in unit 6 and the quick acceleration in unit 1. These sharp acceleration maneuvers without lateral movements are specially telling as they are associated with tailgating behavior, which in turn is a classical aggressive driving behavior.

Finally, the clusters with higher rates of subsequences coming from drowsy trips are located in the lower row of the DTW-SOM grid, excluding unit 24. Units 14 and 18 contain a drift pattern, which is made of two consecutive lateral movements in opposing sides. It is very interesting to see these patterns here as they are usually present in cases where a tired driver lets the car deviate from a lane and then quickly recovers with a sharp turn.

4.2 Identifying driving behavior with TripMD

From the first experiment, we see that TripMD can summarize the trips from a single driver so that different driving behaviors can be identified. However, to further test our system, we focus on a harder task, namely, identifying the driving behavior of an unknown driver from a set of drivers whose behavior we know.

To accomplish this, we apply TripMD to the entire UAH-DriveSet and retrieve the main driving patterns of all those trips. Then, using the known driving behavior of five drivers (the train drivers), we derive scores for all the trips of the remaining driver (the test driver). This test driver was the same used in the first experiment. For each trip of the test driver, a single score is computed for each of the three behaviors - normal, aggressive and drowsy. To compute the score for a specific test trip and a given behavior , we use the following process:

  1. For each DWT-SOM cluster :

    1. Compute the rate , where is the number of subsequences in cluster that belong to train trips of the behavior and is the total number of subsequences in cluster that belong to train trips.

    2. Compute , which is the number of motif subsequences in cluster that belong to test trip.

    3. Derive the behavior score of cluster as .

  2. Compute the trip’s behavior score as , where k is the number of DWT-SOM clusters.

After computing the three behavior scores for a test trip, its predicted behavior is simply the behavior with the highest score. Finally, we compare the predicted behavior of each test driver’s trips with the real behavior performed in those trips. Table 1 summarizes the results.

Route Behavior
Motorway Normal 94.1 78.3 95.5
Secondary Normal 35.5 41.1 43.3
Secondary Normal 47.0 35.9 48.1
Motorway Aggressive 103.1 72.6 96.3
Secondary Aggressive 91.0 52.2 77.8
Motorway Drowsy 97.5 137.9 120.6
Secondary Drowsy 69.7 81.9 82.4
Table 1: Behavior scores for each test trip. The score of the predicted behavior for each trip is highlighted in bold. The column named Behavior contains the real behavior of the trip.

The test driver contains seven trips and, from these, TripMD is capable of assigning the correct behavior to all but one trip. The trip where the predicted behavior does not match the real behavior is the drowsy trip of the secondary road. However, looking at the scores, we see that the normal behavior score (which is the predicted behavior) is very close to the drowsy behavior score (which is the real behavior).

Figure 6: Behavior rates for the DTW-SOM units. For each unit, it shows the percentage of motif subsequences that come from each driving behavior. The three plots at the top contain the rates of the train trips while the three plots at the bottom contain the rates of the test trips.

To further illustrate these results, Figure 6 shows the behavior rates of the DTW-SOM clusters for the train drivers and the test driver. Here we can observe that the behavior distributions of the train drivers are close to behavior distributions of the test driver. For instance, units 14 has a high normal behavior rate in both sets of trips, units 13, 25 and 32 have similarly high aggressive behavior rates and units 2, 8 and 21 have high drowsy behavior rates for both the test and train drivers.

5 Conclusion

In this paper, we propose a system called TripMD, which identifies the main driving patterns from sensor recordings such as acceleration and velocity. The system is made of two different components. The first is a motif detection algorithm tailored for the task of maneuver detection and it extracts all the variable-length patterns from the original trip recordings. The second component is a motif clustering and visualization model that summarizes the motifs discovered from the first component so that the user can quickly understand the mains driving patterns performed in the original trips.

Compared to previous work, our system not only extracts the time-series patterns present in trips recordings but is also capable of summarizing those patterns in a space-efficient visualization that allows for an easy investigation. This feature is highlighted in our first experiment, where we demonstrate that TripMD can discover a wide range of driving patterns from the trips performed by a single driver of the UAH-DriveSet dataset. We also conclude that the three driving behaviors marked in the dataset (normal, aggressive and drowsy) have distinct distributions among the extracted driving patterns. For instance, the aggressive behavior was more prevalent in sharp acceleration patterns and the drowsy behavior was more frequent in two patterns very similar to a drift maneuver. This further supports that usefulness and consistency of the driving patterns provided by TripMD.

We also show that TripMD can determine the driving behavior of a driver from the behaviors of other drivers. Particularly, we (1) apply our system to all the trips in the UAH-DriveSet (which contain six drivers), (2) compute behavior rates for each driving pattern extracted by TripMD using the known behavior of five drivers and (3) derive behavior scores for the remaining driver, which are used to predict his behavior. From the seven trips performed by the test driver, we identify correctly the driving behavior of six trips.

Even though the results seem promising, there are still areas of improvement. Firstly, because of the Variable SAX discretization, the motif detection algorithm used in TripMD is not an exact algorithm. This means that we cannot guarantee to extract all the variable-length motifs. There are some new motif detection algorithms that claim to be exact, however, we could not find one that was capable of extracting motifs with member’s subsequences of difference sizes. Thus, investigating exact motif detection algorithms that work with variable-length motifs could be an interesting line for improvement.

Additionally, we should further test TripMD with more datasets and different tasks, such as understanding whether TripMD can be used to distinguish between drivers with prior accidents from drivers without accidents and to inform car insurance pricing models. Testing TripMD with different dataset would also be helpful to further validate and fine-tune the default parameters of the system.


This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.