# Fast Exact Dynamic Time Warping on Run-Length Encoded Time Series

Dynamic Time Warping (DTW) is a well-known similarity measure for time series. The standard dynamic programming approach to compute the dtw-distance of two length-n time series, however, requires O(n^2) time, which is often too slow in applications. Therefore, many heuristics have been proposed to speed up the dtw computation. These are often based on approximating or bounding the true dtw-distance or considering special inputs (e.g. binary or piecewise constant time series). In this paper, we present a fast and exact algorithm to compute the dtw-distance of two run-length encoded time series. This might be used for fast and accurate indexing and classification of time series in combination with preprocessing techniques such as piecewise aggregate approximation (PAA).

## Authors

• 17 publications
• 10 publications
01/04/2021

### Binary Dynamic Time Warping in Linear Time

Dynamic time warping distance (DTW) is a widely used distance measure be...
02/11/2020

### Exact Indexing of Time Series under Dynamic Time Warping

Dynamic time warping (DTW) is a robust similarity measure of time series...
03/09/2022

### Computing Continuous Dynamic Time Warping of Time Series in Polynomial Time

Dynamic Time Warping is arguably the most popular similarity measure for...
05/19/2020

### A reduction of the dynamic time warping distance to the longest increasing subsequence length

The similarity between a pair of time series, i.e., sequences of indexed...
07/01/2011

### The Influence of Global Constraints on Similarity Measures for Time-Series Databases

A time series consists of a series of values or events obtained over rep...
05/01/2020

### Integrated Time Series Summarization and Prediction Algorithm and its Application to COVID-19 Data Mining

This paper proposes a simple method to extract from a set of multiple re...
01/26/2019

### Discovery of Important Subsequences in Electrocardiogram Beats Using the Nearest Neighbour Algorithm

The classification of time series data is a well-studied problem with nu...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Time series data is ubiquitous appearing in essentially all scientific domains. Comparing time series requires a measure to determine the similarity of two time series. Dynamic Time Warping (DTW) [18] is an established method which is used in numerous time series mining applications [21, 3, 4, 1].

The quadratic time complexity, however, is considered to be a major drawback of DTW on very long time series even in optimized nearest neighbor search applications that apply sophisticated pruning and lower-bounding techniques [20]. Note that in general there is not much hope to find strongly subquadratic algorithms since it has been shown that DTW cannot be solved in  time for any  on time series over a constant-size alphabet (based on a complexity-theoretic assumption) [2, 5]. Long time series of length occur, for example, when measuring electrical power of household appliances with a sampling rate of a few seconds collected over several months, twitter activity data sampled in milliseconds, and human activities inferred from a smart home environment [17]. All these time series have in common that they contain long constant segments.

Recently, several specialized algorithms have been devised to cope with long time series that contain constant segments [8, 17, 12, 19, 13, 14]. The basic idea of these algorithms is to exploit the repetitions of values within a time series to speed up computation of the DTW distance. We briefly summarize the respective algorithms (see also Table 1).

• Binary DTW (BinDTW) [2]: This algorithm computes exact DTW distances on binary length- time series in  time. It has not been implemented and tested in practice.

• AWarp [17]: This algorithm is exact for binary time series (a formal proof is missing) and exploits repetitions of zeros. The running time is , where  and  are the numbers of non-zero entries in the two input time series.

• Sparse DTW (SDTW) [12]: This algorithm yields exact DTW distances for arbitrary time series in  time, where  and  are the numbers of non-zero entries in the two input series (assuming both have length ).

• Binary Sparse DTW (BSDTW) [14]: This algorithm computes exact DTW distances between two binary time series in  time, where  and  are the numbers of non-zero entries in the two input time series. In practice it is often faster than AWarp.

• Blocked DTW (BDTW) [19] (earlier introduced as Coarse-DTW [8]): This algorithm operates on run-length encoded time series. The run-length encoding represents a run of identical values (constant segment) by storing only a single value together with the length of the run. BDTW is exact on binary time series (a formal proof is missing). The running time is , where  and  are the numbers of runs in the two input time series (note that ). BDTW is faster than AWarp in practice.

Clearly, a practical limitation of BinDTW and BSDTW is that they are only applicable for binary time series. AWarp and BDTW are limited in that they only yield exact DTW distances for binary time series.

### Our Contributions.

We develop an improved algorithm that computes exact DTW distances for arbitrary time series in run-length encoding. The running time for two time series  and  of length  and  is , where  and  are the number of runs in  and  and  is a number depending on the individual lengths of runs (see Section 3 for details). Note that this yields a constant-time algorithm for time series with . In general, the running time is asymptotically faster than  if and . If , then the running time is faster even for . Notably, if all runs in at least one input time series have the same length, then our algorithm runs in  time. If both time series contain only runs of the same fixed length, then our algorithm even runs in  time and is in fact equivalent to BDTW.

## 2 Preliminaries

We give some preliminary definitions and introduce notation.

### Notation.

Let  and . An table  consists of  rows and  columns, where  denotes the entry in the -th row and -th column.

### Time Series.

A time series is an ordered finite sequence  of rationals. The run-length encoding of a time series  is the sequence  of pairs  where  is a positive integer denoting the number of consecutive repetitions (run length) of the value  in . Note that . We call  the length of  and we call  the coding length of .

### Dynamic Time Warping.

The dynamic time warping distance [18] is a distance measure between time series using non-linear alignments which are defined via the concept of a warping path.

###### Definition 1.

A warping path of order  is a sequence , , of index pairs , , such that

1. ,

2. , and

3. for each .

The set of all warping paths of order  is denoted by . A warping path  defines an alignment between two time series  and  in the following way: A pair  aligns element  with  incurring a local cost of . The DTW distance between  and  is defined as

 dtw(x,y):=minp∈Pm,n√∑(i,j)∈p(xi−yj)2.

It can be computed via dynamic programming in  time based on an table [18].

## 3 The Algorithm

In the following, let  and  be two time series with run-length encodings  and . We define , for and , for . Consider the  DTW matrix , where . Note that  can be structured into blocks , where each step inside  has local cost . The right boundary of  corresponds to column  of  and the top boundary is formed by row  of  (see Figure 1).

We show that it is sufficient to compute only certain entries on the boundaries of blocks instead of all entries in . To this end, we analyze the structure of optimal warping paths. We begin with the following simple observation.

###### Observation 3.1.

There exists an optimal warping path  such that the following holds for every block : If  moves through , then  first moves diagonally through  until it reaches a boundary of .

This is true since every step inside a block costs the same. Hence, it is optimal to maximize the number of diagonal steps (which minimizes the overall number of steps to reach a boundary of a block). creftypecap 3.1 implies that there exists an optimal warping path which is an alternation of diagonal and horizontal (or vertical) subpaths where the horizontal (vertical) subpaths are always on top (right) boundaries of blocks.

Now, we restrict the possible diagonals along which such an alternating optimal warping path might move. To this end, let , , denote the diagonal in  going through the upper right corner of block  (that is, through the entry ) and let  be the diagonal (corresponding to ) going through . We denote the set of all these block diagonals by  (see Figure 1). Now, our key lemma states that there always exists an optimal warping path which only moves along block boundaries and block diagonals.

###### Lemma 3.2.

There exists an optimal warping path that only moves along diagonals in  and boundaries of blocks.

###### Proof.

By definition, every warping path initially starts in  on the diagonal . Let  be an optimal warping path which alternates between diagonals and block boundaries as described above. Assume that  does not only move along diagonals in . Then, by assumption, leaves some diagonal  on a boundary (wlog horizontally on the top boundary ) of a block  and (diagonally) enters the neighboring block  before the next intersection of a diagonal  with . It then proceeds diagonally in between  and  until reaching some block boundary where it moves horizontally or vertically again. Note that  has to move horizontally or vertically again at some point since it has to reach a diagonal in  again (this holds because every warping path eventually ends up on ). Assume that  moves diagonally only until reaching the top boundary  of a block , , , where moves horizontally (analogous arguments apply if  moves vertically on a right boundary of a block in between  and ). See Figure 2 for an example. Observe that a warping path can only enter blocks from bottom (that is, from the top boundary of the block below) or left (that is, from the right boundary of the block to the left) and exit blocks from top or right boundaries.

Let  denote the number of horizontal steps of  on  and let  be the number of horizontal steps on . Let  denote the diagonal subpath of  from  to . Now, consider the warping path  obtained from  by “shifting” to the right, that is, takes  horizontal steps on  and only  horizontal steps on . Let  be the shifted diagonal subpath and note that  crosses a subset of the blocks crossed by . This is true since there cannot be an upper right corner of any block anywhere in the region between  and  (since they are neighboring diagonals from ).

Let us now consider the number of steps taken by  within each block from  to . Clearly, takes one more step inside  than . Regarding , if  enters  from bottom, then takes one step less inside . Otherwise, if  enters  from the left, then  takes the same number of steps inside  as . For every block  in between  and  which is crossed by , the following holds:

• If  crosses  from left to top, then  takes one more step.

• If  crosses  from bottom to right, then  takes one step less.

• If  crosses  from bottom to top (or from left to right), then  takes the same number of steps.

The above holds since  cannot pass through an upper right corner of a block in between  and . Note that the number of steps taken by  and  through any block differs by at most one.

Now, let  be the set of blocks where  takes more steps than  and let  be the set of blocks where  takes more steps than . Let  and . Then, the cost difference between  and  is . By optimality of , we have , that is, .

If , then also  is an optimal warping path. Thus, by analogous arguments, shifting times to the right yields an optimal warping path that does not move horizontally on  anymore. If this warping path now already moves diagonally along  (as it would be the case in Figure 2 when shifting four times to the right), then this proves the claim. If this is not case, then analogous arguments apply again for the next occurrence of a horizontal (or vertical) subpath in between  and . This finally yields an optimal warping path moving along  (or ) proving the claim.

If , then we can analogously shift  to the left to obtain a warping path . Clearly, the blocks where  takes one more step than  are exactly the blocks , and the blocks where  takes one more step than  are exactly the blocks . Hence, the cost difference between  and  is also , which contradicts the optimality of . ∎

From Lemma 3.2, it follows that , that is, , can be computed from only those entries in  which are an intersection of a block boundary and a block diagonal in  (in Figure 1 these intersections are framed in bold). Let  denote the number of these intersections and note that .

In order to compute the values of  at all intersections via dynamic programming in  time, we need to compute their coordinates and store them in a sorted way on each block boundary (in order to allow constant-time lookups). The following lemma accomplishes this task.

algocf[t]

###### Lemma 3.3.

The intersections of block diagonals with block boundaries can be computed (sorted on each boundary) in time, where  is the number of these intersections.

###### Proof.

We first determine the ordering (from top to bottom) of all diagonals in  in terms of their row index at boundary . Note that this ordering is the same on all right boundaries . Moreover, this ordering also orders the diagonals on all top boundaries  from left to right.

To start with, observe that the row index of  at column  is . Note that  is possible, in which case there is no intersection between  and . We need to sort the numbers for all . Clearly, for each , we have for all , and for each , we have for all . That is, we need to sort sorted sequences of length  (or alternatively, sorted sequences of length ). This can be done via -way merging in  time [7]. We can then easily insert the diagonal  and remove duplicate diagonals.

To compute all intersections, we now iterate over all diagonals in the ordering determined above. This ensures that we obtain all intersections already sorted on each boundary. For diagonal , the row index of the intersection with boundary  is . Clearly, if , then no intersection exists in . Analogously, the column index of the intersection of  with boundary  is . Again, if , then there is no intersection in . Thus, we can compute all intersections in constant time each (see also LABEL:alg:intersections). ∎

An interesting question at this point is whether the ordering of the diagonals can be computed in  time. The overall running time in Lemma 3.3 would then simply be . Recall that the problem is to sort all pairs  with respect to their sum. This problem is known as “X + Y Sorting” [11, 10] (a special case of sorting under partial information) and it is in fact open whether it can be solved faster than sorting arbitrary numbers [16]. However, if at least one input time series contains only runs of equal length (which is the case, for example, when using PAA as preprocessing), then sorting can be done faster.

###### Lemma 3.4.

Let  and  be integers such that  for all  and some . Then, the tuples can be sorted with respect to the sum  in  time.

###### Proof.

We sort the tuples in increasing order of their sums and write  if . Clearly, if  and , then this implies . Hence, the first tuple is . Moreover, note that since holds for all , the following holds for all  and : If , then .

Starting with , we sort the tuples by iteratively determining the next tuple in constant time each. Here, we use the above properties regarding the partial order of tuples to show that at each point in the iteration, there are at most two candidate tuples (computable in constant time) which we need to compare. This yields a linear-time algorithm.

Assume that we already sorted all the tuples  with , where . Let  and note that . For each , let  and note that . Now, let . Then, the next tuple can either be  (if ) or one of the tuples (if  exists). More precisely, it can only be the tuple , for which holds for all . We can store this candidate tuple and update it in every iteration in constant time using a queue (see LABEL:alg:sorting). ∎

algocf[t]

From Lemmas 3.4 and 3.3, we obtain the following corollary.

###### Corollary 3.5.

If all top boundaries (or all right boundaries) of blocks have equal length, then the intersections of block diagonals with block boundaries can be computed (sorted on each boundary) in time.

We are now ready to prove our main result.

###### Theorem 3.6.

The DTW distance between time series  and  can be computed from  and  in  time, where  is the number of intersections between block boundaries and block diagonals in the DTW matrix.

###### Proof.

By Lemma 3.2, it is sufficient to compute the values of the DTW matrix  at intersections of block boundaries and block diagonals. Given the intersection points (computable in  time by Lemma 3.3), their values can be computed block by block, for example from left to right and bottom to top (a warping path can reach a block  only via boundaries of , or ). For each block, we compute the entries on its right boundary from bottom to top, and the entries on its top boundary from left to right.

We start with block . By creftypecap 3.1, we can assume that an optimal warping path moves along the diagonal  until reaching a boundary of . This allows us to initialize as follows: If , then, for each intersection  with , we set . Moreover, we set the values of all intersections on  which are below  to  since our optimal warping path will not use these. We do the same for all intersections on  which are to the left of . Analogously, if , then, for each intersection  with , we set and all intersections left of  and below  to .

To compute the value of an intersection , , on the right boundary of a block  (for or ), one only has to consider two options. By Lemma 3.2, an optimal warping path reaches  either diagonally (via  if or via  otherwise) or from below on . Let , , be the next intersection below on  (if there is none, then set ). Then, the following holds:

 D[ai−1+s,bj]=min(d,D[ai−1+l,bj]+(s−l)⋅ci,j),

where

 d={D[ai−1,bj−s]+s⋅ci,jif s≤njD[ai−1+s−nj,bj−1]+nj⋅ci,jif s>nj

and for all .

The computation of values on the top boundary of a block is completely analogous (see LABEL:alg:DTW). Clearly, for the upper right corner of a block there are three corresponding options since it lies on both boundaries. Hence, computing the values of the  intersections takes  time. ∎

algocf[t]

From Corollary 3.5, we obtain the following special case.

###### Corollary 3.7.

If at least one of the time series  or  contains only runs of equal length, then the DTW distance between  and  can be computed from  and  in  time, where  is the number of intersections between block boundaries and block diagonals in the DTW matrix.

If all block sizes are equal, that is, , then since the intersections are exactly the upper right block corners. In this case, Lemma 3.2 implies that can be computed in  time.

###### Corollary 3.8.

Let and  be two time series with and , where . Then, can be computed in  time.

Note that in this special case the following holds: If an optimal warping path moves through a block , then wlog it takes exactly  steps through . Note further that the algorithm Blocked_DTW_UB [19, Algorithm 1] (and also Coarse-DTW [8, Algorithm 2] with ) uses the value  (which equals  and  in this case) for the cost of crossing block . That is, we proved that Blocked_DTW_UB (Coarse-DTW) and clearly also Blocked_DTW_LB [19, Algorithm 2] (which uses ) are exact if all blocks are squares.

As regards the value of , a tight upper bound in general is . However, this bound is of course only attained in the worst case. Especially for larger values of  and  (in comparison to  and ), might be smaller in practice since not every block diagonal will intersect every boundary (depending on the specific block sizes). Also, some block diagonals might even be identical (if square blocks appear).

As a final remark, we mention that in practice the computation of intersections can be done only once if all time series in a data set have identical block sizes.

## 4 Conclusion

We presented a fast algorithm to compute exact DTW distances between run-length encoded time series. Our method might yield improved performance in practice, especially when combined with dimension reduction such as piecewise aggregate approximation [9, 22, 15, 6]. Empirical evaluation of our algorithm in experiments and comparison with other methods (Table 1) is planned as future work. It is also an interesting question whether the running time can be improved to  (or even better) in general.

## References

• Abanda et al. [2018] A. Abanda, U. Mori, and J. A. Lozano. A review on distance based time series classification. Data Mining and Knowledge Discovery, pages 1–35, 2018.
• Abboud et al. [2015] A. Abboud, A. Backurs, and V. V. Williams. Tight hardness results for LCS and other sequence similarity measures. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS ’15), pages 59–78, 2015.
• Aghabozorgi et al. [2015] S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah. Time-series clustering–a decade review. Information Systems, 53:16–38, 2015.
• Bagnall et al. [2017] A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 31(3):606–660, 2017.
• Bringmann and Künnemann [2015] K. Bringmann and M. Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS ’15), pages 79–97, 2015.
• Chakrabarti et al. [2002] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 27(2):188–228, 2002.
• Cormen et al. [2001] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001.
• Dupont and Marteau [2016] M. Dupont and P.-F. Marteau. Coarse-DTW for sparse time series alignment. In First ECML PKDD Workshop on Advanced Analysis and Learning on Temporal Data (AALTD ’15), pages 157–172, 2016.
• Faloutsos et al. [1997] C. Faloutsos, H. Jagadish, A. Mendelzon, and T. Milo. A signature technique for similarity-based queries. In Proceedings of the Compression and Complexity of Sequences 1997 (SEQUENCES ’97), pages 11–13. IEEE, 1997.
• Fredman [1976] M. L. Fredman. How good is the information theory bound in sorting? Theoretical Computer Science, 1(4):355–361, 1976.
• Harper et al. [1975] L. H. Harper, T. H. Payne, J. E. Savage, and E. Straus. Sorting X + Y. Communications of the ACM, 18(6):347–349, 1975.
• Hwang and Gelfand [2017] Y. Hwang and S. B. Gelfand. Sparse dynamic time warping. In

Proceedings of the 13th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM ’17)

, pages 163–175, 2017.
• Hwang and Gelfand [2018] Y. Hwang and S. B. Gelfand. Constrained sparse dynamic time warping. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA ’18), pages 216–222, 2018.
• Hwang and Gelfand [2019] Y. Hwang and S. B. Gelfand. Binary sparse dynamic time warping. In Proceedings of the 15th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM ’19), 2019.
• Keogh et al. [2001] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263–286, 2001.
• Lambert [1992] J.-L. Lambert. Sorting the sums in comparisons. Theoretical Computer Science, 103(1):137–141, 1992.
• Mueen et al. [2016] A. Mueen, N. Chavoshi, N. Abu-El-Rub, H. Hamooni, and A. Minnich. AWarp: Fast warping distance for sparse time series. In 2016 IEEE 16th International Conference on Data Mining (ICDM ’16), pages 350–359, 2016.
• Sakoe and Chiba [1978] H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1):43–49, 1978.
• Sharabiani et al. [2018] A. Sharabiani, H. Darabi, S. Harford, E. Douzali, F. Karim, H. Johnson, and S. Chen. Asymptotic dynamic time warping calculation with utilizing value repetition. Knowledge and Information Systems, 57(2):359–388, 2018.
• Silva et al. [2018] D. F. Silva, R. Giusti, E. Keogh, and G. Batista. Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Mining and Knowledge Discovery, 32(4):988–1016, 2018.
• Wang et al. [2013] X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26(2):275–309, 2013.
• Yi and Faloutsos [2000] B.-K. Yi and C. Faloutsos. Fast time sequence indexing for arbitrary norms. In Proceedings of the 26th VLDB Conference, pages 385–394, 2000.