Time series data is ubiquitous appearing in essentially all scientific domains. Comparing time series requires a measure to determine the similarity of two time series. Dynamic Time Warping (DTW)  is an established method which is used in numerous time series mining applications [21, 3, 4, 1].
The quadratic time complexity, however, is considered to be a major drawback of DTW on very long time series even in optimized nearest neighbor search applications that apply sophisticated pruning and lower-bounding techniques . Note that in general there is not much hope to find strongly subquadratic algorithms since it has been shown that DTW cannot be solved in time for any on time series over a constant-size alphabet (based on a complexity-theoretic assumption) [2, 5]. Long time series of length occur, for example, when measuring electrical power of household appliances with a sampling rate of a few seconds collected over several months, twitter activity data sampled in milliseconds, and human activities inferred from a smart home environment . All these time series have in common that they contain long constant segments.
Recently, several specialized algorithms have been devised to cope with long time series that contain constant segments [8, 17, 12, 19, 13, 14]. The basic idea of these algorithms is to exploit the repetitions of values within a time series to speed up computation of the DTW distance. We briefly summarize the respective algorithms (see also Table 1).
Binary DTW (BinDTW) : This algorithm computes exact DTW distances on binary length- time series in time. It has not been implemented and tested in practice.
AWarp : This algorithm is exact for binary time series (a formal proof is missing) and exploits repetitions of zeros. The running time is , where and are the numbers of non-zero entries in the two input time series.
Sparse DTW (SDTW) : This algorithm yields exact DTW distances for arbitrary time series in time, where and are the numbers of non-zero entries in the two input series (assuming both have length ).
Binary Sparse DTW (BSDTW) : This algorithm computes exact DTW distances between two binary time series in time, where and are the numbers of non-zero entries in the two input time series. In practice it is often faster than AWarp.
Blocked DTW (BDTW)  (earlier introduced as Coarse-DTW ): This algorithm operates on run-length encoded time series. The run-length encoding represents a run of identical values (constant segment) by storing only a single value together with the length of the run. BDTW is exact on binary time series (a formal proof is missing). The running time is , where and are the numbers of runs in the two input time series (note that ). BDTW is faster than AWarp in practice.
Clearly, a practical limitation of BinDTW and BSDTW is that they are only applicable for binary time series. AWarp and BDTW are limited in that they only yield exact DTW distances for binary time series.
|BDTW [19, 8]||arbitrary||binary|
We develop an improved algorithm that computes exact DTW distances for arbitrary time series in run-length encoding. The running time for two time series and of length and is , where and are the number of runs in and and is a number depending on the individual lengths of runs (see Section 3 for details). Note that this yields a constant-time algorithm for time series with . In general, the running time is asymptotically faster than if and . If , then the running time is faster even for . Notably, if all runs in at least one input time series have the same length, then our algorithm runs in time. If both time series contain only runs of the same fixed length, then our algorithm even runs in time and is in fact equivalent to BDTW.
We give some preliminary definitions and introduce notation.
Let and . An table consists of rows and columns, where denotes the entry in the -th row and -th column.
A time series is an ordered finite sequence of rationals. The run-length encoding of a time series is the sequence of pairs where is a positive integer denoting the number of consecutive repetitions (run length) of the value in . Note that . We call the length of and we call the coding length of .
Dynamic Time Warping.
The dynamic time warping distance  is a distance measure between time series using non-linear alignments which are defined via the concept of a warping path.
A warping path of order is a sequence , , of index pairs , , such that
for each .
The set of all warping paths of order is denoted by . A warping path defines an alignment between two time series and in the following way: A pair aligns element with incurring a local cost of . The DTW distance between and is defined as
It can be computed via dynamic programming in time based on an table .
3 The Algorithm
In the following, let and be two time series with run-length encodings and . We define , for and , for . Consider the DTW matrix , where . Note that can be structured into blocks , where each step inside has local cost . The right boundary of corresponds to column of and the top boundary is formed by row of (see Figure 1).
We show that it is sufficient to compute only certain entries on the boundaries of blocks instead of all entries in . To this end, we analyze the structure of optimal warping paths. We begin with the following simple observation.
There exists an optimal warping path such that the following holds for every block : If moves through , then first moves diagonally through until it reaches a boundary of .
This is true since every step inside a block costs the same. Hence, it is optimal to maximize the number of diagonal steps (which minimizes the overall number of steps to reach a boundary of a block). creftypecap 3.1 implies that there exists an optimal warping path which is an alternation of diagonal and horizontal (or vertical) subpaths where the horizontal (vertical) subpaths are always on top (right) boundaries of blocks.
Now, we restrict the possible diagonals along which such an alternating optimal warping path might move. To this end, let , , denote the diagonal in going through the upper right corner of block (that is, through the entry ) and let be the diagonal (corresponding to ) going through . We denote the set of all these block diagonals by (see Figure 1). Now, our key lemma states that there always exists an optimal warping path which only moves along block boundaries and block diagonals.
There exists an optimal warping path that only moves along diagonals in and boundaries of blocks.
By definition, every warping path initially starts in on the diagonal . Let be an optimal warping path which alternates between diagonals and block boundaries as described above. Assume that does not only move along diagonals in . Then, by assumption, leaves some diagonal on a boundary (wlog horizontally on the top boundary ) of a block and (diagonally) enters the neighboring block before the next intersection of a diagonal with . It then proceeds diagonally in between and until reaching some block boundary where it moves horizontally or vertically again. Note that has to move horizontally or vertically again at some point since it has to reach a diagonal in again (this holds because every warping path eventually ends up on ). Assume that moves diagonally only until reaching the top boundary of a block , , , where moves horizontally (analogous arguments apply if moves vertically on a right boundary of a block in between and ). See Figure 2 for an example. Observe that a warping path can only enter blocks from bottom (that is, from the top boundary of the block below) or left (that is, from the right boundary of the block to the left) and exit blocks from top or right boundaries.
Let denote the number of horizontal steps of on and let be the number of horizontal steps on . Let denote the diagonal subpath of from to . Now, consider the warping path obtained from by “shifting” to the right, that is, takes horizontal steps on and only horizontal steps on . Let be the shifted diagonal subpath and note that crosses a subset of the blocks crossed by . This is true since there cannot be an upper right corner of any block anywhere in the region between and (since they are neighboring diagonals from ).
Let us now consider the number of steps taken by within each block from to . Clearly, takes one more step inside than . Regarding , if enters from bottom, then takes one step less inside . Otherwise, if enters from the left, then takes the same number of steps inside as . For every block in between and which is crossed by , the following holds:
If crosses from left to top, then takes one more step.
If crosses from bottom to right, then takes one step less.
If crosses from bottom to top (or from left to right), then takes the same number of steps.
The above holds since cannot pass through an upper right corner of a block in between and . Note that the number of steps taken by and through any block differs by at most one.
Now, let be the set of blocks where takes more steps than and let be the set of blocks where takes more steps than . Let and . Then, the cost difference between and is . By optimality of , we have , that is, .
If , then also is an optimal warping path. Thus, by analogous arguments, shifting times to the right yields an optimal warping path that does not move horizontally on anymore. If this warping path now already moves diagonally along (as it would be the case in Figure 2 when shifting four times to the right), then this proves the claim. If this is not case, then analogous arguments apply again for the next occurrence of a horizontal (or vertical) subpath in between and . This finally yields an optimal warping path moving along (or ) proving the claim.
If , then we can analogously shift to the left to obtain a warping path . Clearly, the blocks where takes one more step than are exactly the blocks , and the blocks where takes one more step than are exactly the blocks . Hence, the cost difference between and is also , which contradicts the optimality of . ∎
From Lemma 3.2, it follows that , that is, , can be computed from only those entries in which are an intersection of a block boundary and a block diagonal in (in Figure 1 these intersections are framed in bold). Let denote the number of these intersections and note that .
In order to compute the values of at all intersections via dynamic programming in time, we need to compute their coordinates and store them in a sorted way on each block boundary (in order to allow constant-time lookups). The following lemma accomplishes this task.
The intersections of block diagonals with block boundaries can be computed (sorted on each boundary) in time, where is the number of these intersections.
We first determine the ordering (from top to bottom) of all diagonals in in terms of their row index at boundary . Note that this ordering is the same on all right boundaries . Moreover, this ordering also orders the diagonals on all top boundaries from left to right.
To start with, observe that the row index of at column is . Note that is possible, in which case there is no intersection between and . We need to sort the numbers for all . Clearly, for each , we have for all , and for each , we have for all . That is, we need to sort sorted sequences of length (or alternatively, sorted sequences of length ). This can be done via -way merging in time . We can then easily insert the diagonal and remove duplicate diagonals.
To compute all intersections, we now iterate over all diagonals in the ordering determined above. This ensures that we obtain all intersections already sorted on each boundary. For diagonal , the row index of the intersection with boundary is . Clearly, if , then no intersection exists in . Analogously, the column index of the intersection of with boundary is . Again, if , then there is no intersection in . Thus, we can compute all intersections in constant time each (see also LABEL:alg:intersections). ∎
An interesting question at this point is whether the ordering of the diagonals can be computed in time. The overall running time in Lemma 3.3 would then simply be . Recall that the problem is to sort all pairs with respect to their sum. This problem is known as “X + Y Sorting” [11, 10] (a special case of sorting under partial information) and it is in fact open whether it can be solved faster than sorting arbitrary numbers . However, if at least one input time series contains only runs of equal length (which is the case, for example, when using PAA as preprocessing), then sorting can be done faster.
Let and be integers such that for all and some . Then, the tuples can be sorted with respect to the sum in time.
We sort the tuples in increasing order of their sums and write if . Clearly, if and , then this implies . Hence, the first tuple is . Moreover, note that since holds for all , the following holds for all and : If , then .
Starting with , we sort the tuples by iteratively determining the next tuple in constant time each. Here, we use the above properties regarding the partial order of tuples to show that at each point in the iteration, there are at most two candidate tuples (computable in constant time) which we need to compare. This yields a linear-time algorithm.
Assume that we already sorted all the tuples with , where . Let and note that . For each , let and note that . Now, let . Then, the next tuple can either be (if ) or one of the tuples (if exists). More precisely, it can only be the tuple , for which holds for all . We can store this candidate tuple and update it in every iteration in constant time using a queue (see LABEL:alg:sorting). ∎
If all top boundaries (or all right boundaries) of blocks have equal length, then the intersections of block diagonals with block boundaries can be computed (sorted on each boundary) in time.
We are now ready to prove our main result.
The DTW distance between time series and can be computed from and in time, where is the number of intersections between block boundaries and block diagonals in the DTW matrix.
By Lemma 3.2, it is sufficient to compute the values of the DTW matrix at intersections of block boundaries and block diagonals. Given the intersection points (computable in time by Lemma 3.3), their values can be computed block by block, for example from left to right and bottom to top (a warping path can reach a block only via boundaries of , or ). For each block, we compute the entries on its right boundary from bottom to top, and the entries on its top boundary from left to right.
We start with block . By creftypecap 3.1, we can assume that an optimal warping path moves along the diagonal until reaching a boundary of . This allows us to initialize as follows: If , then, for each intersection with , we set . Moreover, we set the values of all intersections on which are below to since our optimal warping path will not use these. We do the same for all intersections on which are to the left of . Analogously, if , then, for each intersection with , we set and all intersections left of and below to .
To compute the value of an intersection , , on the right boundary of a block (for or ), one only has to consider two options. By Lemma 3.2, an optimal warping path reaches either diagonally (via if or via otherwise) or from below on . Let , , be the next intersection below on (if there is none, then set ). Then, the following holds:
and for all .
The computation of values on the top boundary of a block is completely analogous (see LABEL:alg:DTW). Clearly, for the upper right corner of a block there are three corresponding options since it lies on both boundaries. Hence, computing the values of the intersections takes time. ∎
From Corollary 3.5, we obtain the following special case.
If at least one of the time series or contains only runs of equal length, then the DTW distance between and can be computed from and in time, where is the number of intersections between block boundaries and block diagonals in the DTW matrix.
If all block sizes are equal, that is, , then since the intersections are exactly the upper right block corners. In this case, Lemma 3.2 implies that can be computed in time.
Let and be two time series with and , where . Then, can be computed in time.
Note that in this special case the following holds: If an optimal warping path moves through a block , then wlog it takes exactly steps through . Note further that the algorithm Blocked_DTW_UB [19, Algorithm 1] (and also Coarse-DTW [8, Algorithm 2] with ) uses the value (which equals and in this case) for the cost of crossing block . That is, we proved that Blocked_DTW_UB (Coarse-DTW) and clearly also Blocked_DTW_LB [19, Algorithm 2] (which uses ) are exact if all blocks are squares.
As regards the value of , a tight upper bound in general is . However, this bound is of course only attained in the worst case. Especially for larger values of and (in comparison to and ), might be smaller in practice since not every block diagonal will intersect every boundary (depending on the specific block sizes). Also, some block diagonals might even be identical (if square blocks appear).
As a final remark, we mention that in practice the computation of intersections can be done only once if all time series in a data set have identical block sizes.
We presented a fast algorithm to compute exact DTW distances between run-length encoded time series. Our method might yield improved performance in practice, especially when combined with dimension reduction such as piecewise aggregate approximation [9, 22, 15, 6]. Empirical evaluation of our algorithm in experiments and comparison with other methods (Table 1) is planned as future work. It is also an interesting question whether the running time can be improved to (or even better) in general.
- Abanda et al.  A. Abanda, U. Mori, and J. A. Lozano. A review on distance based time series classification. Data Mining and Knowledge Discovery, pages 1–35, 2018.
- Abboud et al.  A. Abboud, A. Backurs, and V. V. Williams. Tight hardness results for LCS and other sequence similarity measures. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS ’15), pages 59–78, 2015.
- Aghabozorgi et al.  S. Aghabozorgi, A. S. Shirkhorshidi, and T. Y. Wah. Time-series clustering–a decade review. Information Systems, 53:16–38, 2015.
- Bagnall et al.  A. Bagnall, J. Lines, A. Bostrom, J. Large, and E. Keogh. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances. Data Mining and Knowledge Discovery, 31(3):606–660, 2017.
- Bringmann and Künnemann  K. Bringmann and M. Künnemann. Quadratic conditional lower bounds for string problems and dynamic time warping. In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS ’15), pages 79–97, 2015.
- Chakrabarti et al.  K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani. Locally adaptive dimensionality reduction for indexing large time series databases. ACM Transactions on Database Systems, 27(2):188–228, 2002.
- Cormen et al.  T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001.
- Dupont and Marteau  M. Dupont and P.-F. Marteau. Coarse-DTW for sparse time series alignment. In First ECML PKDD Workshop on Advanced Analysis and Learning on Temporal Data (AALTD ’15), pages 157–172, 2016.
- Faloutsos et al.  C. Faloutsos, H. Jagadish, A. Mendelzon, and T. Milo. A signature technique for similarity-based queries. In Proceedings of the Compression and Complexity of Sequences 1997 (SEQUENCES ’97), pages 11–13. IEEE, 1997.
- Fredman  M. L. Fredman. How good is the information theory bound in sorting? Theoretical Computer Science, 1(4):355–361, 1976.
- Harper et al.  L. H. Harper, T. H. Payne, J. E. Savage, and E. Straus. Sorting X + Y. Communications of the ACM, 18(6):347–349, 1975.
- Hwang and Gelfand  Y. Hwang and S. B. Gelfand. Sparse dynamic time warping. In , pages 163–175, 2017.
- Hwang and Gelfand  Y. Hwang and S. B. Gelfand. Constrained sparse dynamic time warping. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA ’18), pages 216–222, 2018.
- Hwang and Gelfand  Y. Hwang and S. B. Gelfand. Binary sparse dynamic time warping. In Proceedings of the 15th International Conference on Machine Learning and Data Mining in Pattern Recognition (MLDM ’19), 2019.
- Keogh et al.  E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263–286, 2001.
- Lambert  J.-L. Lambert. Sorting the sums in comparisons. Theoretical Computer Science, 103(1):137–141, 1992.
- Mueen et al.  A. Mueen, N. Chavoshi, N. Abu-El-Rub, H. Hamooni, and A. Minnich. AWarp: Fast warping distance for sparse time series. In 2016 IEEE 16th International Conference on Data Mining (ICDM ’16), pages 350–359, 2016.
- Sakoe and Chiba  H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1):43–49, 1978.
- Sharabiani et al.  A. Sharabiani, H. Darabi, S. Harford, E. Douzali, F. Karim, H. Johnson, and S. Chen. Asymptotic dynamic time warping calculation with utilizing value repetition. Knowledge and Information Systems, 57(2):359–388, 2018.
- Silva et al.  D. F. Silva, R. Giusti, E. Keogh, and G. Batista. Speeding up similarity search under dynamic time warping by pruning unpromising alignments. Data Mining and Knowledge Discovery, 32(4):988–1016, 2018.
- Wang et al.  X. Wang, A. Mueen, H. Ding, G. Trajcevski, P. Scheuermann, and E. Keogh. Experimental comparison of representation methods and distance measures for time series data. Data Mining and Knowledge Discovery, 26(2):275–309, 2013.
- Yi and Faloutsos  B.-K. Yi and C. Faloutsos. Fast time sequence indexing for arbitrary norms. In Proceedings of the 26th VLDB Conference, pages 385–394, 2000.