# The One-Way Communication Complexity of Dynamic Time Warping Distance

We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using O(n/α) bits for the problem of computing an α-approximation for DTW between strings x and y of length n, and we prove a lower bound of Ω(n / α) bits for the same problem. Our communication protocol works for strings over an arbitrary metric of polynomial size and aspect ratio, and we optimize the logarithmic factors depending on properties of the underlying metric, such as when the points are low-dimensional integer vectors equipped with various metrics or have bounded doubling dimension. We also consider linear sketches of DTW, showing that such sketches must have size Ω(n).

## Authors

• 36 publications
• 33 publications
• 21 publications
• 83 publications
• 52 publications
• ### Dynamic Time Warping in Strongly Subquadratic Time: Algorithms for the Low-Distance Regime and Approximate Evaluation

Dynamic time warping distance (DTW) is a widely used distance measure be...
04/22/2019 ∙ by William Kuszmaul, et al. ∙ 0

• ### Compressed Communication Complexity of Longest Common Prefixes

We consider the communication complexity of fundamental longest common p...
06/08/2018 ∙ by Philip Bille, et al. ∙ 0

• ### Compressed Communication Complexity of Hamming Distance

We consider the communication complexity of the Hamming distance of two ...
03/05/2021 ∙ by Shiori Mitsuya, et al. ∙ 0

• ### The Communication Complexity of Optimization

We consider the communication complexity of a number of distributed opti...
06/13/2019 ∙ by Santosh S. Vempala, et al. ∙ 0

• ### Detecting cliques in CONGEST networks

The problem of detecting network structures plays a central role in dist...
07/03/2018 ∙ by Artur Czumaj, et al. ∙ 0

• ### Cantor-solus and Cantor-multus Distributions

The Cantor distribution is obtained from bitstrings; the Cantor-solus di...
03/20/2020 ∙ by Steven Finch, et al. ∙ 0

• ### Random noise increases Kolmogorov complexity and Hausdorff dimension

Consider a binary string x of length n whose Kolmogorov complexity is α ...
08/14/2018 ∙ by Gleb Posobin, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Dynamic Time Warping (DTW) distance is a widely used distance measure between time series. It is particularly flexible in dealing with temporal sequences that vary in speed. To measure the distance between two sequences, each sequence is “warped” non-linearly in the time dimension (i.e., portions of each sequence are stretched by varying amounts) and the warped sequences are compared by summing up distances between corresponding elements. DTW was popularized in the speech recognition community by Sakoe and Chiba [SC78]. It was introduced in the data mining community for mining time series by Berndt and Clifford [BC94]. It has many applications include phone authentication [DLHB12], signature verification [MP99], speech recognition [MBE10], bioinformatics [AC01], cardiac medicine [CPB98], and song identification [ZS03]

. Several techniques and heuristics have been developed to speed up natural dynamic programming algorithms for it

[Hir75, SC78, KP99, KP00, Keo02, BUWK15, PFW16]. We refer the reader also to Section 2 of [ANCT09] for more references.

Distance measures on sequences and time series have been extensively studied in the literature. Given two sequences and of points in (or a metric space), one seeks to “match the points up” as closely as possible. One way of doing this is to define a “correspondence” between by considering expansions of and to produce sequences of equal length, i.e., we duplicate each point some number times (to produce ) and each point some times (to produce ), so that . Now, we define a vector with , for some underlying distance function and choose the correspondence which minimizes a certain function of . For example, minimizing leads to the Dynamic Time Warping distance. Minimizing leads to the discrete Fréchet distance. The edit distance between strings can be similarly cast in this framework. One unusual aspect of DTW (in contrast to its close cousins, edit distance and Fréchet distance) is that it does not satisfy the triangle inequality.

Edit distance and Fréchet distance have received a lot of attention in the theory community. Fundamental questions such as exact and approximation algorithms, nearest neighbor search, sketching, and communication complexity have been intensively studied. However, there are relatively few results about DTW. Similar to edit distance, DTW can be computed by a quadratic-time dynamic program. Recently, it was shown that there is no strongly subquadratic-time algorithm for DTW unless the Strong Exponential Time Hypothesis is false [BK15, ABW15]; approximation algorithms for DTW were obtained under certain assumptions about properties of the input strings [AFPY15, YPFA16]; and slightly subquadratic algorithms for DTW have also been obtained [GS16]. was studied in the context of LSH [DS17] and nearest neighbor search [SYF05, EP17]. To the best of our knowledge, until now, there has been no study of the communication complexity of this basic distance measure on sequences.

In this paper, we study the one-way communication complexity of DTW. For a distance measure such as DTW, the goal in the one-way communication model is to define a randomized function

and an estimation procedure

so that for any , given and , the output

with large probability. There are various notions of approximation, but a natural one is that

for an approximation factor . The challenge is to understand how large needs to be (for sequences of length ) in order to obtain approximation factor . A closely related notion is that of sketching, where the estimation procedure takes and and we require that with large probability. This one-way communication complexity question has been studied previously for edit distance, in the context of document exchange [BZ16, Bel15, IMS05a]. This model captures a number of applications, e.g., lower bounds in it apply to data stream algorithms and to sketching protocols. Upper bounds in it are appropriate for nearest neighbor search; indeed, the natural thing to do here is a lookup table, so a (one-way) sketch of size bits creates a table of size (see e.g., [ADBIW09]). One-way communication is one of the simplest and most natural settings in which one can study communication complexity, and it has rich connections to areas such as information theory, coding theory, on-line computing, and learning theory [KNR95].

### 1.1 Our Results

Our main result is a tight bound, up to logarithmic factors, on the one-way communication complexity of computing an -approximation to . The results are discussed in more detail below.

We present a communication protocol using bits which works for on any underlying metric space of polynomial size and aspect ratio (Theorem 4.11). We optimize the logarithmic factors in the important case when the points are natural numbers and the distance , as well as more generally when the points are low-dimensional integer vectors equipped with various metrics (Theorem 4.13); we also optimize for the important case where the underlying metric has small doubling dimension (Theorem 4.13). At the cost of an extra logarithmic factor in complexity, all of our protocols are also time-efficient, in that Alice and Bob each run in polynomial time.

Next, we turn to lower bounds. Our communication protocol is non-linear, and we show that in general linear sketches must have size (Theorem 5.6). Moreover, we prove that our upper bounds are within a polylogarithmic factor of tight, establishing a randomized one-way communication lower bound of for any underlying metric space of size at least three, for one-way communication algorithms which succeed with constant success probability (Theorem 5.2). We optimize this in several ways: (1) when the underlying metric is generalized Hamming space over a point set of polynomial size , we improve the lower bound to for algorithms which succeed with probability , and show this is optimal (Theorem 5.3); (2) for the natural numbers, we improve the lower bound to for algorithms which succeed with probability at least (Theorem A.1). We note that our lower bound of applies even to approximating DTW in the low distance regime (i.e., distinguishing versus with constant probability), and that in this regime the edit distance admits a much smaller sketching complexity [BZ16, IMS05b]. To the best of our knowledge, our result provides the first separation between the and the edit distance.

We summarize our results in Table 1.

The layout of the paper is as follows: We present preliminaries in Section 2. We give a detailed overview of our techniques and results in Section 3. In Section 4 we present our upper bounds on the one-way communication complexity of DTW. Finally, in Section 5, we presetn our lower bounds.

## 2 Preliminaries

As a convention, we say an event occurs with high probability if it happens with probability at least for a polynomial of our choice. Throughout the paper, we use to denote a finite metric space. We denote by the set of strings of length over and by the set of strings of length at most over . An important property of will be its aspect ratio, which is defined as the ratio between the diameter of and the smallest distance between distinct points in .

#### Dynamic Time Warping Distance

We study the dynamic warping distance (DTW) of strings . Before we formally define the DTW distance, we first introduce the notion of an expansion of a string.

###### Definition 2.1.

The runs of a string are the maximal substrings consisting of a single repeated letter. Any string obtained from by extending ’s runs is an expansion of .

For example, the runs of are , , , and . Given a string , we can extend a run in by further duplicating the letter which populates the run. For example, the second run in can be extended to obtain , and we say the latter string is an expansion of the first.

Using the notion of an expansion, we can now define dynamic time warping.

###### Definition 2.2.

Consider two strings . A correspondence111A related concept, traversal, is sometimes used in the literature. A traversal can be viewed as the the set of matching edges of a correspondence. between and is a pair of equal-length expansions of and . The edges in a correspondence are the pairs of letters , and the cost of an edge is given by . The cost of a correspondence is the sum of the costs of the edges between the two expansions. A correspondence between and is said to be optimal if it has the minimum attainable cost, and the resulting cost is called the dynamic time warping distance .

When discussing a correspondence , the following terms will be useful.

###### Definition 2.3.

A run in overlaps a run in if there is an edge between them. A letter is matched to a letter if the extended run containing overlaps the extended run containing .

Note that any minimum-length optimal correspondence between strings will be of length at most . In particular if in an optimal correspondence a run in and a run in have both been extended and overlap by at least one letter, then there is a shorter optimal correspondence in which the length of each run is reduced by one. Thus any minimum-length optimal correspondence has the property that every edge contains at least one letter from a run that has not been extended, thereby limiting the length of the correspondence to at most .

DTW can be defined over an arbitrary metric space , and is also well-defined when is a distance function not satisfying the triangle inequality.

Throughout our proofs, we will often refer to DTW over generalized Hamming space, denoted by . As a convention, regardless of what metric space the strings and are initially taken over, is defined to be the DTW-distance between and obtained by redefining the distance function to return on distinct inputs.

#### One-Way Communication Complexity

In this paper, we focus on the one-way communication model. In this model, Alice is given an input , Bob is given an input , and Bob wishes to recover a valid solution to a problem with some solution-set . (For convenience, we will refer to the problem by its solution set .) Alice is permitted to send Bob a single message , which may be computed in a randomized fashion using arbitrarily many public random bits. Bob must then use Alice’s message in order to compute some , which he returns as his proposed solution to .

The pair is a -accurate one-way communication protocol for the problem if for all and , the probability that Bob returns a correct answer to is at least . The protocol is said to have bit complexity at most if Alice’s message is guaranteed not to exceed in length. Moreover, the protocol is said to be efficient if both and can be evaluated in time polynomial in the length of and .

Fix a parameter , the randomized one-way communication complexity of the problem is the minimum attainable bit complexity of a -accurate one-way communication protocol for . The focus of this paper is on the one-way communication complexity of the -DTW problem, defined as follows:

###### Definition 2.4 (α-Dtw).

The problem is parameterized by an approximation parameter . The inputs are a string and a string . The goal is recover an -approximation for . In particular, the set of valid solutions is

 {t∣DTW(x,y)≤t<α⋅DTW(x,y)}.

One can also consider the decision version of this problem, in which one wishes to distinguish between distances at most and distances at greater than :

###### Definition 2.5 (Dtep).

The Decision Threshold Estimation Problem , is paramaterized by a positive threshold and an approximation parameter . The inputs to the problem are a string and a string . An output of is a valid solution if , and an output of is a valid solution of .

Notice that any algorithm for -DTW immediately gives an solution for for any . Conversely, any lower bound for the communication complexity of gives a lower bound for the communication complexity of -DTW. For both of the above two definitions, we may omit the sequence space if it is clear from the context.

## 3 Technical Overview

In this section, we present the statements and proof overviews of our main results.

#### Complexity Upper Bounds:

Our starting point is the following: suppose that for a metric space of polynomial size and aspect ratio, and further that the distances between points are always either or at least . Alice and Bob wish to construct a -accurate one-way protocol for -DTW.

Collapsing Repeated Points. Consider the strings and , formed by reducing each run of length greater than one in and to the same run of length one. If we define to be the length of the longest run in or , then . Indeed, any correspondence between and gives rise to a correspondence between and obtained by duplicating each coordinate in and a total of times. Moreover, since any correspondence between and is also a correspondence between and , it follows that .

Inefficient Protocol via Hashing. Suppose Alice and Bob are guaranteed that , and that the maximum run-length satisfies . Then it suffices for Alice and Bob to compute ; and for this it suffices for Bob to be able to reconstruct . The claim is that from a random hash of of length bits, given , Bob can reconstruct . Indeed, given that , and given that the runs in and are all of length one, one can verify that there must be an optimal correspondence between and such that is obtained from by extending at most runs. Since there are ways to choose which runs in are extended, and since there are then ways to choose the new lengths to which those runs are extended, it follows that there are only options for . Moreover, because and differ in at most positions, for a given option of there are only options for and thus for . Since starting from , there are only options for , meaning that a -bit hash allows Bob to recover with high probability.

Efficiency via Edit Distance Sketch. In addition to requiring that and , the above protocol is inefficient since Bob needs to enumerate over all possibilities of and compute the hash value of each. Exploiting the fact that and contain only runs of length one, we prove that is within a constant factor of the edit distance between and . This means that Alice can instead invoke the edit-distance communication protocol of [IMS05b] of size , which allows Bob to efficiently recover using the fact that the edit distance between and is .

Handling Heavy Hitters. The arguments presented so far require that and contain no runs of length greater than . We call such runs heavy hitters. To remove this restriction, a key observation is that there can be at most heavy hitters. Therefore Alice can communicate to Bob precisely which runs are heavy hitters in using bits. The players then proceed as before: Alice collapses her input to by removing consecutive duplicates, and Bob collapses his input to by removing consecutive duplicates. We still have since any correspondence between and is a correspondence between and . Thus, as before, Bob can reconstruct whenever . Now, though, it could be that because of the positions in and that occur more than times. However, Bob uses his knowledge of the locations and values of the heavy hitters, together with , to create a string formed from by collapsing runs of length less than , and not doing anything to runs of length at least . Now by computing , Bob obtains a -approximation for , since any correspondence between and gives rise to a correspondence between and by duplicating each letter times.

Having handled the heavy hitters, the only remaining requirement by our protocol is that the distances between letters in and be zero and one. Thus we arrive at the following:

###### Proposition 3.1 (Protocol over Hamming Space).

Consider over a metric space of polynomial size with distances zero and one. Then for , there is an efficient -accurate one-way communication protocol for -DTW over which uses bits. Moreover, for any , there is an inefficient -accurate protocol for -DTW using space for any .

Note that our protocol is constructive in that it actually allows for to build a correspondence between and satisfying the desired approximation bounds.

In generalizing to DTW over arbitrary metric spaces, we will use our protocol over Hamming Space as a primitive. Moreover, we will exploit the fact that it can be used to solve a slightly more sophisticated problem which we call bounded -DTW:

###### Definition 3.2 (Bounded α-Dtw).

In the bounded -DTW problem, Alice and Bob are given strings and in . The goal for Bob is:

• If , solve - on .

• If , either solve - on , or return “Fail”.

A crucial observation is that Proposition 3.1 continues to hold without modification if the alphabet has arbitrary distances and our goal is to solve the bounded -DTW problem.

Extending Distance Range via HSTs. The result for the bounded -DTW problem allows for Bob to either determine an -approximation for , or to determine that . As a result the algorithm can be used to distinguish between an . One issue though is that the argument cannot distinguish between larger distances, such as for example between the cases and . A key idea for resolving this issue is to first consider the problem over a -hierarchically well-separated tree metric (HST), and then use the embedding of [FRT04] to embed an arbitrary finite metric of polynomial size and aspect ratio into such a metric. A -hierarchically well-separated tree metric is defined as the shortest path metric on a tree whose nodes are elements of and whose edges have positive weights for which on any root-to-leaf path, the weights are geometrically decreasing by a factor of . Since the weights decrease geometrically, for convenience we define pairwise distances in the tree metric to be the maximum edge length on the tree path between the nodes, a notion of distance which coincides with the sum of edge lengths up to a constant factor.

Suppose the points in correspond to a -hierarchically well-separated tree metric and we wish to distinguish between whether or . A crucial idea is what we call the -simplification of a string , which replaces each character in with its highest ancestor in the tree reachable via edges of weight at most . A key property is that , since for two points in , respectively, either they each get replaced with the same point in the -simplifications of and , or the maximu-length edge on a path between and is the same before and after -simplification. Notice that if a point in is not equal to a point in , then their distance is at least , by the definition of an -simplification. Combining the preceding two observations, if , then and there is a correspondence for which and disagree in at most positions. On the other hand, since we only “collapse” edges of weight at most , we have that if , then , since the optimal correspondence has length at most .

It follows that the cases of and , correspond with the cases of and , and moreover that when , there is an optimal correspondence for which and disagree in at most positions. Thus we can use our protocol for the -bounded DTW problem to figure out which case we are in, for a given . This gives a protocol for distinguishing between whether or .

In order to obtain an -approximation for , the rough idea now is to run the above protocol multiple times in parallel as varies in powers of , and then to find the smallest value of for which the protocol declares . This works as long as points are taken from a -hierarchically well-separated tree metric. In order to extend the result to hold over arbitrary finite metrics of polynomial size and aspect ratio, the final piece is the embedding of [FRT04], which embeds any polynomial size metric into a -hierarchically well-separated tree metric for which for all , and . This “lopsided” guarantee is sufficient for us since it ensures in any correspondence the sum of distances after performing the embedding will not shrink, while for a single fixed optimal correspondence, by a Markov bound the sum of distances after performing the embedding will not increase by more than an factor with constant probability. Putting the pieces together we are able to obtain an efficient -accurate one-way communication protocol for -DTW using bits. Formally, we arrive at the following theorem:

###### Theorem 3.3 (Main Upper Bound).

Let be a metric space of size and aspect ratio polynomial in . Then there is an efficient -accurate one-way communication protocol for -DTW over with space complexity and an inefficient -accurate one-way protocol with complexity .

Optimizing in the Case of Natural Numbers. We can further optimize the logarithmic factors in our upper bound when the underlying alphabet is, for example, the natural numbers and . We handle the case as before. However, for larger values of , we take a different approach.

We first explain the case of distinguishing versus . The idea is to impose a randomly shifted grid of side length , and to round each point in and down to the nearest smaller grid point, resulting in strings and . Define a short edge in a correspondence to be an edge of cost at most , and otherwise call the edge a long edge. We assume w.l.o.g. that any correspondence has length at most .

Suppose first , and consider an optimal correspondence. We will show that the effect of rounding is such that with probability at least , . First we consider what effect rounding has on the short edges. The expected number of short edges with endpoints that get rounded to different grid points is at most

 ∑short edge length llα/4≤4DTW(x,y)α.

Each such edge has its length increased by at most after rounding, and so the expected contribution of short edges to the correspondence after rounding is at most . Since each long edge has its length increase by at most an additive , and its original length is at least , its contribution changes by at most a constant factor, so the total contribution of long edges after rounding is . Hence, when , with probability at least after rounding, we have .

Next suppose , and consider any correspondence. The total change in the cost of the correspondence that can result from the rounding procedure is at most , since there are at most edges in total. Consequently the effect of rounding is such that .

It follows that when comparing the cases of and , there is an -factor gap between in the two cases. Further, after rounding to grid points, all non-equal points have distance at least , and so if , then there is a correspondence on which they differ in at most positions. Thus our protocol for bounded -DTW can be applied to distinguish between the two cases. A similar approach can be used to distinguish between and in general, and this can then be used to solve -DTW similarly as for -hierarchically well-separated tree metrics above. We save roughly a factor here because we do not incur the factor distortion of embedding an arbitrary metric into a tree metric.

We remark that our algorithm in the -dimensional natural number case uses a similar grid snapping as used in [DS17] for their nearest neighbor search algorithm for Frechét distance. Recently, Bringmann (personal communication) obtained a sketch for Frechét distance which builds upon the ideas in [DS17] and uses bits. To the best of our knowledge, these techniques do not yield nontrivial results for Dynamic Time Warping, however.

A Unified Approach. To unify the argument for 2-hierarchically well-separated tree metrics and the natural numbers, we recall the definition of a -separable metric space. A -bounded partition of a metric space is a partition such that the diameter of each part is at most . A distribution over partitions is then called -separating if for all , the probability that and occur in different parts of the partition is at most . We say is -separable if for every , there exists a

-separating probability distribution over

-bounded partitions of . One can also define an efficient notion of this, whereby the distribution over partitions is efficiently sampleable.

By adapting our argument for the natural numbers to -separable metrics of polynomial size and aspect ratio, we obtain an efficient -accurate protocol for - with bit complexity , where the comes from minor technical subtitles. For general metrics, it is known that , while for the natural numbers, . Consequently, our result for -separable metrics captures both the result obtained using HSTs (up to a factor of ) as well as the optimization for the natural numbers. Moreover, the theorem allows for space savings over many additional metrics, such as low-dimensional integer vectors equipped with -norms, metrics with bounded doubling dimension, etc., all of which have , allowing for improvement over our result based on HSTs. The general result we arrive at is captured formally in the following theorem:

###### Theorem 3.4 (Extended Main Upper Bound).

Let be a metric space of size and aspect ratio . Suppose that is efficiently -separable for some . Then there is an efficient -accurate one-way communication protocol for -DTW with space complexity and an inefficient -accurate one-way protocol with space complexity

The proof closely follows that for the natural numbers, where instead of our randomly shifted grid, we use a random -bounded partition. If we are trying to distinguish versus , then we set . Just like for the grid, where we “snapped” points to their nearest grid point, we now snap points to a representative point in each part of the partition, obtaining two new sequences and . By using shared randomness, the representative in each part can be agreed upon without any communication. Just like in the grid case, we show that if , then for the optimal correspondence, in expectation its cost increases only by a constant factor after snapping. On the other hand, if , then we show that for every correspondence, its cost decreases only by a constant factor. The key difference is that now the expected number of short edges with endpoints occurring in different parts of the partition is at most .

#### Complexity Lower Bounds:

The simplest of our lower bounds comes from a reduction from a randomized -way communication lower bound for indexing over large alphabets [JW13]. In this problem, Alice is given a string in for some universe and length parameter , and Bob is given a character and an index . The goal is for Bob to decide if with probability at least . It is known if Alice sends a single message to Bob, then there is an lower bound. By reducing this large-alphabet indexing problem to -DTW when . To perform the reduction, Alice’s input string is mapped to the string . Bob’s inputs of and are mapped to an input string in which the character is repeated times. If , then (due to the characters of that do not get matched with an equal-value letter); otherwise (due to the fact that none of the letters in can be correctly matched). This gives a reduction to -DTW as desired. Using this we have an lower bound for -accurate -DTW, provided the alphabet size is say, at least . Thus we have the following theorem:

###### Theorem 3.5 (Tight Bound Over Hamming Space).

Consider , and consider the generalized Hamming distance over a point-set with of polynomial size . For , .

In order to obtain a nearly tight lower bound for arbitrary finite metric spaces, we construct a more intricate lower bound of which holds whenever . For convenience, we describe the argument for the case of below. The lower bound is achieved via a reduction from the Index problem in which Alice has , Bob has an , and Bob would like to output with probability at least . The randomized -way communication complexity of this problem is . We instantiate . For each , if , Alice creates a string of length consisting of s, followed by s, followed by s; and if , Alice creates a string of length consisting of s, followed by a single , followed by s. She then concatenates into a single string of length . Bob, who is given an index , creates the string ; that is, we have the length- string repeated times, then the string , followed by the string repeated times. (We call each piece of the form and a block.) Notice that if , then , since the single in can match to either the or in the block of Bob’s string . On the other hand, if , the entire run of s in has to appear somewhere in the correspondence and cannot match to the or in the -th piece of Bob’s string, without incurring a cost of . So these s must either “travel” to blocks in or blocks in . Suppose, without loss of generality, most of these s are matched to a block . This has a ripple effect, since it causes the s in the -th block to also have to travel to a block . While this is possible, it then means the s in the -st block must travel to a block even larger than , etc. Eventually, we run out of blocks to match the elements in Alice’s string to since there are blocks in her string that need to be matched to fewer than blocks in Bob’s string. This ultimately forces , completing the reduction from the Index problem to -DTW. The extension of this argument to arbitrary establishes that our upper bound for general metric spaces is optimal up to a polylogarithmic factor:

###### Theorem 3.6 (General Lower Bound).

Let be three letters with a two-point distance function , not necessarily satisfying the triangle inequality. Consider . Then CC.

Our communication protocols are non-linear, and we conclude our lower bounds by showing that linear sketches must have size .

###### Theorem 3.7 (Linear Sketching Lower Bound).

Consider . Then any -error linear sketch for - on has space complexity .

## 4 Upper Bounds

In this section, we present a near optimal one-way communication protocol for -DTW over an arbitrary finite metric space with the constraint that has at most polynomial size and aspect ratio, i.e., the ratio between the largest and the smallest distances between distinct points.

We begin in Subsection 4.1 by considering an easier problem known as bounded -DTW, in which Bob is only required to compute an -approximation for when , and he may instead return “Fail” when (recall that is the number of conflicting edges in a correspondence). Proposition 4.3 gives an efficient one-way protocol for bounded -DTW.

Building on Proposition 4.3, in Subsection 4.2 we then design an efficient one-way protocol for -DTW over well-separated tree metrics (Lemma 4.10).

Then, in Subsection 4.3, Theorem 4.11 provides an efficient one-way protocol for arbitrary finite metric spaces with polynomially bounded size and aspect ratios by first embedding the metric space into a well-separated tree metric and then using Lemma 4.10.

The protocol given by Theorem 4.11 uses bits, which we will later see is within a polylogarithmic factor of optimal. In Subsection 4.4, we prove a further generalization of Theorem 4.11 which for allows for a tighter upper bound by a logarithmic factor for certain important cases of such as when .

### 4.1 The Bounded α-DTW Problem

We define bounded -DTW over a metric space to be the following communication problem.

###### Definition 4.1 (Bounded α-Dtw).

In the bounded -DTW problem, Alice and Bob are given strings and in . The goal for Bob is:

• If , solve - on .

• If , either solve - on , or return “Fail”.

In order to design an efficient one-way communication scheme for bounded -DTW, we will use what we refer to as the -document exchange problem as a primitive. Here, Alice and Bob are given strings and . The goal for Bob is:

• If , recover the string .

• If , either recover or return “Fail”.

The -document exchange problem has been studied extensively [Orl91, Jow12, Bel15, CGK16, IMS05b]. The one-way communication protocol of [IMS05b] efficiently solves -document exchange using bits with high probability. This can be slightly improved at the cost of being no longer time-efficient using the protocol of [Orl91], which achieves accuracy for any by having Alice simply hash her string to a -bits.

The -document exchange problem concerns edit distance rather than DTW. Nonetheless, in designing a sketch for DTW, the -document exchange problem will prove useful due to a convenient relationship between edit distance and DTW over generalized Hamming space (or equivalently ).

###### Lemma 4.2 (Dtw0 Approx. Edit Dist.).

Let be strings of length at most with letters from any metric space, and suppose that neither string contains any runs of length greater than one. Then

 DTW0(x,y)≤ed(x,y)≤3DTW0(x,y).\lx@notefootnoteThiscanbeimprovedto$2DTW0(x,y)$withslightlybetterbookkeeping.
###### Proof.

We first show that . A sequence of edits between and can be thought of as a series of insertions in each of and , as well as substitutions. One can create expansions and of and , respectively, by extending runs by one in each place where the sequence of edits would have performed an insertion. The Hamming distance between and is then at most the length of the sequence of edits. Hence .

Next we show that . Consider an optimal correspondence between and . Without loss of generality, we may assume that whenever two runs in and overlap, at least one of them has length only one. (Indeed, otherwise both runs could have been reduced in size by one at no cost to DTW.) Therefore, any run of length in must overlap distinct runs in , and thus must incur at least Hamming differences. On the other hand, because the run is length , the expansion of the run can be simulated by insertions. Therefore, and can be constructed from and through at most edits. Hence, . ∎

We now present an efficient one-way communication scheme for bounded -DTW. (Note that this also implies Proposition 3.1 from Section 3.)

###### Proposition 4.3 (Protocol for Bounded DTW).

Consider over a metric space of polynomial size. Then for , there is an efficient -accurate one-way communication protocol for bounded -DTW over which uses bits. Moreover, for any , there is an inefficient -accurate protocol for bounded -DTW using space for any .

###### Proof.

We assume without loss of generality that and are integers. Let be a string given to Alice, and be a string given to Bob. Alice can construct a string by taking each run in which is of length less than and reducing its length to one. Notice that trivially and that because any correspondence between and can be turned into a correspondence between and by duplicating every letter in the original correspondence times. Thus if Alice could communicate to Bob, then Bob could solve -DTW.

In an attempt to communicate to Bob, Alice first constructs a list consisting of the pairs for which the -th run in is of length . Alice then sends to Bob. Notice that , and thus can be communicated using bits. Moreover, if Alice defines to be except with every run reduced to length one, then can be recovered from and . Therefore, if Alice could further communicate to Bob, then Bob could solve -DTW.

In an attempt to communicate to Bob, Alice invokes the one-way communication protocol of [IMS05b] for the -document exchange problem. She sends Bob the resulting sketch of size bits which is correct with probability at least . Bob, in turn, defines to be with each run reduced to length one and uses the sketch along with in order to try to recover . If Bob is able to use to recover a value for , then he can correctly solve -DTW with high probability. If, on the other hand, Bob is unable to use to recover a value for , then Bob may conclude with high probability that . Because by Lemma 4.2 and because , we have that . It follows that in this case Bob can correctly return “Fail”.

Rather than using the efficient one-way communication protocol of [IMS05b], Alice could instead invoke the protocol of [Orl91] in which she sends Bob a hash of using and Bob is able to then inefficiently recover correctly with probability at least . Thus we also obtain an inefficient -accurate protocol which uses bits. ∎

### 4.2 Protocol Over Well-Separated Tree Metrics

In this Subsection, we generalize Proposition 4.3 to obtain an efficient communication protocol for -DTW over well-separated tree metrics. Before doing so, we provide a brief background discussion for these metric spaces.

###### Definition 4.4.

Let be a tree whose nodes are the letters from the alphabet and whose edges have positive weights. Moreover, suppose that any root-to-leaf path of edges has non-increasing weights. Then define the distance between two nodes in to be the weight of the heaviest edge in the path between the two nodes. We call such a metric a well-separated tree metric.

If additionally the edges along every root-to-leaf path always decrease in weight by at least a factor of two between consecutive edges, then the metric is said to be a 2-hierarchically well-separated tree metric.

Our definition differs slightly from the standard definition, which defines the metric as simply being the graph distance metric induced by the tree on its nodes. Importantly, these two definitions are essentially equivalent (up to constant factor change in distances) for the case of 2-hierarchically well-separated tree metrics. It was shown in [FRT04] that any finite metric can be embedded into a 2-hierarchically well-separated tree metric with expected distortion . Thus well-separated tree metrics are in some sense universal.

Next we define the notion of the -simplification of a string. We will then use -simplifications to reduce -DTW to bounded -DTW over well-separated tree metrics.

###### Definition 4.5 (r-Simplification).

Let be a well-separated tree metric whose nodes form the alphabet . For a string and for , we construct a string by replacing each letter with the highest ancestor of in to be reachable from via only edges of weight at most . The string is known as ’s -simplification.

The next lemma states three important properties of -simplifications.

###### Lemma 4.6 (Simplification Preserves DTW Gap).

Let be a well-separated tree metric with distance function and whose nodes form the alphabet . Consider strings and in .

Then the following three properties of and hold:

• For every pair of distinct letters and in and , the distance is greater than .

• If then and .

• If , then .

###### Proof.

The first part of the claim follows immediately from the definition of and .

Consider a correspondence between and and define to be the analogous correspondence between and . Without loss of generality the correspondences are each of length at most .

Consider any two letters and which form an edge in the correspondence , and define and to be the corresponding letters in and . If , then we will have , meaning that . If, on the other hand, , then the heaviest edge in the path from to in the tree will also be the heaviest edge in the path from to , meaning that . Combining these two cases, it follows in general both that

 d(l′1,l′2)≤d(l1,l2), (1)

and that

 d(l1,l2)≤d(l′1,l′2)+r/4. (2)

By (1), the cost of is no larger than the cost of . Therefore, . Hence, if , then