# Comparing Temporal Graphs Using Dynamic Time Warping

The connections within many real-world networks change over time. Thus, there has been a recent boom in studying temporal graphs. Recognizing patterns in temporal graphs requires a similarity measure to compare different temporal graphs. To this end, we initiate the study of dynamic time warping (an established concept for mining time series data) on temporal graphs. We propose the dynamic temporal graph warping distance (dtgw) to determine the (dis-)similarity of two temporal graphs. Our novel measure is flexible and can be applied in various application domains. We show that computing the dtgw-distance is a challenging (NP-hard) optimization problem and identify some polynomial-time solvable special cases. Moreover, we develop a quadratic programming formulation and an efficient heuristic. Preliminary experiments indicate that the heuristic performs very well and that our concept yields meaningful results on real-world instances.

## Authors

• 14 publications
• 10 publications
• 57 publications
• 16 publications
• ### Faster Binary Mean Computation Under Dynamic Time Warping

Many consensus string problems are based on Hamming distance. We replace...
02/04/2020 ∙ by Nathan Schaar, et al. ∙ 0

• ### Warping Resilient Time Series Embeddings

Time series are ubiquitous in real world problems and computing distance...
06/12/2019 ∙ by Anish Mathew, et al. ∙ 1

• ### On Finding Separators in Temporal Split and Permutation Graphs

Removing all connections between two vertices s and z in a graph by remo...
05/25/2021 ∙ by Nicolas Maack, et al. ∙ 0

• ### Multiscale Snapshots: Visual Analysis of Temporal Summaries in Dynamic Graphs

The overview-driven visual analysis of large-scale dynamic graphs poses ...
08/19/2020 ∙ by Eren Cakmak, et al. ∙ 0

• ### An information-theoretic, all-scales approach to comparing networks

As network research becomes more sophisticated, it is more common than e...
04/10/2018 ∙ by James P. Bagrow, et al. ∙ 0

• ### Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments

Dynamic time warping (DTW) can be used to compute the similarity between...
10/30/2018 ∙ by Lerato Lerato, et al. ∙ 4

• ### Semi-Metrification of the Dynamic Time Warping Distance

The dynamic time warping (dtw) distance fails to satisfy the triangle in...
08/29/2018 ∙ by Brijnesh J. Jain, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

A fundamental concept for pattern recognition is the concept of (dis)similarity between objects. For objects that are represented by numerical feature vectors, there exist a lot of well-known (dis)similarity functions such as

-norms or positive semi-definite kernels.

In structural pattern recognition, objects are often more naturally represented by complex (discrete) data structures such as graphs, strings or time series. For these representations, one can often not simply use vector-based (dis)similarity measures. Instead, one needs to define suitable domain-specific (dis)similarity functions such as the edit distance on graphs and strings or the dynamic time warping distance on time series.

The majority of graph (dis)similarity functions, focuses on static graphs such as the graph edit distance [20], graph kernels [6], and geometric graph distances [12]. However, many complex systems are not static as the links between entities dynamically change over time. Such temporal networks can be represented by a series of temporal edges between a fixed set of vertices. Examples are face-to-face proximity networks, flight traffic networks, temporal attack networks in computer security, or protein-protein-interaction networks in biology [15, 9, 17]. Thus, there is a steadily growing research interest in analyzing temporal networks [26]. In order to perform data mining tasks such as classification or clustering on temporal networks, one needs to find suitable (dis)similarity functions.

We introduce a novel (dis)similarity measure on temporal graphs based on dynamic time warping, called dynamic temporal graph warping. Thus, by combining established methods from graph-based pattern recognition and time series data mining in a nontrivial way, we obtain a suitable tool to analyze temporal network data. Beyond that, we study its computational complexity, develop efficient algorithms and study their behavior on real-world data, the latter confirming the practical usefulness.

#### Related Work.

There are numerous approaches to define graph (dis)similarity measures. A well-known example is the (NP-hard) graph edit distance [20]. Graph kernels (many of which are polynomial-time computable) are another well-studied class [6, 8]. Measuring graph distance based on vertex mappings using local vertex signatures was introduced by Jouili and Tabbone [14]. The idea of using vertex mappings can also be found in optimal assignment kernels [5, 16, 2]. Regarding (dis)similarity measures on temporal graphs, seemingly little work has been done so far. Elhesha et al. [4] recently described an approach based on vertex mappings. Their method, however, does not allow for a flexible alignment between time layers.

Dynamic Time Warping [21] is an established measure for mining time series data [19, 24, 22] which is specifically designed to cope with temporal distortion in the data via nonlinear alignment of observations. We lift this approved concept to the domain of temporal graphs.

#### Our Contributions.

We define the dynamic temporal graph warping distance as a twofold discrete minimization problem involving computation of an optimal vertex mapping and an optimal warping path (see Section 3). We prove that it is NP-hard to solve in general (Theorem 4.1). In contrast, we point out several polynomial-time solvable special cases. Namely, the case when either a vertex mapping or a warping path is fixed (creftypecap 3.1), the case of deciding whether the dtgw-distance is zero (Theorem 5.1), and the case when the lifetimes of the two temporal graphs differ only by a constant and the warping path length is restricted (Proposition 5.2). Moreover, we give a quadratic programming formulation (Section 5.1) and propose an efficient heuristic approach (Section 5.2).

We evaluate the heuristic in some experiments on real-world data to show its efficiency and quality of solution (Section 6).

#### Organization.

In Section 2 we introduce basic definitions. Section 3 contains our main definition of the dtgw-distance followed by some computational hardness results in Section 4 and algorithmic results in Section 5. Finally, Section 6 presents experimental results on some real-world data.

## 2 Preliminaries

For , we define . For a set , we denote the set of all size- subsets of  by .

#### Temporal Graphs.

A temporal graph consists of a vertex set  and a sequence of  edge sets . By , we denote the th layer of and we call  the lifetime of . The underlying graph of  is the graph . We remark that all definitions and results in this work can easily be extended to labeled temporal graphs (with vertex and/or edge labels).

#### Vertex Mapping.

A vertex mapping between two vertex sets and is a set containing tuples such that for all it holds that . We denote the set of all vertex mappings between  and  by . Let  be the subset of vertices in  that are contained in some tuple of  ( is defined analogously).

#### Assignment Problem.

The assignment problem

is a fundamental problem in combinatorial optimization. Given two sets

and  of equal size and a cost function , the goal is to find a bijection  such that

is minimized. It is well known that the assignment problem can be described as an integer linear program and is solvable in

time [1, Theorem 12.2].

#### Dynamic Time Warping.

The dynamic time warping distance [21] is a distance between time series. It is based on the concept of a warping path. A warping path of order  is a set of  pairs  such that

• and , and

• for all .

We denote the set of all warping paths of order  by . For two temporal graphs , , every order- warping path  defines a warping between and , that is, a pair  warps the layer  to .

#### Parameterized Complexity.

We assume the reader to be familiar with basic concepts of computational complexity theory such as NP-completeness. In parameterized complexity theory [3] one considers running times with respect to two dimensions. One dimension is the size of the input instance  and the other dimensions is a parameter  (usually a numerical value). An instance of a parameterized problem is a pair . The class FPT contains all fixed-parameter tractabale parameterized problems, that is, they can be solved in time  for some computable function  only depending on . The class XP contains all parameterized problems that can be solved in polynomial time for every constant parameter value, that is, in time  (clearly, ).

## 3 Dynamic Temporal Graph Warping (DTGW)

In this section we define a temporal graph distance based on dynamic time warping using a vertex-signature-based graph distance as local cost function. We choose this graph distance for the following reasons. First, it is computationally tractable (in comparison to the NP-hard graph edit distance). Second, it is based on a mapping between the two vertex sets (possibly of different size) which might be reasonable in many temporal network applications since this allows to enforce a consistency over time. Third, vertex signatures allow for a high flexibility since they can be chosen arbitrarily (as can the metric) in order to incorporate essential information for the application at hand (e.g. they can be used for weighted or labeled temporal graphs).

#### Graph Distance Based on Vertex Signatures.

The following approach is due to Jouili and Tabbone [14]. For a (static) graph , a vertex signature function encodes (local) information about a vertex (e.g. its degree). Let be a metric.

For two (static) graphs and with vertex signatures and and a given vertex mapping  between  and , we define the cost of  as

 C(G,H,M)\coloneqq ∑\mathclap(u,v)∈Md(fG(u),fH(v))+∑\mathclapv∈V∖VMΔG(v)+∑\mathclapv∈W∖WMΔH(v),

where  is the (predefined) cost of “deleting” vertex  from since it is not mapped by  to any vertex in the other vertex set. The value  might for example depend on the vertex signature of .

The vertex-signature-based distance between  and  is then defined as

 D(G,H)\coloneqqminM∈M(V,W)C(G,H,M).

Depending on the application, one might normalize the distance  by some appropriate factor (typically depending on  and , e.g. Jouili and Tabbone [14] normalize by ).

Throughout this work, we assume that vertex signature functions  are computable in polynomial time in the size of  and we assume all metrics  to be polynomial-time computable. We neglect the running times for computing the values of and  (we can actually assume that all vertex signatures are precomputed once in polynomial time).

#### Dynamic Time Warping Distance for Temporal Graphs.

We transfer the concept of dynamic time warping to temporal graphs in the following way. Let and be two temporal graphs and let and be corresponding vertex signature functions.

We define the vertex-signature-based dynamic temporal graph warping distance (dtgw-distance) between and as

 dtgw(G,H)\coloneqqminM∈M(V,W)minp∈PT,U∑(i,j)∈pC(Gi,Hj,M).

Figure 1 depicts an example illustrating the dtgw-distance of two temporal graphs. Note that (for if one fixes , then we get a temporal graph distance without time warping.

The following results are easily observed and play a central role for our subsequent algorithms.

###### Observation 3.1.

Let and be two temporal graphs and let .

1. For a fixed vertex mapping  between  and , can be computed in  time.

2. For a fixed warping path , can be computed in  time.

###### Proof.
1. Let be a vertex mapping. Then, it holds

 dtgw(G,H)=minp∈PT,U∑(i,j)∈pC(Gi,Hj,M).

The right-hand side of the above equation can be computed by a well-known dynamic program for dynamic time warping in  time [21]. Here  is the time required to compute .

2. Let be a fixed warping path. Assume without loss of generality that  and let , where is a set of dummy vertices with . For every , let

 σ(u,v)\coloneqq{∑(i,j)∈pd(fGi(u),fHj(v)),u∈V∑(i,j)∈pΔHj(v),u∈Q.

Then, we have

 dtgw(G,H)=minM∈M(V′,W)∑(u,v)∈Mσ(u,v).

Note that the vertex mapping  defines a bijection between  and . Hence, computing is an assignment problem solvable in  time [1, Theorem 12.2]. Computing all values  can be done in  time.

Note that creftypecap 3.1 implies that if we already know the vertex mapping up to a constant number of vertices, then can be computed in polynomial time (since we can try out all polynomially many possible vertex mappings).

For given vertex signature function and metric, we refer to the decision problem of testing whether two temporal graphs have dynamic temporal graph warping distance at most some given  by DTGW.

Dynamic Temporal Graph Warping (DTGW)

Input: Two temporal graphs G and H, c∈Q. Is dtgw(G,H)≤c?

By creftype 3.1, DTGW is polynomial-time solvable if one temporal graph has a constant lifetime or a constant number of vertices since there are only polynomially many possible warping paths or polynomially many vertex mappings.

## 4 Computational Hardness

Even though the dynamic time warping distance and the vertex-signature-based graph distance are both computable in polynomial time, their combined application to temporal graphs yields a distance measure that is generally NP-hard to compute:

###### Theorem 4.1.

DTGW is NP-complete for every metric when the vertex signatures are vertex degrees.

###### Proof.

DTGW is clearly contained in NP since for a given vertex mapping and warping path (both having polynomial size), one can check in polynomial time whether the -distance is at most  (also see creftypecap 3.1).

To show NP-hardness, we give a polynomial-time reduction from -SAT. Let be any metric and let be an instance of -SAT over the variables . Each clause is then a disjunction of three literals and there is a function such that holds for all . We may assume .

Our idea is to represent each literal by a vertex which can be mapped to either (true) or (false). We then build, for each clause, a clause box gadget consisting of three consecutive layers. The choice of warping path will then, for each clause, implicitly select one of its literals and the costs caused by each clause box will attain their minimum value if and only if that particular literal is mapped to .

Henceforth the details. Let and be two copies of the graph (consisting of disjoint edges), where for each vertex we denote its copy in by . We construct two temporal graphs and . Their vertex sets each contain the following vertices.

 V(G) \coloneqq{\setargsxi,¯¯¯¯¯xi;i∈[n]}∪{\setargsC1j,C2j,C3j;j∈[m]}∪{\setargsXi,Yi;i∈[4]}∪V(D), V(H) \coloneqq{\setargs⊤i,⊥i;i∈[n]}∪{\setargsC′1j,C′2j,C′3j;j∈[m]}∪{\setargsX′i,Y′i;i∈[4]}∪V(D′).

Both temporal graphs have  layers defined as follows. For each , we set

 E(G2i−1) \coloneqq{{xi,¯¯¯¯¯xi}}, E(H2i−1) \coloneqq{{⊤i,⊥i}}, E(G2i) \coloneqqE(D), E(H2i) \coloneqqE(D′).

For , we set

 E(G2n+4j−3) \coloneqq{\setargs{\setargsXi,Yi};i∈[4]}, E(G2n+4j−2) \coloneqq{\setargs{Cij,ℓij};i∈[3]}, E(G2n+4j−1) \coloneqq{\setargs{\setargsXi,Yi};i∈[4]}, E(G2n+4j) \coloneqqE(D),

and

 E(H2n+4j−3)\coloneqq {\setargs{\setargsC′1j,⊤ν(j,1)},{\setargs⊤ν(j,2),⊥ν(j,2)},{\setargs⊤ν(j,3),⊥ν(j,3)},{\setargsC′2j,C′3j}}, E(H2n+4j−2)\coloneqq {\setargs{\setargsC′2j,⊤ν(j,2)},{\setargs⊤ν(j,1),⊥ν(j,1)},{\setargs⊤ν(j,3),⊥ν(j,3)},{\setargsC′1j,C′3j}} ∪{\setargs{\setargsX′i,Y′i};i∈[4]}, E(H2n+4j−1)\coloneqq {\setargs{\setargsC′3j,⊤ν(j,3)},{\setargs⊤ν(j,1),⊥ν(j,1)},{\setargs⊤ν(j,2),⊥ν(j,2)},{\setargsC′1j,C′2j}}, E(H2n+4j)\coloneqq E(D′).

Finally, for , we set

 E(G2n+4m+j) :={\setargs{\setargsXk,Yk};k∈[4]}, E(H2n+4m+j) :={\setargs{\setargsX′k,Y′k};k∈[4]}.

We call the layers containing edges separation layers. Furthermore, for each we say that the layers , , and form the clause block corresponding to (see Fig. 2 for an example).

Let . We claim that if and only if has a satisfying assignment.

”: Given a satisfying assignment of , we define the following vertex mapping

 M:= {\setargs(xi,⊤i),(¯¯¯¯¯xi,⊥i);β(xi)=true} ∪{\setargs(xi,⊥i),(¯¯¯¯¯xi,⊤i);β(xi)=false} ∪{\setargs(Cij,C′ij);i∈[3],j∈[m]} ∪{\setargs(Xi,X′i),(Yi,Y′i);i∈[4]} ∪{\setargs(v,v′);v∈V(D)}.

To construct a warping path, we begin by defining, for each , the following three sub-paths (see also Fig. 3):

 π1j :={(2n+4j−2,2n+4j−3),(2n+4j−1,2n+4j−2)}, π2j :={(2n+4j−2,2n+4j−2)}, π3j :={(2n+4j−3,2n+4j−2),(2n+4j−2,2n+4j−1)}.

For each clause , pick such that is true. We then build the warping path  as the union of all , using the trivial warping path for all remaining layers:

 p:={\setargs(i,i);i∈[2n+4m+22m]∖{\setargs2n+4j−2;j∈[m]}}∪⋃j∈[m]πkjj.

It is then not difficult to calculate that each clause block adds cost of exactly and there are no other costs. Thus .

”: Now suppose that and let be a pair of vertex mapping and warping path with cost . Note that any non-separation layer contains at most eight edges. So if warps any separation layer to any non-separation layer, then the resulting cost would be at least . Thus, we may assume that every separation layer  of  is only warped to layer  of  and vice versa. Since the last layers of each temporal graph are all identical and are chosen to have minimal cost, we can conclude that

 p⊃{\setargs(i,i);i∈[2n+4m+22m]∖{\setargs2n+4j−2;j∈[m]}}.

If  maps some vertex from to some vertex that is not in , then the layers each would cause cost of at least , thus exceeding  in total. Therefore, has to contain a bijection from to .

Now, consider the clause block corresponding to . From the arguments above, it follows that and are warped to and respectively. This already costs . We distinguish three cases (corresponding to through above):

1. is warped to . This causes costs of at least . Then, must be warped to or would not have minimal cost. Thus, there are additional costs of at least . This is the situation illustrated in Fig. 2(a).

2. is warped to . This causes costs of at least . This is the situation illustrated in Fig. 2(b).

3. is warped to . This case is symmetrical to (1) and also causes costs of at least . This is the situation illustrated in Fig. 2(c).

In summary, the costs contributed by each clause block are at least . Therefore, to meet the bound of , all layers outside of clause blocks must not cause any additional cost. For each , since is warped to , this implies that either or .

Furthermore, for each , the clause block corresponding to must have costs of exactly . If we are in Case (1) as above, then this is only possible if  maps each degree-1 vertex of  to some degree-1 vertex of . Thus, . Otherwise, if we are in Case (2) respectively Case (3), then analogous arguments yield that respectively . Hence, in any case there is some for which .

Consequently,

 β(xi)\coloneqq{true,if (xi,⊤i)∈Mfalse,if (¯¯¯¯¯xi,⊤i)∈M

is a satisfying assignment for . ∎

Let us take a closer look at the reduction in the proof of Theorem 4.1. Note that the corresponding optimal warping path is always close to the diagonal (that is, holds for every pair ). Hence, it lies within the so-called Sakoe-Chiba band [21] of width one. Moreover, the maximum degree in each layer is one. Finally, the number of vertices and the number of layers of both temporal graphs and the target cost  are all upper-bounded linearly in the size of the -SAT formula, which allows to conclude a running time lower bound based on the Exponential Time Hypothesis333The Exponential Time Hypothesis asserts that -SAT cannot be solved in subexponential time, that is, there is no -time algorithm, where  is the number of variables and  is the number of clauses of the input formula. [10] (together with the Sparsification Lemma [11]). These observations are summarized in the following corollary.

###### Corollary 4.2.

DTGW is NP-complete for every metric and vertex degrees as vertex signatures even when the maximum degree of each layer is one and the warping path is restricted to the Sakoe-Chiba band of width one.

Moreover, this case cannot be solved in time unless the Exponential Time Hypothesis fails.

Due to the intrinsic hardness of DTGW, there is little hope to solve the general problem efficiently. In the following section, however, we point out two polynomial-time solvable special cases. Furthermore, we develop a mathematical programming formulation as well as a heuristic approach to compute the -distance in practice.

## 5 Algorithms

Our first algorithmic result is to show that determining whether two temporal graphs with the same number of vertices have -distance zero is possible in polynomial time. In contrast, determining whether two (static) graphs have graph edit distance zero is not known to be polynomial-time solvable (as this is equivalent to the famous Graph Isomorphism problem).

###### Theorem 5.1.

Let  and  be two temporal graphs with . For all vertex signatures and all metrics, deciding whether holds is possible in time.

###### Proof.

We will show that for distance zero, an optimal warping path can easily be determined. Polynomial-time solvability then follows from creftype 3.1.

Let  and  be two temporal graphs with  and . For each , we define the th layer signature of  as  (analogously, for ). Assuming , it follows that there exists a vertex mapping  and a warping path  such that

 ∑(u,v)∈Md(fGi(u),fHj(v))=0

holds for every . Since  is a metric, this implies that  holds for every . That is, is a permutation (determined by ) of . Let be the indices such that

 f(Gi)≠f(Gi+1)⟺i∈{\setargsik;k∈[q]}

and let  be the indices such that

 f(Hj)≠f(Hj+1)⟺j∈{\setargsjk;k∈[r]}.

Clearly, if  and layer  is warped to layer  and layer  is warped to layer , then since otherwise the cost will not be zero. By the definition of a warping path, it follows that the layers  of  can only be warped to layers  of  and the layers  of  can only be warped to layers  of  and so on. Note that this is only possible if . If this is the case, then we can assume that the warping path  has the following form:

 p={ (1,1),(1,2),…,(1,j1),(2,j1),…(i1,j1), (i1+1,j1+1),…,(i1+1,j2),…(i2,j2), …, (iq+1,jq+1),…,(iq+1,U),…,(T,U)}.

By creftype 3.1, we can now check whether there exists a vertex mapping that yields distance zero for the warping path  in time. Computing  can be done in  time. ∎

We remark that if the vertex signatures and the metric satisfy the property that every pair of different vertex signatures has distance at least  for some constant , then DTGW parameterized by  is in XP. For example, this is the case when the vertex signatures contain only integers and  is any -norm (for ). Then, every pair of different signatures has distance at least . The idea of the algorithm is to “guess” the tuples of a warping path which cause non-zero cost (at most many) and to check whether it is possible to complete the warping path without further costs. The latter can be done in polynomial time using similar arguments as for the case  (Theorem 5.1).

In contrast, if the dtgw-distance is normalized (e.g. divided by the number of vertices), then the differences between vertex signatures can be arbitrarily small. In that case, DTGW is NP-complete even for a constant value of  (by the same reduction as in the proof of Theorem 4.1).

To overcome this hardness, in the following, we consider parameters regarding the warping path length. We assume that the lifetimes of the inputs differ by at most a constant, that is, for some (which might often be the case in practice). Note that, by definition, every warping path of order  has length at least . We define the parameter  to be the difference between the warping path length and the lower bound , that is, we consider only order- warping paths of length at most  (in practice, long warping paths are often considered unnatural). We prove that DTGW is in XP with respect to the combined parameter .

###### Proposition 5.2.

For all vertex signatures and all metrics, DTGW is solvable in

 O((T+λ)λ⋅Tλ+t(n2⋅(T+λ)+n3))

time if , , and the warping paths have length at most .

###### Proof.

Let  and  be two temporal graphs and let  be a warping path. The warping path  contains  steps for . We call a step  horizontal if , and we call it vertical if , and otherwise we call it diagonal. Let  denote the number of vertical steps in . Then,  contains also horizontal and diagonal steps, that is, , which implies that . Clearly, there are possible positions for the vertical steps. For each of these possible choices, there are again possible positions for horizontal steps (the remaining steps are diagonal). Therefore, the overall number of warping paths of length at most  is

 λ∑l=0(T+l−1l)(T−1l+t)∈O((T+λ)λ⋅Tλ+t).

For each of these possible warping paths, we can compute in  time by creftype 3.1. ∎

Note that Proposition 5.2 implies polynomial-time solvability of DTGW if  and  are constants. For unbounded , however, we conjecture that DTGW is NP-hard even if the warping paths are restricted to have length , which is the minimum possible length (that is, ). The idea is to modify the reduction in the proof of Theorem 4.1 by adding some appropriate layers to one of the temporal graphs.

We give a formalization of DTGW as a quadratic minimization problem with linear constraints (QP). This can be used to solve relatively small instances exactly with state-of-the-art QP-solvers.

Let and be two temporal graphs. Denote the vertices in  by and the vertices in  by . To model “vertex deletion”, we add two artificial vertices .

We define the following variables:

• For every , we have a vertex mapping variable , where if and only if vertex  is mapped to vertex .

• For every , we have a warping variable , where if and only if  is warped to .

Moreover, for every , let

denote the cost of matching vertex in layer to vertex in layer .

Then, computing  is the following quadratic444 It is also possible to convert our formulation into a linear problem by introducing further variables and constraints for replacing the product in the objective. However, we found the quadratic formulation to be more efficient in practice. minimization problem. equationparentequation

 minimize \mathrlap∑s∈[T]∑t∈[U]∑i∈[|V|+1]∑j∈[|W|+1]ds,t,i,j⋅ws,t⋅mi,j (1) subject to ∑j∈[|W|+1]mi,j =1 ∀i∈[|V|] (1a) ∑i∈[|V|+1]mi,j =1 ∀j∈[|W|] (1b) w1,1 =1 (1c) ws,t ≤\mathrlapws+1,t+1+ws,t+1+ws+1,t ∀(s,t