A fundamental concept for pattern recognition is the concept of (dis)similarity between objects.
For objects that are represented by numerical feature vectors, there exist a lot of well-known
(dis)similarity functions such as
A fundamental concept for pattern recognition is the concept of (dis)similarity between objects. For objects that are represented by numerical feature vectors, there exist a lot of well-known (dis)similarity functions such as-norms or positive semi-definite kernels.
In structural pattern recognition, objects are often more naturally represented by complex (discrete) data structures such as graphs, strings or time series. For these representations, one can often not simply use vector-based (dis)similarity measures. Instead, one needs to define suitable domain-specific (dis)similarity functions such as the edit distance on graphs and strings or the dynamic time warping distance on time series.
The majority of graph (dis)similarity functions, focuses on static graphs such as the graph edit distance , graph kernels , and geometric graph distances . However, many complex systems are not static as the links between entities dynamically change over time. Such temporal networks can be represented by a series of temporal edges between a fixed set of vertices. Examples are face-to-face proximity networks, flight traffic networks, temporal attack networks in computer security, or protein-protein-interaction networks in biology [15, 9, 17]. Thus, there is a steadily growing research interest in analyzing temporal networks . In order to perform data mining tasks such as classification or clustering on temporal networks, one needs to find suitable (dis)similarity functions.
We introduce a novel (dis)similarity measure on temporal graphs based on dynamic time warping, called dynamic temporal graph warping. Thus, by combining established methods from graph-based pattern recognition and time series data mining in a nontrivial way, we obtain a suitable tool to analyze temporal network data. Beyond that, we study its computational complexity, develop efficient algorithms and study their behavior on real-world data, the latter confirming the practical usefulness.
There are numerous approaches to define graph (dis)similarity measures. A well-known example is the (NP-hard) graph edit distance . Graph kernels (many of which are polynomial-time computable) are another well-studied class [6, 8]. Measuring graph distance based on vertex mappings using local vertex signatures was introduced by Jouili and Tabbone . The idea of using vertex mappings can also be found in optimal assignment kernels [5, 16, 2]. Regarding (dis)similarity measures on temporal graphs, seemingly little work has been done so far. Elhesha et al.  recently described an approach based on vertex mappings. Their method, however, does not allow for a flexible alignment between time layers.
We define the dynamic temporal graph warping distance as a twofold discrete minimization problem involving computation of an optimal vertex mapping and an optimal warping path (see Section 3). We prove that it is NP-hard to solve in general (Theorem 4.1). In contrast, we point out several polynomial-time solvable special cases. Namely, the case when either a vertex mapping or a warping path is fixed (creftypecap 3.1), the case of deciding whether the dtgw-distance is zero (Theorem 5.1), and the case when the lifetimes of the two temporal graphs differ only by a constant and the warping path length is restricted (Proposition 5.2). Moreover, we give a quadratic programming formulation (Section 5.1) and propose an efficient heuristic approach (Section 5.2).
We evaluate the heuristic in some experiments on real-world data to show its efficiency and quality of solution (Section 6).
For , we define . For a set , we denote the set of all size- subsets of by .
A temporal graph consists of a vertex set and a sequence of edge sets . By , we denote the th layer of and we call the lifetime of . The underlying graph of is the graph . We remark that all definitions and results in this work can easily be extended to labeled temporal graphs (with vertex and/or edge labels).
A vertex mapping between two vertex sets and is a set containing tuples such that for all it holds that . We denote the set of all vertex mappings between and by . Let be the subset of vertices in that are contained in some tuple of ( is defined analogously).
The assignment problem is a fundamental problem in combinatorial optimization. Given two sets is minimized.
It is well known that the assignment problem can be described as an integer linear program and is solvable in
is a fundamental problem in combinatorial optimization. Given two setsand of equal size and a cost function , the goal is to find a bijection such that
is minimized. It is well known that the assignment problem can be described as an integer linear program and is solvable intime [1, Theorem 12.2].
Dynamic Time Warping.
The dynamic time warping distance  is a distance between time series. It is based on the concept of a warping path. A warping path of order is a set of pairs such that
and , and
for all .
We denote the set of all warping paths of order by . For two temporal graphs , , every order- warping path defines a warping between and , that is, a pair warps the layer to .
We assume the reader to be familiar with basic concepts of computational complexity theory such as NP-completeness. In parameterized complexity theory  one considers running times with respect to two dimensions. One dimension is the size of the input instance and the other dimensions is a parameter (usually a numerical value). An instance of a parameterized problem is a pair . The class FPT contains all fixed-parameter tractabale parameterized problems, that is, they can be solved in time for some computable function only depending on . The class XP contains all parameterized problems that can be solved in polynomial time for every constant parameter value, that is, in time (clearly, ).
3 Dynamic Temporal Graph Warping (DTGW)
In this section we define a temporal graph distance based on dynamic time warping using a vertex-signature-based graph distance as local cost function. We choose this graph distance for the following reasons. First, it is computationally tractable (in comparison to the NP-hard graph edit distance). Second, it is based on a mapping between the two vertex sets (possibly of different size) which might be reasonable in many temporal network applications since this allows to enforce a consistency over time. Third, vertex signatures allow for a high flexibility since they can be chosen arbitrarily (as can the metric) in order to incorporate essential information for the application at hand (e.g. they can be used for weighted or labeled temporal graphs).
Graph Distance Based on Vertex Signatures.
The following approach is due to Jouili and Tabbone . For a (static) graph , a vertex signature function encodes (local) information about a vertex (e.g. its degree). Let be a metric.
For two (static) graphs and with vertex signatures and and a given vertex mapping between and , we define the cost of as
where is the (predefined) cost of “deleting” vertex from since it is not mapped by to any vertex in the other vertex set. The value might for example depend on the vertex signature of .
The vertex-signature-based distance between and is then defined as
Depending on the application, one might normalize the distance by some appropriate factor (typically depending on and , e.g. Jouili and Tabbone  normalize by ).
Throughout this work, we assume that vertex signature functions are computable in polynomial time in the size of and we assume all metrics to be polynomial-time computable. We neglect the running times for computing the values of and (we can actually assume that all vertex signatures are precomputed once in polynomial time).
Dynamic Time Warping Distance for Temporal Graphs.
We transfer the concept of dynamic time warping to temporal graphs in the following way. Let and be two temporal graphs and let and be corresponding vertex signature functions.
We define the vertex-signature-based dynamic temporal graph warping distance (dtgw-distance) between and as
Figure 1 depicts an example illustrating the dtgw-distance of two temporal graphs. Note that (for if one fixes , then we get a temporal graph distance without time warping.
The following results are easily observed and play a central role for our subsequent algorithms.
Let and be two temporal graphs and let .
For a fixed vertex mapping between and , can be computed in time.
For a fixed warping path , can be computed in time.
Let be a vertex mapping. Then, it holds
The right-hand side of the above equation can be computed by a well-known dynamic program for dynamic time warping in time . Here is the time required to compute .
Let be a fixed warping path. Assume without loss of generality that and let , where is a set of dummy vertices with . For every , let
Then, we have
Note that the vertex mapping defines a bijection between and . Hence, computing is an assignment problem solvable in time [1, Theorem 12.2]. Computing all values can be done in time.
Note that creftypecap 3.1 implies that if we already know the vertex mapping up to a constant number of vertices, then can be computed in polynomial time (since we can try out all polynomially many possible vertex mappings).
For given vertex signature function and metric, we refer to the decision problem of testing whether two temporal graphs have dynamic temporal graph warping distance at most some given by DTGW.
Dynamic Temporal Graph Warping (DTGW)
|Input:||Two temporal graphs and , .|
By creftype 3.1, DTGW is polynomial-time solvable if one temporal graph has a constant lifetime or a constant number of vertices since there are only polynomially many possible warping paths or polynomially many vertex mappings.
4 Computational Hardness
Even though the dynamic time warping distance and the vertex-signature-based graph distance are both computable in polynomial time, their combined application to temporal graphs yields a distance measure that is generally NP-hard to compute:
DTGW is NP-complete for every metric when the vertex signatures are vertex degrees.
DTGW is clearly contained in NP since for a given vertex mapping and warping path (both having polynomial size), one can check in polynomial time whether the -distance is at most (also see creftypecap 3.1).
To show NP-hardness, we give a polynomial-time reduction from -SAT. Let be any metric and let be an instance of -SAT over the variables . Each clause is then a disjunction of three literals and there is a function such that holds for all . We may assume .
Our idea is to represent each literal by a vertex which can be mapped to either (true) or (false). We then build, for each clause, a clause box gadget consisting of three consecutive layers. The choice of warping path will then, for each clause, implicitly select one of its literals and the costs caused by each clause box will attain their minimum value if and only if that particular literal is mapped to .
Henceforth the details. Let and be two copies of the graph (consisting of disjoint edges), where for each vertex we denote its copy in by . We construct two temporal graphs and . Their vertex sets each contain the following vertices.
Both temporal graphs have layers defined as follows. For each , we set
For , we set
Finally, for , we set
We call the layers containing edges separation layers. Furthermore, for each we say that the layers , , and form the clause block corresponding to (see Fig. 2 for an example).
Let . We claim that if and only if has a satisfying assignment.
“”: Given a satisfying assignment of , we define the following vertex mapping
To construct a warping path, we begin by defining, for each , the following three sub-paths (see also Fig. 3):
For each clause , pick such that is true. We then build the warping path as the union of all , using the trivial warping path for all remaining layers:
It is then not difficult to calculate that each clause block adds cost of exactly and there are no other costs. Thus .
“”: Now suppose that and let be a pair of vertex mapping and warping path with cost . Note that any non-separation layer contains at most eight edges. So if warps any separation layer to any non-separation layer, then the resulting cost would be at least . Thus, we may assume that every separation layer of is only warped to layer of and vice versa. Since the last layers of each temporal graph are all identical and are chosen to have minimal cost, we can conclude that
If maps some vertex from to some vertex that is not in , then the layers each would cause cost of at least , thus exceeding in total. Therefore, has to contain a bijection from to .
Now, consider the clause block corresponding to . From the arguments above, it follows that and are warped to and respectively. This already costs . We distinguish three cases (corresponding to through above):
is warped to . This causes costs of at least . Then, must be warped to or would not have minimal cost. Thus, there are additional costs of at least . This is the situation illustrated in Fig. 2(a).
is warped to . This causes costs of at least . This is the situation illustrated in Fig. 2(b).
In summary, the costs contributed by each clause block are at least . Therefore, to meet the bound of , all layers outside of clause blocks must not cause any additional cost. For each , since is warped to , this implies that either or .
Furthermore, for each , the clause block corresponding to must have costs of exactly . If we are in Case (1) as above, then this is only possible if maps each degree-1 vertex of to some degree-1 vertex of . Thus, . Otherwise, if we are in Case (2) respectively Case (3), then analogous arguments yield that respectively . Hence, in any case there is some for which .
is a satisfying assignment for . ∎
Let us take a closer look at the reduction in the proof of Theorem 4.1. Note that the corresponding optimal warping path is always close to the diagonal (that is, holds for every pair ). Hence, it lies within the so-called Sakoe-Chiba band  of width one. Moreover, the maximum degree in each layer is one. Finally, the number of vertices and the number of layers of both temporal graphs and the target cost are all upper-bounded linearly in the size of the -SAT formula, which allows to conclude a running time lower bound based on the Exponential Time Hypothesis333The Exponential Time Hypothesis asserts that -SAT cannot be solved in subexponential time, that is, there is no -time algorithm, where is the number of variables and is the number of clauses of the input formula.  (together with the Sparsification Lemma ). These observations are summarized in the following corollary.
DTGW is NP-complete for every metric and vertex degrees as vertex signatures even when the maximum degree of each layer is one and the warping path is restricted to the Sakoe-Chiba band of width one.
Moreover, this case cannot be solved in time unless the Exponential Time Hypothesis fails.
Due to the intrinsic hardness of DTGW, there is little hope to solve the general problem efficiently. In the following section, however, we point out two polynomial-time solvable special cases. Furthermore, we develop a mathematical programming formulation as well as a heuristic approach to compute the -distance in practice.
Our first algorithmic result is to show that determining whether two temporal graphs with the same number of vertices have -distance zero is possible in polynomial time. In contrast, determining whether two (static) graphs have graph edit distance zero is not known to be polynomial-time solvable (as this is equivalent to the famous Graph Isomorphism problem).
Let and be two temporal graphs with . For all vertex signatures and all metrics, deciding whether holds is possible in time.
We will show that for distance zero, an optimal warping path can easily be determined. Polynomial-time solvability then follows from creftype 3.1.
Let and be two temporal graphs with and . For each , we define the th layer signature of as (analogously, for ). Assuming , it follows that there exists a vertex mapping and a warping path such that
holds for every . Since is a metric, this implies that holds for every . That is, is a permutation (determined by ) of . Let be the indices such that
and let be the indices such that
Clearly, if and layer is warped to layer and layer is warped to layer , then since otherwise the cost will not be zero. By the definition of a warping path, it follows that the layers of can only be warped to layers of and the layers of can only be warped to layers of and so on. Note that this is only possible if . If this is the case, then we can assume that the warping path has the following form:
By creftype 3.1, we can now check whether there exists a vertex mapping that yields distance zero for the warping path in time. Computing can be done in time. ∎
We remark that if the vertex signatures and the metric satisfy the property that every pair of different vertex signatures has distance at least for some constant , then DTGW parameterized by is in XP. For example, this is the case when the vertex signatures contain only integers and is any -norm (for ). Then, every pair of different signatures has distance at least . The idea of the algorithm is to “guess” the tuples of a warping path which cause non-zero cost (at most many) and to check whether it is possible to complete the warping path without further costs. The latter can be done in polynomial time using similar arguments as for the case (Theorem 5.1).
In contrast, if the dtgw-distance is normalized (e.g. divided by the number of vertices), then the differences between vertex signatures can be arbitrarily small. In that case, DTGW is NP-complete even for a constant value of (by the same reduction as in the proof of Theorem 4.1).
To overcome this hardness, in the following, we consider parameters regarding the warping path length. We assume that the lifetimes of the inputs differ by at most a constant, that is, for some (which might often be the case in practice). Note that, by definition, every warping path of order has length at least . We define the parameter to be the difference between the warping path length and the lower bound , that is, we consider only order- warping paths of length at most (in practice, long warping paths are often considered unnatural). We prove that DTGW is in XP with respect to the combined parameter .
For all vertex signatures and all metrics, DTGW is solvable in
time if , , and the warping paths have length at most .
Let and be two temporal graphs and let be a warping path. The warping path contains steps for . We call a step horizontal if , and we call it vertical if , and otherwise we call it diagonal. Let denote the number of vertical steps in . Then, contains also horizontal and diagonal steps, that is, , which implies that . Clearly, there are possible positions for the vertical steps. For each of these possible choices, there are again possible positions for horizontal steps (the remaining steps are diagonal). Therefore, the overall number of warping paths of length at most is
For each of these possible warping paths, we can compute in time by creftype 3.1. ∎
Note that Proposition 5.2 implies polynomial-time solvability of DTGW if and are constants. For unbounded , however, we conjecture that DTGW is NP-hard even if the warping paths are restricted to have length , which is the minimum possible length (that is, ). The idea is to modify the reduction in the proof of Theorem 4.1 by adding some appropriate layers to one of the temporal graphs.
5.1 Quadratic Programming
We give a formalization of DTGW as a quadratic minimization problem with linear constraints (QP). This can be used to solve relatively small instances exactly with state-of-the-art QP-solvers.
Let and be two temporal graphs. Denote the vertices in by and the vertices in by . To model “vertex deletion”, we add two artificial vertices .
We define the following variables:
For every , we have a vertex mapping variable , where if and only if vertex is mapped to vertex .
For every , we have a warping variable , where if and only if is warped to .
Moreover, for every , let
denote the cost of matching vertex in layer to vertex in layer .
Then, computing is the following quadratic444 It is also possible to convert our formulation into a linear problem by introducing further variables and constraints for replacing the product in the objective. However, we found the quadratic formulation to be more efficient in practice. minimization problem. equationparentequation