Graphs appear in many applications as abstraction of real-world phenomena, where vertices represent certain objects and edges represent their relations. Rather than being stationary, graph data obtained in applications usually change with respect to some parameter such as time. A summary of these changes in a quantifiable manner can help gain insight into the data. Persistent homology [4, 12] is a suitable tool for this goal because it quantifies the life span of topological features as the graph changes. One drawback of using standard non-zigzag persistence  is that it only allows addition of vertices and edges during the change, whereas deletion may also happen in practice. For example, many complex systems such as social networks, food webs, or disease spreading are modeled by the so-called “dynamic networks” [17, 18, 24], where vertices and edges can appear and disappear at different time. A variant of the standard persistence called zigzag persistence  is thus a more natural tool in such scenarios because simplices can be both added and deleted. Given a sequence of graphs possibly with additions and deletions (formally called a zigzag filtration), zigzag persistence produces a set of intervals termed as zigzag barcode in which each interval registers the birth and death time of a homological feature. Figure 1 gives an example of a graph sequence in which clusters may split (birth of 0-dimensional features) or vanish/merge (death of 0-dimensional features). Moreover, addition of edges within the clusters creates 1-dimensional cycles and deletion of edges makes some cycles disappear. These births and deaths are captured by zigzag persistence.
Algorithms for both zigzag and non-zigzag persistence have a general-case time complexity of [3, 12, 19, 20], where is the length of the input filtration and is the matrix multiplication exponent . For the special case of graph filtrations, it is well known that non-zigzag persistence can be computed in time, where is the inverse Ackermann’s function that is almost constant for all practical purposes . However, analogous faster algorithms for zigzag persistence on graphs are not known. In this paper, we present algorithms for zigzag persistence on graphs with near-linear time complexity. In particular, given a zigzag filtration of length for a graph with vertices and edges, our algorithm for 0-dimension runs in time, and our algorithm for -dimension runs in time. Observe that the algorithm for -dimension works for arbitrary complexes by restricting to the -skeletons.
The difficulty in designing faster zigzag persistence algorithms for the special case of graphs lies in the deletion of vertices and edges. For example, besides merging into bigger ones, connected components can also split into smaller ones because of edge deletion. Therefore, one cannot simply kill the younger component during merging as in standard persistence , but rather has to pair the merge and departure events with the split and entrance events (see Sections 3 for details). Similarly, in dimension one, deletion of edges may kill 1-cycles so that one has to properly pair the creation and destruction of 1-cycles, instead of simply treating all 1-dimensional intervals as infinite ones.
Our solutions are as follows: in dimension zero, we find that the algorithm by Agarwal et al.  originally designed for pairing critical points of Morse functions on 2-manifolds can be utilized in our scenario. We formally prove the correctness of applying the algorithm and use a dynamic connectivity data structure  to achieve the claimed complexity. In dimension one, we observe that a positive and a negative edge can be paired by finding the earliest 1-cycle containing both edges which resides in all intermediate graphs. We further reduce the pairing to finding the max edge-weight of a path in a minimum spanning forest. Utilizing a data structure for dynamic minimum spanning forest , we achieve the claimed time complexity. Section 4 details this algorithm.
Using Alexander duality, we also extend the algorithm for -dimension to compute -dimensional zigzag for -embedded complexes. The connection between these two cases for non-zigzag persistence is well known [11, 23], and the challenge comes in adopting this duality to the zigzag setting while maintaining an efficient time budget. With the help of a dual filtration and an observation about faster void boundary reconstruction for -connected complexes , we achieve a time complexity of .
The algorithm for computing persistent homology by Edelsbrunner et al.  is a cornerstone of topological data analysis. Several extensions followed after this initial development. De Silva et al.  proposed to compute persistent cohomology instead of homology which gives the same barcode. De Silva et al.  then showed that the persistent cohomology algorithm runs faster in practice than the version that uses homology. The annotation technique proposed by Dey et al.  implements the cohomology algorithm by maintaining a cohomology basis more succinctly and extends to towers connected by simplicial maps. These algorithms run in time.
Carlsson and de Silva 
introduced zigzag persistence as an extension of the standard persistence, where they also presented a decomposition algorithm for computing zigzag barcodes on the level of vector spaces and linear maps. This algorithm is then adapted to zigzag filtrations at simplicial level by Carlsson et al. with a time complexity of . Both algorithms [4, 3] utilize a construct called right filtration and a birth-time vector. Maria and Oudot  proposed an algorithm for zigzag persistence based on some diamond principles where an inverse non-zigzag filtration is always maintained during the process. The algorithm in  is shown to run faster in experiments than the algorithm in  though the time complexities remain the same. Milosavljević et al.  proposed an algorithm for zigzag persistence based on matrix multiplication which runs in time, giving the best asymptotic bound for computing zigzag and non-zigzag persistence in general dimensions.
The algorithms reviewed so far are all for general dimensions and many of them are based on matrix operations. Thus, it is not surprising that the best time bound achieved is given that computing Betti numbers for a simplicial -complex of size is as hard as computing the rank of a -matrix with non-zero entries as shown by Edelsbrunner and Parsa . To lower the complexity, one strategy (which is adopted by this paper) is to consider special cases where matrix operations can be avoided. The work by Dey 
is probably most related to ours in that regard, who proposed analgorithm for non-zigzag persistence induced from height functions on -embedded complexes.
A zigzag module (or module for short) is a sequence of vector spaces
in which each is either a forward linear map or a backward linear map . We assume vector spaces are over field in this paper. A module of the form
is called a submodule of if each is a subspace of and each is the restriction of . For an interval , is called an interval submodule of over if is one-dimensional for and is trivial for , and is an isomorphism for . It is well known  that admits an interval decomposition which is a direct sum of interval submodules of . The (multi-)set of intervals is called the zigzag barcode (or barcode for short) of and is denoted as . Each interval in a zigzag barcode is called a persistence interval.
In this paper, we mainly focus on a special type of zigzag modules:
Definition 1 (Elementary zigzag module).
A zigzag module is called elementary if it starts with the trivial vector space and all linear maps in the module are of the three forms: (i) an isomorphism; (ii) an injection with rank 1 cokernel; (iii) a surjection with rank 1 kernel.
A zigzag filtration (or filtration for short) is a sequence of simplicial complexes
in which each is either a forward inclusion with a single simplex added, or a backward inclusion with a single deleted. When the ’s are not explicitly used, we drop them and simply denote as . For computational purposes, we sometimes assume that a filtration starts with the empty complex, i.e., in . Throughout the paper, we also assume that each in is a subcomplex of a fixed complex ; such a , when not given, can be constructed by taking the union of every in . In this case, we call a filtration of .
Applying the -th homology with coefficients on , we derive the -th zigzag module of
in which each is the linear map induced by the inclusion. In this paper, whenever is used to denote a filtration, we use to denote a linear map in the module . Note that is an elementary module if starts with an empty complex. Specifically, we call the -th zigzag barcode of .
3 Zero-dimensional zigzag persistence
We present our algorithm for 0-th zigzag persistence111For brevity, henceforth we call -dimensional zigzag persistence as -th zigzag persistence. in this section. The input is assumed to be on graphs but note that our algorithm can be applied to any complex by restricting to its 1-skeleton. We first define the barcode graph of a zigzag filtration which is a construct that our algorithm implicitly works on. In a barcode graph, nodes correspond to connected components of graphs in the filtration and edges encode the mapping between the components:
Definition 2 (Barcode graph).
For a graph and a zigzag filtration of , the barcode graph of is a graph whose vertices (preferably called nodes) are associated with a level and whose edges connect nodes only at adjacent levels. The graph is constructively described as follows:
For each in and each connected component of , there is a node in at level corresponding to this component; this node is also called a level- node.
For each inclusion in , if it is forward, then there is an edge connecting a level- node to a level- node if and only if the component of maps to the component of by the inclusion. Similarly, if the inclusion is backward, then connects to by an edge iff the component of maps to the component of .
For two nodes at different levels in , the node at the higher (resp. lower) level is said to be higher (resp. lower) than the other.
Figure 1(a) and 1(b) give an example of a zigzag filtration and its barcode graph. Note that a barcode graph is of size , where is the length of and is the number of vertices and edges of . Although we present our algorithm (Algorithm 1) by first building the barcode graph, the implementation does not do so explicitly, allowing us to achieve the claimed time complexity; see Section 3.1 for the implementation details. Introducing barcode graphs helps us justify the algorithm, and more importantly, points to the fact that the algorithm can be applied whenever such a barcode graph can be built.
Algorithm 1 (Algorithm for 0-th zigzag persistence).
Given a graph and a zigzag filtration of , we first build the barcode graph , and then apply the pairing algorithm described in  on to compute . For a better understanding, we rephrase this algorithm which originally works on Reeb graphs:
The algorithm iterates for and maintains a barcode forest , whose leaves have a one-to-one correspondence to level- nodes of . Like the barcode graph, each tree node in a barcode forest is associated with a level and each tree edge connects nodes at adjacent levels. For each tree in a barcode forest, the lowest node is the root. Initially, is empty; then, the algorithm builds from in the -th iteration. Intervals for are produced while updating the barcode forest. (Figure 1(c) illustrates such updates.)
Specifically, the -th iteration proceeds as follows: first, is formed by copying the level- nodes of and their connections to the level- nodes, into ; the copying is possible because leaves of and level- nodes of have a one-to-one correspondence; see transitions from to and from to in Figure 1(c). We further change under the following events:
One level- node in , said to be entering, does not connect to any level- node.
One level- node in , said to be splitting, connects to two different level- nodes. For the two events so far, no changes need to be made on .
One level- node in , said to be departing, does not connect to any level- node. If has splitting ancestors (i.e., ancestors which are also splitting nodes), add an interval to , where is the level of the highest splitting ancestor of ; otherwise, add an interval to , where is the level of the root of . We then delete the path from to in .
Two different level- nodes in connect to the same level- node. Tentatively, may now contain a loop and is not a tree. If are in different trees in , add an interval to , where is the level of the higher root of in ; otherwise, add an interval to , where is the level of the highest common ancestor of in . We then glue the two paths from and to their level- ancestors in , after which is guaranteed to be a tree.
If none of the above events happen, no changes are made on .
At the end, for each root in at a level , add an interval to , and for each splitting node in at a level , add an interval to .
Figure 1(c) gives examples of barcode forests constructed by Algorithm 1 for the barcode graph shown in Figure 1(b), where and introduce entering nodes, introduces a splitting node, and introduces a departing node. In , the departure event happens and the dotted path is deleted, producing an interval . In and , the merge event happens and the dotted paths are glued together, producing intervals and . Note that the glued level- nodes are in different trees in and are in the same tree in .
As mentioned, to achieve the claimed time complexity, we do not explicitly build the barcode graph. Instead, we differentiate the different events as follows: inserting (resp. deleting) a vertex in simply corresponds to the entrance (resp. departure) event, whereas inserting (resp. deleting) an edge corresponds to the merge (resp. split) event only when connected components in the graph merge (resp. split).
To keep track of the connectivity of vertices, we use a dynamic connectivity data structure by Holm et al. , which we denote as . Assuming that is the length of and is the number of vertices and edges of , the data structure supports the following operations:
Return the identifier222Since maintains the connectivity information by dynamically updating the spanning forest for the current graph, the identifier of a connected component is indeed the identifier of a tree in the spanning forest. of the connected component of a vertex in time. We denote this subroutine as .
Insert or delete an edge, and possibly update the connectivity information, in amortized time.
We also note the following implementation details:
All vertices of are added to initially and are then never deleted. But we make sure that edges in always equal edges in as the algorithm proceeds so that still records the connectivity of .
At each iteration , we update to form according to the changes of the connected components from to . For this, we maintain a key-value map from connected components of to leaves of the barcode forest, and is initially empty.
In a barcode forest , since the level of a leaf always equals , we only record the level of a non-leaf node. Note that at iteration , a leaf in may uniquely connect to a single leaf in . In this case, we simply let the leaf in automatically become a leaf in ; see Figure 3. The size of a barcode forest is then .
Now we can present the full detail of the implementation. Specifically, for each addition and deletion in , we do the following in each case:
- Adding vertex :
Add an isolated node to the barcode forest and let equal this newly added node.
- Deleting vertex :
Let ; then, is the node in the barcode forest that is departing. Update the barcode forest as described in Algorithm 1.
- Adding edge :
Let , , , and . If , then the no-change event happens; otherwise, the merge event happens. We then add to . For the no-change event, do nothing after this. For the merge event, do the following: glue the paths from and to their ancestors as described in Algorithm 1; attach a new child to the highest glued node; update to be .
- Deleting edge :
Let , and then delete from . If after this, then the no-change event happens but we have to update to be because the identifiers of the connected components in may change after deleting the edge . Otherwise, the split event happens: we attach two new children , to in the barcode forest and set , .
Following the idea in , the barcode forest can be implemented using the mergeable trees data structure by Georgiadis et al. . Since the maximum number of nodes in a barcode forest is , the data structure supports the following operations, each of which takes amortized time:
Return the root of a node.
Return the nearest common ancestor of two leaves (in the same tree).
Glue the paths from two leaves (in the same tree) to their nearest common ancestor.
Note that while we delete the path from the departing node to its ancestor in the departure event, deletions are not supported by mergeable trees. However, path deletions are indeed unnecessary which are only meant for a clear exposition. Hence, during implementation, we only traverse each ancestor of the departing node until an unpaired333An entering or splitting node is initially unpaired when introduced and becomes paired when its level is used to produce an interval. E.g., the node becomes paired in the departure event in Algorithm 1. one is found without actual deletions. Since each node can only be traversed once, the traversal in the departure events takes time in total. See [14, Section 5] for details of implementing the barcode forest and its operations using mergeable trees.
The time complexity of the algorithm is dominated by the operations of the dynamic connectivity and the mergeable trees data structures.
In this subsection, we justify the correctness of Algorithm 1. For each entering node in a of Algorithm 1, there must be a single node in at the level of with the same property. So we also have entering nodes in . Splitting and departing nodes in can be similarly defined.
We first prepare some standard notions and facts in zigzag persistence (Definition 5 and 7, Proposition 9) that help with our proofs. Some notions also appear in previous works in different forms; see, e.g., .
Definition 5 (Representatives).
Let be an elementary zigzag module and be an interval. An indexed set is called a set of partial representatives for if for every , or by ; it is called a set of representatives for if the following additional conditions are satisfied:
If is forward with non-trivial cokernel, then is not in ; if is backward with non-trivial kernel, then is the non-zero element in .
If and is backward with non-trivial cokernel, then is not in ; if and is forward with non-trivial kernel, then is the non-zero element in .
Specifically, when for a zigzag filtration , we use terms -representatives and partial -representatives to emphasize the dimension .
Let be the filtration given in Figure 1(a), and let , be the sum of the component containing vertex 1 and the component containing vertex 2 in and . Then, is a set of 0-representatives for the interval .
Definition 7 (Positive/negative indices).
Let be an elementary zigzag module. The set of positive indices of , denoted , and the set of negative indices of , denoted , are constructed as follows: for each forward , if is an injection with non-trivial cokernel, add to ; if is a surjection with non-trivial kernel, add to . Furthermore, for each backward , if is an injection with non-trivial cokernel, add to ; if is a surjection with non-trivial kernel, add to . Finally, add copies of to .
For each in Definition 7, if , then ; similarly, if , then . Furthermore, if is an isomorphism, then and .
Note that in Definition 7 is in fact a multi-set; calling it a set should not cause any confusion in this paper though. Also note that , and every index in (resp. ) is the start (resp. end) of an interval in . This explains why we add copies of to because there are always number of intervals ending with in ; see the example in Figure 1(a) where .
Let be an elementary zigzag module and be a bijection. If every satisfies that and the interval has a set of representatives, then .
For each , let be a set of representatives for . Then, define as an interval submodule of over such that is generated by if and is trivial otherwise, where denotes the -th vector space in . We claim that , which implies the proposition. To prove this, suppose that is of the form
Then, we only need to verify that for every , the set is a basis of . We prove this by induction on . For , since , is obviously a basis. So we can assume that for an , is a basis of . We have the following cases:
- an isomorphism:
In this case, and . If is forward, then . The elements in must then form a basis of because is an isomorphism. The verification for being backward is similar.
- forward, non-trivial:
In this case, and . For each such that , and by . We then have that elements in are linearly independent because is injective. Since by Definition 5, must contain linearly independent elements. The fact that the cardinality of the set equals implies that it must form a basis of .
- forward, non-trivial:
In this case, and . Let . For each such that and , and by . We then have that . Since is surjective, elements in generate , in which by Definition 5. It follows that forms a basis of because it generates and its cardinality equals .
- backward, non-trivial:
In this case, and . For each such that and , and by . We then have that elements in are linearly independent because if they are not, then their images under are also not, which is a contradiction. Note that and its cardinality equals , so it must form a basis of .
- backward, non-trivial:
In this case, and . For each such that , and by . We then have that elements in are linearly independent because their images under are. We also have that there is no non-trivial linear combination of falling in because otherwise their images under would not be linearly independent. Since is the non-zero element in by Definition 5, we have that contains linearly independent elements. Then, it must form a basis of because its cardinality equals , ∎
Now we present several propositions leading to our conclusion (Theorem 14). Specifically, Proposition 10 states that a certain path in induces a set of partial 0-representatives. Proposition 11 lists some invariants of Algorithm 1. Proposition 10 and 11 support the proof of Proposition 13, which together with Proposition 9 implies Theorem 14.
From now on, and always denote the input to Algorithm 1. Since each node in a barcode graph represents a connected component, we also interpret nodes in a barcode graph as 0-th homology classes throughout the paper. Moreover, a path in a barcode graph from a node to a node is said to be within level and if for each node on the path, its level satisfies ; we denote such a path as .
Let be a level- node and be a level- node in such that and there is a path in . Then, there is a set of partial 0-representatives for the interval with and .
We can assume that is a simple path because if it were not we could always find one. For each , let be all the level- nodes on whose adjacent nodes on are at different levels. Then, let . Also, let and . It can be verified that is a set of partial 0-representatives for . See Figure 4 for an example of a simple path (the dashed one) in a barcode graph, where the solid nodes contribute to the induced partial 0-representatives and the hollow nodes are excluded. ∎
For an with , we define the prefix of as the filtration and observe that is the subgraph of induced by nodes at levels less than or equal to . We call level- nodes of as leaves and do not distinguish leaves in and because they bijectively map to each other. It should be clear from the context though which graph or forest a particular leaf is in.
For each , Algorithm 1 maintains the following invariants:
There is a bijection from trees in to connected components in containing leaves such that a leaf is in a tree of if and only if is in .
For each leaf in and each ancestor of at a level , there is a path in where is a level- node.
For each leaf in and each splitting ancestor of at a level , let be the unique level- splitting node in . Then, there is a path in .
We only verify invariant 3 as the verification for invariant 2 is similar but easier and invariant 1 is straightforward. The verification is by induction. When , invariant 3 trivially holds. Now suppose that invariant 3 is true for an . For the no-change, entrance, and split event in Algorithm 1, it is not hard to see that invariant 3 still holds for . For the departure event, because we are only deleting a path from to form , invariant 3 also holds for . For the merge event, let be a leaf in , be a splitting ancestor of at a level , and be the unique splitting node in at level . The node may correspond to one or two nodes in , in which only one is splitting, and let be the splitting one. Note that ’s parent may correspond to one or two nodes in , and we let be the set of nodes in that ’s parent corresponds to. If is an ancestor of a node in , then by the assumption, there must be a path in . From this path, we can derive a path in . If is not an ancestor of any node of in , the fact that is an ancestor of ’s parent in implies that there must be an ancestor of a node in which corresponds to. So we have that is a gluing of two nodes from . Note that ’s parent must not be a glued node in because otherwise would have been an ancestor of a node of in ; see Figure 5 where and are the two level- nodes glued together. Let be the highest one among the nodes on the path from to that are glued in iteration . We have that must correspond to a node in which is an ancestor of . Recall that are the two leaves in which are glued, and let be the child of the glued node of in , as shown in Figure 5. From the figure, we have that must be splitting because one child of (which is not glued) descends down to and the other child of (which is glued) descends down to . The fact that is an ancestor of in implies that there is a path in . Let be the unique splitting node in at the same level with ; then, and being descendants of in implies that there are paths and in . Now we derive a path in by concatenating the following paths and edges: , , , , , . ∎
Each interval produced by Algorithm 1 admits a set of 0-representatives.
Suppose that an interval is produced by the merge event at iteration . We have the following situations:
If the nodes in this event (see Algorithm 1) are in the same tree in , let be the highest common ancestor of and note that is a splitting node at level . Also note that are actually leaves in and hence can also be considered as level- nodes in . Let be the unique level- splitting node in . By invariant 3 of Proposition 11 along with Proposition 10, there are two sets of partial 0-representatives for with , , , and . We claim that is a set of 0-representatives for the interval . To prove this, we first note the following obvious facts: (i) is a set of partial 0-representatives; (ii) ; (iii) is the non-zero element in . So we only need to show that . Let be the two level- nodes in connecting to . Then, equals or and the same for . To see this, we first show that can only contain . For contradiction, suppose instead that contains a level- node with , . Let be the simple path that induces as in Proposition 10 and its proof. Then, is on the path and the two adjacent nodes of on are at level and , in which we let be the one at level . Note that because is not equal to or . Since