1.1 Background and related work
A path cover of a (directed) graph is a set of paths such that every vertex of the graph appears in some path in the set. A minimum-size path cover of a graph is called a minimum path cover (MPC). While computing a MPC is NP-hard in general, it is a classic result, dating back to Dilworth  and Fulkerson , that this can be done in polynomial time on directed acyclic graphs (DAGs). From this point onward we assume a fixed DAG admitting a MPC of size .
Dilworth’s theorem  states that the size of a MPC of a DAG equals the size of a maximum antichain of it. An antichain is a set of vertices that are pairwise non-reachable, and a maximum antichain is one of maximum size. Fulkerson’s constructive proof  of this fact shows that computing a MPC can be reduced to finding a maximum matching in a bipartite graph with vertices and edges, where is the set of edges in the transitive closure of . It follows that one can compute a MPC in time with the Hopcroft-Karp algorithm , assuming that is already computed. Another reduction to a minimum flow problem gives a different complexity: one can subdivide every vertex and add the demand 1 for each such new edge. Since minimum flows are reducible to maximum flows (see e.g. ), it follows from Orlin’s result  that the MPC problem can be solved in -time. Finally, observe that the MPC problem is in general at least as hard as the maximum bipartite matching problem, because the converse reduction also holds: it is enough to orient the edges of the bipartite graph from left to right, and obtain a maximum matching as the one-edge paths in a MPC.
The MPC problem has applications in various fields, ranging from program testing , scheduling , to some assembly problems in bioinformatics [24, 27, 3, 22, 21]. A MPC can also act as a backbone to solve other problems. For example, Jagadish  shows that -time reachability queries can be supported given a path cover of size : it is enough to compute from it a -size table, storing, for every vertex and path in the cover, the index inside the path of the first vertex reachable from . Recently, Mäkinen et al. [17, 16] showed that using essentially the same table, other basic problems extended to a DAG can be parameterized by its width (longest increasing subsequence and longest common subsequence). The same holds also for a string matching problem extended to DAGs (co-linear chaining) [17, 16].
In practical applications, the width of the DAG may be small. For example, in  the DAG comes from a so-called pan-genome: it has hundreds of millions of vertices, is “long” (i.e., it has transitive edges), but yet it has a small width. It is thus natural to ask whether there exists a faster MPC algorithm when the width is small. The first algorithms parameterized on are due to Chen and Chen: the first runs in time , and the second in time . Recently, Mäkinen et al. [17, 16] obtained a faster one for sparse graphs, running in time .
Despite these efforts, the fixed-parameter tractability (FPT) of the MPC problem is far from settled. This issue is also related to the recent line of research “FPT inside P”  (see e.g. the parameterizations for the maximum matching problem [9, 15].) As a first question, observe that all the existing algorithms have either a superlinear dependence on , or a quadratic dependence on , in the worst case. Thus, we can state:
Open Question 1 (Intermediate goal).
Is there an FPT algorithm running in time , for some and some function depending only on ?
However, the final goal is to obtain an FPT algorithm having a linear dependence on both and :
Open Question 2 (Final goal).
Is there an FPT algorithm running in time , for some function depending only on ?
To open some hope in this direction, Felsner et al.  gave an -time algorithm based on an exhaustive combinatorial approach to decide whether the width of a DAG is 3. Their technical result can be regarded as a special case for the decision problem for .
We affirmatively answer creftypecap 1 with the following result.
Given a DAG of width , we can compute a MPC of it in time .
Theorem 1 obtains linear dependence on (i.e., without any -multiplicative factors), and moreover, demonstrates that can be just . This algorithm is also faster than all existing MPC algorithms when is small but the graph is dense. In particular, it is faster when and but .
In order to describe our algorithm, we briefly review the -time algorithm of Mäkinen et al. . This works in two steps. In step (i) it finds an approximate path cover, namely one of size . This can be done in time, by running a greedy algorithm, iteratively taking a path covering the most uncovered vertices. This can be done in time per path. The standard set cover approximation analysis ensures that the number of paths thus obtained is . For step (ii), this solution is shrunk to the optimal size , by using the standard reduction to maximum flow (see e.g. [2, Theorem 3.9.1]). Each shrinking step takes -time, and there are such steps. We will also use this procedure as a primitive called shrinking (Lemma 1).
In order to obtain as additive term in the bound from Theorem 1, we resort to the same high-level idea of Jagadish : transitive edges are immaterial to reachability, and thus also to computing a MPC. More precisely, having a path cover of size , one can sparsify the DAG to have at most out-going edges per vertex: for each vertex and path , we keep only the edge from to the earliest vertex in , to have overall remaining edges. This can be done in time , while also ensuring that shrinking does not destroy the path cover (Lemma 2). However, we do not have a MPC to start with, since we are trying to compute one!
To solve this, we need to interleave computing a MPC with sparsifying the DAG as we progress. Since black-box reductions such as flow or matching cannot allow for this, we need a new approach. We first devise a new -time algorithm for the MPC problem, which, to the best of our knowledge, is the first one based on divide and conquer. This works by splitting a topological order of the vertices in half, and recursing in each half. When combining the MPCs from the two halves, we need to (i) account for the new edges between the two parts (and here we can exploit sparsification), and (ii) efficiently combine the two partial path covers into one for the entire graph (and here we use shrinking). We present this in Section 2, and in Section 4 we present the implications of Theorem 1 to the problems from  mentioned above.
Given a DAG of width , we can compute a MPC of it in parallel steps using single processors in the PRAM model.
This answers creftypecap 2 in the parallel setting, using a reasonable number of parallel processors. In the canonical setting we present a linear-time FPT algorithm for solving the dual of creftypecap 2; finding a maximum antichain.
Given a DAG of width , we can compute a maximum antichain of it in time .
Assuming is constant, this is the fastest possible algorithm for the problem. Our strategy is to traverse the graph in a topological order and have an antichain structure sweeping the graph, while performing only work per step. As such, it can also be viewed as an online algorithm receiving in every step a sink vertex and its in-coming edges.
As a first attempt, one can think of maintaining only the (unique) right-most maximum antichain (recall that all maximum antichains form a lattice ). However, it is difficult to update this antichain in time . As such, since we are allowed to spend per step, we can maintain more structure at every step, in addition to the right-most maximum antichain.
Therefore, we introduce the notion of frontier antichain. A frontier antichain is one such that there is no other antichain of the same size and “to the right” of it (see Definition 3). Thus, the largest frontier antichain is also the (unique) right-most maximum antichain, and gives thus the width of . Since any antichain can take at most one vertex from any path in a path cover, there are at most frontier antichains (Lemma 7).
In Section 3.1 we show how to maintain all frontier antichains when a new vertex is added. For example, if a frontier antichain does not reach , then is also a frontier antichain in the new graph. However, can now be to the right of an old frontier antichain of size , thus invalidating it. Moreover, we show that it is sufficient to compare old frontier antichains against the newly added frontier antichains, therefore a pair-wise comparison between frontier antichains suffices to compute them all, maintaining the complexity as a function of . Since this involves testing reachability, as another key ingredient we prove that it is enough to maintain reachability only from vertices belonging to frontier antichains to the newly added vertex. Thus, we can store information per vertex (Section 3.2).
We say that a graph is a subgraph of if and ; in this case we also say that is a supergraph of . If , then is the subgraph of induced by , defined as , where . A walk in is a sequence of vertices of , such that , for all . We say that is proper if . A path is a walk not repeating vertices. We say that is reachable from , or equivalently, that reaches , if there exists a path in , with and . We say that a set of vertices reaches a vertex if there exists , such that, reaches . The transitive closure of , denoted , is the supergraph of , such that if and only if reaches in with a proper path. A partially ordered set (poset) is a set and a partial order (reflexive, transitive and antisymmetric binary relation) over . If the number of elements of a poset is finite, then there exist at least one maximal (minimal) element, and every element in the poset is comparable to a maximal (minimal) element .
We recall the following result from .
Lemma 1 (Path cover shrinking ).
Given a DAG of width , and a path cover of , we can obtain a MPC of in time .
2 Combining sparsification, divide and conquer, and shrinking
First, we formalize the concept of sparsification of a graph, which we will use intensively.
Let be a graph. We call another graph a sparsification of if and only if and .
When we obtain a sparsification of a graph we say that we sparsify . Sparsification has been studied before with the name of reduction (see  for a brief survey). Indeed, a sparsification having the minimum number of edges is called a transitive reduction.
Next, we show that the sparsification process in DAGs does not modify the size of a MPC. This is not necessarily true in general graphs, but it will let us sparsify our DAGs when computing its MPC without changing its width.
Observation 1 (Sparsification on DAGs).
If is a DAG, and a sparsification of , then the size of a MPC of is the same as the size of a MPC of .
Since and have the same transitive closure and set of vertices, a set of vertices is an antichain in if and only if it is an antichain in . Therefore, the size of the maximum antichain is the same in both graphs. Finally, since and are DAGs, they also share the size of their MPCs, by Dilworth’s theorem. ∎
We are now ready to present the sparsification building block.
Lemma 2 (Sparsification algorithm).
Let be a graph, and be a path cover of . Then, we can sparsify to , such that is a path cover of and , in time.
First, we compute for every vertex and every path , , which is the position of vertex in path or if . We can do this in by initializing all the values to , and then scanning the paths in one by one and filling the corresponding values. Using the same scan we can compute for every vertex , , which is a list of the indexes of paths where belongs.
For every vertex we initialize an array , where is a vertex not in , such that, , for all . Then, in time we process every path , and for every edge in , we set . After that, we process the edges one by one, set , and if we set . Finally, will be the edges such that , thus in total the number of edges is at most .
Finally, is a sparsification of because if an edge is not considered in it means that there is an edge such that, . Therefore, there is a path from to , using first and then, the edges in between and . Note that contains all the edges in the paths because we initialized for every edge in path , and these are not updated during the algorithm. ∎
2.2 Divide and conquer plus shrinking
In the divide-and-conquer solution we need to divide the DAG in such a way that a solution (i.e. a MPC) in each of its parts is smaller or equal than a solution of the entire graph. We will see in the next lemma that it is sufficient to consider consecutive vertices in a topological order.
Let be a DAG, and a topological order of its vertices. Then, for all , the width of is at most the width of .
We first show that the intersection of any path of with the vertices of is a path in . Consider a path , and remove from it all the vertices from . Thus, we obtain a (possibly empty) sequence of vertices from . We say that is the intersection of with . Since is a DAG, is a sequence of consecutive vertices in (otherwise, if it is not empty, we would have a vertex of smaller (bigger) topological index that is reached by (reaches )), and therefore a path in . Finally, since only contains vertices from and is an induced subgraph, is a path also in .
Consider a MPC of . The intersection of each of those paths with forms a path cover of , whose size is greater than or equal to the size of a MPC of . ∎
Now we present our divide-and-conquer approach, which solves the MPC problem in the same time complexity as in .
Let be a DAG of width . We can compute a MPC of in time , without using the greedy approximation algorithm.
Compute a topological order of the vertices in time , we will use this order to make the following divide-and-conquer algorithm. Solve recursively in the subgraph induced by , obtaining a MPC of , and in the subgraph induced by , obtaining a MPC of . We consider the path cover of , and shrink our solution to size in time, which is , since (by Lemma 3).
To analyze this recursive algorithm consider its recursion tree. At each node of the recursion tree every vertex and edge of the respective subgraph pays . Also, since the division of the graph into two subgraphs generates disjoint graphs, every vertex and edge is considered in nodes of the recursion tree. Therefore, the cost of the algorithm is . ∎
2.3 Putting it all together
The next observation is the last one needed for proving the correctness of our final algorithm. It states that when we sparsify a subgraph, if we remove the same edges in the original graph, the result is still an sparsification of the graph.
Lemma 4 (Sparsification of a subgraph).
Let be a graph, a subgraph of , and a sparsification of . Then is a sparsification of .
Since is a sparsification of , and every two vertices connected in are also connected in . Suppose by contradiction that and are connected in by a path , but they are not connected in . Then, contains an edge disconnecting from in , but since is a sparsification of , is connected to in , which is a contradiction. ∎
With this final observation we can sparsify the subgraphs in our divide-and-conquer approach without worrying about modifying the size of the MPC, because of creftypecap 1. This gives rise to our final solution.
We will modify the recursive solution given in Theorem 4 by sparsifying before the shrinking.
Take a topological order of the vertices . Solve recursively in the subgraph induced by , obtaining a MPC of a sparsification of with , and in the subgraph induced by , obtaining a MPC of a sparsification of with . Note that is a sparsification of according to Lemma 4 with , where are the edges in from to . We consider the path cover of and use Lemma 2 to obtain an sparsification of in time such that . Then, we shrink our solution to size in time.
The complexity analysis will consider again the recursion tree. Note that the complexity of a recursion step is , that is, every vertex of the corresponding subgraph pays and every edge going from the left subgraph to the right subgraph pays . Since the division of the graph generates disjoint subgraphs, every vertex appears in nodes in the recursion tree, and every edge going from left to right appears in exactly one node in the recursion tree. Therefore, the total cost is . ∎
Note that in the above algorithm, from the level in the recursion tree where down, the cost of a level will be half of the cost of the previous level, then in total all these levels will cost (without considering the cost in the edges, we count this as before). The reason for this is that in these nodes of the recursion tree, the number of paths is bounded by , and since is halving in the recursion, the cost of shrinking also halves. With this refinement of the analysis, our algorithm actually costs .
Figure 1 shows an schematic example of the algorithm. Since our algorithm is based on divide and conquer, we can parallelize the work done on every sub-part of the input, and obtain a linear-time parallel algorithm for the MPC problem. See 2
We use our algorithm from Theorem 1. Since the algorithm divides the problem into two disjoint subgraphs we can easily solve each sub-part by using separate processors, and then join the solutions as explained above in . We do this subdivision until we have processors, that is, when the size of the input is . When reaching this point we run the algorithm in the inputs in parallel running in . Finally, note that all the merge in the first levels in the recursion tree are done in parallel, adding up to . ∎
3 A linear-time FPT algorithm
In this section we present an algorithm running in time for computing the width of the DAG (Theorem 3). The core idea is to process the vertices in topological order , and maintain all frontier antichains (Definition 3) of the current subgraph . To do so, we maintain reachability from vertices in the frontier antichains to the currently processed vertex.
Definition 2 (Antichain domination).
Let and be antichains of the same size. We say dominates if and only if for all , there exists , such that reaches . We denote this by , if also , we denote it .
Note that frontier antichains can dominate only antichains of the same size, since antichains of different size are not comparable. Algorithm 1 shows a function determining whether an antichain dominates another. Although naive, it is an essential piece in our final solution, and uses the efficient reachability computation explained in Section 3.2.
Domination is a partial order on antichains.
Clearly, domination is a reflexive and transitive relation. We argue that it is also antisymmetric: suppose and are antichains such that and . Suppose by contradiction that there exists . Since , there exists such that reaches (note ). Since , there exists such that reaches . Thus, there is a proper path from to to in . If , this implies a cycle exists in a DAG, a contradiction. If , this implies is not an antichain, a contradiction. Thus, . A symmetric argument shows , so . We conclude domination is also antisymmetric. ∎
Definition 3 (Frontier antichains).
Frontier antichains are the maximal elements of the domination partial order, i.e. those antichains that are not dominated by any other antichain.
Figure 2 shows frontier antichains of an example graph. The next lemma establishes that frontier antichains dominate all antichains of the graph, i.e., every non frontier antichain is dominated by some frontier antichain (thus of the same size).
Let be a non frontier antichain of . Then, there exists a frontier antichain dominating .
Since there are a finite number of antichains of , the antichains with the domination relation form a finite poset (Lemma 5), therefore every element of this poset (i.e. antichain) is less than or equal to (i.e. is dominated by) a maximal element (i.e. a frontier antichain). ∎
Now we show that the number of such antichains only grows with , thus there is no problem for our complexity bound to maintain them all.
If is a DAG of width , then has at most frontier antichains.
Let be an MPC of . Since any antichain can take at most one vertex from each of those paths, we show that for every size- subset of paths of , there is at most one frontier antichain of size whose vertices come from those paths, and thus there are at most frontier antichains. Without loss of generality consider the subset of paths , and suppose by contradiction that there are two different size- frontier antichains and whose vertices come from . Let us label the vertices in these antichains by the path they belong to. Namely, , , with and in for all . We define the following set of vertices:
Note that if , then reaches , because and appear on the same path .
First, note that is an antichain of size . Otherwise, if there exists that reaches (, then without loss of generality, suppose that and . Since , we have that reaches , and thus it reaches , which contradicts being an antichain. Second, note that , since otherwise . Finally, , since for all there exists , such that reaches . ∎
3.1 Maintaining frontier antichains
Our algorithm will maintain all frontier antichains of the current graph . We say that an antichain is -frontier if it is a frontier antichain in the graph . The following two lemmas will show us how these frontier antichains evolve when processing the vertices of the graph, i.e. when passing from to .
Lemma 8 (Type 1).
Let be a -frontier antichain with . Then is a -frontier antichain.
Otherwise, there would exist an antichain , such that in . Consider , which is an antichain (otherwise would reach , a contradiction, since , and is an antichain). Finally, note that in , which is a contradiction since is -frontier antichain. ∎
Lemma 9 (Type 2).
Let be a -frontier antichain with . Then is a -frontier antichain.
Otherwise, there would exist an antichain , such that in , and also in , which is a contradiction. ∎
Looking at these two lemmas, we establish two types of -frontier antichains: the ones containing , called of type 1, and the ones that are also -frontier antichains, called of type 2. We handle these two cases separately. First, we find all type-1 frontier antichains, then all of type 2.
Type-1 -frontier antichains are made up of one -frontier antichain and vertex . A first requirement for a -frontier antichain, , to be a subset of a type-1 -frontier antichain is that does not reach . We now show that this is enough to ensure that is a -frontier antichain.
Let be a -frontier antichain not reaching , then is a -frontier antichain.
If , is frontier antichain, because is a sink of . Otherwise and, by contradiction, take an antichain , such that in . Suppose that , then for all there exists , such that, reaches , but since is a sink of , for all there exists , such that, reaches , i.e., in , which is a contradiction. If , then every vertex of is reached by a vertex of (it cannot be reached by since it is a sink in ), and therefore take any , which , a contradiction. ∎
We use this lemma to find all type-1 -frontier antichains by testing reachability from -frontier antichains to , with reachability queries in total.
Type-2 -frontier antichains are -frontier antichains that are not dominated by any antichain in containing (this is sufficient since they are frontier in ). Moreover, by Lemma 6, if a -frontier antichain is dominated in , then it is dominated by a -frontier antichain. Therefore, type-2 -frontier antichains are -frontier antichains that are not dominated by any type-1 -frontier antichain. For every -frontier antichain we check if there exists a type-1 -frontier antichain dominating . We can do this in total reachability queries from vertices in -frontier antichains to vertices in -frontier antichains and .
Both type-1 and type-2 -frontier antichains need answering reachability queries efficiently among vertices in -frontier antichains and . In the next section we show how to maintain constant-time reachability queries among these vertices in time per vertex and edge.
3.2 Reachability between frontier antichains
To complete our algorithm, we aim to maintain reachability queries among all vertices in -frontier antichains and .
Definition 4 (Support).
Let be a DAG of width , and a topological order of its vertices. For every , we define the support of , as the set of all vertices belonging to some -frontier antichain, that is,
We say that , thus . Note that follows by Lemma 7. Also, , since is a -frontier antichain. Another interesting fact is that if a vertex exits the support in some step, then it cannot re-enter. This is formalized in the following two lemmas.
Let . If , then for all .
By induction on . The base case is the hypothesis itself. Suppose that it is true for , and suppose by contradiction that . Then , for some -frontier antichain . If , then by Lemma 9, is a -frontier antichain, and , which is a contradiction. But if , then by Lemma 8, is a -frontier antichain, and , a contradiction. ∎
Let . If , then holds for all .
If this is not true, we have that exists , which is a contradiction with and Lemma 11. ∎
We now state that it is sufficient to support reachability queries from every to to answer queries among vertices in and . Then, we show how to maintain these reachability relations in time per vertex and edge.
If we know reachability from to for all , then we can answer reachability queries among vertices in .
Let . We can answer whether reaches by doing the following. If we answer true, if we answer false. Suppose , then by Lemma 12 it holds that , and then we can use reachability from to to answer this query. ∎
Algorithm 2 shows a function deciding whether an antichain reaches a vertex, using the technique explained in Theorem 5. This function is used to implement Algorithm 1, and our final solution in Algorithm 4.
We will compute reachability from to for all incrementally when processing the vertices in topological order. That is, we assume that we have computed reachability from to for all and we want to compute reachability from to .
For this we do the following. Initially, we set reachability from to to false for all . Then, for every edge , if we set reachability from to to true, and for each (we can compute this intersection in per edge, by scanning and deciding whether by testing if by Lemma 12) such that reaches (known since ) we set reachability from to to true. After doing this, reachability is correctly computed and stored from to .
The procedure above correctly computes reachability from to .
Clearly, what the algorithm sets to true is correct. Suppose by contradiction that it exists reaching , such that reachability from to was not set to true. Since reaches , the in-neighbourhood of is not empty. Since was not set to true, in particular,