A feedback vertex set (FVS) in a graph is a vertex subset such that is acyclic. In the case of directed graphs, it means is a directed acyclic graph (DAG). In the (Directed) Feedback Vertex Set ((D)FVS) problem we are given as input a (directed) graph and a weight function . The objective is to find a minimum weight feedback vertex set . Both the directed and undirected version of the problem are NP-complete  and have been extensively studied from the perspective of approximation algorithms [1, 12], parameterized algorithms [6, 8, 19], exact exponential time algorithms [23, 29] as well as graph theory [11, 24].
In this paper we consider a restriction of DFVS, namely the Feedback Vertex Set in Tournaments (TFVS) problem, from the perspective of approximation algorithms. A tournament is a directed graph such that every pair of vertices is connected by an arc, and TFVS is simply DFVS when the input graph is required to be a tournament. We refer to the textbook of Williamson and Shmoys  for an introduction to approximation algorithms. Even this restricted variant DFVS has applications in voting systems and rank aggregation and is quite well-studied [5, 10, 15, 22, 21, 20]. It is formally defined as follows.
Feedback Vertex Set in Tournaments (TFVS) Input: A tournament and a weight function . Output: A minimum weight FVS of .
The problem has several simple -approximation algorithms. It is well known that a tournament has a directed triangle if and only if there is a directed triangle . Then a -approximation solution for the unweighted version111Where all the vertices have the same weight. of TFVS is easily constructed as follows. If there is a directed triangle in the tournament put all the vertices of the triangle in the solution and delete them from the tournament. We repeat the above process until the tournament becomes triangle free222This will not, in general, give a -approximation for a weighted instance.. Another simple -approximation algorithm for TFVS is given in . The first algorithm with a better approximation ratio was given by Cai et al. , who gave a -approximation algorithm using the local ratio technique of Bar-Yehuda and Even . Recently, Mnich et al.  gave a -approximation algorithm using the iterative rounding technique. They observe that the approximation-preserving reduction from Vertex Cover to TFVS of Speckenmeyer  implies that, assuming the Unique Games Conjecture (UGC) , TFVS cannot have an approximation algorithm with factor smaller than . The more general DFVS problem has a factor- approximation [25, 13] where is the number of vertices in the input tournament and is the cost of an optimal solution, and it is known that DFVS cannot have a factor- approximation for any constant under the UGC [17, 16, 27]. A related problem is -Hitting Set or Vertex Cover in -uniform hypergraphs. Here the input is a universe and a family of subsets of of size at most . The goal is to find a minimum subset of the universe that intersects every set in . Observe that TFVS is a special case of this problem, since TFVS reduces to hitting all the directed triangles in the tournament. While it is NP-hard to approximate -Hitting Set better than factor , under the UGC there can be no polynomial time approximation better than factor 333These results actually hold for the more general problem of Vertex Cover in -uniform hypergraphs.. Mnich et al.  state that their algorithm “gives hope that a -approximation algorithm, that would be optimal under the UGC, might be achievable (for TFVS)”. In this paper we show that this is indeed the case, by giving a (randomized) -approximation algorithm for TFVS. More formally, we prove the following theorem.
There exists a randomized algorithm that, given a tournament
on vertices and a weight function on , runs in time
and outputs a feedback vertex set of . With
probability at least
. With probability at least, is a -approximate solution of .
This algorithm can be easily derandomized in quasi-polynomial time.
Our algorithm is inspired by the methods and analysis of Fixed Parameter Tractable (FPT)-algorithms. A well known technique in FPT algorithm is branching where we try to guess if a vertex is in the optimal solution or not. Similarly, our approximation algorithm tries to randomly sample a vertex of the tournament which is not contained in some optimal solution, and whose in-degree and out-degree are each at most a constant fraction of . Assuming that the size of the optimal solution is upper bounded by a constant fraction of , the random sampling succeeds with a constant probability444When the size of the optimal solution is large, the algorithm picks a constant fraction of lowest weight vertices into the approximation solution, to obtain the reduced instance.. With the vertex in hand, we reduce the input instance into smaller instances, defined by the in-neighborhood and the out-neighborhood of , which are then solved recursively. By the the properties of , the cardinality of the vertex set of each of these instances is upper-bounded by a constant fraction of . This step is reminiscent of reduction rules that are frequently applied in FPT algorithms and kernelization. We show that we can recover a -approximation for the input instance from -approximate solutions of the reduced instances, with a constant probability of success. By repeated application, this process gradually decomposes the input instance into a collection of constant size instances, which are then solved by brute force. This leads to a -approximation algorithm for TFVS which runs in randomized polynomial time. We believe that the connection to FPT algorithms and analysis is a key feature of our algorithm, which will be applicable for other problems.
In this paper we work with directed graphs (or digraphs) that do not contain any self loops or parallel arcs. We use to denote the vertex set of a digraph and to denote the set of arcs of . We use the notation to denote an arc from vertex to vertex in a digraph. Vertices are incident with arc . A tournament is a digraph in which there is exactly one arc between any two vertices. The set of out-neighbors of a vertex in a digraph is defined to be , and the set of in-neighbors of in is defined to be . For an integer a directed cycle of length in a digraph is an alternating sequence where is a set of distinct vertices of and is a subset of arcs of where and . A digraph is acyclic if it does not contain a directed cycle. A triangle in a digraph is a directed cycle of length three. In this paper we use the term “triangle” exclusively to denote directed triangles. A topological sort of a digraph with vertices is a permutation of the vertices of the digraph such that for all arcs , it is the case that . Such a permutation exists for a digraph if and only if is acyclic . For an acyclic tournament, the topological sort is unique . Deleting a vertex from digraph involves removing, from , the vertex and all those arcs in with which is incident in . We use to denote the digraph obtained by deleting a vertex from digraph . For a vertex set we use to denotes the digraph obtained from digraph by deleting all the vertices of .
A feedback vertex set (FVS) of a digraph is a vertex set such that is acyclic. A vertex set is a feasible solution if and only if it is an FVS. Given a weight function the weight of a vertex set is . An FVS of is an optimal solution of the instance if every other FVS of satisfies . A FVS of is called -approximate solution of the instance if for an optimal solution of . An FVS is called -disjoint for a vertex if , and further, is said to be an optimal -disjoint FVS of if, for every -disjoint solution we have . Note that an optimal -disjoint solution of is not necessarily an optimal solution of . On the other hand if an optimal solution of happens to be -disjoint then is also an optimal p-disjoint solution of . A -disjoint FVS of is called -approximate -disjoint solution of the instance if for an optimal -disjoint solution of .
In the following we will assume that is a tournament on vertices, and is a weight function. Furthermore, for any induced subgraph of , we assume that defines a weight function, when restricted to . We will frequently make use of the following lemma which directly follows from the fact that acyclic digraphs are closed under vertex deletions.
Let be an FVS of a digraph and let be a subset of the vertex set of . Then is an FVS of the digraph . If is an optimal solution of an instance of TFVS and is a subset of then is an optimal solution of the instance , of weight .
We use the following lemma to prove the correctness our algorithm in the later section.
Let be an instance of TFVS.
A vertex is not part of any triangle in if and only if every arc between a vertex in and a vertex in is of the form .
Let be a vertex which is not part of any triangle in . Let and be the subgraphs induced in by the in- and out-neighborhoods of vertex , respectively. A set is an FVS of digraph if and only if is an FVS of the subgraph and is an FVS of the subgraph .
Suppose vertex is not part of any triangle in . If there is an arc in where vertex is in the out-neighborhood of vertex and vertex is in its in-neighborhood then the vertices form a triangle containing vertex , a contradiction. So every arc between vertices and is directed from to . Conversely, if vertices form a triangle and—without loss of generality— is an arc in then we have that both and are arcs in . Thus , and arc is not of the form .
Now prove statement of the lemma. Let be an FVS of . As and are subgraphs of (which is a DAG), we have that is an FVS of and is an FVS of . Now we prove the other direction. Let be such that is an FVS of and is an FVS of . Since is an acyclic tournament, there is a unique topological sort of , where . Also, since is an acyclic tournament, there is a unique topological sort of , where . Since is not part of a triangle in , by statement of the lemma, there is no arc from a vertex in to a vertex in . This implies that is a topological sort of . Therefore is an FVS of . ∎
3 The Algorithm
We begin with an informal overview. Let be a digraph and be a weight function on the vertices of . If is an optimal FVS for the instance and is a vertex in then (Lemma 1) is an optimal FVS of the instance , and its weight is exactly . Note that this need not be the case for vertices outside of ; deleting a vertex may not bring down the weight of an optimal FVS. As a simple example, consider the tournament on four vertices where (i) form a triangle, (ii) vertex has in-degree three, and (iii) all vertices have weight one. An optimum FVS of this instance consists of any one of the three vertices and has weight one. An optimum FVS of the digraph is also of this same form, and has weight one as well.
Thus if we are given the promise that a vertex is in some optimal FVS of then we can safely delete from and recursively find an optimal FVS of the smaller instance , to get an optimal FVS of the original instance . If we don’t know that vertex is in some optimal FVS of then we cannot safely make such a reduction.
It turns out that if we are willing to accept the lesser promise of “half a vertex” being in an optimal solution then we can safely make an analogous reduction which preserves a 2-approximate solution for the TFVS instance. More precisely, suppose we are given a pair of vertices and the promise that some optimal solution contains at least one out of . Then—see Lemma 4 (with an assumption that there is an optimal solution not containing )—vertex must belong to some 2-approximate solution for the instance . Indeed, if we delete from and reduce the weight of vertex by to get a smaller instance, then for any 2-approximate solution of this smaller instance, the set is a 2-approximate solution of the original instance .
So to find a 2-approximate solution for TFVS it is enough to—repeatedly—find pairs of vertices with the guarantee that there is an optimal solution which contains at least one of these two vertices. For this we use the observation that a tournament contains a directed cycle if and only if it contains a directed triangle. Let be a tournament and the vertex set of a directed triangle in . If there is an optimal solution which does not contain vertex then is a pair of vertices with the required property. So it is enough to be able to repeatedly find a vertex which (i) belongs to a directed triangle, and (ii) is not part of some optimal solution. Call a vertex which has these two properties, an “unimportant” vertex.
If we could consistently find an unimportant vertex with some good probability then we could solve the problem with a good probability of success. One way to do this would be to—somehow—ensure that a constant fraction—say, —of the entire vertex set is unimportant; a vertex picked uniformly at random would then be unimportant with probability . So the “bad case” is when only a very small part of the vertex set is unimportant; equivalently, when a large fraction of the vertex set—here, —is part of every optimal solution. This in turn implies that there is an optimal solution which contains a large fraction——of the vertex set. If we can—somehow—process those cases where there is an optimal solution which contains a very large fraction of the vertex set then we will be able to consistently find unimportant vertices with good probability.
Let be an optimal solution which contains more than of the vertex set of . Consider the set of the vertices of the smallest weight in . Then the weight of the vertex set is at most a quarter () of the weight of the optimum . This suggests that picking all of into a solution should not result in a solution which is heavier than the optimum by a factor of . Indeed, something stronger holds for 2-approximate solutions. We show—see Lemma 3—that there is a 2-approximate solution which contains all of . Indeed, we can delete from and modify the weights of the remaining vertices in a certain way to get an instance such that for any 2-approximate solution of , the set is a 2-approximate solution for the original instance .
We now give a high level conceptual sketch of the algorithm, hiding some details required for getting good bounds on the running time and success probability. Our algorithm has two phases. In each phase it computes a feasible solution, and at the end it returns the solution of smaller weight among these two. We prove—along the lines suggested by the above discussion—that at least one of these solutions must be a 2-approximate solution. Recall that denotes the input instance where has vertices.
Phase 1 of the algorithm computes a candidate 2-approximate solution for assuming that there is an optimum solution with . To do this the algorithm deletes the set of the vertices of the smallest weight in , modifies the weights of the remaining vertices in as specified in Lemma 3, and recursively finds a 2-approximate solution of the resulting instance . The candidate 2-approximate solution from this step is .
Phase 2 of the algorithm computes another candidate 2-approximate solution for assuming that no optimum solution has or more vertices. To do this the algorithm picks a “pivot” vertex at random. If is not part of any triangle in then the algorithm recursively finds 2-approximate solutions of the subgraphs and induced by the in- and out-neighborhoods of vertex , respectively, and sets the candidate 2-approximate solution from this phase to be . This is safe by Lemma 2.
If the pivot vertex is part of some triangle in then the algorithm assumes that is unimportant, and applies a reduction procedure to obtain an instance where vertex is not in any triangle. This procedure chooses two vertices which form a triangle together with . It then deletes from and modifies555See Lemma 3 for the specifics. the weight of to get a new instance . The reduction procedure consists of the repeated application of this step as long as the pivot vertex is part of some triangle, and stops when it obtains a subgraph in which vertex is not part of any triangle. Now the algorithm recurses on the in and out-neighborhoods of in digraph as described in the previous paragraph, to get a 2-approximate solution . The candidate 2-approximate solution from this phase is where is the set of all vertices deleted from by the reduction step to get to the digraph . If then the algorithm outputs ; otherwise it outputs .
To prove that this recursive procedure runs in polynomial time we need to ensure that neither of the digraphs in the recursive step is “too small”; more specifically, that the number of vertices in each of is upper-bounded by a fraction of the number of vertices in the digraph given as input to Phase 2. We enforce this by picking the pivot vertex from among those vertices of whose in- and out-degrees are upper-bounded by a certain fraction of .
In the rest of this section we give a more formal description of the algorithm, prove its correctness, and show that it runs in polynomial time. We begin by proving a couple of lemmas which formalize some ideas from the above discussion. Our first lemma pertains to the case when there is an optimal solution which contains a large fraction of the vertex set.
Let be an instance of TFVS where has vertices, and which has an optimal solution that contains at least vertices of . Let be a set of vertices of the smallest weight in , ties broken arbitrarily, and let be the weight of the heaviest vertex in . Let be the weight function which assigns the weight to each vertex of . If is a -approximate solution of the reduced instance then is a -approximate solution of the instance .
Let be an optimum solution of the reduced instance . Then . From Lemma 1 we get that is a—not necessarily optimal—solution of the reduced instance . Since is an optimum solution of this instance we have that . Since holds for each vertex we get that . Since we get that . Hence .
Thus . Since the set is disjoint from the deleted set we have that holds for each vertex . Hence . Since holds for each vertex we have that . Hence
Here the last inequality follows from the fact that . ∎
The next lemma shows that given , we can safely pick a lighter weight vertex of the two vertices and into a 2-approximate -disjoint solution.
Let be an instance of TFVS and . Let be two vertices such that (i) form a triangle in , and (ii) . Let be the weight function defined by: , , and for all vertices . Then for every -approximate -disjoint solution of the reduced instance , we have is a -approximate -disjoint solution of the original instance .
Since and the former digraph is acyclic by assumption, we get that is a FVS in the digraph . We will show that is a -approximate -disjoint solution of . Since , is a -disjoint FVS of . Let be an optimal -disjoint solution of . Notice that . Now to complete the proof, it remains to show that . Let , that is . Now we have the following.
|since is an FVS of|
This completes the proof. ∎
Recall that in Phase 2 we work under the assumption that there is an optimal solution of which does not contain the pivot vertex . If there is an arc such that and then the vertices form a triangle in , and so at least one of the two vertices must be present in the solution . Let be a vertex of the least weight among , ties broken arbitrarily, and let be the other vertex. Then Lemma 4 applies to the tuple .
Procedure of Algorithm 1 1 implements the reduction procedure of Phase 2. It starts by setting , , and . As long as there is an arc such that and it finds vertices as described in the previous paragraph and computes a weight function as specified in Lemma 4 as applied to the collection . It sets , , increments by one, and repeats. When no such arc exists the procedure outputs the set and the weight function .
Our next lemma states that procedure Reduce runs in polynomial time and correctly outputs a reduced instance. Recall that for an instance of TFVS and a vertex , a -disjoint solution of is an FVS of which does not contain vertex .
Let be an instance of TFVS and . When given as input, the procedure Reduce runs in time and outputs a vertex set and a weight function with the following properties:
there are no arcs from to in digraph , and
for every -approximate -disjoint solution of , the set is a -approximate -disjoint solution of .
The check on line 3 of Algorithm 1 fails if and only if there are no arcs from to in the digraph for the value of at that point. Since the assignment of to on line 13 happens only if this check fails, we get that there are no arcs from to in the digraph . Let be a -approximate -disjoint solution of . Then by a simple induction on the number of iterations and Lemma 4, we obtain that is a -approximate -disjoint solution of .
To complete the proof we show that procedure Reduce runs in time where . Let . We assume that graph is given as its adjacency matrix where if is an arc in and otherwise. We assume also that the weight function is given as a array where stores the weight of vertex .
We compute the two neighborhoods and of the pivot vertex by scanning the entries of the row ; vertex if , and if and . This takes time. Let be the in- and out-degrees of vertex . We construct a array to store the neighborhood relation between the sets and , and a array to store the out-degrees of vertices in into the set . We initialize all entries of and to zeroes. Now for each pair of vertices we increment the entries and by each if and only if . Once this is done the cell holds the number of out-neighbors of vertex in the set , and if and only if is an arc in for vertices . Since all this can be done in time.
To execute the test on line 3 of Algorithm 1 we scan the list for a non-zero entry. If all entries of are zeros then there is no arc of the specified form and the test returns False. If for some then we scan the row to find an index such that . Then is a pair of vertices which satisfy the test. We use these vertices to execute lines 4 to 10 of the procedure. We effect the addition of vertex to the set on line 11 as follows: If then we set and . If then for each such that , we decrement the cells and by .
Each line of Algorithm 1, except for line 11, takes constant time. Line 11—as described above—takes time. Each execution of line 11 takes either a row or a column of which has non-zero entries and sets all these entries to zero. Since the algorithm does not increment these entries in the loop, we get that the while loop of lines 3 to 12 is executed at most times. Thus the entire procedure runs in time. ∎
On input the procedure Reduce runs in time and outputs a vertex set and a weight function such that for every FVS of and every FVS of , we have that is a -disjoint FVS of .
Further, if is a -approximate solution of and is -approximate solution of then is a -approximate -disjoint solution of .
The running time of procedure Reduce follows from Lemma 5. Let be an FVS of and be an FVS of . By Lemma 5, there are no arcs from to in digraph . Then by statement of Lemma 2, is not part of any triangle in . Thus, by statement of Lemma 2, is an FVS of . Therefore, by Lemma 1, is an FVS of . Moreover, since , it is a -disjoint FVS of .
Suppose is a -approximate solution of and is a -approximate solution of . Now we claim that is a -approximate -disjoint solution of . Let and be optimal solutions of and , respectively. Then we claim that is an optimal -disjoint solution of . By statement of Lemma 2, is an FVS of and clearly it does not contain . Suppose is not an optimal -disjoint solution of . Let be an optimal -disjoint solution of and . Then, either or . Consider the case when . By Lemma 2, is an FVS of . But this contradicts the assumption that is an optimal solution of . The same arguments apply to the case when . Therefore is an optimal -disjoint solution of . Since is a -approximate solution of and is a -approximate solution of , we have that . Hence, is a -approximate -disjoint solution of . Then by Lemma 5, is a -approximate -disjoint solution of . This completes the proof of the corollary. ∎
We are now ready to prove our main theorem. See 1.1
We first describe the algorithm. On input , if has at most vertices the algorithm finds an optimal solution by exhaustively enumerating and comparing all potential solutions. Otherwise the algorithm iteratively computes at most solutions of by making recursive calls. It then outputs the least weight FVS among them. We now describe the iterations and the recursive calls. Let us index the iteration by .
The first iteration is different from the other iterations. In this iteration, the algorithm sets to be the set of the vertices of smallest weight in and . Let be the weight function which assigns the weight to each vertex of . The algorithm calls itself recursively on . The recursive call returns an FVS of , the algorithm constructs the FVS of .
We do the remaining 25 iterations only when the set is non-empty. For each of these 25 iterations (which we index by ), the algorithm picks a vertex uniformly at random from the set of vertices . For each the algorithm runs the procedure Reduce on , , and and obtains a set and a weight function . It then makes two recursive calls, one on , and the other on . Let the sets returned by the two recursive calls be and respectively. The algorithm constructs the set as the FVS of G corresponding to .
Finally, the algorithm outputs the minimum weight , where the minimum is taken over as the solution. The algorithm terminates within the claimed running time, since the running time is governed by the recurrence which solves to by the Master theorem . We now prove that in each iteration, the constructed solution is indeed an FVS of , and that the same holds for the solution returned by the algorithm. We apply an induction on the number of vertices in . For there are no recursive calls made, and the returned solution is an optimal solution, since it is computed by brute force. For the returned solution is one of the ’s and so it is sufficient to prove that all ’s are in fact feedback vertex sets of . For , this follows from Corollary 1 and the induction hypothesis. And for , we know that and is a vertex subset returned by the recursive call for the instance , which is also an FVS of , by the induction hypothesis. Since and is an FVS of , clearly is an FVS of .
Finally, will show that with probability at least , the algorithm outputs a -approximate solution of . We prove this by induction on , the number of vertices in . Suppose that is of the least weight among , for some , which is output by the algorithm. For the returned solution is optimal, so assume . Let be an optimal solution for . We distinguish between two cases, either or . If then, by the induction hypothesis the first iteration, the recursive call on returns a