An Experimental Study of ILP Formulations for the Longest Induced Path Problem

02/17/2020 ∙ by Markus Chimani, et al. ∙ Universität Osnabrück 0

Given a graph G=(V,E), the longest induced path problem asks for a maximum cardinality node subset W⊆ V such that the graph induced by W is a path. It is a long established problem with applications, e.g., in network analysis. We propose novel integer linear programming (ILP) formulations for the problem and discuss efficient implementations thereof. Comparing them with known formulations from literature, we prove that they are beneficial in theory, yielding stronger relaxations. Moreover, our experiments show their practical superiority.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Let be an undirected graph and . The -induced graph contains exactly the nodes and those edges of whose incident nodes are both in . If is a path, it is called an induced path. The length of a longest induced path is also referred to as the induced detour number which was introduced more than 30 years ago [8]. We denote the problem of finding such a path by LongestInducedPath. It is known to be -complete, even on bipartite graphs [17].

The LongestInducedPath problem has applications in molecular physics, the analysis of social, telecommunication, and more general transportation networks [7, 25, 3, 32] as well as pure graph and complexity theory: It is closely related to the graph diameter—the longest among all shortest paths between any two nodes, which is a commonly analyzed communication property of social networks [29]. A longest induced path witnesses the largest diameter that may occur by the deletion of any node subset in a node failure scenario [29]. The tree-depth of a graph is the minimum depth over all of its depth-first-search trees, and constitutes an upper bound on its treewidth [6], which is a well-established measure in parameterized complexity and graph theory. Recently, it was shown that any graph class with bounded degree has bounded induced detour number iff it has bounded tree-depth [31]. Further, the enumeration of induced paths can be used to predict nuclear magnetic resonance [35].

LongestInducedPath is not only -complete, but also [2]-complete [9] and does not allow a polynomial -approximation, unless  [24, 5]. On the positive side, it can be solved in polynomial time for several graph classes, e.g., those of bounded mim-width (which includes interval, bi-interval, circular arc, and permutation graphs) [26] as well as -bounded-hole, interval-filament, and other decomposable graphs [18]. Furthermore, there are -complete problems, such as -Coloring for  [22] and Independent Set [28], are polynomial time solvable on graphs with bounded induced detour number.

Recently the first non-trivial, general algorithms to solve the LongestInducedPath problem exactly were devised by Matsypura et al. [29]. There, three different integer linear programming (ILP) formulations were proposed: the first searches for a subgraph with largest diameter; the second utilizes properties derived from the average distance between two nodes of a subgraph; the third models the path as a walk in which no shortcuts can be taken. Matsypura et al. show that the latter (see below for details) is the most effective in practice.

1.0.1 Contribution.

In Section 3, we propose novel ILP formulations based on cut and subtour elimination constraints. We obtain strictly stronger relaxations than those proposed in [29] and describe a way to strengthen them even further in Section 4. After discussing some algorithmic considerations in Section 5, we show in Section 6 that our most effective models are also superior in practice.

2 Preliminaries

2.0.1 Notation.

For , let . Throughout this paper, we consider a connected, undirected, simple graph as our input. Edges are cardinality-two subsets of . If there is no ambiguity, we may write for an edge . Given a graph , we refer to its nodes (edges) by (, respectively). Given a cycle in , a chord is an edge connecting two nodes of that are not neighbors along .

2.0.2 Linear programming (cf., e.g., [34]).

A linear program

(LP) consists of a cost vector 

together with a set of linear inequalities, called constraints, that define a polyhedron  in . We want to find a point  that maximizes the objective function . This can be done in polynomial time. Unless , this is no longer true when restricting to have integral components; the so-modified problem is an integer linear program (ILP). Conversely, the LP relaxation of an ILP is obtained by dropping the integrality constraints on the components of . The optimal value of an LP relaxation is a dual bound on the ILP’s objective; e.g., an upper bound for maximization problems. As there are several ways to model a given problem as an ILP, one aims for models that yield small dimensions and strong dual bounds, to achieve good practical performance. This is crucial, as ILP solvers are based on a branch-and-bound scheme that relies on iteratively solving LP relaxations to obtain dual bounds on the ILP’s objective. When a model contains too many constraints, it is often sufficient to use only a reasonably sized constraint subset to achieve provably optimal solutions. This allows us to add constraints during the solving process, which is called separation. We say that model  is at least as strong as model , if for all instances, the LP relaxation’s value of model  is no further from the ILP optimum than that of . If there also exists an instance for which ’s LP relaxation yields a tighter bound than that of , then  is stronger than .

When referring to models, we use the prefix “” with an appropriate subscript. When referring to their respective LP relaxations we write “” instead.

2.0.3 Walk-based model (state-of-the-art).

Recently Matsypura et al. [29] proposed an ILP model, , that is the foundation of the fastest known exact algorithm (called A3c therein) for LongestInducedPath. They introduce timesteps, and for every node and timestep they introduce a variable that is iff is visited at time . Constraints guarantee that nodes at non-consecutive time points cannot be adjacent. We recapitulate details in Appendix 0.A. Unfortunately, yields only weak LP relaxations (cf. [29] and Section 4). To achieve a practical algorithm, Matsypura et al. iteratively solve for an increasing number of timesteps until the path found does not use all timesteps, i.e., a non-trivial dual bound is encountered. In contrast to [29], we consider the number of edges in the path (instead of nodes) as the objective value.

3 New Models

We aim for models that exhibit stronger LP relaxations and are practically solvable via single ILP computations. To this end, we consider what we deem a more natural variable space. We start by describing a partial model , which by itself is not sufficient but constitutes the core of our new models. To obtain a full model, , we add constraints that prevent subtours.

For notational simplicity, we augment to  by adding a new node  that is adjacent to all nodes of . Within , we look for a longest induced cycle through , where we ignore induced chords incident to . Searching for a cycle instead of a path, allows us to homogeneously require that each selected edge, i.e., edge in the solution, has exactly two adjacent edges that are also selected. Let denote the edges adjacent to edge  in . Each binary -variable is iff edge is selected. We denote the partial model below by : equationparentequation

(1a)
s.t. (1b)
(1c)
(1d)

Constraint (1b) requires to select exactly two edges incident with . To prevent chords, constraints (1c) enforce that any (original) edge (even if not selected itself!) is adjacent to at most two selected edges; if is selected, precisely two of its adjacent edges need to be selected as well.

3.0.1 Establishing connectivity.

The above model is not sufficient: it allows for the solution to consist of multiple disjoint cycles, only one of which contains . But still, these cycles have no chords in , and no edge in connects any two cycles. To obtain a longest single cycle through —yielding the longest induced path —we thus have to forbid additional cycles in the solutions that are not containing . In other words, we want to enforce that the graph induced by the -variables is connected.

There are several established ways to achieve connectivity: To stay with compact (i.e., polynomially sized) models, we could, e.g., augment with Miller-Tucker-Zemlin constraints (which are known to be polyhedrally weak [4]) or multi-commodity-flow formulations (; cf. Appendix 0.B). However, herein we focus on augmenting with cut or (generalized) subtour elimination constraints, resulting in the (non-compact) model we denote by , see below for details. Such constraints are a cornerstone of many algorithms for diverse problems where they are typically superior (in particular in practice) than other known approaches [33, 16, 15]. While and are polyhedrally equally strong (cf. Section 4), we know from other problems that the sheer size of the latter typically nullifies the potential benefit of its compactness. Preliminary experiments show that this is indeed the case here as well.

3.0.2 Cut model (and generalized subtour elimination).

Let be the set of edges in the cut induced by . For notational simplicity, we may omit braces when referring to node sets of cardinality one. We obtain by adding cut constraints to : equationparentequation

(2a)
These constraints ensure that if a node  is incident to a selected edge (by (1c) there are then two such selected edges), any cut separating from contains at least two selected edges, as well. Thus, there are (at least) two edge-disjoint paths between and selected. Together with the cycle properties of , we can deduce that all selected edges form a common cycle through .

An alternative view leads to subtour elimination constraints for , which prohibit cycles not containing via counting. It is well known that these constraints can be generalized using binary node variables that indicate whether node  participates in the solution (in our case: in the induced path) [20]. Generalized subtour elimination constraints thus take the form

(2b)

One expects and “ with constraints (2b)” to be equally strong as this is well-known for standard Steiner tree, and other related models [21, 11, 12]. In fact, there even is a direct one-to-one correspondence between cut constraints (2a) and generalized subtour elimination constraints (2b): By substituting node-variables with their definitions in (2b), we obtain . A simple rearrangement yields the corresponding cut constraint (2a).

3.0.3 Clique constraints.

We further strengthen our above models by introducing a set of additional inequalities. Consider any clique (i.e., complete subgraph) in . The induced path may contain at most one of its edges:

(3)

4 Polyhedral Properties of the LP Relaxations

We compare the above models w.r.t. the strength of their LP relaxations, i.e., the quality of their dual bounds. Achieving strong dual bounds is a highly relevant goal also in practice: one can expect a lower running time for the ILP solvers in case of better dual bounds since fewer nodes of the underlying branch-and-bound tree have to be explored. We defer the proofs of this section to Appendix 0.C.

Since requires some upper bound  on the objective value, we can only reasonably compare this model to ours by assuming that we are also given this bound as an explicit constraint. Hence, no dual bound of any of the considered models gives a worse (i.e., larger) bound than . As has already been observed in [29], in fact always yields this worst case bound:

Proposition 1

(Proposition 5 from [29]) For every instance and every number of timesteps has objective value .

Note that Proposition 1 is independent of the graph. Given that the longest induced path of a complete graph has length , we also see that the integrality gap of is unbounded. Furthermore, this shows that cannot be weaker than . We show that already the partial model is in fact stronger than . Let therefore , where is the instance’s (integral) optimum value.

Proposition 2

is stronger than . Moreover, for every there is an infinite family of instances on which has objective value at most and has objective value at least .

Since only has additional constraints compared to , this implies that is also stronger than . In fact, since constraints (2a) cut off infeasible integral points contained in , is clearly even a strict subset of . As noted before, we can show that using a multi-commodity-flow scheme (cf. Appendix 0.B) results in LP relaxations equivalent to :

Proposition 3

and are equally strong.

Let denote with clique constraints added for all cliques on at most  nodes. We show that increasing the clique sizes yields a hierarchy of ever stronger models.

Proposition 4

For any , is stronger than .

5 Algorithmic Considerations

5.0.1 Separation.

Since contains an exponential number of cut constraints (2a), it is not practical in its full form. We follow the traditional separation pattern for branch-and-cut-based ILP solvers: We initially omit cut constraints (2a), i.e., we start with model . Iteratively, given a feasible solution to the LP relaxation of , we seek violated cut constraints and add them to . If no such constraints are found and the solution is integral, we have obtained a solution to . Otherwise, we proceed by branching or—given a sophisticated branch-and-cut framework—by more general techniques.

Given an LP solution , we call an edge active if . Similarly, we say that a node is active, if it has an active incident edge. These active graph elements yield a subgraph of . For integral LP solutions, we simply compute the connected components of and add a cut constraint for each component that does not contain . We refer to this routine as integral separation. For a fractional LP solution, we compute the maximum flow value between and each active node in ; the capacity of an edge is equal to . If , a cut constraint based on the induced minimum --cut is added. We call this routine fractional separation. Both routines manage to find a violated constraint if there is any, i.e., they are exact separation routines. In fact, this shows that an optimal solution to can be computed in polynomial time [23]. Note that already integral separation suffices to obtain an exact, correct algorithm—we simply may need more branching steps than with fractional separation.

5.0.2 Relaxing variables.

As presented above, our models have binary variables, each of which may be used for branching by the ILP solver. We can reduce this number, by introducing new binary variables , , that allow us to relax the binary -variables, , to continuous ones. The new variables are precisely those discussed w.r.t. generalized subtour elimination, i.e., we require . Assuming to be continuous in , we have for every edge if or then . Conversely, if then by (1c). Hence, requiring integrality for the -variables (and, e.g., branching only on them), suffices to ensure integral values.

5.0.3 Handling clique constraints.

We use a modified version of the Bron-Kerbosch algorithm [14] to list all maximal cliques. For each such clique we add a constraint during the construction of our model. Recall that there are up to maximal cliques [30]

, but preliminary tests show that this effort is negligible compared to solving the ILP. Thus, as our preliminary tests also show, other (heuristic) approaches of adding clique constraints to the initial model are not worthwhile.

6 Computational Experiments

6.0.1 Algorithms.

We implement the best state-of-the-art algorithm, i.e., the -based one by Matsypura et al. as briefly described in Section 2 and Appendix 0.A. We denote this algorithm by “W”. For our implementations of , we consider various parameter settings w.r.t. to the algorithmic considerations described in Section 5. We denote the arising algorithms by “C” to which we attach sub- and superscripts defining the parameters: the subscript “” denotes that we use fractional separation in addition to integral separation. The superscript “” specifies that we introduce node variables as the sole integer variables. The superscript “” specifies that we use clique constraints. We consider all eight thereby possible implementations.

6.0.2 Hard- and software.

Our C++ (GCC 8.3.0) code uses SCIP 6.0.1 [19] as the Branch-and-Cut-Framework with CPLEX 12.9.0 as the LP solver. We use OGDF snapshot-2018-03-28 [10], in particular its push-relabel implementation, for the separation of cut constraints. We use igraph 0.7.1 [13] to calculate all maximal cliques. For W, we directly use CPLEX instead of SCIP as the Branch-and-Cut-Framework. This does not give an advantage to our algorithms, since CPLEX is more than twice as fast as SCIP [1] and we confirmed in preliminary tests that CPLEX is faster on . However, we use SCIP for our algorithms, as it allows better parameterizible user-defined separation routines. We run all tests on an Intel Xeon Gold 6134 with 3.2 GHz and 256 GB RAM running Debian 9. We limit each test instance to a single thread with a time limit of minutes and a memory limit of  GB.

6.0.3 Instances.

We consider the instances proposed for LongestInducedPath in [29] as well as additional ones. Overall, our test instances are grouped into four sets: RWC, MG, BAS and BAL. The first set, denoted RWC, is a collection of 22 real-world networks, including communication and social networks of companies and of characters in books, as well as transportation, biological, and technical networks. See [29] for details on the selection. The Movie Galaxy (MG) set consists of 773 graphs representing social networks of movie characters [27]. While [29] considered only 17 of them, we use the full set here. The other two sets are based on the Barabási-Albert probabilistic model for scale-free networks [2]. In [29], only the chosen parameter values are reported, not the actual instances. Our set BAS recreates instances with the same values: 30 graphs for each choice , where is the graph’s density. As we will see, these small instances are rather easy for our models. We thus also consider a set BAL of graphs on 100 nodes; for each density  we generate 30 instances. See http://tcs.uos.de/research/lip for all instances, their sources, and detailed experimental results.

6.0.4 Comparison to the state-of-the-art.

instance W C C C C C C C C
high-tech 13 33 91 15. 40 0. 90 1. 11 1. 44 3. 15 0. 51 0. 81  0. 41 2. 05
karate 9 34 78 2. 98 1. 73 1. 65 2. 12 1. 32 1. 07 3. 71  0. 66 2. 74
mexican 16 35 117 73. 30 1. 68 2. 25 1. 12 3. 59 1. 22 1. 34  0. 87 0. 99
sawmill 18 36 62 70. 00 0. 51  0. 43 0. 50  0. 44 0. 85 3. 32 0. 82 3. 34
tailorS1 13 39 158 83. 80 4. 78 7. 92 4. 81 6. 45  1. 51 1. 87 3. 29 3. 55
chesapeake 16 39 170 106. 00  1. 84 13. 11 2. 11 11. 00 2. 29 4. 88 3. 19 4. 39
tailorS2 15 39 223 445. 00 6. 80 21. 78 11. 92 14. 91 3. 20 4. 31  2. 89 3. 14
attiro 31 59 128 🕒 1. 76 2. 57 2. 48 1. 75 1. 20 1. 75  0. 89 1. 19
krebs 17 62 153 522. 00 3. 86 28. 21 18. 55 10. 03 16. 00 11. 26 3. 90  2. 33
dolphins 24 62 159 🕒 7. 95 27. 59 22. 72 18. 33 19. 21  2. 99  3. 01 4. 70
prison 36 67 142 🕒 13. 36 5. 87 1. 09 1. 50 3. 62 4. 05  1. 02  1. 02
huck 9 69 297 41. 70 🕒 144. 13 19. 46 42. 22 114. 27 11. 63  5. 96 7. 49
sanjuansur 38 75 144 🕒 30. 67 8. 64 24. 86 10. 33 8. 22  3. 65  3. 79 4. 71
jean 11 77 254 121. 00 464. 89 52. 89 16. 54 9. 53 81. 03 14. 47  3. 88 5. 14
david 19 87 406 🕒 666. 25 719. 46 26. 70 45. 34 85. 88 23. 94  6. 93 10. 35
ieeebus 47 118 179 🕒 37. 10 22. 35 39. 82 10. 60 15. 69  3. 13 22. 72 5. 61
sfi 13 118 200 44. 40 47. 41 4. 39 4. 89 3. 77 15. 13 2. 64 3. 31  2. 44
anna 20 138 493 🕒 21. 58 296. 69 53. 21 74. 55 439. 23 20. 27  7. 09 7. 58
usair 46 332 2126 🕒 🕒 🕒 🕒 🕒 🕒 🕒  922. 94 🕒
494bus 142 494 586 🕒 🕒 379. 29 🕒 379. 97 🕒  178. 92 🕒  170. 74
Table 1: Running times [s] on RWC except for yeast and 622bus (solved by none). We denote timeouts by   🕒  and mark times within of the minimum in bold.
(a) Running time on BAS and BAL
(b) Running time on MG
(c) Running time vs. OPT (all instances)
(d) Reduction of B&B-nodes by node var’s on commonly solved BAS and BAL
Figure 1: Comparison between different ILP models.
(a),(b): Each point is a median, where timeouts are treated as seconds. Bars in the background give the number of instances. Gray encircled markers, connected via dotted lines, show the number of solved instances (if not 100%).
(c): Whiskers mark the 20% and 80% percentile. The gray area marks timeouts.

We start with the most obvious question: Are the new models practically more effective than the state-of-the-art? See Fig. 0(a) for BAS and BAL, Fig. 0(b) for MG, and Table 1 for RWC.

We observe that rather independent of the benchmark set, the various implementations achieve the best running times and success rates. The only exceptions are the instances from MG (cf. Fig. 0(b)): there, the overhead of the stronger model, requiring an explicit separation routine, does not pay off and W yields comparable performance to the weaker of the cut-based variants. On BAS instances, the cut-based variants dominate (cf. Fig. 0(a)): while all variants (see below) solve all of BAS, W can only solve the instances for reliably. On BAL (cf. Fig. 0(a)) W fails on virtually all instances. The cut-based model, however, allows implementations (see below for details) that solve all of these harder instances. We point out one peculiarity on the BAL instances, visible in Fig. 0(a). The instances have 100 nodes but varying density. As the density increases from 2 to 30, the median running times of all algorithmic variants increase and the median success rates decrease. However, from to (where only C is successful) the running times drop again and the success rate increases. Interestingly, the number of branch-and-bound (B&B) nodes for is only roughly 1/7 of those for . This suggests that the denser graphs may allow fewer (near-)optimal solutions and thus more efficient pruning of the search tree.

6.0.5 Comparison of cut-based implementations.

Choosing the best among the eight implementations is not as clear as the general choice of over . In Fig. 0(a), 0(b), and Table 1 we see that, while adding clique constraints is clearly beneficial on MG, on BAS and RWC the benefit is less clear. On BAL, we do not see a benefit and for we even see a clear benefit of not using clique constraints. Each of the graphs from BAL with has at least maximal cliques—and therefore initial clique constraints—, whereas the BAL graphs for and the RWC graphs yeast and usair have at most maximal cliques and all other graphs have at most .

The probably most surprising finding is the choice of the separation routine: while the fractional variant is a quite fast algorithm and yields tighter dual bounds, the simpler integral separation performs better in practice. This is in stark contrast to seemingly similar scenarios like TSP or Steiner problems, where the former is considered by default. In our case, the latter—being very fast and called more rarely—is seemingly strong enough to find effective cutting planes that allow the ILP solver to achieve its computations fastest. This is particularly true when combined with the addition of node variables (see below). In fact,

C is the only choice that can completely solve all large graphs in BAL.

Adding node variables (and relaxing the integrality on the edge variables) nearly always pays off significantly (cf. Fig. 0(a), 0(b)). Fig. 0(d) shows that the models without node variables require many more B&B-nodes. In fact, looking more deeply into the data, C requires roughly as few B&B-nodes as C without requiring the overhead of the more expensive separation routine. Only for BAS with , the configurations without node variables are faster; on these instances, our algorithms only require B&B-nodes (median).

(a) LP value vs. . Left: MG; right: BAS and BAL.

x

(b) Maximal found clique size vs. LP value on MG.
Figure 2: Root LP relaxation of cut-based models. The blue line shows the median.

6.0.6 Dependency of running time on the optimal value.

Since the instances optimal value determines the final size of the instance, it is natural to expect the running time of W to heavily depend on . Fig. 0(c) shows that this is indeed the case. The new models are less dependent on the solution size, as, e.g., witnessed by C in the same figure.

6.0.7 Practical strength of the root relaxations.

For our new models, we may ask how the integer optimal solution value and the value of the LP relaxation (obtained by any cut-based implementation with exact fractional separation) differ, see Fig. 1(a). The gap increases for larger values of . Interestingly, we observe that the density of the instance seems to play an important role: for BAS and BAL, the plot shows obvious clusters, which—without a single exception—directly correspond to the different parameter settings as labeled. Denser graphs lead to weaker LP bounds in general.

Fig. 1(b) shows the relative improvement to the LP relaxation when adding clique constraints for MG instances. On the other hand for every instance of BAS and BAL the root relaxation did not change by adding clique constraints.

7 Conclusion

We propose new ILP models for LongestInducedPath and prove that they yield stronger relaxations in theory than the previous state-of-the-art. Moreover, we show that they—generally, but also in particular in conjunction with further algorithmic considerations—clearly outperform all known approaches in practice. We also provide strengthening inequalities based on cliques in the graph and prove that they form a hierarchy when increasing the size of the cliques.

It could be worthwhile to separate the proposed clique constraints (at least heuristically) to take advantage of their theoretical properties without overloading the initial model with too many such constraints. As it is unclear how to develop an efficient such separation scheme, we leave it as future research.

References

  • [1] Achterberg, T.: SCIP: solving constraint integer programs. Math. Prog. Comput. 1(1), 1–41 (2009)
  • [2] Barabási, A.L., Albert, R.: Emergence of Scaling in Random Networks. Science 286, 509–512 (1999)
  • [3] Barabási, A.L.: Network Science. Cambridge University Press (2016)
  • [4] Bektaş, T., Gouveia, L.: Requiem for the Miller-Tucker-Zemlin subtour elimination constraints? EJOR 236(3), 820–832 (2014)
  • [5] Berman, P., Schnitger, G.: On the Complexity of Approximating the Independent Set Problem. Inf. Comput. 96(1), 77–94 (1992)
  • [6] Bodlaender, H.L., Gilbert, J.R., Hafsteinsson, H., Kloks, T.: Approximating Treewidth, Pathwidth, Frontsize, and Shortest Elimination Tree. J. Alg. 18(2), 238–255 (1995)
  • [7] Borgatti, S.P., Everett, M.G., Johnson, J.C.: Analyzing Social Networks. SAGE Publishing (2013)
  • [8] Buckley, F., Harary, F.: On longest induced paths in graphs. Chinese Quart. J. Math. 3(3), 61–65 (1988)
  • [9] Chen, Y., Flum, J.: On Parameterized Path and Chordless Path Problems. In: CCC. pp. 250–263 (2007)
  • [10] Chimani, M., Gutwenger, C., Juenger, M., Klau, G.W., Klein, K., Mutzel, P.: The Open Graph Drawing Framework (OGDF). In: Tamassia, R. (ed.) Handbook on Graph Drawing and Visualization, pp. 543–569. Chapman and Hall/CRC (2013), www.ogdf.net
  • [11] Chimani, M., Kandyba, M., Ljubić, I., Mutzel, P.: Obtaining Optimal -cardinality Trees Fast. J. Exp. Alg. 14, 5:2.5–5:2.23 (2010)
  • [12] Chimani, M., Kandyba, M., Ljubić, I., Mutzel, P.: Strong Formulations for -Node-Connected Steiner Network Problems. In: COCOA. pp. 190–200. LNCS 5165 (2008)
  • [13] Csardi, G., Nepusz, T.: The igraph software package for complex network research. InterJournal, Complex Systems 1695,  1–9 (2006), http://igraph.sf.net
  • [14] Eppstein, D., Löffler, M., Strash, D.: Listing All Maximal Cliques in Sparse Graphs in Near-Optimal Time. In: ISAAC. pp. 403–414. LNCS 6506 (2010)
  • [15] Fischetti, M.: Facets of two Steiner arborescence polyhedra. Math. Prog. 51, 401–419 (1991)
  • [16] Fischetti, M., Salazar-Gonzalez, J., Toth, P.: The Generalized Traveling Salesman and Orienteering Problems. In: The Traveling Salesman Problem and Its Variations, Comb. Opt., vol. 12. Springer (2007)
  • [17] Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co. (1979)
  • [18] Gavril, F.: Algorithms for maximum weight induced paths. Inf. Process. Let. 81(4), 203–208 (2002)
  • [19] Gleixner, A., Bastubbe, M., Eifler, L., Gally, T., Gamrath, G., Gottwald, R.L., Hendel, G., Hojny, C., Koch, T., Lübbecke, M.E., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schlösser, F., Schubert, C., Serrano, F., Shinano, Y., Viernickel, J.M., Walter, M., Wegscheider, F., Witt, J.T., Witzig, J.: The SCIP Optimization Suite 6.0. ZIB-Report 18-26, Zuse Inst. Berlin (2018), https://scip.zib.de
  • [20] Goemans, M.X.: The steiner tree polytope and related polyhedra. Math. Prog. 63, 157–182 (1994)
  • [21] Goemans, M.X., soo Myung, Y.: A Catalog of Steiner Tree Formulations. Networks 23, 19–28 (1993)
  • [22] Golovach, P.A., Paulusma, D., Song, J.: Coloring graphs without short cycles and long induced paths. Disc. Appl. Math. 167, 107–120 (2014)
  • [23]

    Grötschel, M., Lovász, L., Schrijver, A.: Geometric Algorithms and Combinatorial Optimization, Alg. and Comb., vol. 2. Springer (1988)

  • [24] Håstad, J.: Clique is hard to approximate within . Acta Math. 182(1), 105–142 (1999)
  • [25] Jackson, M.O.: Social and Economic Networks. Princeton University Press (2010)
  • [26] Jaffke, L., Kwon, O., Telle, J.A.: Polynomial-Time Algorithms for the Longest Induced Path and Induced Disjoint Paths Problems on Graphs of Bounded Mim-Width. In: IPEC. pp. 21:1–13. LIPIcs 89 (2017)
  • [27] Kaminski, J., Schober, M., Albaladejo, R., Zastupailo, O., Hidalgo, C.: Moviegalaxies - Social Networks in Movies. Harvard Dataverse (V3 2018)
  • [28] Lozin, V., Rautenbach, D.: Some results on graphs without long induced paths. Inf. Process. Let. 88(4), 167–171 (2003)
  • [29] Matsypura, D., Veremyev, A., Prokopyev, O.A., Pasiliao, E.L.: On exact solution approaches for the longest induced path problem. EJOR 278, 546–562 (2019)
  • [30] Moon, J.W., Moser, L.: On Cliques in Graphs. Israel J. of Math. 3(1), 23–28 (1965)
  • [31] Nesetril, J., de Mendez, P.O.: Sparsity - Graphs, Structures, and Algorithms, Alg. and Comb., vol. 28. Springer (2012)
  • [32] Newman, M.: Networks: An Introduction. Oxford University Press (2010)
  • [33] Polzin, T.: Algorithms for the Steiner problem in networks. Ph.D. thesis, Saarland University, Saarbrücken, Germany (2003)
  • [34] Schrijver, A.: Theory of linear and integer programming. Wiley-Intersci. series in disc. math. and opt., Wiley (1999)
  • [35] Uno, T., Satoh, H.: An Efficient Algorithm for Enumerating Chordless Cycles and Chordless Paths. In: Int. Conf. on Disc. Sci. pp. 313–324. LNCS 8777 (2014)

Appendix

Appendix 0.A Walk-Based Model (State-of-the-Art)

The following ILP model, denoted by , was recently presented in [29]. It constitutes the foundation of the fastest known exact algorithm. It models a timed walk through the graph that prevents “short-cut” edges. Let denote an upper bound on the length of the path, i.e., on its number of edges. For every node and every point in time  there is a variable  that is iff is visited at time  (4g). equationparentequation

(4a)
s.t. (4b)
(4c)
(4d)
(4e)
(4f)
(4g)
In every step at most one node can be visited (4b); a node can be visited at most once (4c); the time points have to be used consecutively (4d); nodes visited at consecutive time points need to be adjacent (4e); and nodes at non-consecutive time points cannot be adjacent (4f).

However, yields only weak LP relaxations (cf. Section 4). To obtain a practical algorithm, the authors of [29] iteratively solve for increasing values of  until its optimal objective value becomes less than . They use the graph’s diameter as a lower bound on to avoid trivial calls. In addition, they add supplemental symmetry breaking inequalities.

Appendix 0.B Multi-Commodity-Flow Model

A flow formulation allows a compact, i.e., polynomially-sized, model. We start with and extend it in the following way: Each node is assigned a commodity and sends—if is part of the induced path—two units of flow of this commodity from to using only selected edges, where edges have capacity one (per commodity). This ensures that each node in the solution lies on a common cycle with . Consider the bidirected arc set  that consists of a directed arc for both directions of each edge in . Let () denote the arcs of with source (resp. target) . We use variables to model the flow of commodity  over arc ; we do not actively require them to be binary. The below model, together with , forms . equationparentequation

(5a)
(5b)
(5c)

The capacity constraints (5a) ensure that flow is only sent over selected edges. Equations (5b) model flow preservation (up to, but not including, the sink ) and send the commodities away from their source , if is part of the solution.

Appendix 0.C Proofs for Section 4 (Polyhedral Properties)

Proposition 1

(Proposition 5 from [29]) For every instance and every number of timesteps has objective value .

Proof

We set to for all and . It is easy to see that this solution is feasible and attains the claimed objective value. ∎

Proposition 2

is stronger than . Moreover, for every there is an infinite family of instances on which has objective value at most and has objective value at least .

Proof

By Proposition 1, will always attain value . To show the strength claim, it thus suffices to give instances where yields a strictly tighter bound.

Already a star with at least three leaves proves the claim, as guarantees a solution of optimal value . However, it can be argued that such graphs and substructures are easy to preprocess. Thus, we prove the claim with a more suitable instance class.

Choose any , start with two nodes , connect them with internally node-disjoint paths of length 2, and add new node with edge . A longest induced path in this graph contains exactly  edges: and the two edges of one of the --paths. Let denote the degree of node  in  without added star . By summing all constraints (1c) we deduce

For the double sum we see that any edge incident to or is considered times, i.e., it has adjacent edges, while the other edges are considered times. Thus . In the second sum , is the only edge with coefficient (instead of ), and we thus have . By (1b) and the variable bounds we have . Since we overall have , giving objective value . As the objective must be integral, this even yields the optimal bound  when using within an ILP solver.

We furthermore note that, to achieve strictly two-connected graphs, we could, e.g., also consider a cycle where each edge is replaced by two internally node-disjoint paths of length 2. However, in the above instance class the gap between the relaxations is larger, which is why we refrain from giving further details to the latter class. ∎

Proposition 3

and are equally strong.

Proof

Let and be the polytope of and , respectively. Let be the projection of onto the -variables by ignoring the -variables. Then . We show that the projection is surjective. Clearly, it retains the objective value. We observe that by constraints (5a) for any node  there can be at most units of flow along edge  that belong to some commodity . By constraint (5b), each node  sends units of flow that have to arrive at node . Consequently, the claim—both that any solution maps to an solution and vice versa—follows directly from the duality of max-flow and min-cut. ∎

Proposition 4

For any , is stronger than .

Proof

is as least as strong as as we only add new constraints. Let , the complete graph on nodes. By choosing in constraint (3), has objective value .

However, allows a solution with objective value : We set for each and for each to obtain an LP feasible solution  to : Clearly, constraints (1b,1c) are satisfied. The cut constraints (2a) are satisfied since edge variables are chosen uniformly (w.r.t. the two above edge types) and the right-hand side of the constraint sums over at least as many edge variables (per type) as the left-hand side. For any clique of size at most , the left-hand side of its clique constraint (3) sums up to at most .

We note that it is straight-forward to generalize , so that it contains only as a subgraph, while retaining the property of having a gap between the two considered LPs. ∎