Large scale graph optimisation problems are at the heart of urban science, computational sustainability  and human wellbeing. Examples include computing connected subgraphs to design wildlife corridors  and planning methods for bike sharing systems in New York . With of the world population exposed to particulate matter (PM2.5) concentrations that are above the annual mean World Health Organization air quality guideline levels , we are motivated by the interesting and challenging problem of finding running routes that minimise air pollution in a city.
Consider a runner planning a route which starts and ends at the same location. The runner would like the route to be sufficiently long but they do not want to run too far. However, running without considering the air quality in the local area is a suboptimal approach. Air pollution has an adverse effect on the cardio-respiratory system, which can be exacerbated by increased inhalation during exercise . Moreover, air pollution in urban environments is highly localised because factors such as transportation, industry and construction largely contribute to the poor air quality . In order to minimise the exposure to air pollution, the runner could use a mobile or web application to request and view an appropriate route. An algorithm which computes such a running route efficiently is therefore highly desirable.
With the above motivation in mind, we study the Constrained Least-cost Tour (CLT) problem. The input is an undirected graph representing the road network with edge weights (distance) and edge costs (air pollution). We are given a lower weight threshold and an upper weight threshold . We are also given a specially annotated vertex called the origin representing the start and end location of the run. The objective is to minimise the total cost of a tour starting and ending at the origin such that the total weight of the tour is weight-feasible. A weight-feasible tour means that the total weight of the tour is at least and at most . In the context of a running route, weight-feasible means that the route is sufficiently long (at least ) but is not too far for the runner (at most ).
The CLT problem is most closely related to the family of Travelling Salesman Problems with Profits (TSP-wP). Table 1 summarises the similarities and differences between CLT and some TSP-wP family members. This family of TSPs does not require the tour to visit every vertex in the graph and may be rooted (tour must start and end at a given vertex) or unrooted. In a review by Feillet et al. , the TSP-wP family is split into three classes: Quota TSP (Q-TSP), Selective TSP (S-TSP) and Profitable Tour Problems (PTP). The Orienteering Problem (OP) [11; 13] is a well-known S-TSP and the Prize-collecting TSP (Pc-TSP)  is an important PTP. However, the more similar TSP-wP class to the CLT probem is Q-TSP .
In Q-TSP, we are given an undirected graph with a profit function on the vertices, a cost function on the edges and a quota . The goal of Q-TSP is to minimise the total cost of a tour such that the total collected profit is at least . There are three key differences between the CLT Problem and the Q-TSP. First, the profit function in Q-TSP is on the vertices of the graph whereas the weight function in the CLT Problem is on the edges of the graph. Second, there is no upper profit threshold in Q-TSP, where as the CLT Problem defines an upper weight threshold . Third, the limited existing literature  for Q-TSP assumes the triangle inequality holds on the cost function and obtains an approximation by doubling a -Minimum Spanning Tree . However, the cost function (air pollution exposure) in our real-world application does not follow this inequality.
TSP-wPs such as Q-TSP may ask for a weak tour (vertex repetition is not constrained) or a strong tour (vertices are repeated at most once, thus the tour is a simple cycle). We investigate both the weak and strong variants of the CLT problem. In the remainder of this paper, we refer to the weak variant as the CLT problem, and the strong variant as the Constrained Least-cost Cycle (CLC) Problem.
Our contributions are as follows. In Section 3, we prove that the CLT problem is -hard, even in the case when the input graph is a path. A simple reduction from the Hamiltonian Cycle problem shows that if the triangle inequality does not hold on the cost function, then the CLC problem does not have an -approximation algorithm for any (assuming ). Thus, we will focus on heuristic approaches to the problem which find sufficiently good solutions in polynomial-time (Section 5). First, we introduce the Déjà Vu heuristic (DjV) to find weak tours for the CLT problem. Next, we propose Suurballe’s heuristic (SH), which calls on Suurballe’s algorithm  to find solutions to the CLC problem. Finally, we develop the Adaptive heuristic (AH) that extends SH by exploring a greater proportion of the solution space. We analyse the performance of our heuristics on two datasets by comparing them against the continuous and connectivity relaxations of the CLT and CLC problems respectively.
2 Problem definition
A tour is a sequence of vertices starting and ending at the same vertex, where every two consecutive vertices in the sequence are adjacent to each other in the graph.
In the CLT problem, we are given an undirected graph where denotes the set of vertices and denotes the set of edges. Each edge has a weight function defined by and a cost function defined by . We are also given a vertex and weight thresholds where . Let denote the number of times edge is repeated in tour . We denote the total cost and weight of a tour by and . We say a tour is weight-feasible if and only if .
The goal of the CLT problem is to minimise the total cost of a tour starting and ending at such that the tour is weight-feasible. That is, the goal is to minimise subject to and . In the CLT Decision problem, we ask if there is a tour such that and where . Finally, we define the Constrained Least-cost Cycle (CLC) problem, which has the same objective function and constraints as CLT, but adds the requirement that the tour must be a simple cycle that visits vertices at most once.
We prove that CLT Decision is -hard by reducing from the unbounded subset sum problem, which is known to be -hard [14; 15]. For brevity, we continue to refer to this variant as Subset Sum. We construct an instance of CLT Decision on a path, starting from an instance of Subset Sum. We show that an instance of Subset Sum is a Yes-instance if and only if our constructed instance of CLT Decision is a Yes-instance.
In the Subset Sum problem, we are given a set of items , a target value and weight function ,
and want to decide whether there is a vector
, and want to decide whether there is a vectorsuch that .
The Constrained Least-cost Tour problem is -hard, even if the input graph is a path.
Let be an instance of Subset Sum. We construct an instance of CLT Decision as follows. Let denote the graph on vertex set and edge set where has endpoints and for every . In other words, is a path on vertices. For all , we define the weight function as and the cost function as . For the last edge , we set to be and to be . We set to be , to be and to be any positive integer such that . Finally, we assign . This completes the construction of the CLT Decision instance. Clearly, the reduction is polynomial-time. We now argue that is a Yes-instance of Subset Sum if and only if the constructed instance is a Yes-instance of CLT Decision.
Suppose that is a Yes-instance of Subset Sum and let be the solution. Consider the “natural" tour in starting at , traversing every () exactly times and the edge exactly twice. Then, the values of and are precisely the following:
Hence, we conclude that if is a Yes-instance, then the constructed instance of CLT Decision is a Yes-instance. Conversely, suppose that the constructed instance of CLT Decision is a Yes instance. This implies a tour in starting at such that and .
traverses every edge of at least twice and the edge exactly twice.
Observe that in order to prove the claim, it is sufficient to prove that traverses exactly twice. must traverse at least twice, since cannot be traversed once, and not traversing would contradict our assumption that . must traverse at most twice, since otherwise traverses at least four times, and we obtain and which is a contradiction.
We now describe the solution vector for the Subset Sum instance. Recall we have already proved that for all , , is even, and . For every , define Clearly, for all as required in the description of Subset Sum and since traverses exactly twice, we conclude that . Here, we use the fact that to imply that and use to infer that . This completes the proof in the converse direction.
Theorem 1 proves CLT is -hard when the input graph is a path, so clearly CLT is -hard on a general graph. The CLC problem can also be shown to be -hard by reducing from the Hamiltonian Cycle problem. We further note that if the cost function does not satisfy the triangle inequality, then CLC does not have an -approximation algorithm for any (assuming ). 111See appendix for complete proofs.
Relaxing -hard problems often provides useful insights about the optimal solution to the original problem and provides a lower bound we can compare our heuristics against. In this work, we consider two relaxations. Firstly, we show that on continuous graphs, the CLT problem has a polynomial-time algorithm that finds the optimal solution. In a continuous graph each edge is viewed as infinitely many vertices of degree two with infinitesimally small edges (formally the continuous graph is the geometric realisation of the graph topology). This is equivalent to saying that the multiplicity of an edge can be any positive real value (). Secondly, we give an integer programming formulation of CLC and relax the constraint that the solution must be connected.
4.1 Continuous relaxation
Let denote a continuous graph. We define the induced graph over a tour of as follows. A vertex is in if . An edge is in if the multiplicity of the edge in is greater than zero. If is a path where is the origin and for each , then we call the edge the head of the tour, and the remaining edges the tail of the tour. We argue that in this relaxation, we may assume that the optimal tour induces a path in .
On a continuous graph, the induced graph of the optimal tour is a path.
Take an optimal tour and let be an edge in the induced graph which has the least cost per unit weight among the edges in . Without loss of generality, suppose that is the first vertex of which is visited by in the traversal starting from . Let denote a minimum cost path in from to . On the one hand, may already be greater than , in which case we can simply find another tour contained within , of same weight as and at most the same cost as , which in turn must be at most . But, may be less than . However, the cost of any subtour of which starts at and visits must still have cost at least that of .
Now, consider the tour obtained by starting at , traversing and arriving at , followed by taking with multiplicity , followed by taking all the way back to . Call this tour . By definition, . Moreover, it is straightforward to see that .
Thus, we can compute an optimal solution to the continuous relaxation in polynomial-time by running Dijkstra’s algorithm , computing the multiplicity of the head for every edge , and returning the least-cost solution. We also have a lower bound on the cost of an optimal discrete solution:
The cost of the optimal constrained least-cost tour on graph is less than or equal to the cost of the optimal constrained least-cost tour on .
4.2 Connectivity relaxation
We first turn our attention to an integer programming (IP) formulation for the CLC problem. Let be an edge between vertex and . We place the variable on the edges of the graph. The objective function is to minimise subject to for all . Let for a subset and be the set of edges adjacent to vertex . We add the constraints to enforce starts and ends at the origin , and to ensure is closed. Finally, we add the sub-tour elimination constraint [4; 16] to enforce connectivity. Optimally solving an IP that enforces connectivity is not be practical in our application of a runner requesting a route, since the user might have to wait for too long as the size of the graph and length of the requested run increase. Thus we relax the connectivity constraint and use the resulting IP as a lower bound on the optimal solution of CLC to compare our heuristic against.
Our approach is to develop heuristics that run in polynomial-time and return close to optimal solutions in real-world environments. Fig. 1 shows examples for each algorithm. First, we present the Déjà Vu (DjV) heuristic for the CLT problem that exploits the continuous relaxation.222The repetitive nature of the low-cost edge is where the name “Déjà Vu” comes from. Next, we propose Suurballe’s heuristic (SH) that finds low-cost cycles for the CLC problem. Finally, we introduce the Adaptive heuristic (AH) which extends SH to explore a greater proportion of the solution space.
5.1 Déjà Vu Heuristic
Our Déjà Vu (DjV) heuristic exploits the intuition that the optimal solution to CLT on a continuous graph is a good indicator of a low-cost solution to CLT on a discrete graph. DjV walks along a path to an edge with low cost, repeats this edge with some positive even multiplicity, then walks back along the same path to the origin.
DjV computes the least-cost tree rooted at . We store the parent and cost of the least-cost path from to . For each edge , we extract where is the endpoint of with least . If , then the multiplicity of edge in tour is
where rounds up to the closest positive even integer. The time-complexity of DjV is .
5.2 Suurballe’s heuristic
We now propose Suurballe’s heuristic (SH) which uses the fact that a pair of vertex-disjoint simple paths between two vertices and form a simple cycle . Suurballe’s algorithm [18; 19] solves the Shortest Pairs of disjoint paths problem: given a directed, weighted graph , find a pair of edge-disjoint paths with minimum total cost from a source vertex to a sink vertex , for every possible sink . Suurballe and Tarjan  give an algorithm for Shortest Pairs with time complexity . Their algorithm requires to be asymmetric, that is if is an arc in , then is not in . To construct a directed, asymmetric graph from our undirected graph , we use the vertex splitting transformation as described by Suurballe and Tarjan. The splitting transformation also allows us to compute vertex-disjoint paths on using Suurballe and Tarjan’s edge-disjoint algorithm.
Given , SH runs Suurballe’s algorithm and computes the cost of the shortest pairs of vertex-disjoint paths from to every vertex . From , we construct a simple cycle containing and . We return the least-cost simple cycle that is weight-feasible (). The time complexity of the heuristic is under the assumption that the number of vertices in is much smaller than the number of edges in the graph. The weakness of SH is it only considers a small subset of the solution space to CLC. This subset covers all cycles which can be constructed by finding the least-cost pair of vertex-disjoint paths from to a vertex in the graph.
5.3 Adaptive heuristic
The Adaptive heuristic (AH) extends SH by exploring a larger solution space. This space encompasses all cycles containing the origin that are formed by computing the least-cost pair of vertex-disjoint paths between every pair of vertices in the graph. Thus the solution space of SH is a subset of the solution space of AH. Psuedocode for AH is given by Algorithm 1. The time-complexity of AH is because we must compute Suurballe and Tarjan’s algorithm  at most times. The increase in time-complexity is the price to pay for exploring more solutions.
We now present the results from running our heuristics on two datasets and comparing them against our relaxations.333AH = adaptive heuristic, CR = continuous relaxation, XR = connectivity relaxation, SH = Suurballe’s heuristic, DjV = Déjà Vu heuristic. Code available at https://patrickohara.github.io/CLT-problem/. DjV is compared against CR. SH and AH are compared against XR. Every algorithm is tested at 10 different weight thresholds and 10 random origins. The gap between and is kept constant at 250 meters for the pollution dataset and 5 units for the Crucible dataset. Experiments for the heuristics and continuous relaxation are computed on a Microsoft Azure virtual machine with four CPUs and 14GB of memory running Linux. The connectivity relaxation is computed using the IBM Decision Optimisation Cloud service with 10 cores and 60GB of memory. We set the maximum time limit for IP to solve the connectivity relaxation to one hour. We reduce the size of our input graph with two pre-processing steps. The first removes vertices which cannot be reached from within . The second removes all leaves from the graph when solving the CLC problem, and so is only applied to SH, AH and XR.
Figs. 2 and 3 show the effect of increasing the weight thresholds. Figs 4 and 5 display examples of routes computed by our heuristics for the Crucible and air quality dataset respectively. Table 2 summarises the overshoot and margin of error. The overshoot of a tour is defined as . The margin of error of a heuristic against its relaxation is defined as 444Further details of our pre-processing, methodology and datasets are available in Section D of the appendix..
We conduct experiments on two different datasets to show our methods work in different environments. The air quality dataset contains around vertices and edges. The Crucible dataset contains vertices and edges. These datasets are pre-procecessed as described above, thus reducing the size of the input graph before we execute our algorithms.
Air quality in London: The goal is to minimise air pollution exposure of a runner in London such that the total distance of the route is within a given range. We assume people run on the London road network. Vertices in represent road intersections and edges represent the roads. The weight of an edge is the distance of a road and the cost of an edge is the total pollution a runner is exposed to by running along the road. The air quality model of London is a non-stationary mixture of Gaussian Processes [21; 1] that predicts air quality (nitrogen dioxide) from data such as sensors, road traffic and weather. We note that our methods are not dependent on the type of model used. The output of the model is a two-dimensional grid which overlays the road network. The cost of an edge is the mean pollution of the grid squares intersecting in space multiplied by the weight of
. We assume the pollution (cost) is uniformly distributed along an edge.
The Crucible: Our second application finds tours which seek a diverse variety of environment types in “The Crucible” map from the game Warcraft III . The map consists of grid squares (512 by 512) belonging to one of the following five environment types (classes): normal ground (1), shallow water (2), trees (3), water (4) and out of bounds (5). An agent moving in the environment can only traverse normal ground, thus classes 2-5 trees are all defined as impassable for an agent. Let a vertex in represents a grid square in the map. If a grid square is surrounded by a diverse variety of environment types, then the representative vertex will have high entropy. We define the entropy of a vertex by where is the number of classes and
is the probability of classappearing in a 7 by 7 grid centered on vertex . A unit weight edge exists between vertex and if both and are passable. The cost of an edge is defined by .
|Air Quality||The Crucible|
|Continuous relaxation (CR)||0.00||0.00||0.00||0.00|
|Déjà Vu heuristic (DjV)||34.20||1.18||0.16||0.81|
|Connectivity relaxation (XR)||17.16||0.00||0.00||0.00|
|Adaptive heuristic (AH)||41.77||1.96||0.00||2.22|
|Suurballe’s heuristic (SH)||120.98||4.57||1.98||4.68|
CLT: DjV yields consistently low-cost tours which have error on both datasets (Table 2) when compared to the CR lower bound. Fig. 4 demonstrates the algorithm traversing a path to an edge with low-cost, repeating this edge 18 times (the long, flat line on right of Fig. 4), before returning to the origin. DjV traverses every edge with even multiplicity so the total weight of every tour will be even (assuming
). Thus DjV does not consider low-cost solutions with odd weight, so ifand are odd integers, then DjV will not return a solution. However, the time-complexity means DjV is fast (right of Figs. 2 and 3) and the algorithm works well for a general in practise.
CLC: AH significantly outperforms SH on both datasets because it explores a larger proportion of the solution space. SH also significantly overshoots the lower weight threshold compared to AH (Table 2), resulting in SH traversing more weight and thus (in general) more cost. However, the trade-off for lower cost solutions is the time-complexity of AH compared to the running time of SH. The right of Figs. 2 and 3 clearly show this difference in time, and also highlights how long it takes for the integer program (IP) to optimally solve XR. Indeed, the drop off in cost in Fig. 2 (left) for XR correlates with the point where the IP is no longer solving instances optimally because the running time is cut-off at 1 hour (Fig. 2 right). In Fig. 3 (right), the IP quickly hits the cut-off time limit because the size of the Crucible graph is bigger, so the IP only returns a bound and not the optimal solution for . Thus for in the pollution dataset and
for the Crucible dataset, the XR lower bound is actually larger than the given value, and so we are slightly over-estimating the error for AH and SH.
7 Final remarks
We have introduced the Constrained Least-cost Tour problem: an -hard routing problem with the motivating application of finding running routes that minimise air pollution exposure in a city (see Fig. 5). We have derived relaxations and proposed heuristics for weak tours (CLT) and strong tours (CLC). Experiments on both datasets show our algorithms perform competitively when compared to our derived lower bounds. Finally, the motivating application of “running from air pollution" has a rich problem structure that we plan to further exploit; multiple pollutants, varying human sensitivities to different pollutants and uncertainty of the forecasting models.
Patrick O’Hara and Theodoros Damoulas are funded by the Lloyds Register Foundation programme on Data Centric Engineering through the London Air Quality project. This work was furthermore supported by The Alan Turing Institute for Data Science and AI under EPSRC grant EP/N510129/1 in collaboration with the Greater London Authority. Patrick O’Hara was previously supported by the Warwick Impact Fund. We would like to thank Oliver Hamelijnck (The Alan Turing Institute) for providing the air quality predictions of London and further thank the anonamous reviewers of IJCAI for their useful feedback.
Aglietti et al. 
V. Aglietti, T. Damoulas, and E. Bonilla.
Efficient inference in multi-task cox process models.
Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, 2019.
Awerbuch et al. 
B. Awerbuch, Y. Azar, A. Blum, and S. Vempala.
Improved approximation guarantees for minimum-weight k-trees and
Proceedings of the Twenty-seventh Annual ACM Symposium on Theory of Computing, pages 277–283, 1995.
- Balas  E. Balas. The prize collecting traveling salesman problem. Networks, 19(6):621–636, 1989.
- Bauer et al.  P. Bauer, J. Linderoth, and M. Savelsbergh. A branch and cut approach to the cardinality constrained circuit problem. Mathematical Programming, 91(2):307–348, 2002.
- Dijkstra  E. W. Dijkstra. A note on two problems in connexion with graphs. Numer. Math., 1(1):269–271, 1959.
Dilkina and Gomes 
B. Dilkina and C. P. Gomes.
Solving connected subgraph problems in wildlife conservation.
Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, pages 102–116, 2010.
- Feillet et al.  D. Feillet, P. Dejax, and M. Gendreau. Traveling salesman problems with profits. Transportation Science, 39(2):188–205, 2005.
- Freund et al.  D. Freund, S. G. Henderson, and D. B. Shmoys. Bike sharing. In Sharing Economy: Making Supply Meet Demand, volume 6, pages 435–459. Springer International Publishing, 2019.
- Garg  N. Garg. Saving an epsilon: A 2-approximation for the k-mst problem in graphs. In Proceedings of the Thirty-seventh Annual ACM Symposium on Theory of Computing, pages 396–402, New York, NY, USA, 2005.
- Giles and Koehle  L. V. Giles and M. S. Koehle. The health effects of exercising in air pollution. Sports Medicine, 44(2):223–249, 2014.
- Golden et al.  B. L. Golden, L. Levy, and R. Vohra. The orienteering problem. Naval Research Logistics (NRL), 34(3):307–318, 1987.
- Gomes  C. P. Gomes. Computational sustainability: Computational methods for a sustainable environment, economy, and society. The Bridge, 39(4):5–13, 2009.
- Gunawan et al.  A. Gunawan, H. C. Lau, and P. Vansteenwegen. Orienteering problem: A survey of recent variants, solution approaches and applications. European Journal of Operational Research, 255(2):315 – 332, 2016.
- Karp  R. M. Karp. Reducibility among combinatorial problems. In Proceedings of a symposium on the Complexity of Computer Computations, pages 85–103, 1972.
- Kellerer et al.  H. Kellerer, U. Pferschy, and D. Pisinger. Introduction to np-completeness of knapsack problems. In Knapsack Problems, pages 483–493. Springer Berlin Heidelberg, 2004.
- Laporte  G. Laporte. Generalized subtour elimination constraints and connectivity constraints. The Journal of the Operational Research Society, 37(5):509–514, 1986.
- Sturtevant  N. R. Sturtevant. Benchmarks for grid-based pathfinding. IEEE Transactions on Computational Intelligence and AI in Games, 4(2):144–148, 2012.
- Suurballe  J. W. Suurballe. Disjoint paths in a network. Networks, 4(2):125–145, 1974.
- Suurballe and Tarjan  J. W. Suurballe and R. Tarjan. A quick method for finding shortest pairs of disjoint paths. Networks, 14:325–336, 1984.
- Vardoulakis and Kassomenos  S. Vardoulakis and P. Kassomenos. Sources and factors affecting pm10 levels in two european cities: Implications for local air quality management. Atmospheric Environment, 42(17):3949 – 3963, 2008.
Wilson et al. 
A. G. Wilson, D. A. Knowles, and Z. Ghahramani.
Gaussian process regression networks.
Proceedings of the 29th International Conference on Machine Learning, pages 1139–1146, 2012.
- World Health Organization  World Health Organization. Exposure to ambient air pollution from particulate matter, 2016.
Appendix A -hardness
For completeness we give full proofs of Theorem 1, the -hardness of CLC and the approximation of CLC.
a.1 The CLT Problem
We expand upon the proof of Theorem 1 in the paper.
In particular, we expand on Claim 1.
The Constrained Least-cost Tour Problem is -hard, even when the input graph is a path.
Recall from the paper that we construct an instance of CLT Decision from an instance of Subset Sum. We argue that is a Yes-instance of Subset Sum if and only if the constructed instance is a Yes-instance of CLT Decision.
Suppose that is a Yes-instance of Subset Sum and let be the solution. Consider the “natural" tour in starting at . For every , traverses exactly times. The edge is traversed exactly twice. Thus, the value of is precisely the following:
Similarly the value of is:
Hence, we conclude that if is a Yes-instance, then the constructed instance of CLT Decision is a Yes-instance.
Conversely, suppose that the constructed instance of CLT Decision is a Yes instance. This implies we have a tour in starting at such that and .
traverses every edge of at least twice and the edge exactly twice.
Observe that in order to prove the claim, it is sufficient to prove that traverses exactly twice. This is because is a path and any tour containing and must traverse every edge of at least once and any tour in containing must traverse every edge of an even number of times. Consequently, we now focus on proving that traverses exactly twice.
We first show that traverses at least twice. Suppose not. Then, does not traverse at all. Note that we have defined such that and since is a solution for the instance of CLT Decision, we know that . Then, by the pigeonhole principle, there must be an edge such that the multiplicity of in is greater than and hence greater than , which is precisely the value of . This contradicts our assumption that . Hence we conclude that traverses at least twice.
It remains to argue that traverses at most twice. Suppose that this is not the case and that the edge occurs at least 3 times in . Then, it must be the case that appears at least 4 times in . In this case, and , which is a contradiction since we chose in such a way that . Hence we conclude that traverses at most twice. ∎
We now describe the solution vector for the Subset Sum instance. Recall we have already proved that for all : and is even. For the last edge: . For every , define Clearly, for all as required in the description of Subset Sum.
It remains to argue that . Since traverses exactly twice, we first use the fact that to conclude that :
and use to infer that :
This completes the proof in the converse direction, and thus the proof of the theorem. ∎
a.2 The CLC Problem
Given an undirected graph with vertices and edges , the Hamiltonian Cycle (HC) problem is to find a cycle that visits every vertex in the graph exactly once.
Given a graph with a weight function and cost function on the edges; a start vertex ; two weight thresholds and a cost threshold , the CLC-Decision problem asks is there a strong tour starting and ending at such that and .
The Constrained Least-cost Cycle (CLC) Problem is -hard.
To prove CLC-Decision is -hard, we find a polynomial-time reduction from HC. We note that HC is -complete. Let be an instance of HC on a graph with vertices and edges . We construct an instance of CLC-Decision as follows. Let the cost and the weight of every edge be 1. Set where is the number of vertices in . Let . The reduction is clearly polynomial. We prove that is a Yes-instance of HC if and only the constructed instance of CLC-Decision is a Yes-instance.
Suppose is a Yes-instance. Then is a simple cycle that visits every vertex in the graph (including ) exactly once. The cycle is a strong tour that traverses exactly edges and includes the start vertex . Thus the total weight of on the constructed instance of CLC-Decision is and the total cost is . Hence we conclude the constructed instance is a Yes-instance.
Now suppose the constructed instance of CLC-Decision is a Yes-instance. Then we have a strong tour that starts at the origin , visits every vertex at most once, and has total weight and total cost . We need to show that visits every vertex exactly once. Suppose otherwise. Then there must exist a vertex that has not been visited by , which implies is a simple cycle on at most vertices that has total weight and total cost of . This is clearly not possible since edges have unit weight and unit cost. Thus we conclude that visits every vertex exactly once and the strong tour is a Hamiltonian cycle.
a.3 Approximation of CLC
Let OPT be the cost of the optimal solution to a problem and be greater than or equal to 1. An algorithm is an -approximation algorithm if and only if for every instance of the problem it returns a solution within a factor of OPT.
When referring to an -approximation algorithm, we shall mean that the algorithm must run in polynomial-time.
For CLC without the triangle inequality assumption, there does not exist an -approximation algorithm for any , provided .
Given an instance of HC with vertices and edges , construct an instance of CLC on a complete graph as follows. For all , let in . For all pairs of vertices for which and , let and . Let where is the number of vertices. Assume there exists an -approximation algorithm (Apx) for CLC. We show that such an algorithm can be used to solve HC in polynomial time.
First suppose there exists a Hamiltonian cycle in . Then the optimal solution OPT for CLC will have cost and weight , so . Now suppose there does not exists a Hamiltonian cycle in . Then OPT must use one edge not in with cost . The cost of OPT will therefore be .
Hence we conclude that has a Hamiltonian cycle if and only if the cost of Apx is at most .
Appendix B An Integer Programming Formulation of the CLC Problem
Recall that is a 0-1 variable placed on the edges in graph , is the set of edges adjacent to vertex , and is the sum of variables on edge set .
|Subtour elimination constraints||(6)|
There are several ways to define the subtour elimination constraint. We give one such formulation using cutsets:
where is the set of minimal -edge cuts. That is, each cut in is a minimal set of edges that if removed would disconnect and in the graph. The connectivity relaxation uses the objective function (1) with contraints (2)-(5) from the IP formulation above, but relaxes constraint (6).
Appendix C Suurballe’s Algorithm
In the CLT/CLC problem, we are given an undirected graph with vertices and edges . Edges have a weight function and a cost function . However, Suurballe’s algorithm requires a directed input graph that is asymmetric, that is if is an arc in the graph, then is not in the graph.
We construct a directed, asymmetric graph from the undirected graph as follows. For each vertex , split into two vertices and and add them to . Add a directed split arc from to in with zero cost and zero weight. For every undirected edge adjacent to in , add a directed arc from to in with the same weight and cost as in . The construction requires time and space complexity.
Appendix D Experiments
In this section, we expand upon the pre-processsing algorithms, methodology and datasets uses for the experiments. To run our algorithms on the same datasets, please refer to our GitHub repository555https://patrickohara.github.io/CLT-problem/.
We use two pre-processing methods to reduce the size of the graph. The time taken for pre-processing is not included when timing the algorithms on the right of Figures 2 and 3. The first pre-processing algorithm removes vertices from the graph that cannot be reached from the origin within weight , that is, if the weight of the shortest path from the origin to a vertex is greater than , then remove from the graph (see Algorithm 2). The second removes vertices with degree one (leaves) and is given in Algorithm 3. It uses a recursive depth first search to remove leaves from the graph.
We compare our heuristics on two different datasets to show that our methods can be applied in different contexts. The contrast between the air quality (AQ) dataset and the Crucible dataset is also interesting. Firstly, the structure of the graph in the Crucible is a grid, whereas the AQ graph is a road network. Running heuristics on different types of graphs can often highlight strengths of weaknesses, although in our experiments there were no such notable strengths or weaknesses. Secondly, the cost function on edges in the two datasets is spatially distributed in a very different way. In the Crucible, there are large areas of space where in all edges will have uniform cost. Compare this to AQ in which there are large areas of space with relatively low air pollution (cost) with localised peaks of highly polluted air. Despite these differences, our heuristics have a similar margin of error on both datasets when compared to their respective relaxations (Table 2).
Air quality in London
The road network of London was downloaded from the Ordnance Survey666https://www.ordnancesurvey.co.uk. The air quality prediction shown in Fig. 6 are a snapshot from an AQ model of London. The model is currently under development at the Alan Turing Institute. We emphasise that our algorithms are not dependent upon the model of air quality. The road network was pruned using an SQL PostGIS query to return all roads that intersect with the prediction area of the AQ model.
Fig. 7 shows the Crucible dataset which can be downloaded from Moving AI777https://www.movingai.com/benchmarks/wc3maps512/index.html. The aim of the agent is to find a tour that visits a diverse range of environment types. The original dataset is shown in Fig. 6(a) and the diversity of environment types is shown in Fig. 6(b). The agent will seek the darker areas of Fig. 6(b). The darker areas show borders between different environment types, for example where the normal ground (class 1) meets trees (class 3).
d.3 Additional details
The heuristics and relaxations are coded using Python. Specifically, we use the networkx library to store the graph datastucture. The machines used to compute the results are given in the paper: the first was an Azure virtual machine (VM) for the heuristics and the continuous relaxation (CR); and the second was an IBM Cloud machine running the CPLEX library for the connectivity relaxation (XR). Using a more powerful machine for the XR means that the time taken to compute a result is not comparible to the time taken to compute a result for heuristics and CR. However, Figs. 2 and 3 [right] show that even with a more powerful machine, XR still takes substantially more time to find an solution that any of the heuristics or CR. Further, the important algorithms to compare in terms of time are the Adaptive heuristic and Suurballe’s heuristic, since they are the two competing algorithms for the CLC problem.
For each dataset, 10 vertices were chosen randomly from a uniform distribution. Each dataset was given 10 weight thresholds Each heuristic and each relaxation were tested for each configuration. For each solution to a relaxation, we record the total weight, total cost, number of vertices in the solution, the overshoot, number of vertices and edges in the pre-processed input graph and the time taken in seconds. In addition to the above quantities, we calculate the margin of error for each heuristic compared to the appropriate relaxation.