1 Introduction
Sixty years ago, the route of a delivery truck would have been fixed well before the truck departed the warehouse. Today, thanks to the availability of realtime traffic information, and the option to transmit instructions to the driver to add or remove delivery locations onthefly, the route is no longer fixed. Nevertheless, minimizing the length or duration of the route remains an important problem. This is an instance of the Traveling Salesperson Problem (TSP), and one of a growing number of practical applications which require solving combinatorial optimization problems in real time. In these problems, there is often a cost attributed to waiting for an optimal solution or hard deadlines at which decisions must be taken. For example, the driver cannot wait for a new solution to be computed – they may miss their deliveries, or the traffic conditions may change again. Therefore, there is a pressing need for combinatorial optimization algorithms that produce
highquality solutions with minimal computation time. This remains challenging for current algorithms, as most research has focused on finding optimal solutions to static problems with little regard for computation time.Contributions We present a hybrid datadriven approach for approximately solving the Euclidean TSP based on Graph Neural Networks (GNNs) and Guided Local Search (GLS), which we demonstrate converges to optimal solutions more quickly than three recent learningbased approaches and one nonlearning GLS algorithm.
We provide the following contributions:

[itemsep=0em,leftmargin=0.25in]

Our GLS algorithm is guided by the global regret of each edge in the problem graph. This allows the algorithm to differentiate edges that are very costly to include in the solution from ones that are not so costly, whether or not they are part of the optimal solution. Thus, using regret allows us to find highquality, rather than optimal, solutions very quickly. We are the first to use a measure of global regret, which we define as the cost of enforcing a decision relative to the cost of an optimal solution.

We make this computationally tractable by approximating regret with a learned representation. Our GNNbased model operates on the line graph of the problem graph, which allows us to build a network without node features, focusing only on edge weight.

We present an improved evaluation criterion for the TSP that emphasizes the tradeoff between solution quality and computation time. Notably, our approach finds optimal solutions to 96% of the 50node problem set, 7% more than the next best benchmark, and to 20% of the 100node problem set, 4.5 more than the next best benchmark. When generalizing from 20node problems to the 100node problem set, our approach finds solutions with an average optimality gap of 2.5%, a 10 improvement over the next best learningbased benchmark.
2 Related Work
The Operations Research (OR) community is responsible for the majority of research in algorithms for solving combinatorial optimization problems. Concorde (Applegate et al., 2006) is widely regarded as the best exact TSP solver. As an exact solver, it can guarantee the level of optimality of the solutions it finds. It uses cutting plane algorithms (Dantzig et al., 1954; Padberg and Rinaldi, 1990; Applegate et al., 2003) and branchandcut methods to iteratively prune away parts of search space that will not contain an optimal solution. LKH3 (Helsgaun, 2017), and its predecessor, LKH2 (Helsgaun, 2009), are approximate TSP and Capacitated Vehicle Routing Problem (CVRP) solvers based on the
opt heuristic
(Lin and Kernighan, 1973). While these solvers do not provide any guarantees, experimentation has demonstrated that they are extremely effective. Nevertheless, these methods focus on finding optimal solutions with little regard for computation time, making them illadapted for solving the TSP in realtime. With this in mind, Arnold and Sörensen (2019a) introduces KGLS, a GLS algorithm for the CVRP that is guided by three engineered features. They demonstrate that their algorithm can find solutions almost as good as those found by stateoftheart heuristics in a fraction of the time.As noted by Bengio et al. (2021); Cappart et al. (2021)
, classical combinatorial optimization approaches solve each problem instance in isolation, overlooking the fact that in practice, problems and their solutions stem from related data distributions. Machine learning offers a way to exploit this observation. Neural networkbased methods for solving combinatorial optimization problems have existed for more than three decades
(Smith, 1999), yet new neural architectures, especially GNNs, have driven a surge of activity in this area. We classify learningbased approaches for solving the TSP into the three categories identified by
Bengio et al. (2021); our approach belongs to the second category, described below.ML alone provides a solution. These approaches use a machine learning model to output solutions directly from the problem definition. Vinyals et al. (2015) propose PtrNet, a sequencetosequence model based on LSTMs, which iteratively constructs a solution by outputting the permutation of the input nodes. They train their model using expert solutions from Concorde. They use beam search (Reddy, 1977) to produce better solutions than those sampled from their model directly. Bello et al. (2016) present a method of training PtrNet without supervision using policy gradients. Their “active search” method resembles early neural networkbased approaches to combinatorial optimization, in which a model was trained online to solve a particular problem instance (Smith, 1999). This method effectively solves each problem in isolation, thus it does not leverage expertise extracted from distribution of training problem instances. Kool et al. (2018) take a similar approach to Bello et al. (2016) but use a Transformer architecture (Vaswani et al., 2017) rather than an LSTM. Their model can also be seen as a Graph Attention Network (GAT) (Veličković et al., 2017) applied to a complete graph. Joshi et al. (2019) train a model based on Graph Convolutional Networks (GCNs) (Kipf and Welling, 2016) to predict which edges in the TSP graph are likely to be part of the optimal solution, and reconstruct valid tours using beam search. Similar to Vinyals et al. (2015), they train their model using expert solutions from Concorde.
ML provides information to an OR algorithm. Machine learning may not be suitable to solve a combinatorial optimization problem alone. Instead, machine learning can provide information to augment a combinatorial optimization algorithm. Deudon et al. (2018) use a model based on the Transformer architecture to construct solutions iteratively and train it using policy gradients (they take a similar approach to Kool et al., 2018). They apply a 2opt local search heuristic to solutions sampled from their model. In contrast to prior work, we are the first to use a machine learning model to approximate regret. Furthermore, previous work in this category has only leveraged heuristics, whereas we use a model to provide information to a metaheuristic (GLS).
ML makes decisions within an OR algorithm. Finally, a machine learning model can be embedded inside a combinatorial optimisation algorithm. In this paradigm, a master algorithm controls the highlevel procedure while repeatedly calling a machine learning model to make lower level decisions. Dai et al. (2017) present a model that uses a GNN to learn an embedding of the current partial solution and a DQN (Mnih et al., 2015) to iteratively select nodes to insert using a cheapest insertion heuristic. They also use an “active search” method when solving the TSP. Wu et al. (2020) and da Costa et al. (2020) present policy gradient algorithms to learn policies to apply 2opt operations to existing feasible solutions. Wu et al. (2020) use a Transformer architecture. da Costa et al. (2020) use a combination of GCN and RNN modules, and acheive better results. These approaches can either be seen as endtoend learning approaches belonging to the first category, or as local search procedures where an ML model decides where to apply an operator.
3 Preliminaries
Traveling Salesperson Problem We focus on the Euclidean TSP, although our approach can be applied to other routing problems, such as the CVRP. A problem with cities, typically denoted TSP, is represented as a complete, undirected, weighted graph with nodes. The edges represent connections between cities and are weighted by the Euclidean distance between the adjacent nodes. A solution, or tour, is a Hamiltonian cycle: a closed path through the graph that visits every node exactly once. An optimal solution is a cycle of minimum weight.
Regret Regret measures the future cost of an action that also yields an immediate reward, and is typically used to make greedy combinatorial optimization algorithms less myopic. Typically, regret is computed over a fixed horizon. For example, Potvin and Rousseau (1993) evaluate the cost of inserting a node in the best versus secondbest position when constructing a CVRP solution. Hassin and Keinan (2008) solve the TSP using regret, allowing a greedy construction heuristic to remove the most recently inserted node. It is not computationally feasible to calculate regret over a global horizon, for example, for all possible insertion sequences. However, if it were possible, a greedy algorithm could compute an optimal solution by selecting the lowest regret decision at each step.
Local Search Local Search (LS) is a general improvement heuristic. Starting from an initial solution, local search iteratively moves to neighboring solutions that are lower cost than the current solution according to the objective function . Neighboring solutions are solutions that are reachable from a given solution by applying a certain function, or operator. The set of all solutions reachable from another solution by applying an operator define the neighborhood of that operator. The algorithm terminates when all neighboring solutions are inferior to the current solution, meaning the search has reached a local optimum.
Guided Local Search Guided Local Search (GLS) is a metaheuristic that sits on top of LS and allows it to escape local optima (Voudouris and Tsang, 1996). To apply GLS, the designer must define some aspects of a solution. When trapped in a local optimum, the algorithm penalizes certain aspects of the current solution which are considered to be unfavorable. The underlying LS procedure searches using an objective function that is augmented by these penalties, thus it is incentivized to remove heavily penalized aspects from the solution. The augmented objective function is
(1) 
where is a solution, is the objective function, is a scaling parameter, indexes the aspects, is the number of aspects, is the current number of penalties assigned to aspect , and is an indication of whether exhibits aspect , i.e.
(2) 
For the TSP, aspects of the solution are often defined as the edges in the problem graph. Therefore, indicates if an edge is in the solution . Upon reaching a local optimum, which aspects are penalized is determined by a utility function. The utility of penalising aspect , , is defined as
(3) 
where is the solution at a local optimum, indicates if the solution exhibits aspect , and is the cost of the aspect . The cost of an aspect measures how unfavorable it is. The higher the cost, the greater the utility of penalizing that aspect. In the context of the TSP, the cost can be the weight of the edge (Voudouris and Tsang, 1999), or a combination of various features (Arnold and Sörensen, 2019a). Conversely, the more penalties assigned to that aspect, the lower the utility of penalising it again. The aspects with the maximum utility are always penalized, which means increasing by one. This penalization mechanism distributes the search effort in the search space, favoring areas where a promise is shown (Voudouris and Tsang, 1996).
We use a variation of the classical GLS algorithm (see Voudouris and Tsang, 1999) that applies alternating optimisation and perturbation phases (see Arnold and Sörensen, 2019a). During an optimization phase, the local search procedure is guided by the original objective function . During a perturbation phase, it is guided by the augmented objection function . After an edge is penalized, the local search is applied only on the penalized edge. That is, only operations that would remove the penalized edge are considered. This differs from the local search during the optimization phase, which considers all solutions in the neighborhood of the given operator. The perturbation phase continues until operations (operations that improve the solution according to the augmented objective function ) have been applied to the current solution. These operations perturb the solution enough for the local search to escape a local minimum. The alternating phases continue until the stopping condition is met.
4 Method
Our hybrid method, shown in Figure 1, combines a machine learning model and a metaheuristic. Our GNNbased model learns an approximation of the global regret of including each edge of the problem graph in the solution. The metaheuristic, GLS, uses this learned regret conjunction with the original problem graph to quickly find highquality solutions. The learned regret allows the algorithm to differentiate between edges which are costly to include in the solution and ones that are not so costly, thus improving its ability to steer the underlying local search procedure out of local minima and towards promising areas of the solution space.
4.1 Global Regret
We define global regret as the cost of requiring a certain decision to be part of the solution relative to the cost of a globally optimal solution. Unlike previous heuristics using regret, which calculate the cost of a decision relative to some fixed number of alternatives (for example, the next best, or two next best options), our regret is measured relative to a global optimal solution. Decisions that are part of an optimal solution have zero regret, and all other decisions have positive regret. Mathematically,
(4) 
where is the regret of decision , is the objective function, is an optimal solution with fixed, and a globally optimal solution. With perfect information, a greedy algorithm could construct an optimal solution by sequentially selecting the lowestregret decisions.
In the TSP, decisions correspond to which edges are included in the solution, thus regret is defined as the cost of requiring that a certain edge be part of the solution. We posit that using regret is preferable to directly classifying which edges are part of the optimal solution (which is the approach taken by Joshi et al., 2019)
. Where classification can produce a probability that the edge is part of
optimal solution, regret can differentiate between edges that are very costly to have as part of the solution and ones that are not so costly, whether or not they are part of the optimal solution. Thus, using regret furthers our goal of finding highquality, rather than optimal, solutions with minimal computation time.4.2 Regret Approximation Model
Calculating the global regret of an edge in the TSP graph requires solving the TSP itself, which is computationally intractable. Instead, we aim to learn a function that approximates the regret of an edge . We use GNNs to achieve this, as they are universal function approximators that operate on graphstructured data.
Input transformation Typically, GNNs aggregate messages and store states on the nodes of a graph (Gilmer et al., 2017). Therefore, most approaches to edgeproperty prediction predict the properties of an edge as a function of the surrounding node states. Instead, we input the line graph of the original problem graph into our model. The line graph of an undirected graph is a graph such that there exists a node in for every edge in , and that for every two edges in that share a node, there exists an edge between their corresponding nodes . Figure 2 illustrates this transformation for a simple graph. The result is that our model aggregates messages and stores states on the edges of the problem graph (the nodes of the line graph). This allows us to build models with no node features, which is advantageous as the edge weights, not the specific node locations, are relevant when solving the TSP. For a complete, undirected graph with nodes, there are nodes and edges in . Thus, although is complete, can be very sparse.
Model architecture Our model consists of an embedding layer, several GNN layers, and an output layer. The embedding layer is an edgewise fully connected layer that computes dimensional edge embeddings from dimensional edge features. Node features, if used, are concatenated onto the feature set of the adjacent edges. The forward pass of the embedding layer is written as
(5) 
where is the initial embedding of edge , is a learnable weight matrix, are the input features of edge (including any node features), and is a set of learnable biases. We use and . The edge embeddings are updated using message passing layers. Inspired by the encoder layer of Kool et al. (2018), each layer consists of multiple sublayers. The forward pass is given by
(6)  
(7) 
where is a multiheaded GAT layer (Veličković et al., 2017), is a feedforward layer, is batch normalisation (Ioffe and Szegedy, 2015), and is a hidden state. The layers do not share parameters. The GAT layer uses heads and dimensionality
, and the FF layer uses one hidden sublayer with dimensionality 512 and ReLU activation. Finally, the output layer is a single edgewise fully connected layer that computes a onedimensional output from the
dimensional node embeddings computed by the final message passing layer. This is written as(8) 
where is the output for edge and is the final embedding of that edge.
4.3 RegretGuided Local Search
We adapt GLS to use regret to solve the TSP, including how the initial solution is built, the local search procedure, and the perturbation strategy. Our GLS uses alternating optimization and perturbation phases. During an optimization phase, the local search procedure greedily accepts changes to the solution until it reaches a local minimum. During a perturbation phase, the algorithm penalizes and attempts to remove edges in the current solution with high regret, thus allowing it to escape local minima while simultaneously guiding it towards promising areas of the solution space, i.e., those with low regret. Effectively, the regret predictions produced by our model allow our GLS algorithm to undo costly decisions made during the greedy optimization phase.
Initial solution We use a greedy nearest neighbor algorithm to construct an initial solution. Beginning from the origin node we iteratively select the lowestregret edge leading to an unvisited node, until all nodes have been visited.
Local Search neighborhoods Our LS procedure uses two solution operators for the TSP, relocate and 2opt. It alternates between using either operator, and uses a “best improvement” strategy, meaning that it exhaustively searches the neighborhood corresponding with the current operator and accepts the solution that improves the objective function the most, before continuing with the other operator. The algorithm terminates when no improvement can be found in either neighborhood. The relocate operator simply changes the position of a single node in the tour. The 2opt operator selects two nodes in the tour to swap. This divides the tour into three segments: an initial segment, an intermediate segment, and a final segment. The tour is reassembled beginning with the initial segment, the intermediate segment in reverse order, and the final segment. It is a special case of the opt operator (Lin and Kernighan, 1973), although it was introduced earlier (Croes, 1958).
Perturbation strategy We define the cost of an edge as the predicted regret of that edge. The utility of penalizing edge , , is therefore defined as
(9) 
where we remind the reader that is the solution at a local optimum, indicates if the solution contains edge , and is the number of penalties assigned to that edge. The edges of maximum utility are penalized. Afterward, the local search is applied only on the penalized edge. That is, only operations that would remove the penalized edge are considered.
5 Experiments
We train our model using supervised learning. It uses a single input feature, edge weight (i.e. the Euclidean distance between adjacent nodes), to predict the regret of the edges in the problem graph. Further details of our implementation are found in
Appendix A. While further input features could be considered, they come at a computational cost, as seen in Table 3. In Appendix C we compare 11 input features, and demonstrate that edge weight is the most important feature when predicting the regret of an edge.Evaluation We evaluate the tradeoff between solution quality and computation time when solving a set of TSP instances by fixing computation time and measuring the solution quality, in terms of the median optimality gap and the number of problems solved optimally. Previous works leave both variables unfixed, meaning that approaches cannot be compared to one another.
We compare our approach to the following:

[itemsep=0em,leftmargin=0.25in]

Local Search (Baseline)

attentionlearntosolvetitle (Kool et al., 2018)

gcntsptitle (Joshi et al., 2019)

learn2opttitle (da Costa et al., 2020)

kglstitle (Arnold and Sörensen, 2019a)
Where available, we use publicly available implementations and pretrained models. We modified the implementations of Kool et al. (2018); Joshi et al. (2019); da Costa et al. (2020) to stop when the computation time limit is reached, rather than after a fixed number of iterations. We also modified these implementations to report more precise tour lengths, as these are necessary to accurately report the number of problems solved optimally. Joshi et al. (2019) uses beam search, which is not an anytime algorithm. Therefore, we modified their implementation to use an anytime sampling approach similar to Kool et al. (2018). There is no publicly available implementation of Arnold and Sörensen (2019a), so we use the GLS implementation used in our approach with the guides described in their paper. We use the local search procedure used in our approach as a baseline.
We evaluate three problem sets, TSP20, TSP50, and TSP100, consisting of one thousand 20, 50, and 100node 2D Euclidean TSP instances, respectively. We generate the problem instances by uniformrandomly sampling node locations in the unit square , which is inline with the methods used by Kool et al. (2018); Joshi et al. (2019); da Costa et al. (2020). We allow up to 10 seconds to solve each problem, including the time required to calculate input features, evaluate a model, or to construct an initial solution. We evaluate the various approaches oneatatime, to ensure they do not compete for computational resources. While the evaluated approaches can all be parallelized, either on the CPU, on the GPU, or both, not all of the publicly available implementations are parallelized. We aim to evaluate the performance of the algorithms, not their implementations, therefore we allow each approach to solve the problem instances oneatatime (i.e., a batch size of ). Based on approximately two hundred thousand TSP solutions, we determined that an absolute threshold of on the value of the objective function is sufficient to separate optimal and suboptimal solutions.
Results The performance of each approach at the computation time limit is shown in Table 1. Figure 3 shows the performance as a function of time when solving the 100node problem set. Similar plots depicting the performance of a function of time for all problem sets are found in Appendix B. All approaches achieve reasonably good results on the 20node problem set, however performance decreases as problem size increases. Our approach outperforms the others on all problem sets. Notably, our approach finds optimal solutions to 96% of the TSP50 problem set, 7% more than the next best benchmark (Arnold and Sörensen, 2019a), which finds 90%. When solving the TSP100 problem set, our approach finds optimal solutions to 20.5% of the problems, 4.5 more than the next best benchmark (Arnold and Sörensen, 2019a) and achieves an average optimality gap of 0.45% when solving the TSP100 problem set, 2.9 better than the next best benchmark (also Arnold and Sörensen, 2019a), which achieves an average optimality gap of 1.59%.
Problem  Method  Optimality gap (%)  Optimal solutions (%)  

Median  IQR  
TSP20  Local Search (Baseline)  0.0000  0.6951  42.6426 
Kool et al.  0.0000  0.0000  81.2813  
Joshi et al. + Sampling  0.0000  0.0000  98.1982  
de O. da Costa et al.  0.0000  0.0000  99.2993  
Arnold et al.  0.0000  0.0000  100.0000  
Ours  0.0000  0.0000  100.0000  
TSP50  Local Search (Baseline)  2.5284  2.9722  7.4074 
Kool et al.  0.4279  0.9597  18.9189  
Joshi et al. + Sampling  0.0000  0.1278  67.0671  
de O. da Costa et al.  0.0297  0.3502  46.6466  
Arnold et al.  0.0000  0.0000  90.1902  
Ours  0.0000  0.0000  96.1962  
TSP100  Local Search (Baseline)  5.1069  2.7396  0.0000 
Kool et al.  2.8101  1.7887  0.0000  
Joshi et al. + Sampling  13.2537  15.5596  1.1011  
de O. da Costa et al.  1.3148  1.4757  1.1011  
Arnold et al.  1.5911  1.6178  4.5045  
Ours  0.4513  1.0585  20.5205 
Model  Problem  Method  Optimality gap (%)  Optimal solutions (%)  

Median  IQR  
TSP20  TSP50  Kool et al.  2.5270  2.1080  0.5005 
Joshi et al. + Sampling  28.2036  5.7565  0.0000  
de O. da Costa et al.  3.2976  2.7561  0.5005  
Ours  0.0000  0.0618  73.1732  
TSP20  TSP100  Kool et al.  26.5113  4.6053  0.0000 
Joshi et al. + Sampling  89.1632  7.4764  0.0000  
de O. da Costa et al.  32.9608  12.1003  0.0000  
Ours  2.5129  2.1176  1.2012  
TSP50  TSP100  Kool et al.  4.0025  1.6081  0.0000 
Joshi et al. + Sampling  56.7009  6.6427  0.0000  
de O. da Costa et al.  2.2806  1.8278  0.0000  
Ours  1.1577  1.4869  7.6076 
We evaluate the performance of our approach and other learningbased approaches when generalizing from smaller problem instances to larger ones. The performance of each approach at the computation time limit is reported in Table 2. All approaches perform worse when using models trained on smaller problems. Our approach performs the best, notably finding optimal solutions to 73% of the problems in the TSP50 problem set when solving them using a model trained on 20node problems, when the next best approaches find 0.5% (Kool et al., 2018; da Costa et al., 2020). Our model trained on 20node problems achieves an average optimality gap of 2.5% when solving the TSP100 problem set, a 10 improvement over the next best benchmark (Kool et al., 2018), which achieves an average optimality gap of 26.5%.
6 Discussion
Our experiments demonstrate that our approach converges to optimal solutions at a faster rate than three recent learningbased approaches (Kool et al., 2018; Joshi et al., 2019; da Costa et al., 2020) and one recent nonlearning GLS algorithm (Arnold and Sörensen, 2019a) when solving the Euclidean TSP. The results presented in Table 1 and Table 2 align with the results presented by Joshi et al. (2019) for their own method and the method of Kool et al. (2018), despite the difference in evaluation criteria. Using our evaluation criterion we achieve slightly worse results in generalization than those reported by Kool et al. (2018) in their paper, which may be attributed to evaluating the model for a fixed computation time, which we do here, rather than a fixed number of iterations: models evaluated on larger problems are slower to run, thus find fewer solutions in a fixed computation time.
Our approach generalizes to larger problems better than the other learningbased approaches, which may be as a result of the unique input transformation we apply to the problem graph. We input the line graph of the problem graph , which allows the model to aggregate messages and store states on edges rather than nodes. This allows us to build models without node features, which is advantageous as the edge weights, not the specific node positions, are relevant when solving the TSP. Including the node positions as features, as done by Kool et al. (2018); Joshi et al. (2019); da Costa et al. (2020) may hinder the learned policy’s ability to generalize. Deudon et al. (2018) apply PCA on the node positions before inputting them into their model so they are rotation invariant, yet they do not report generalization performance. For a Euclidean TSP problem graph with nodes, has nodes, meaning that although is complete, can be very sparse, which may help learning (Cappart et al., 2021). However, the model consumes more GPU memory than models using the problem graph .
Many approaches that learn construction heuristics (for example Kool et al., 2018) treat the tour as a permutation of the input nodes. By considering the solution as a sequence, these approaches miss out on its symmetry: a TSP solution is invariant to the origin city. Outputting the correct sequence, even when augmented by beam search, becomes increasingly difficult as the number of cities increases. Our model learns the regret of including a given edge in the solution based on its features and those of its neighbors. This learning task implicitly assumes the symmetry of a TSP solution and leverages the strengths of GNNs.
Our approach is not specific to the TSP, however one disadvantage is that it relies on an explicitly designed local search algorithm. To apply our framework to another problem, the local search procedure must be modified with operators that are appropriate for that problem. For example, a local search procedure for the CVRP requires intraroute operators (such as 2opt and relocate) and interroute operators. In future work, our regret approximation model could be combined with a general learned local search algorithm and could be trained endtoend.
7 Conclusion
We present a hybrid datadriven approach for solving the TSP based on Graph Neural Networks (GNNs) and Guided Local Search (GLS). To inform GLS, we use a GNNbased model to compute fast, generalizable approximations of the global regret of each edge. Our model operates on the line graph of the TSP graph, which allows us to build a network that uses edge weight alone as an input feature. This approach allows our algorithm to escape local minima and find highquality solutions with minimal computation time. Finally, we present an improved evaluation criterion for algorithms solving combinatorial optimization problems that emphasizes the tradeoff between solution quality and computation time. Using this criterion, we demonstrate that our approach converges to optimal solutions at a faster rate than three learningbased approaches and one nonlearning GLS algorithm for the TSP.
Acknowledgements
We gratefully acknowledge the support of the European Research Council (ERC) Project 949940 (gAIa).
References
 Implementing the DantzigFulkersonJohnson algorithm for large traveling salesman problems. Mathematical Programming 97 (1), pp. 91–153. Cited by: §2.
 The traveling salesman problem: a computational study. Princeton University Press, Princeton, NJ. Cited by: §2.
 Knowledgeguided local search for the vehicle routing problem. Computers & Operations Research 105, pp. 32–46. Cited by: §2, §3, §3, 5th item, §5, §5, §6.
 What makes a VRP solution good? The generation of problemspecific knowledge for heuristics. Computers & Operations Research 106, pp. 280–288. Cited by: Table 3.

Neural combinatorial optimization with reinforcement learning
. arXiv preprint arXiv:1611.09940. Cited by: §2.  Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research 290 (2), pp. 405–421. Cited by: §2.
 Combinatorial optimization and reasoning with graph neural networks. arXiv preprint arXiv:2102.09544. Cited by: §2, §6.
 A method for solving travelingsalesman problems. Operations Research 6 (6), pp. 791–812. Cited by: §4.3.
 Learning 2opt heuristics for the traveling salesman problem via deep reinforcement learning. arXiv preprint arXiv:2004.01608. Cited by: §2, 4th item, §5, §5, §5, §6, §6.
 Learning combinatorial optimization algorithms over graphs. In International Conference on Neural Information Processing Systems, pp. 6351–6361. Cited by: §2.
 Solution of a largescale travelingsalesman problem. Journal of the Operations Research Society of America 2 (4), pp. 393–410. Cited by: §2.

Learning heuristics for the TSP by policy gradient.
In
Integration of Constraint Programming, Artificial Intelligence, and Operations Research
, pp. 170–181. Cited by: §2, §6.  Neural message passing for quantum chemistry. In International Conference on Machine Learning, pp. 1263–1272. Cited by: §4.2.
 Metamodelbased sensitivity analysis: polynomial chaos expansions and gaussian processes. In Handbook of Uncertainty Quantification, pp. 1–37. Cited by: Appendix C.
 Greedy heuristics with regret, with application to the cheapest insertion algorithm for the TSP. Operations Research Letters 36 (2), pp. 243–246. Cited by: §3.
 General kopt submoves for the Lin–Kernighan TSP heuristic. Mathematical Programming Computation 1 (2), pp. 119–163. Cited by: §2.
 An extension of the LinKernighanHelsgaun TSP solver for constrained traveling salesman and vehicle routing problems. Technical report Roskilde University. Cited by: §2.
 Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456. Cited by: §4.2.
 An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227. Cited by: §2, §4.1, 3rd item, §5, §5, §6, §6.
 Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: Appendix A.
 Semisupervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Cited by: §2.
 Attention, learn to solve routing problems!. arXiv preprint arXiv:1803.08475. Cited by: §2, §2, §4.2, 2nd item, §5, §5, §5, §6, §6, §6.
 An effective heuristic algorithm for the travelingsalesman problem. Operations Research 21 (2), pp. 498–516. Cited by: §2, §4.3.
 A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21 (2), pp. 239–245. Cited by: Appendix C.
 Humanlevel control through deep reinforcement learning. Nature (508), pp. 529–533. Cited by: §2.
 A branchandcut algorithm for the resolution of largescale symmetric traveling salesman problems. Society for Industrial and Applied Mathematics 33 (1), pp. 60–100. Cited by: §2.
 Emulation of physical processes with emukit. In Second Workshop on Machine Learning and the Physical Sciences, Cited by: Appendix C.
 Pytorch: an imperative style, highperformance deep learning library. In Advances in Neural Information Processing Systems, pp. 8026–8037. Cited by: Appendix A.
 A parallel route building algorithm for the vehicle routing and scheduling problem with time windows. European Journal of Operational Research 66 (3), pp. 331–340. Cited by: §3.
 Shortest connection networks and some generalizations. The Bell System Technical Journal 36 (6), pp. 1389–1401. Cited by: Table 3.
 Speech understanding systems: summary of results of the fiveyear research effort at CarnegieMellon University. Technical report Carnegie Mellon University. Cited by: §2.
 Making best use of model evaluations to compute sensitivity indices. Computer Physics Communications 145 (2), pp. 280–297. Cited by: Appendix C.
 Neural networks for combinatorial optimization: a review of more than a decade of research. INFORMS Journal on Computing 11 (1), pp. 15–34. Cited by: §2, §2.
 Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §2.
 Graph attention networks. arXiv preprint arXiv:1710.10903. Cited by: §2, §4.2.
 Pointer networks. arXiv preprint arXiv:1506.03134. Cited by: §2.
 Partial constraint satisfaction problems and guided local search. Technical report Department of Computer Science, University of Essex. Cited by: §3, §3.
 Guided local search and its application to the traveling salesman problem. European Journal of Operational Research 113 (2), pp. 469–499. Cited by: §3, §3.
 Deep graph library: a graphcentric, highlyperformant package for graph neural networks. arXiv preprint arXiv:1909.01315. Cited by: Appendix A.
 Learning improvement heuristics for solving routing problems. arXiv preprint arXiv:1912.05784. Cited by: §2.
Appendix A Implementation details
We train our model on sets of one hundred thousand 20, 50, and 100node 2D Euclidean TSP instances. We generate all instances by uniformrandomly sampling node locations in the unit square
. We calculate the target output, the regret of including each edge in the solution, according to
eq. 4. We use Concorde to find an optimal solution and LKH3 with 100 trials and 10 runs to solve find an optimal solution with a given edge fixed. We empirically validate that this configuration for LKH3 is sufficient to find optimal solutions for problems with 100 nodes or less. We scale the input features and target outputs to a range of .We train our model by minimizing the loss to the target output using the Adam optimizer (Kingma and Ba, 2014) and an exponentially decaying learning rate , where
is the epoch. We train the model for 100 epochs with early stopping. We use a training batch size
for the 20 and 50node training sets, and a training batch size of for the 100node training set due to limited GPU memory. Our model uses message passing layers and our GLS algorithm uses perturbation moves. Our approach is implemented using Python 3.8.11, PyTorch 1.9.0 (Paszke et al., 2019), DGL 0.7.1 (Wang et al., 2019), and CUDA 10.2, and is opensource.
We conduct all computational experiments on virtual machines with three cores of an Intel Xeon E52650 v4 processor, 24 GB of RAM, and a Tesla P100 GPU with 16 GB of VRAM.
Appendix B Performance as a function of time
We evaluate the solution quality, in terms of median optimality gap and number of problems solved optimally, as a function of computation time rather than at the computation time limit. Figure 4 depicts this for our approach and several recent approaches, for models trained and evaluated on 20, 50, and 100node problem instances. Computation time required to compute input features, evaluate a model, or to construct an initial solution is visible as a gap between the trace and the vertical axis, and is especially noticeable in Figure 3(c). Note that in Figure 3(a), the traces for the median optimality gap are not visible for some methods as these methods find many optimal solutions almost instantly. This is reflected in the plot on the right, depicting the number of optimal solutions found as a function of computation time. Figure 5 depicts the performance of our approach and the learningbased approaches when generalizing from smaller problems to larger ones.
Appendix C Analysis of additional input features
We evaluate the effect of adding additional features to our regret approximation model. While more features can help produce better predictions, they come at the cost of additional computation time. We consider a total of ten additional features, described in Table 3. We use a Gaussian processbased surrogate model to conduct global sensitivity analysis (Gratiet et al., 2016) of the input features on the validation loss of the model after training. We assume that better regret predictions will equate to better guidance by the GLS algorithm, ultimately resulting in better performance to the problem sets.
Name  Description  Complexity 

Node width  Perpendicular distance from a node to a line from the depot node through the centroid of the nodes (Arnold and Sörensen, 2019b).  
Edge width  Absolute difference between the width of two nodes.  
Depot weight  Weight from a node to the depot node.  
Neighbor rank  where node is the nearest neighbor of . There are two features for a given edge, the rank of w.r.t. and the rank of w.r.t. .  
30%NN graph  If the edge is part of the nearest neighbor graph, where is 0.3.  
20%NN graph  If the edge is part of the nearest neighbor graph, where is 0.2.  
10%NN graph  If the edge is part of the nearest neighbor graph, where is 0.1.  
MST  If the edge is part of the minimum spanning tree, calculated using Prim’s algorithm (Prim, 1957).  
NN solution  If the edge is part of solution constructed by the nearest neighbor heuristic.  
NI solution  If the edge is part of solution constructed by the nearest insertion heuristic. 
We semirandomly sample 100 input feature sets using Latin hypercube sampling (McKay et al., 1979). We train a model using each of these feature sets for 35 epochs without early stopping and record the final validation loss. Using these results, we fit a Gaussian process to emulate the mapping between the feature set , and the validation loss achieved by our model after training, where indicates whether or not feature
is used. We then compute MonteCarlo estimates of the main and total effects of each input feature on the model’s validation loss
(Saltelli, 2002). Our implementation uses Emukit (Paleyes et al., 2019). Figure 6 depicts the estimated total effect for each input feature. Edge weight is the most important feature, followed by neighbor rank, depot weight, edge width, and node width. The remaining features have little to no importance.We train a model using these top five features on 20node problems and compare its performance to the equivalent model using edge weight as the only feature when solving the 20, 50, and 100node problem sets. The performance of both models at the computation time limit is shown in Table 4. While the model using additional input features produces slightly more accurate regret predictions, any benefit is cancelled out by the additional time required to compute these features. Note that the results are slightly different from those reported in Table 2
due to different training hyperparameters.
Model  Problem  Method  Optimality gap (%)  Optimal solutions (%)  

Median  IQR  
TSP20  TSP20  Edge weight only  0.0000  0.0000  100.0000 
Top 5 features  0.0000  0.0000  99.8999  
TSP20  TSP50  Edge weight only  0.0000  0.0000  82.0821 
Top 5 features  0.0000  0.0000  83.1832  
TSP20  TSP100  Edge weight only  1.9454  1.9661  2.2022 
Top 5 features  2.0625  2.0009  2.2022 
Comments
There are no comments yet.