NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem

We present NeuroLKH, a novel algorithm that combines deep learning with the strong traditional heuristic Lin-Kernighan-Helsgaun (LKH) for solving Traveling Salesman Problem. Specifically, we train a Sparse Graph Network (SGN) with supervised learning for edge scores and unsupervised learning for node penalties, both of which are critical for improving the performance of LKH. Based on the output of SGN, NeuroLKH creates the edge candidate set and transforms edge distances to guide the searching process of LKH. Extensive experiments firmly demonstrate that, by training one model on a wide range of problem sizes, NeuroLKH significantly outperforms LKH and generalizes well to much larger sizes. Also, we show that NeuroLKH can be applied to other routing problems such as Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP), and CVRP with Time Windows (CVRPTW).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/06/2021

Heterogeneous Attentions for Solving Pickup and Delivery Problem via Deep Reinforcement Learning

Recently, there is an emerging trend to apply deep reinforcement learnin...
06/16/2020

Learning to Solve Vehicle Routing Problems with Time Windows through Joint Attention

Many real-world vehicle routing problems involve rich sets of constraint...
12/19/2020

Multi-Decoder Attention Model with Embedding Glimpse for Solving Vehicle Routing Problems

We present a novel deep reinforcement learning method to learn construct...
09/10/2021

Boosting Graph Search with Attention Network for Solving the General Orienteering Problem

Recently, several studies have explored the use of neural network to sol...
02/13/2021

Goods Transportation Problem Solving via Routing Algorithm

This paper outlines the ideas behind developing a graph-based heuristic-...
10/06/2021

Learning to Iteratively Solve Routing Problems with Dual-Aspect Collaborative Transformer

Recently, Transformer has become a prevailing deep architecture for solv...

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Traveling Salesman Problem (TSP) is an important NP-hard Combinatorial Optimization Problem with extensive industrial applications in various domains. Exact methods have the exponential worst-case computational complexity, which renders them impractical for solving large-scale problems in reality, even for highly optimized solvers such as Concorde. In contrast, although lacking optimality guarantees and non-trivial theoretical analysis, heuristic solvers search for near-optimal solutions with much lower complexity. They are usually desirable for real-life applications where statistically better performance is the goal.

Traditional heuristic methods are manually designed based on expert knowledge which is usually human-interpretable. However, supported by the recent development of deep learning technology, modern methods train powerful deep neural networks to learn the complex patterns from the TSP instances generated from some specific distributions

Vinyals et al. (2015); Bello et al. (2016); Dai et al. (2017); Kool et al. (2019); Joshi et al. (2019); Xin et al. (2020); Wu et al. (2021); Xin et al. (2021). The performances of deep learning models for solving TSP are constantly improved by these works, which unfortunately are still far worse than the strong traditional heuristic solver and generally limited to relatively small problem sizes.

We believe that learning-based methods should be combined with strong traditional heuristic algorithms, which is also suggested by Bengio et al. (2020). In such a way, while learning the complex patterns from data samples, the efficient heuristics highly optimized by researchers for decades can be effectively utilized, especially for problems such as TSP which are well-studied due to their importance.

The Lin-Kernighan-Helsgaun (LKH) algorithm Helsgaun (2000, 2009) is generally considered as a very strong heuristic for solving TSP, which is developed based on the Lin-Kernighan (LK) heuristic Lin and Kernighan (1973). LKH iteratively searches for -opt moves to improve the existing solution where edges of the tour are exchanged for another edges to form a shorter tour. To save the searching time, the edges to add are limited to a small edge candidate set, which is created before search. One of the most significant contributions of LKH is to generate the edge candidate set based on Minimum Spanning Tree, rather than using the nearest neighbor method in the LK heuristic. Furthermore, LKH applies penalty values to the nodes which are iteratively optimized using subgradient optimization (will be detailed in Section 3). The optimized node penalties are used by LKH to transform the edge distances for the -opt searching process and improve the quality of edge candidate sets, both of which help find better solutions.

However, the edge candidate set generation in LKH is still guided by hand-crafted rules, which could limit the quality of edge candidates and hence the search performance. Moreover, the iterative optimization of node penalties is time-consuming, especially for large-scale problems. To address these limitations, we propose NeuroLKH, a novel learning-based method featuring a Sparse Graph Network (SGN) combined with the highly efficient -opt local search of LKH. SGN outputs the edge scores and node penalties simultaneously, which are trained by supervised learning and unsupervised learning, respectively. NeuroLKH transforms the edge distances based on the node penalties learned inductively from training instances, instead of performing iterative optimization for each instance, therefore saving a significant amount of time. More importantly, at the same time the edge scores are used to create the edge candidate set, leading to substantially better sets than those created by LKH. NeuroLKH trains one single network on TSP instances across a wide range of sizes and generalizes well to substantially larger problems with minutes of unsupervised offline fine-tuning to adjust the node penalty scales for different sizes.

Same as existing works on deep learning models for solving TSP, NeuroLKH aims to learn complex patterns from data samples to find better solutions for instances following specific distributions. Following the evaluation process in these works, we perform extensive experiments. Results show that NeuroLKH improves the baseline algorithms by large margins, not only across the wide range of training problem sizes, but also on much larger problem sizes not used in training. Furthermore, NeuroLKH trained with instances of relatively simple distributions generalizes well to traditional benchmark with various node distributions such as the TSPLIB Reinelt (1991). Also, we show that NeuroLKH can be applied to guide the extension of LKH Helsgaun (2017) for more complicated routing problems such as the Capacitated Vehicle Routing Problem (CVRP), Pickup and Delivery Problem (PDP) and CVRP with Time Windows (CVRPTW), using generated test datasets and traditional benchmarks Solomon (1987); Uchoa et al. (2017).

2 Related works

Till now, for routing problems such as TSP, most works focus on learning construction heuristics, where deep neural networks are trained to sequentially select the nodes to visit with supervised learning Vinyals et al. (2015); Hottung et al. (2021)

or reinforcement learning

Bello et al. (2016); Dai et al. (2017); Nazari et al. (2018); Kool et al. (2019); Kwon et al. (2020). Similarly, networks are trained to pick edges in Joshi et al. (2019); Kool et al. (2021). In another line of works Chen and Tian (2019); Wu et al. (2021); Hottung and Tierney (2020); Hao Lu (2020); da Costa et al. (2020), researchers employ deep learning models to learn the actions for improving existing solutions, such as picking regions and rules or selecting nodes for the 2-opt heuristic. However, the performance of these works is still quite far from the strong non-learning heuristics such as LKH. In addition, they focus only on relatively small-sized problems (up to hundreds of nodes).

A recent work Fu et al. (2021) generalizes a network pre-trained on fixed-size small graphs to solve larger size problems by sampling small sub-graphs to infer and merging the results. This interesting idea can be applied to very large graphs, however, the performance is still inferior to LKH and deteriorates rapidly with the increase of problem size.

In a concurrent work Zheng et al. (2021), a VSR-LKH method is proposed which also applies a learning method in combination with LKH. However, very different from our method, VSR-LKH applies traditional reinforcement learning during the searching process for each instance, instead of learning patterns for a class of instances. Moreover, VSR-LKH aims to guide the decision on edge selections within the edge candidate set, which is generated using the original procedure of LKH. NeuroLKH significantly outperforms VSR-LKH by large margins in all the settings of our experiments on testing instances following the training distributions, especially when the time limits are short. Even more impressively, NeuroLKH achieves performance similar to VSR-LKH on traditional benchmark TSPLIB Reinelt (1991) with various node distributions, which are very different from the training distributions for NeuroLKH.

3 Preliminaries: LKH algorithm

The Lin-Kernighan-Helsgaun (LKH) algorithm Helsgaun (2000, 2009) is a local optimization algorithm developed based on the -opt move Lin (1965), where edges in the current tour are exchanged by another set of edges to achieve a shorter tour. While solving one instance, the LKH algorithm can conduct multiple trials to find better solutions. In each trial, starting from a randomly initialized tour, it iteratively searches for -opt exchanges that improve the tour, until no such exchanges can be found. In each iteration, the -opt exchanges are searched in the ascending order of variable and the tour will be replaced once an exchange is found to reduce the tour distance.

One central rule is that the -opt searching process is restricted and directed by an edge candidate set, which is created before search based on the -measure using sensitivity analysis of the Minimum Spanning Tree. Here we briefly introduce the related concepts. A TSP graph can be viewed as an undirected graph with as the set of nodes and as the set of edges weighted by distances. A spanning tree of is a connected graph with edges from and no cycles where any pair of nodes is connected by a path. A 1-tree of is a spanning tree for the graph of node set \{1} combined with two edges in connected to node 1, an arbitrary special node in . A minimum 1-tree is the 1-tree with minimum length. The -measure of an edge for graph is defined as , where is the length of Minimum 1-Tree and is the length of Minimum 1-Tree required to include the edge . The -measure of an edge can be viewed as the extra length of the Minimum 1-Tree to include this edge.

The edge candidate set consists of the edges with the smallest -measures connected to each node ( as default). During the -opt searching process, the edges to be included into the new tour are limited to the edges in this candidate set, and edges with smaller -measures will have higher priorities to be searched over. Therefore this candidate set not only restricts but also directs the search.

Moreover, the quality of -measures can be improved significantly by a subgradient optimization method. If we add a penalty to each node and transform the original distance of the edge to a new distance as , the optimal tour for the TSP will stay the same but the Minimum 1-Tree usually will change. Because by definition, a Minimum 1-Tree with node degrees all equal to 2 is an optimal solution for the corresponding TSP instance. With the length of Minimum 1-Tree resulting from the penalty as , is a lower bound of the optimal tour distance for the original TSP instance. LKH applies subgradient optimization Held and Karp (1971) to iteratively maximize this lower bound for multiple steps until convergence by applying at step , where is the scalar step size,

is the vector of node degrees in the Minimum 1-Tree with penalty

. Therefore, the node degrees are pushed towards 2. The -measures after this optimization will substantially improve the quality of edge candidate set. Furthermore, the transformed edge distance after this optimization helps find better solutions when used during the searching process for -opt exchanges.

Figure 1: NeuroLKH algorithm and the original LKH algorithm.

4 The proposed NeuroLKH algorithm

The subgradient optimization in LKH can substantially improve the quality of edge candidate sets based on the -measures, and transform the edge distances effectively to achieve reasonably good performance. However, it still has major limitations as the optimization process is over one instance iteratively until convergence, which costs a large amount of time, especially for large-scale problems. Moreover, even after subgradient optimization, some critical patterns could be missed by the relatively straightforward sensitivity analysis of spanning tree. Therefore, the quality of edge candidate set could be further improved by large margins, which will in turn improve the overall performance.

We propose the NeuroLKH algorithm, which employs a Sparse Graph Network to learn the complex patterns associated with the TSP instances generated from a distribution. Concretely, the network will learn the edge scores and node penalties simultaneously with a multi-task training process. The edge scores are trained with supervised learning for creating the edge candidate set, while the node penalties are trained with unsupervised learning for transforming the edge distances. The architecture of NeuroLKH is presented in Figure 1, along with the original LKH algorithm. We will detail the Sparse Graph Network, the training process and the proposed NeuroLKH algorithm in the following.

4.1 Sparse Graph Network

For the Sparse Graph Network (SGN), we format the TSP instance as a sparse directed graph containing the node set and a sparse edge set which only includes the shortest edges pointed from each node, as shown in the leftmost green box in Figure 1, where the circles represent the nodes and the diamonds represent the directed edges. Sparsification of the graph is crucial for effectively training the deep learning model on large TSP instances and generalizing to even larger sizes. Note that edge belongs to does not necessarily mean that the opposite-direction edge belongs to . The node inputs are the node coordinates and the edge inputs are the edge distances. Though we focus on 2-dimensional TSP with Euclidean distance as the other deep learning literature like Kool et al. (2019), the model can be applied to other kinds of TSP.

The SGN consists of 1) one encoder embedding the edge and node inputs into the corresponding feature vectors, and 2) two decoders for the edge scores and node penalties, respectively.

Encoder. The encoder first linearly projects the node inputs and the edge inputs into feature vectors and , respectively, where is the feature dimension, and . Then the node and edge features are embedded with Sparse Graph Convolutional Layers, which are defined formally as follows:

(1)
(2)
(3)
(4)

where and represent the element-wise multiplication and the element-wise division, respectively; is the layer index; and are trainable parameters; Eqs. (2) and  (4) consist of a Skip-Connection layer He et al. (2016)

and a Batch Normalization layer

Ioffe and Szegedy (2015) in each; and the idea of element-wise attention in Eq. (1) is adopted from Bresson and Laurent (2017). As the input graph is directed and sparse, edges with different directions are embedded separately. But obviously the embedding of an edge should benefit from knowing whether its opposite-direction counterpart is also in the graph and the information of , which motivates our design of Eqs. (3) and  (4).

Decoders. The edge decoder takes the edge embeddings

from the encoder and embeds them with two layers of linear projection followed by ReLU activation into

. Then the edge scores are calculated as follows:

(5)

Similarly, the node decoder first embeds the node embeddings with two layers of linear projection and ReLU activation into . Then the node penalties are calculated as follows:

(6)

where are trainable parameters; is used to keep the node penalties in the range of .

4.2 Training process

We train the network to learn the edge scores with supervised learning. And the edge loss is detailed as follows:

(7)

where . Effectively, we increase the edge scores if the edge belongs to the optimal tour and decrease them otherwise.

The node penalties are trained by unsupervised learning. Similar to the goal of subgradient optimization in LKH, we are trying to transform the Minimum 1-Tree generated from the TSP graph closer to a tour where all nodes have a degree of 2. An important distinction from LKH is that we are learning the patterns for a class of TSP instances following a distribution, instead of optimizing the penalties for a specific TSP instance. The node loss is detailed as follows:

(8)

where is the degree of node in the Minimum 1-Tree induced with penalty

. The penalties are increased for nodes with degrees larger than 2 and decreased for nodes with smaller degrees. The SGN is trained for the task of outputting the edge scores and node penalties simultaneously with the loss function

, where is the coefficient for balancing the two losses.

4.3 NeuroLKH algorithm

Input: TSP instance, number of trials
Output: TSP solution

1:  Convert the TSP instance to SGN input
2:  Calculate the edge scores and the node penalties with Eqs. (5) and  (6)
3:  =TransformEdgeDistance()
4:  =CreateEdgeCandidateSet()
5:  =LKHSearchingTrials(, , )
6:  return
Algorithm 1 NeuroLKH Algorithm

The process of using NeuroLKH to solve one instance is shown in Algorithm 1. Firstly, the TSP instance is converted to a sparse directed graph . Then the SGN encoder embeds the nodes and edges in into feature embeddings, based on which the decoders output the node penalties and edge scores . Afterwards, NeuroLKH creates powerful edge candidate set and transforms the distance of each edge effectively, which further guides NeuroLKH to conduct multiple LKH trials to find good solutions. We detail each part as follows.

Transform Edge Distance. Based on the node penalties , the original edge distances are transformed into new distances , which will be used in the search process. With such a transformation, the optimal solution tour will stay the same. And the tour distance calculated with the transformed edge distances will be subtracted by to restore the tour distance for the original TSP.

Create Edge Candidate Set. For each node , the edge scores are sorted for and the edges with the top- largest scores are included in the edge candidate set. Edges with larger scores have higher priorities in the candidate set, which will be tried first for adding in the exchange during the LKH search process. Note that neither the original LKH nor NeuroLKH can guarantee all the edges in the optimal tour to be included in the edge candidate set. However, optimal solutions are still likely to be found during the multiple trials.

LKH Searching Trials. To solve one TSP instance, LKH conducts multiple trials to find better solutions. In each trial, one tour is initialized randomly, and iterations of LKH search are conducted for the -opt exchanges until the tour can no longer be improved by such exchanges. In each iteration, LKH searches in the ascending order of for -opt exchanges to reduce tour length, which will be applied once found.

Based on the trained SGN network, NeuroLKH infers the edge distance transformation and candidate set to guide the LKH trials, which is done by performing forward calculation through the model. This is much faster than the corresponding procedure in the original LKH, which employs subgradient optimization on each instance iteratively until convergence and is apparently time-consuming especially for large-scale problems. More importantly, rather than using the hand-crafted rules based on sensitivity analysis in the original LKH, NeuroLKH learns to create edge candidate set of much higher quality with the powerful deep model, leading to significantly better performance.

5 Experiments

In this section, we conduct extensive experiments on TSP with various sizes and show the effective performance of NeuroLKH compared to the baseline algorithms. Our code is publicly available.111https://github.com/liangxinedu/NeuroLKH

Dataset distribution. Closely following the existing works such as Kool et al. (2019)

, we experiment with the 2-dimensional TSP instances in the Euclidean distance space where both coordinates of each node are generated independently from a unit uniform distribution. We train only one network using TSP instances ranging from 101 to 500 nodes. Since the amount of supervision and feedback during training is linearly related to the number of nodes. We generate

instances for each size in the training dataset, resulting in approximately 780000 instances in total. Therefore the amounts of supervision and feedback are kept similar across different sizes. We use Concorde 222https://www.math.uwaterloo.ca/tsp/concorde to get the optimal edges for the supervised training of edge scores. For testing, we generate 1000 instances for each testing problem size.

Hyperparameters. We choose the number of directed edges pointed from one node in the sparse edge set as , which results in only 0.01% of the edges in the optimal tours missed in for the training dataset. We also conduct experiments to justify this choice in Appendix Section A. The hidden dimension is set to in the network with Sparse Graph Convolutional Layers. The node penalty coefficient in the loss function is . The network is trained by Adam Optimizer Kingma and Ba (2014)

with learning rate of 0.0001 for 16 epochs, which takes approximately 4 days. The deep learning models are trained and evaluated with one RTX-2080Ti GPU. The other parts of experiments without deep models for NeuroLKH and other baselines are conducted with random seed 1234 on an Intel(R) Core(TM) i9-10940X CPU unless stated otherwise. Hyperparameters for the LKH searching process are consistent with the example script for TSP given by LKH available online 

333http://akira.ruc.dk/%7Ekeld/research/LKH-3/LKH-3.0.6.tgz and those used in Zheng et al. (2021).

Method Time(s) Obj Gap(%00) Time(s) Obj Gap(%00) Time(s) Obj Gap(%00)
Concorde 207 *7.753246 0.000 1072 *10.701303 0.000 17022 *16.541830 0.000
LKH (1 trial) 33 7.755071 2.353 80 10.707043 5.364 338 16.556733 9.009
VSR-LKH 7.754980 2.236 10.706739 5.080 16.557297 9.350
NeuroLKH 7.753332 0.111 10.701873 0.533 16.543197 0.826
LKH (10 trials) 43 7.754177 1.200 111 10.703724 2.263 445 16.548017 3.740
VSR-LKH 7.754184 1.209 10.703997 2.518 16.549591 4.692
NeuroLKH 7.753311 0.083 10.701623 0.299 16.542880 0.634
LKH (100 trials) 127 7.753450 0.263 368 10.701755 0.423 1147 16.543707 1.134
VSR-LKH 7.753407 0.207 10.701687 0.359 16.543085 0.759
NeuroLKH 7.753270 0.030 10.701381 0.073 16.542163 0.201
LKH (1000 trials) 938 7.753254 0.010 2805 10.701351 0.045 7527 16.542125 0.178
VSR-LKH 7.753322 0.097 10.701336 0.031 16.541934 0.063
NeuroLKH 7.753247 0.000 10.701303 0.000 16.541847 0.010
Table 1: Comparative results on training sizes

5.1 Comparative study on TSP

Here, we compare NeuroLKH with the original LKH algorithm Helsgaun (2009) and the recently proposed VSR-LKH algorithm Zheng et al. (2021). We do not compare with other deep learning based methods here because their performances are rather inferior to LKH, and most of them can hardly generalize to problems with more than 100 nodes. One exception is the method in Fu et al. (2021), which is tested on large problems but the performances are still far worse than LKH.

All algorithms are run once for each testing instance as we find running multiple times only provides very marginal improvement. For each testing problem size, we run the original LKH for 1, 10, 100, and 1000 trials, and record the total amounts of time in solving the 1000 instances. Then we impose the same amounts of time as time limits to NeuroLKH and VSR-LKH for solving the same 1000 instances for fair comparison. Note that for NeuroLKH, the solving time is the summation of the inference time of SGN on GPU and LKH searching time on CPU. In the following tables, for each size and time limit, we report the average performance (tour distance) and the total solving time for the 1000 testing instances.

Comparison on training sizes. In Table 1, we report the performances of LKH, VSR-LKH and NeuroLKH on three testing datasets with 100, 200 and 500 nodes, which are within the size range of instances used in training. Note that we train only one SGN Network on a wide range of problem sizes and here we use these three sizes to demonstrate the testing performances. We also use the exact solver Concorde on these instances to obtain the optimal solutions and compute the optimality gap for each method. As shown in this table, it is clear that NeuroLKH outperforms both LKH and VSR-LKH significantly and consistently across different problem sizes and with different time limits. Notably, the optimality gaps are reduced by at least an order of magnitude for most of the cases, which is a significant improvement.

Generalization analysis on larger sizes. We further show the generalization ability of NeuroLKH on much larger graph sizes of 1000, 2000 and 5000 nodes. Note that while the edge scores in SGN generalize well without any modification, it is hard for the node penalties to directly generalize. This is because they are trained unsupervisedly and SGN does not have any knowledge about how to penalize the nodes for larger TSP instances. Nevertheless, this could be resolved by a simple fine-tuning step. As the learned node embeddings are very powerful, we only fine-tune the very small amount of parameters in the SGN node decoder and keep the other parameters fixed. Specifically, for each of the large sizes, we fine-tune the node decoder for 100 iterations with batch size of , which only takes less than one minute for each size of 1000, 2000 and 5000. This fast fine-tuning process is for TSPs of one size generated from the distribution instead of specific instances, and may be viewed as adjusting the scale of penalties for large sizes. The generalization results are summarized in Table 2. Note that we do not run Concorde here due to the prohibitively long running time, and the gaps are with respect to the best value found by all methods. Clearly, NeuroLKH generalizes well to substantially larger problem sizes and the improvement of NeuroLKH over baselines is significant and consistent across all the settings.

Further discussion. The inference time of SGN in NeuroLKH for the 1000 instances of 100, 200, 500, 1000, 2000 and 5000 nodes is 3s, 6s, 16s, 33s, 63s and 208s, which is approximately linear with the number of nodes . In contrast, the subgradient optimization in LKH and VSR-LKH needs 20s, 51s, 266s, 1028s, 4501s and 38970s, which grows superlinearly with and is much longer than SGN inference, especially for large-scale problems. For NeuroLKH, the saved time is used to conduct more trials, which effectively helps to find better solutions. This effect is more salient with short time limit. Meanwhile, the number of trials is also small for short time limit and the algorithm only searches a small number of solutions, in which case the guidance of edge candidate set is more important. Due to these two reasons, the improvement of NeuroLKH over baselines is particularly substantial for short time limit. This is a desirable property especially for time-critical applications and solving large-scale problems, for which large numbers of trials are not feasible.

Method Time(s) Obj Gap(%00) Time(s) Obj Gap(%00) Time(s) Obj Gap(%00)
LKH (1 trial) 1183 23.155916 10.593 4843 32.483851 11.264 40048 51.025519 12.284
VSR-LKH 23.154946 10.173 32.485551 11.788 51.025539 12.288
NeuroLKH 23.133494 0.899 32.449752 0.755 50.965382 0.484
LKH (10 trials) 1414 23.143435 5.197 5322 32.466953 6.056 41523 50.998721 7.026
VSR-LKH 23.143347 5.159 32.467997 6.377 51.000093 7.295
NeuroLKH 23.133066 0.714 32.449519 0.683 50.965219 0.452
LKH (100 trials) 2567 23.135427 1.735 7371 32.455454 2.512 47884 50.976677 2.700
VSR-LKH 23.134426 1.302 32.454427 2.195 50.979317 3.218
NeuroLKH 23.132258 0.365 32.448666 0.420 50.964677 0.345
LKH (1000 trials) 12884 23.132216 0.347 25613 32.448954 0.509 103885 50.965233 0.455
VSR-LKH 23.131658 0.105 32.447953 0.200 50.965300 0.468
NeuroLKH 23.131414 0.000 32.447304 0.000 50.962916 0.000
Table 2: Comparative results on generalization sizes

In Figure 2, we plot the performance of the LKH, VSR-LKH and NeuroLKH algorithms for solving the testing datasets with different numbers of nodes against different running time to visualize the improvement process (the resulting objective values after each trial). The time limits are set to the longest ones used in Table 1 and Table 2, which are the running time of LKH with 1000 trials. Clearly, NeuroLKH outperforms both LKH and VSR-LKH significantly and consistently across different problem sizes and with different time limits. In particular, NeuroLKH is superior as it not only reaches good solutions fast but also converges to better solutions eventually. With the same performance (i.e. objective value), NeuroLKH considerably reduces the computational time. We can also conclude that when the time limit is short, the improvement of NeuroLKH over baselines is particularly substantial. In addition, we show that the subgradient optimization is necessary for LKH and VSR-LKH. As exhibited in Figure 2, the performances of both LKH and VSR-LKH are much worse without subgradient optimization (w/o SO). More impressively, even ignoring the preprocessing time (IPT) used for subgradient optimization (pertaining to LKH and VSR-LKH) and Sparse Graph Network inferring (pertaining to NeuroLKH), NeuroLKH still outstrips both LKH and VSR-LKH. Note that this comparison is unfair for NeuroLKH as LKH and VSR-LKH consume much longer preprocessing time which is unavoidable.

For the results reported in Table 1 and Table 2, almost all the improvements of NeuroLKH over LKH and VSR-LKH on different sizes and with different time limits are statistically significant with confidence levels larger than 99%. The only one exception is the performance on TSP with 100 nodes and the running time of LKH with 1000 trials where the confidence levels are 90.5% and 97.6% for the improvements, respectively.

In the Appendix Section A, we also show that NeuroLKH substantially outperforms other deep learning based methods Kool et al. (2019, 2021); Wu et al. (2021); Joshi et al. (2019); Hottung et al. (2021); Fu et al. (2021); Kwon et al. (2020); da Costa et al. (2020).

(a) TSP with 100 nodes
(b) TSP with 200 nodes
(c) TSP with 500 nodes
(d) TSP with 1000 nodes
(e) TSP with 2000 nodes
(f) TSP with 5000 nodes
Figure 2: Performances of LKH, VSR-LKH and NeuroLKH for solving TSP with different sizes against different running time

Generalization to TSPLIB benchmark. Besides generalization to larger sizes, generalization to different distributions remains a crucial challenge for deep learning based methods in existing works. The TSPLIB benchmark contains instances with various node distributions, making it extremely hard for such methods. We test on all the 72 TSPLIB instances with Euclidean distances and less than 10000 nodes. The number of trials is set to be the number of nodes and the algorithms are run 10 times for each instance following the convention for TSPLIB in Helsgaun (2000); Zheng et al. (2021). With the various unknown node distributions, we do not fine-tune the model for the node penalties and only use the edge scores in NeuroLKH. For the 24 instances labeled as hard in Zheng et al. (2021)

, which the original LKH fails to solve optimally during at least one of the 10 runs, NeuroLKH trained with uniformly distributed data is able to find optimal solutions 6.13 times on average, which is much better than LKH (3.75 times). As an active learning method, VSR-LKH finds optimal solutions 6.42 times on average, slightly better than NeuroLKH. While NeuroLKH improves the results on most hard instances, it could generalize poorly on instances with certain special patterns such as where most nodes are located along several horizontal lines, making it fail to solve 11 of the 48 easy instances optimally for some runs.

With the same training dataset size, we trained another model NeuroLKH_M using a mixture of instances with uniformly distributed nodes, clustered nodes with 3-8 clusters, half uniform and half clustered nodes following Uchoa et al. (2017). NeuroLKH_M finds optimal solutions 6.79 times on average for the hard instances and fails to solve only 5 easy instances optimally for some runs, better than the NeuroLKH trained with only uniformly distributed instances. For all the 72 instances, NeuroLKH_M finds optimal solutions 8.74 times on average, which is much better than LKH (7.92 times) but slightly worse than VSR-LKH (8.78 times). Detailed results of each instance are listed in the Appendix Section B.

5.2 Experiments on other routing problems

Finally, we show that NeuroLKH can be easily extended to solve much more complicated routing problems such as the Capacitated Vehicle Routing Problem (CVRP), the Pickup and Delivery Problem (PDP) and CVRP with Time Windows (CVRPTW). We briefly introduce the problems in the Appendix Section C. Different from TSP, the node penalties do not apply to these problems. Therefore, NeuroLKH only learns the edge candidate set. As these three problems are very hard to solve and the optimal solutions are not available in a reasonable amount of time, we use LKH with 10000 trials to get solutions as training labels. The demands, capacities, starts and ends in the time windows are taken as node inputs along with the coordinates. For PDP, we add connections between each pair of pickup and delivery nodes and assign weight matrices for these connections in Eq. (2). For PDP and CVRPTW, the edge directions affect the tour feasibility therefore the model learns the in-direction edge scores and out-direction edge scores for each node with Eq. (5).

The node coordinates are also generated uniformly from the unit square for all three problems, following Kool et al. (2019); Li et al. (2021). For CVRP, the demands of customers are generated uniformly from integers {1..9} with the capacity fixed as , compatible with the largest CVRP (100 nodes) studied in Kool et al. (2019). For CVRPTW, we use the same way to generate demands, capacity, serving time and time windows as Falkner and Schmidt-Thieme (2020). A training dataset for CVRP with 101-500 nodes and instances for each size (about 180000 in total) is used to train the SGN for 10 epochs. PDP and CVRPTW are harder to solve therefore we use a training dataset with 41-200 nodes and instances for each size. The other hyperparameters in SGN are the same as TSP and those for LKH searching process are consistent with example scripts given by LKH for CVRP, PDP and CVRPTW (with the SPECIAL hyperparameter).

In Table 3, we show the performance of NeuroLKH and the original LKH on testing datasets with 1000 instances for the smallest and largest graph sizes (number of customers) used in training as well as a much larger generalization size. We use the solving time of LKH with 100, 1000, 10000 trials as the time limits. For 100 trials, both methods fail to find feasible solutions for less than 1% of the PDP and CVRPTW test instances with 300 nodes. Whenever this happens, we push the infeasible visits to the end to get feasible solutions. The inferring time of SGN is 1s, 3s, 7s, 10s, 19s and 40s in total for the 1000 instances in the testing datasets with 40, 100, 200, 300, 500 and 1000 nodes, which is a tiny fraction compared to the LKH searching process. As shown in Table 3, NeuroLKH significantly improves the solution quality compared with the original LKH which is a very strong heuristic solver for all three problems, showing its potential in handling various types of routing problems.

Method Time(s) Obj Gap(%) Time(s) Obj Gap(%) Time(s) Obj Gap(%)
Generalization
CVRP LKH (100 trials) 485 15.8363 1.675 2043 42.1621 5.394 4607 58.1372 9.750
NeuroLKH 15.7770 1.295 41.7311 4.316 56.6469 6.937
LKH (1000 trials) 4520 15.6483 0.468 15812 40.6103 1.515 30133 54.3412 2.584
NeuroLKH 15.6295 0.348 40.4974 1.233 54.0499 2.034
LKH (10000 trials) 45435 15.5823 0.044 166875 40.0670 0.157 319368 53.1093 0.259
NeuroLKH 15.5754 0.000 40.0043 0.000 52.9723 0.000
Generalization
PDP LKH (100 trials) 115 6.2495 0.819 2832 13.8390 5.535 7939 17.0913 6.916
NeuroLKH 6.2241 0.409 13.6246 3.899 16.7867 5.011
LKH (1000 trials) 845 6.2088 0.163 21216 13.2850 1.310 55643 16.2447 1.620
NeuroLKH 6.2041 0.087 13.2443 0.999 16.1857 1.251
LKH (10000 trials) 7989 6.1998 0.018 195220 13.1387 0.194 515377 16.0119 0.163
NeuroLKH 6.1988 0.000 13.1132 0.000 15.9857 0.000
Generalization
CVRPTW LKH (100 trials) 147 9.3051 1.081 813 26.1757 7.124 1746 34.2301 8.798
NeuroLKH 9.2606 0.597 25.4000 3.949 32.9676 4.786
LKH (1000 trials) 1017 9.2276 0.239 4525 24.9770 2.218 7820 32.2671 2.559
NeuroLKH 9.2207 0.164 24.7857 1.435 32.0224 1.781
LKH (10000 trials) 9624 9.2073 0.018 45509 24.5338 0.405 75481 31.5719 0.350
NeuroLKH 9.2056 0.000 24.4350 0.000 31.4620 0.000
Table 3: Comparative results for other routing problems

Performance on traditional benchmarks. To show the effectiveness of NeuroLKH on complicated routing problems with various distributions, we perform experiments on CVRPLIB Uchoa et al. (2017) and Solomon Solomon (1987) benchmark datasets. CVRPLIB Uchoa et al. (2017) contains various sized CVRP instances with a combination of 3 depot positioning, 3 customer positioning and 7 demand distributions. Solomon benchmark Solomon (1987) contains CVRPTW instances with 100 customers and various distributions of time windows. We detail the benchmarks, the training datasets and the results for each instance in the Appendix Section D. In summary, tested on the 43 instances with 100-300 nodes in CVRPLIB Uchoa et al. (2017), NeuroLKH improves the average performances on 38, 38 and 31 instances when the time limits are set to the time of LKH with 100, 1000 and 10000 trials, respectively. On the 11 Solomon R2-type instances, NeuroLKH outperforms LKH almost consistently with all the settings (32 out of the 33).

6 Conclusion

In this paper, we propose an algorithm utilizing the great power of deep learning models to combine with a strong heuristic for TSP. Specifically, one Sparse Graph Network is trained to predict the edge scores and the node penalties for generating the edge candidate set and transforming the edge distances, respectively. As shown in the extensive experiments, the improvement of NeuroLKH over baseline algorithms within different time limits is consistent and significant. And NeuroLKH generalizes well to instances with much larger graph sizes than training sizes and traditional benchmarks with various node distributions. Also, we use CVRP, PDP and CVRPTW to demonstrate that NeuroLKH effectively applies to other routing problems. NeuroLKH can effectively learn the routing patterns for TSP which generalize well to much larger sizes and different distributions of nodes. However, for other complicated routing problems such as CVRP and CVRPTW, although NeuroLKH generalizes well to larger sizes, it is hard to directly generalize to other distributions of demands and time windows without training, which is a limitation of NeuroLKH and is left for future research. In addition, NeuroLKH can be further combined with other learning based techniques such as sparsifying the TSP graph Sun et al. (2021) and other strong traditional algorithms such as the Hybrid Genetic Search Vidal et al. (2012).

This work was supported by the A*STAR Cyber-Physical Production System (CPPS) – Towards Contextual and Intelligent Response Research Program, under the RIE2020 IAF-PP Grant A19C1a0018, and Model Factory@SIMTech, in part by the National Natural Science Foundation of China under Grant 61803104 and Grant 62102228, and in part by the Young Scholar Future Plan of Shandong University under Grant 62420089964188.

References

  • [1] I. Bello, H. Pham, Q. V. Le, M. Norouzi, and S. Bengio (2016) Neural combinatorial optimization with reinforcement learning. In Proceedings of International Conference on Learning Representations (ICLR)., Cited by: §1, §2.
  • [2] Y. Bengio, A. Lodi, and A. Prouvost (2020) Machine learning for combinatorial optimization: a methodological tour d’horizon. European Journal of Operational Research. Cited by: §1.
  • [3] X. Bresson and T. Laurent (2017) Residual gated graph convnets. arXiv preprint arXiv:1711.07553. Cited by: §4.1.
  • [4] X. Chen and Y. Tian (2019) Learning to perform local rewriting for combinatorial optimization. In Advances in Neural Information Processing Systems, pp. 6278–6289. Cited by: §2.
  • [5] P. R. d. O. da Costa, J. Rhuggenaath, Y. Zhang, and A. Akcay (2020) Learning 2-opt heuristics for the traveling salesman problem via deep reinforcement learning. In Asian Conference on Machine Learning, pp. 465–480. Cited by: Table S.1, §2, §5.1.
  • [6] H. Dai, E. Khalil, Y. Zhang, B. Dilkina, and L. Song (2017) Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, pp. 6348–6358. Cited by: §1, §2.
  • [7] J. K. Falkner and L. Schmidt-Thieme (2020) Learning to solve vehicle routing problems with time windows through joint attention. arXiv preprint arXiv:2006.09100. Cited by: §5.2.
  • [8] Z. Fu, K. Qiu, and H. Zha (2021) Generalize a small pre-trained model to arbitrarily large tsp instances. In

    Proceedings of the AAAI Conference on Artificial Intelligence

    ,
    Cited by: Table S.1, Appendix A, §2, §5.1, §5.1.
  • [9] S. Y. Hao Lu (2020) A learning-based iterative method for solving vehicle routing problems. In Proceedings of International Conference on Learning Representations (ICLR)., Cited by: §2.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: §4.1.
  • [11] M. Held and R. M. Karp (1971) The traveling-salesman problem and minimum spanning trees: part ii. Mathematical programming 1 (1), pp. 6–25. Cited by: §3.
  • [12] K. Helsgaun (2000) An effective implementation of the lin–kernighan traveling salesman heuristic. European Journal of Operational Research 126 (1), pp. 106–130. Cited by: Appendix B, §1, §3, §5.1.
  • [13] K. Helsgaun (2009) General k-opt submoves for the lin–kernighan tsp heuristic. Mathematical Programming Computation 1 (2-3), pp. 119–163. Cited by: §1, §3, §5.1.
  • [14] K. Helsgaun (2017) An extension of the lin-kernighan-helsgaun tsp solver for constrained traveling salesman and vehicle routing problems. Roskilde: Roskilde University. Cited by: §1.
  • [15] A. Hottung, B. Bhandari, and K. Tierney (2021)

    LEARNING a latent search space for routing problems using variational autoencoders

    .
    In Proceedings of International Conference on Learning Representations (ICLR)., Cited by: Table S.1, §2, §5.1.
  • [16] A. Hottung and K. Tierney (2020) Neural large neighborhood search for the capacitated vehicle routing problem. In European Conference on Artificial Intelligence, Cited by: §2.
  • [17] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning, pp. 448–456. Cited by: §4.1.
  • [18] C. K. Joshi, T. Laurent, and X. Bresson (2019) An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227. Cited by: Table S.1, §1, §2, §5.1.
  • [19] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. In Proceedings of International Conference on Learning Representations (ICLR)., Cited by: §5.
  • [20] W. Kool, H. van Hoof, J. Gromicho, and M. Welling (2021) Deep policy dynamic programming for vehicle routing problems. arXiv preprint arXiv:2102.11756. Cited by: Table S.1, §2, §5.1.
  • [21] W. Kool, H. van Hoof, and M. Welling (2019) Attention, learn to solve routing problems!. In Proceedings of International Conference on Learning Representations (ICLR)., Cited by: Table S.1, §1, §2, §4.1, §5.1, §5.2, §5.
  • [22] Y. Kwon, J. Choo, B. Kim, I. Yoon, Y. Gwon, and S. Min (2020) POMO: policy optimization with multiple optima for reinforcement learning. In Advances in Neural Information Processing Systems, Vol. 33. Cited by: Table S.1, §2, §5.1.
  • [23] J. Li, L. Xin, Z. Cao, A. Lim, W. Song, and J. Zhang (2021) Heterogeneous attentions for solving pickup and delivery problem via deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems. Cited by: §5.2.
  • [24] S. Lin and B. W. Kernighan (1973) An effective heuristic algorithm for the traveling-salesman problem. Operations research 21 (2), pp. 498–516. Cited by: §1.
  • [25] S. Lin (1965) Computer solutions of the traveling salesman problem. Bell System Technical Journal 44 (10), pp. 2245–2269. Cited by: §3.
  • [26] M. Nazari, A. Oroojlooy, L. Snyder, and M. Takác (2018) Reinforcement learning for solving the vehicle routing problem. In Advances in Neural Information Processing Systems, pp. 9839–9849. Cited by: §2.
  • [27] G. Reinelt (1991) TSPLIB—a traveling salesman problem library. ORSA journal on computing 3 (4), pp. 376–384. Cited by: §1, §2.
  • [28] M. M. Solomon (1987) Algorithms for the vehicle routing and scheduling problems with time window constraints. Operations research 35 (2), pp. 254–265. Cited by: Appendix D, §1, §5.2.
  • [29] Y. Sun, A. Ernst, X. Li, and J. Weiner (2021) Generalization of machine learning for problem reduction: a case study on travelling salesman problems. OR Spectrum 43 (3), pp. 607–633. Cited by: §6.
  • [30] E. Uchoa, D. Pecin, A. Pessoa, M. Poggi, T. Vidal, and A. Subramanian (2017) New benchmark instances for the capacitated vehicle routing problem. European Journal of Operational Research 257 (3), pp. 845–858. Cited by: Appendix B, Appendix D, §1, §5.1, §5.2.
  • [31] T. Vidal, T. G. Crainic, M. Gendreau, N. Lahrichi, and W. Rei (2012)

    A hybrid genetic algorithm for multidepot and periodic vehicle routing problems

    .
    Operations Research 60 (3), pp. 611–624. Cited by: §6.
  • [32] O. Vinyals, M. Fortunato, and N. Jaitly (2015) Pointer networks. In Advances in Neural Information Processing Systems, Vol. 28, pp. . External Links: Link Cited by: §1, §2.
  • [33] Y. Wu, W. Song, Z. Cao, J. Zhang, and A. Lim (2021) Learning improvement heuristics for solving routing problems. IEEE Transactions on Neural Networks and Learning Systems. Cited by: Table S.1, §1, §2, §5.1.
  • [34] L. Xin, W. Song, Z. Cao, and J. Zhang (2020) Step-wise deep learning models for solving routing problems. IEEE Transactions on Industrial Informatics 17 (7), pp. 4861–4871. Cited by: §1.
  • [35] L. Xin, W. Song, Z. Cao, and J. Zhang (2021)

    Multi-decoder attention model with embedding glimpse for solving vehicle routing problems

    .
    In Proceedings of the 35th AAAI Conference on Artificial Intelligence, pp. 12042–12049. Cited by: §1.
  • [36] J. Zheng, K. He, J. Zhou, Y. Jin, and C. Li (2021) Combining reinforcement learning with lin-kernighan-helsgaun algorithm for the traveling salesman problem. In Proceedings of the AAAI Conference on Artificial Intelligence, Cited by: Appendix B, §2, §5.1, §5.1, §5.

Appendix A Experiments for TSP

To verify the quality of the edge candidate set learned by NeuroLKH, we report two metrics for the edge candidate set attained by different methods, i.e., the average ranking of the optimal edges and the percentage of optimal edges missed in the set, respectively. Regarding the sensitivity analysis of the Minimum Spanning Tree with the subgradient optimization in LKH algorithm, 0.68% and 0.67% of the optimal edges are missed in the candidate set for TSP100 and TSP500, respectively, where the average rankings of optimal edges are 1.670 and 1.681. The ideal average ranking would be 1.5 since the two optimal edges for each node would be the first and the second in the ranks. NeuroLKH reduces the average ranking to 1.557 and 1.597 where only 0.05% and 0.09% of the optimal edges are missed in the set, which justifies the effectiveness of NeuroLKH in learning desirable edge candidates.

For TSP, we choose the number of directed edges pointed from one node in the sparse edge set as to include most of the edges in the optimal tours into the sparse graph, which results in only 0.01% of the optimal edges missed in the sparse graph for the training dataset. In our experiments with (trained with 20% of the training samples to save time), 0.643%, 0.209% and 0.208% of the optimal edges are missed in the candidate set with the average ranking of the optimal edges 1.653, 1.646 and 1.640 for TSP500, respectively. With (i.e. numbers of edges), it only improves the average ranking marginally with similar percentages of optimal edges but obviously increases the computational time. Pertaining to other routing problems, we find similar results therefore we use for consistency. We find that the network can hardly give a high edge score to an edge with considerably large Euclidean distance and include it into the candidate set. Therefore larger is not needed which does not impact the performance much as long as it is not too small (e.g. less than 20).

The model outputs the node penalties within the range of with . In the original LKH algorithm, a subgradient optimization process is used to optimize the node penalties iteratively until convergence for each instance. In this process for the training instances where the coordinates are always between 0 and 1, we find that the penalties are usually between -10 and 10 (for different sizes). While testing for instances with different coordinate ranges, we scale the instances to make the coordinates between 0 and 1. The aspect ratio is fixed so that the objective value is just scaled by a constant. Therefore, we use in our experiments.

In Table S.1, we compare NeuroLKH with other recently proposed Deep Learning based methods on TSP100. Notably, most of them can hardly handle problems with more than 100 nodes. One exception is the method in [8], which is tested on large problems but the performance deteriorates rapidly with the increase of problem size and is still inferior to LKH. We adopt the results from their original works where the datasets tested on might be different but are sampled from the same distribution. Therefore the optimality gap is a more important measure than the objective value. The running time is reported for solving 1000 instances in total with the assumption that it is linearly related to the number of instances. Apparently, NeuroLKH significantly outperforms other methods with a short running time. And more importantly, as shown in Table 1 and Table 2, NeuroLKH generalizes well to large TSP with up to 5000 nodes.

Method Time(s) Gap(%00) Method Time(s) Gap(%00) Method Time(s) Gap(%00)
GCN greedy [18] 36 838.000 AM Greedy [21] 0.6 453.000 AM sampling [21] 360 226.000
Wu [33] 720 142.000 GCN bs [18] 240 139.000 CVAE-Opt-RS [15] 50500 135.000
da Costa [5] 246 87.000 CVAE-Opt-DE [15] 55100 34.000 POMO [22] 6 14.000
Fu [8] 90 4.000 DPDP 10k [20] 456 0.900 DPDP 100k [20] 990 0.400
NeuroLKH 33 0.111 NeuroLKH 127 0.030 NeuroLKH 938 0.000
Table S.1: Comparative results on TSP100. Here we report three results of NeuroLKH with different time limits from Table 1.

Appendix B Experiments for TSPLIB

NeuroLKH is trained using only the instances with nodes generated from the uniform distribution. With the same training dataset size, we trained another model NeuroLKH_M using a mixture of instances with uniformly distributed nodes, clustered nodes with 3-8 clusters, half uniform and half clustered nodes following [30]. Following the convention for TSPLIB in [12, 36], the number of trials is set to be the number of nodes and the algorithms are run 10 times for each instance. During each run, the algorithm will stop when the optimal solution is found and the number of trials actually conducted is reported. Here we show the results of LKH, VSR-LKH, NeuroLKH and NeuroLKH_M for each instance in Table S.2, Table S.3 and Table S.4. The optimal tour distance is shown under the instance name. We report the success times where the optimal solution is found, the best performance (tour distance) during the runs, the average performance, the average running time (seconds) and the average number of trials actually conducted. The results of LKH are the same as reported in [36] (except the running time where we run all the algorithms on our machine for a fair comparison) while the results of VSR-LKH are slightly different due to behaviour uncontrolled by the random seed in the code.

Appendix C Experiments for Other Routing Problems

Here we briefly introduce the Capacitated Vehicle Routing Problem (CVRP), the Pickup and Delivery Problem (PDP) and CVRP with Time Windows (CVRPTW). For PDP, the customers contain pairs of pickup and delivery nodes. The vehicle starts from the depot, visits each customer node once and returns to the depot with the constraint that the pickup node must be visited before the corresponding delivery node. For CVRP, multiple routes can be planned. In each route, the vehicle starts from the depot, visits some customers and returns to the depot. The total demand of the customers in each route cannot exceed the vehicle capacity and each customer must be visited once. CVRPTW generalizes CVRP with an additional constraint that each customer must be visited within the corresponding time window. The time will be spent on traveling between the nodes and serving the customers. The goal of all three problems is to minimize the tour distance.

Similarly, we plot the performance of the LKH and NeuroLKH algorithms for solving CVRP, PDP and CVRPTW in Figure S.1, which shows similar trends as those in Figure 2. The time limits are set to the longest ones used in Table 3, i.e., the running time of LKH algorithm with 10000 trials.

For the results reported in Table 3, almost all the improvements of NeuroLKH over LKH on different sizes and with different time limits are statistically significant with confidence levels larger than 99%. The only exceptions are the performance for the smallest size of each problem and the longest time limits (the running time of LKH with 10000 trials), where the confidence levels are 98.7%, 98.9% and 77.9% for CVRP100, PDP40 and CVRPTW40, respectively. The confidence level for CVRPTW40 with the time limit of LKH with 10000 trials is relatively low because CVRPTW with 40 nodes solved by LKH is already fairly close to the optimality with such a long time limit. Therefore the improvement room left for NeuroLKH is small.

Appendix D Experiments on CVRPLIB and Solomon Benchmark

CVRPLIB [30] contains various sized CVRP instances with a combination of 3 depot positioning, 3 customer positioning and 7 demand distributions. We train one network using CVRP instances ranging from 101 to 300 nodes. The instances are generated from this mixture of distributions proposed in [30] and we generate instances for each size in the training dataset, resulting in approximately 120000 instances in total.

Solomon benchmark [28]

contains CVRPTW instances with 100 customers and various distributions of time windows. An additional constraint for this benchmark is to minimize the number of routes. Therefore the goal is to minimize the tour distance using the minimum number of routes. We choose R2-type as the testbed in our experiment. We generate a training dataset of instances with 100 customers. The node coordinates are generated independently from the uniform distribution ranging from 0 to 80. The demands are generated from a Gaussian distribution with mean 15 and standard deviation 10 and the capacity is fixed as 1000. The serving time

for each customer is fixed as 10. The center of time window for node is generated from the uniform distribution with the interval , where is the distance between node and the depot. And the width of time window is generated from a Gaussian distribution with the mean and standard deviation set to 115 and 35, 240 and 0, 350 and 160, 150 and 380, 470 and 70, respectively. For each of the first two sets of parameters, four different types are generated with 0%, 25%, 50% and 100% of the customers receiving the time windows. And for the last three sets of parameters, all customers are receiving the time windows, resulting in 11 types of instances in total. We generate 5000 instances for each type in the training dataset. Please refer to the code for more details.

As the running time is all relatively short, we run both LKH and NeuroLKH for 100 times on each instance. The results of LKH and NeuroLKH are shown in Table S.5, Table S.6 and Table S.7, while the time limits are set to the running time of LKH with 100, 1000 and 10000 trials. The optimal tour distance is shown under the instance name. We report the average running time (seconds), the best performance (tour distance) during the runs, the average performance, the success times when the optimal solution is found.

(a) CVRP with 100 nodes
(b) CVRP with 500 nodes
(c) CVRP with 1000 nodes
(d) PDP with 40 nodes
(e) PDP with 200 nodes
(f) PDP with 300 nodes
(g) CVRPTW with 40 nodes
(h) CVRPTW with 200 nodes
(i) CVRPTW with 300 nodes
Figure S.1: Performances of LKH and NeuroLKH for solving CVRP, PDP and CVRPTW with different sizes against different running time

-3cm Method Name Success Best Average Time Trials Name Success Best Average Time Trials LKH kroB150 2/10 26130 26131.6 0.32 128.4 rat195 9/10 2323 2323.5 0.22 55 VSR-LKH 4/10 26130 26131.2 0.21 106.3 9/10 2323 2323.5 0.36 69.5 NeuroLKH_R 26130 10/10 26130 26130 0.07 9.8 2323 10/10 2323 2323 0.11 8.4 NeuroLKH_M 10/10 26130 26130 0.12 22.1 10/10 2323 2323 0.06 3.9 LKH pr299 9/10 48191 48194.3 0.4 51.7 d493 6/10 35002 35002.8 4.71 219.6 VSR-LKH 10/10 48191 48191 0.43 13.6 10/10 35002 35002 0.5 8.8 NeuroLKH_R 48191 10/10 48191 48191 0.25 10.1 35002 6/10 35002 35032.2 6.73 320.5 NeuroLKH_M 10/10 48191 48191 0.22 13.2 10/10 35002 35002 0.67 27.5 LKH rat575 2/10 6773 6773.8 3.23 526.9 pr1002 8/10 259045 259045.6 4.53 549 VSR-LKH 6/10 6773 6773.4 3.2 310.6 10/10 259045 259045 0.72 16 NeuroLKH_R 6773 9/10 6773 6773.1 1.91 179 259045 10/10 259045 259045 8.46 330.6 NeuroLKH_M 7/10 6773 6773.3 3.87 345.3 10/10 259045 259045 1.05 34 LKH u1060 5/10 224094 224107.5 101.76 663.3 vm1084 3/10 239297 239372.6 46.16 824.1 VSR-LKH 10/10 224094 224094 3.52 19.1 7/10 239297 239312.6 49.41 474.8 NeuroLKH_R 224094 10/10 224094 224094 35.07 206.9 239297 1/10 239297 239379.5 23.4 1028.9 NeuroLKH_M 10/10 224094 224094 10.05 75.4 7/10 239297 239315.1 21.29 439.7 LKH pcb1173 4/10 56892 56895 5.37 844 rl1304 3/10 252948 253156.4 18.28 1170 VSR-LKH 8/10 56892 56893 7.07 436.9 10/10 252948 252948 1.44 17.9 NeuroLKH_R 56892 9/10 56892 56892.5 5.32 410.4 252948 9/10 252948 252953.1 9.26 370.8 NeuroLKH_M 8/10 56892 56893 6.48 378.2 8/10 252948 252958.2 11.36 600.6 LKH rl1323 6/10 270199 270219.6 12.57 718.8 nrw1379 6/10 56638 56640 9.84 759.3 VSR-LKH 10/10 270199 270199 9.08 189.7 9/10 56638 56638.5 12.84 253.7 NeuroLKH_R 270199 7/10 270199 270247.9 16.59 742.2 56638 9/10 56638 56638.5 15.28 372.4 NeuroLKH_M 8/10 270199 270204.4 11.13 538.5 10/10 56638 56638 7.85 260.8 LKH fl1400 1/10 20127 20160.3 2703.75 1372.9 fl1577 0/10 22254 22260.6 965.98 1577 VSR-LKH 1/10 20127 20160.3 3323.31 1380.6 0/10 22254 22255.8 3095.13 1577 NeuroLKH_R 20127 0/10 20165 20235.5 356.77 1400 22249 1/10 22249 22256.6 652.75 1445.8 NeuroLKH_M 0/10 20164 20169.4 754.03 1400 0/10 22254 22302.8 522.49 1577 LKH vm1748 9/10 336556 336557.3 17.62 1007.9 u1817 1/10 57201 57251.1 63.28 1817 VSR-LKH 10/10 336556 336556 5.42 37.8 7/10 57201 57212 159.43 967 NeuroLKH_R 336556 5/10 336556 336628 38.16 1282.9 57201 2/10 57201 57227.3 238.86 1803.4 NeuroLKH_M 10/10 336556 336556 13.65 460.2 2/10 57201 57225.2 126.01 1691.5 LKH rl1889 0/10 316549 316549.8 59.31 1889 d2103 0/10 80454 80462 111.69 2103 VSR-LKH 4/10 316536 316569 143.58 1393.9 0/10 80454 80454.2 619.38 2103 NeuroLKH_R 316536 0/10 316638 316648.7 141.23 1889 80450 4/10 80450 80452.4 339.12 1560.3 NeuroLKH_M 3/10 316536 316619.4 81.93 1485.6 3/10 80450 80454.6 213 1614.7 LKH u2152 3/10 64253 64287.7 88.79 1614 pcb3038 4/10 137694 137701.2 79.22 2078.6 VSR-LKH 7/10 64253 64270.1 178.54 1334.7 7/10 137694 137695.5 214.24 1422.2 NeuroLKH_R 64253 9/10 64253 64258.7 56.63 520.9 137694 8/10 137694 137695 151.91 1104 NeuroLKH_M 8/10 64253 64255.2 66.85 878.1 8/10 137694 137695 99.23 1084.6 LKH fl3795 0/10 28813 28813.7 34045.95 3795 fnl4461 9/10 182566 182566.5 31.89 923.1 VSR-LKH 0/10 28831 28831 75405 3795 10/10 182566 182566 19.94 89.1 NeuroLKH_R 28772 0/10 28999 29010.6 80797.24 3795 182566 10/10 182566 182566 27.91 171.5 NeuroLKH_M 0/10 29488 29495.3 1329.72 3795 10/10 182566 182566 19.26 151.5 LKH rl5915 0/10 565544 565581.2 221.29 5915 rl5934 0/10 556136 556309.8 371.79 5934 VSR-LKH 1/10 565530 565580.8 896.59 5354.9 4/10 556045 556099.6 923.66 4804.7 NeuroLKH_R 565530 0/10 565585 565969.9 658.32 5915 556045 8/10 556045 556059.5 376.57 3470.2 NeuroLKH_M 1/10 565530 565579.5 365.82 5352.9 10/10 556045 556045 143.34 1529.8

Table S.2: TSPLIB results for each hard instance

-2.5cm Method Name Success Best Average Time Trials Name Success Best Average Time Trials LKH eil51 10/10 426 426 0 1 berlin52 10/10 7542 7542 0.01 0 VSR-LKH 10/10 426 426 0 1 10/10 7542 7542 0.02 0 NeuroLKH_R 426 10/10 426 426 0 1 7542 10/10 7542 7542 0.02 0 NeuroLKH_M 10/10 426 426 0 1 10/10 7542 7542 0.02 0 LKH st70 10/10 675 675 0.01 1 eil76 10/10 538 538 0 1 VSR-LKH 10/10 675 675 0.01 1 10/10 538 538 0 1 NeuroLKH_R 675 10/10 675 675 0.01 1 538 10/10 538 538 0 1 NeuroLKH_M 10/10 675 675 0.01 1 10/10 538 538 0 1 LKH pr76 10/10 108159 108159 0.02 1 rat99 10/10 1211 1211 0 1 VSR-LKH 10/10 108159 108159 0.02 1 10/10 1211 1211 0 1 NeuroLKH_R 108159 10/10 108159 108159 0.02 1 1211 10/10 1211 1211 0.01 1 NeuroLKH_M 10/10 108159 108159 0.02 1 10/10 1211 1211 0 1 LKH kroA100 10/10 21282 21282 0.02 1 kroB100 10/10 22141 22141 0.03 1.2 VSR-LKH 10/10 21282 21282 0.01 1 10/10 22141 22141 0.04 2.5 NeuroLKH_R 21282 10/10 21282 21282 0.01 1 22141 10/10 22141 22141 0.03 1 NeuroLKH_M 10/10 21282 21282 0.01 1 10/10 22141 22141 0.03 1 LKH kroC100 10/10 20749 20749 0.01 1 kroD100 10/10 21294 21294 0.02 1.8 VSR-LKH 10/10 20749 20749 0.02 1 10/10 21294 21294 0.02 1 NeuroLKH_R 20749 10/10 20749 20749 0.02 1 21294 10/10 21294 21294 0.02 1 NeuroLKH_M 10/10 20749 20749 0.02 1 10/10 21294 21294 0.02 1 LKH kroE100 10/10 22068 22068 0.03 3.2 rd100 10/10 7910 7910 0 1 VSR-LKH 10/10 22068 22068 0.06 8.5 10/10 7910 7910 0 1 NeuroLKH_R 22068 10/10 22068 22068 0.03 1 7910 10/10 7910 7910 0.01 1 NeuroLKH_M 10/10 22068 22068 0.04 4.8 10/10 7910 7910 0.01 1 LKH eil101 10/10 629 629 0 1 lin105 10/10 14379 14379 0 1 VSR-LKH 10/10 629 629 0 1 10/10 14379 14379 0 1 NeuroLKH_R 629 10/10 629 629 0 1 14379 10/10 14379 14379 0 1 NeuroLKH_M 10/10 629 629 0 1 10/10 14379 14379 0 1 LKH pr107 10/10 44303 44303 0.13 1 pr124 10/10 59030 59030 0.04 1 VSR-LKH 10/10 44303 44303 0.13 1 10/10 59030 59030 0.04 1 NeuroLKH_R 44303 10/10 44303 44303 0.14 1.1 59030 10/10 59030 59030 0.07 1 NeuroLKH_M 10/10 44303 44303 0.13 1 10/10 59030 59030 0.06 1 LKH bier127 10/10 118282 118282 0.01 1 ch130 10/10 6110 6110 0.03 1 VSR-LKH 10/10 118282 118282 0.02 1 10/10 6110 6110 0.07 7.3 NeuroLKH_R 118282 4/10 118282 118300.6 0.13 102.5 6110 10/10 6110 6110 0.02 1.1 NeuroLKH_M 10/10 118282 118282 0.01 1 10/10 6110 6110 0.03 2.1 LKH pr136 10/10 96772 96772 0.08 1 pr144 10/10 58537 58537 0.37 1 VSR-LKH 10/10 96772 96772 0.08 1 10/10 58537 58537 0.43 1 NeuroLKH_R 96772 10/10 96772 96772 0.15 4.5 58537 1/10 58537 58584.7 2.6 131.8 NeuroLKH_M 10/10 96772 96772 0.11 1 2/10 58537 58614 2.31 122.3 LKH ch150 10/10 6528 6528 0.04 1.7 kroA150 10/10 26524 26524 0.05 3.8 VSR-LKH 10/10 6528 6528 0.02 1 10/10 26524 26524 0.04 1 NeuroLKH_R 6528 10/10 6528 6528 0.02 1.1 26524 10/10 26524 26524 0.04 2.6 NeuroLKH_M 10/10 6528 6528 0.02 1.1 10/10 26524 26524 0.02 1 LKH pr152 10/10 73682 73682 0.48 29.4 u159 10/10 42080 42080 0.01 1 VSR-LKH 8/10 73682 73709.2 0.69 47 10/10 42080 42080 0.01 1 NeuroLKH_R 73682 8/10 73682 73709.2 1.44 59.6 42080 10/10 42080 42080 0.01 1 NeuroLKH_M 9/10 73682 73695.6 0.87 38.7 10/10 42080 42080 0.01 1

Table S.3: TSPLIB results for each easy instance

-2.5cm Method Name Success Best Average Time Trials Name Success Best Average Time Trials LKH d198 10/10 15780 15780 0.57 1 kroA200 10/10 29368 29368 0.06 1.7 VSR-LKH 10/10 15780 15780 0.43 1 10/10 29368 29368 0.06 1.5 NeuroLKH_R 15780 0/10 15789 15825 2.54 198 29368 10/10 29368 29368 0.05 1 NeuroLKH_M 10/10 15780 15780 0.87 1 10/10 29368 29368 0.04 1 LKH kroB200 10/10 29437 29437 0.02 1 ts225 10/10 126643 126643 0.04 1 VSR-LKH 10/10 29437 29437 0.03 1 10/10 126643 126643 0.02 1 NeuroLKH_R 29437 10/10 29437 29437 0.02 1 126643 10/10 126643 126643 0.06 1 NeuroLKH_M 10/10 29437 29437 0.02 1 10/10 126643 126643 0.06 1 LKH tsp225 10/10 3916 3916 0.06 1 pr226 10/10 80369 80369 0.08 1 VSR-LKH 10/10 3916 3916 0.07 1 10/10 80369 80369 0.1 13.3 NeuroLKH_R 3916 10/10 3916 3916 0.06 1 80369 6/10 80369 80381.7 1.34 146.2 NeuroLKH_M 10/10 3916 3916 0.06 1 10/10 80369 80369 0.22 5.9 LKH gil262 10/10 2378 2378 0.14 10.6 pr264 10/10 49135 49135 0.24 14.4 VSR-LKH 10/10 2378 2378 0.05 1.7 10/10 49135 49135 0.19 1 NeuroLKH_R 2378 10/10 2378 2378 0.13 8 49135 10/10 49135 49135 0.13 6.2 NeuroLKH_M 10/10 2378 2378 0.05 2.2 10/10 49135 49135 0.09 2.4 LKH a280 10/10 2579 2579 0.03 1 lin318 10/10 42029 42029 0.23 27.9 VSR-LKH 10/10 2579 2579 0.02 1 10/10 42029 42029 0.09 1.8 NeuroLKH_R 2579 10/10 2579 2579 0.02 1 42029 10/10 42029 42029 0.18 3.6 NeuroLKH_M 10/10 2579 2579 0.03 1 10/10 42029 42029 0.15 5.9 LKH rd400 10/10 15281 15281 0.23 33 fl417 10/10 11861 11861 2.69 7.3 VSR-LKH 10/10 15281 15281 0.23 11.6 10/10 11861 11861 1.91 3.7 NeuroLKH_R 15281 10/10 15281 15281 0.11 3.9 11861 5/10 11861 11867.6 16.64 337.2 NeuroLKH_M 10/10 15281 15281 0.12 4.7 9/10 11861 11861.1 16.7 51.7 LKH pr439 10/10 107217 107217 0.59 39.5 pcb442 10/10 50778 50778 0.16 8.2 VSR-LKH 10/10 107217 107217 0.44 22.1 10/10 50778 50778 0.07 3 NeuroLKH_R 107217 3/10 107217 107267.4 1.64 320.1 50778 10/10 50778 50778 0.11 3.8 NeuroLKH_M 9/10 107217 107224.2 0.71 90.3 10/10 50778 50778 0.18 6.9 LKH u574 10/10 36905 36905 0.8 149.9 p654 10/10 34643 34643 7.04 22.9 VSR-LKH 10/10 36905 36905 0.39 29.2 10/10 34643 34643 4.28 9 NeuroLKH_R 36905 10/10 36905 36905 0.2 3.8 34643 1/10 34643 34765.8 40.27 619 NeuroLKH_M 10/10 36905 36905 0.11 1.9 10/10 34643 34643 2.63 7 LKH d657 10/10 48912 48912 0.48 33.5 u724 10/10 41910 41910 1.53 125.4 VSR-LKH 10/10 48912 48912 0.44 21 10/10 41910 41910 0.85 23.3 NeuroLKH_R 48912 5/10 48912 48912.5 6.65 511.5 41910 10/10 41910 41910 0.94 46.6 NeuroLKH_M 10/10 48912 48912 0.39 10 10/10 41910 41910 0.64 16.8 LKH rat783 10/10 8806 8806 0.08 4.2 d1291 10/10 50801 50801 6.27 192.1 VSR-LKH 10/10 8806 8806 0.11 3.9 10/10 50801 50801 2.51 39.5 NeuroLKH_R 8806 10/10 8806 8806 0.14 4.2 50801 9/10 50801 50803.4 5.64 274.4 NeuroLKH_M 10/10 8806 8806 0.21 12.2 7/10 50801 50808.2 9.46 437.4 LKH u1432 10/10 152970 152970 0.43 5.3 d1655 10/10 62128 62128 5.44 176 VSR-LKH 10/10 152970 152970 0.55 5 10/10 62128 62128 0.94 9.8 NeuroLKH_R 152970 10/10 152970 152970 0.56 7.1 62128 8/10 62128 62128.2 22.22 870.4 NeuroLKH_M 10/10 152970 152970 0.43 3.8 10/10 62128 62128 7.86 214.1 LKH u2319 10/10 234256 234256 0.46 3.1 pr2392 10/10 378032 378032 0.4 5.8 VSR-LKH 10/10 234256 234256 0.89 3.9 10/10 378032 378032 0.78 8.7 NeuroLKH_R 234256 10/10 234256 234256 0.67 3.5 378032 10/10 378032 378032 1.22 25.5 NeuroLKH_M 10/10 234256 234256 0.37 2.6 10/10 378032 378032 1.31 25.9

Table S.4: TSPLIB results for each easy instance (continued)

-2.5cm LKH with 100 trials as time limit LKH with 1000 trials as time limit LKH with 10000 trials as time limit Name Method Time Best Avg Suc Time Best Avg Suc Time Best Avg Suc X-n101-k25 LKH 1.2 27744 28214.5 0 13 27591 27794.3 6 131 27591 27667.0 33 27591 NeuroLKH 27665 28146.6 0 27591 27790.3 5 27591 27669.5 30 X-n106-k14 LKH 0.9 26495 26730.1 0 10 26426 26557.6 0 105 26381 26438.3 0 26362 NeuroLKH 26447 26712.8 0 26396 26528.5 0 26381 26428.5 0 X-n110-k13 LKH 0.4 14971 15216.2 2 3 14971 15073.3 31 29 14971 15020.4 53 14971 NeuroLKH 14971 15207.3 2 14971 15074.2 32 14971 15022.3 58 X-n115-k10 LKH 0.2 12750 12838.3 0 2 12747 12778.3 14 17 12747 12770.3 46 12747 NeuroLKH 12747 12837.6 1 12747 12783.9 14 12747 12771.8 40 X-n120-k6 LKH 0.3 13332 13547.4 1 2 13332 13394.3 10 21 13332 13358.6 40 13332 NeuroLKH 13333 13519.9 0 13332 13389.7 5 13332 13352.9 33 X-n125-k30 LKH 3.1 56167 56690.8 0 31 55733 56041.8 0 335 55546 55813.0 0 55539 NeuroLKH 56011 56624.7 0 55645 55981.7 0 55539 55779.7 1 X-n129-k18 LKH 0.8 29173 29635.5 0 8 28967 29257.5 0 86 28948 29108.8 0 28940 NeuroLKH 29160 29566.1 0 28948 29224.3 0 28948 29081.3 0 X-n134-k13 LKH 1.1 11024 11215.7 0 10 10931 11048.8 0 94 10916 10994.8 1 10916 NeuroLKH 11023 11194.9 0 10941 11044.6 0 10916 10987.1 1 X-n139-k10 LKH 0.4 13670 13894.9 0 3 13605 13713.6 0 33 13590 13660.4 5 13590 NeuroLKH 13672 13871.1 0 13605 13702.6 0 13590 13657.0 6 X-n143-k7 LKH 0.5 15765 16186.5 0 5 15737 15910.4 0 50 15711 15812.4 0 15700 NeuroLKH 15781 16208.1 0 15726 15885.9 0 15726 15787.3 0 X-n148-k46 LKH 0.9 43833 44382.4 0 9 43485 43819.2 0 89 43448 43635.2 18 43448 NeuroLKH 43809 44283.0 0 43514 43818.1 0 43448 43634.7 19 X-n153-k22 LKH 1.7 21328 21559.2 0 15 21236 21326.8 0 156 21225 21263.6 0 21220 NeuroLKH 21298 21493.7 0 21240 21311.1 0 21225 21272.1 0 X-n157-k13 LKH 0.5 16903 17008.7 0 4 16876 16911.0 8 40 16876 16893.4 40 16876 NeuroLKH 16900 17006.8 0 16876 16904.9 14 16876 16889.0 52 X-n162-k11 LKH 0.3 14179 14362.6 0 3 14156 14225.2 0 26 14138 14196.8 6 14138 NeuroLKH 14190 14388.8 0 14161 14245.3 0 14138 14213.9 2 X-n167-k10 LKH 0.6 20826 21319.8 0 7 20583 20863.2 0 65 20557 20749.5 1 20557 NeuroLKH 20687 21270.5 0 20592 20857.8 0 20557 20740.3 1 X-n172-k51 LKH 1.2 46141 46679.2 0 11 45742 46078.0 0 122 45607 45840.5 5 45607 NeuroLKH 46134 46533.1 0 45707 45994.7 0 45607 45783.9 3 X-n176-k26 LKH 3.6 48035 48819.7 0 33 47930 48273.6 0 353 47840 48090.3 0 47812 NeuroLKH 48147 48726.3 0 47950 48279.7 0 47812 48098.9 1 X-n181-k23 LKH 0.5 25677 25829.7 0 4 25611 25691.2 0 42 25582 25645.3 0 25569 NeuroLKH 25691 25822.8 0 25603 25685.9 0 25577 25641.2 0 X-n186-k15 LKH 1.0 24297 24882.6 0 10 24227 24528.3 0 104 24149 24359.6 0 24145 NeuroLKH 24511 24911.5 0 24178 24523.0 0 24147 24361.7 0 X-n190-k8 LKH 0.9 17187 17418.0 0 8 17065 17275.4 0 84 16993 17155.2 0 16980 NeuroLKH 17160 17410.0 0 17041 17259.8 0 16985 17145.1 0 X-n195-k51 LKH 1.4 44911 45594.9 0 11 44437 44799.6 0 117 44334 44558.1 0 44225 NeuroLKH 44685 45244.5 0 44422 44688.0 0 44237 44524.8 0 X-n200-k36 LKH 4.3 59329 59984.3 0 39 58919 59174.2 0 405 58643 58927.5 0 58578 NeuroLKH 59229 59803.9 0 58844 59104.6 0 58694 58937.4 0

Table S.5: CVRPLIB results

-2.5cm LKH with 100 trials as time limit LKH with 1000 trials as time limit LKH with 10000 trials as time limit Name Method Time Best Avg Suc Time Best Avg Suc Time Best Avg Suc X-n204-k19 LKH 0.6 19795 20159.5 0 5 19718 19880.5 0 49 19662 19777.9 0 19565 NeuroLKH 19794 20076.7 0 19692 19857.7 0 19583 19776.3 0 X-n209-k16 LKH 0.9 31259 31648.1 0 9 30818 31214.9 0 93 30700 31028.9 0 30656 NeuroLKH 31163 31555.8 0 30864 31140.2 0 30722 30969.3 0 X-n214-k11 LKH 2.6 11727 12131.2 0 23 11147 11487.5 0 229 10974 11182.9 0 10856 NeuroLKH 11702 12128.5 0 11235 11498.2 0 10988 11214.2 0 X-n219-k73 LKH 1.4 117821 118242.7 0 10 117595 117790.2 1 101 117595 117684.3 3 117595 NeuroLKH 117046 117998.3 0 117622 117733.2 0 117595 117654.8 4 X-n223-k34 LKH 1.4 41250 41880.6 0 12 40766 41087.1 0 127 40560 40818.7 0 40437 NeuroLKH 41066 41662.6 0 40641 41022.8 0 40563 40821.3 0 X-n228-k23 LKH 1.6 26051 26541.4 0 15 25863 26037.7 0 150 25781 25910.4 0 25742 NeuroLKH 26067 26614.9 0 25835 26030.5 0 25791 25907.7 0 X-n233-k16 LKH 0.5 19615 19885.4 0 4 19379 19599.1 0 39 19305 19477.2 0 19230 NeuroLKH 19499 19831.7 0 19381 19597.4 0 19324 19473.2 0 X-n237-k14 LKH 0.8 27381 27829.6 0 7 27164 27406.5 0 65 27050 27276.5 0 27042 NeuroLKH 27324 27789.5 0 27124 27402.8 0 27042 27240.0 1 X-n242-k48 LKH 2.2 84353 85218.4 0 19 83419 83826.9 0 198 83045 83401.3 0 82751 NeuroLKH 84090 84685.6 0 83299 83743.7 0 83042 83357.2 0 X-n247-k50 LKH 2.8 37681 38206.5 0 26 37353 37701.6 0 280 37289 37457.1 0 37274 NeuroLKH 37629 38118.3 0 37326 37638.8 0 37292 37454.3 0 X-n251-k28 LKH 1.3 39394 39831.8 0 11 39010 39274.9 0 117 38838 39067.3 0 38684 NeuroLKH 39277 39720.2 0 38988 39259.8 0 38887 39069.3 0 X-n256-k16 LKH 2.4 19931 20953.7 0 17 19150 19519.9 0 148 18926 19164.9 0 18839 NeuroLKH 19681 20730.2 0 19046 19433.9 0 18889 19143.3 0 X-n261-k13 LKH 1.2 27395 27891.3 0 13 26966 27367.3 0 150 26686 27104.9 0 26558 NeuroLKH 27174 27746.3 0 26749 27308.2 0 26661 27074.4 0 X-n266-k58 LKH 4.1 77457 78371.5 0 35 76117 76718.3 0 359 75803 76193.4 0 75478 NeuroLKH 76864 77879.7 0 76175 76582.7 0 75876 76187.2 0 X-n270-k35 LKH 1.7 35999 36580.2 0 14 35513 35870.2 0 142 35407 35613.1 0 35291 NeuroLKH 35808 36425.9 0 35509 35817.5 0 35424 35598.2 0 X-n275-k28 LKH 0.7 21455 21784.3 0 5 21304 21524.7 0 50 21245 21422.8 1 21245 NeuroLKH 21515 21715.7 0 21320 21512.0 0 21281 21424.8 0 X-n280-k17 LKH 2.2 34230 34932.1 0 22 33790 34218.8 0 229 33633 33943.8 0 33503 NeuroLKH 34071 34844.4 0 33699 34178.3 0 33632 33943.0 0 X-n284-k15 LKH 0.7 20917 21194.0 0 7 20580 20862.8 0 76 20381 20655.2 0 20215 NeuroLKH 20903 21199.4 0 20609 20849.8 0 20455 20639.5 0 X-n289-k60 LKH 5.9 97877 99666.8 0 53 96381 97129.8 0 557 95687 96226.0 0 95151 NeuroLKH 97731 99084.0 0 96163 96998.4 0 95754 96154.4 0 X-n294-k50 LKH 2.1 48490 49351.2 0 17 47575 48009.5 0 176 47381 47644.4 0 47161 NeuroLKH 48093 48990.2 0 47550 47914.8 0 47354 47616.2 0 X-n298-k31 LKH 1.7 35568 36543.4 0 12 34732 35199.4 0 121 34343 34764.9 0 34231 NeuroLKH 35380 36292.7 0 34656 35113.6 0 34320 34763.9 0

Table S.6: CVRPLIB results (continued)

-2.5cm LKH with 100 trials as time limit LKH with 1000 trials as time limit LKH with 10000 trials as time limit Name Method Time Best Avg Suc Time Best Avg Suc Time Best Avg Suc R201 LKH 0.6 1252372 1275464.5 1 4 1252372 1258897.3 8 35 1252372 1254027.4 21 1252372 NeuroLKH 1253210 1271983.1 0 1252372 1257114.3 5 1252372 1253575.6 11 R202 LKH 4.5 1195297 1234362.3 0 33 1191698 1207016.3 19 283 1191698 1197334.8 82 1191698 NeuroLKH 1193776 1221507.1 0 1191698 1204530.3 31 1191698 1193964.5 87 R203 LKH 2.1 947357 964214.4 0 16 941996 948987.2 0 143 939504 943864.1 6 939504 NeuroLKH 943363 957044.9 0 941405 947506.7 0 939504 943832.2 3 R204 LKH 4.9 836241 879723.0 0 37 829440 846041.2 0 320 825510 838430.7 2 825510 NeuroLKH 838945 875614.2 0 825510 846814.1 1 825510 837939.1 8 R205 LKH 1.4 994429 1046294.2 1 11 994429 1024682.4 8 95 994429 1014571.8 40 994429 NeuroLKH 1003685 1038416.6 0 994429 1022870.4 6 994429 1009598.9 45 R206 LKH 1.5 913333 942722.6 0 12 909820 926079.9 0 104 906145 918597.5 19 906145 NeuroLKH 913333 940668.2 0 906145 925617.6 2 906145 918009.4 24 R207 LKH 6.0 908532 965102.0 0 51 894793 929064.0 0 445 890608 915756.4 1 890608 NeuroLKH 903583 956950.3 0 893384 924560.7 0 890608 913160.8 5 R208 LKH 2.0 726817 751164.7 2 15 726817 736114.7 9 125 726817 731161.9 16 726817 NeuroLKH 727258 744832.5 0 726817 733925.3 6 726817 730790.3 9 R209 LKH 1.7 918711 946581.2 0 14 913141 927854.0 0 113 909158 920110.0 7 909158 NeuroLKH 914609 935769.5 0 909158 922974.0 2 909158 917506.1 9 R210 LKH 1.7 951624 979061.5 0 13 939373 959573.6 1 114 939373 953584.1 20 939373 NeuroLKH 939373 967980.8 1 939373 955815.6 9 939373 950722.1 39 R211 LKH 5.1 910853 963151.8 0 44 893168 926350.7 0 378 890930 914120.6 2 890930 NeuroLKH 909830 956837.2 0 892988 923050.2 0 890930 912125.8 2

Table S.7: Solomon benchmark results