1 Introduction
The TravelingSalespersonProblem (TSP) is an intriguing fundamental and wellstudied hard optimization problem. Given a complete graph the TSP asks for a Hamiltonian cycle of minimum length, i.e., a roundtrip salesperson tour that visits each node exactly once before ending at the start node. In the Euclidean TSP (ETSP) nodes are associated with point coordinates in the Euclidean space and pairwise (symmetric) internode distances are given by the Euclidean distance; the ETSP remains hard.
Since its introduction in 1930 a body of knowledge has been built around the TSP. As a consequence, a plethora of methods has been developed ranging from sophisticated exact solvers (guarantee to find an optimum) to fast heuristic algorithmic approaches with no performance guarantees at all. In the domain of exact TSP solving, the branchandcut based Concorde solver by [1] is the state of the art. However, even though instances with hundreds of nodes can be solved within seconds [7], no guarantees for reasonable runtime can be given for large instances.
For the ETSP, distances adhere to the triangle inequality induced by the Euclidean metric. This property can be leveraged to come up with approximation algorithms. For a minimization problem, an approximation algorithm guarantees that the tour length of on a problem instance is at most , where is the optimal tour length. Christofides [6] introduced an algorithm that achieves an approximation factor of , which is the best constant approximation factor known for the ETSP.
Celebrated work by Arora [2] and – independently – by Mitchell [19] introduced a Polynomial Time Approximation Schema (PTAS) algorithm for the metric TSP which guarantees to produce solutions of quality at most for each constant in polynomial time. However, the algorithms are highly sophisticated and to the best of our knowledge no practical implementation of this PTAS is available. Moreover, PTAS naturally suffers from impractical runtimes if is increased – in other words: a reduction of goes hand in hand with a dramatic increase of the polynomial degree.
In recent years tremendous advances in heuristic TSP solving have been made where no formal performance guarantee can be given. Nevertheless the two best performing heuristics, LKH by [9] (based on sophisticated
opt moves and the LinKernighan heuristic) and the genetic algorithm EAX by
[20] (a genetic algorithm adopting the eponymous edge assembly crossover operator and sophisticated diversity preservation) solve instances with thousands of nodes to optimality within reasonable time limits [11]. The respective restart versions LKH+r and EAX+r, which trigger a restart once the internal stopping conditions of the respective vanilla versions have been satisfied, pose the state of the art in inexact TSP solving [14, 13]. Recent endeavours by [22] extend both solvers by a sophisticated crossover operator – the generalized partition crossover (GPX2) – which has shown superior performance over the vanilla version of EAX on large instances with node numbers in the fivedigit range.In the field of perinstance algorithm selection (AS) – see, e.g., the survey of [12] for further details – the goal is to build a model which automatically selects the bestperforming algorithm from a portfolio of algorithms with complementary performances. In case of the ETSP, the complementary behavior of EAX(+r) and LKH(+r) across a wide range of problem instances has been leveraged in several studies in recent years [14, 13]. Both of these works focused on optimality of the found solutions, i.e., the runtime until a solver found a tour of optimal (i.e., minimal) length was measured, and runs which did not succeed (within the given time of one hour) were penalized.
Despite their success in solving ETSP instances up to optimality, little is known about the empirical approximation qualities, i.e., the anytime behaviour, of LKH+r and EAX+r building on the concept of empirical runtime distributions [10]. Our work will thus shed light on the relationship between runtimes and respective approximation qualities. This is conceptually similar to the commonly accepted benchmarking practice in singleobjective continuous optimization on the BlackBox Optimization Benchmark (BBOB, [8]). It should be noted, however, that despite simultaneous consideration of runtime and solution quality, the herein considered analysis of the anytime behavior of a solver (w.r.t. its solution quality) differs substantially from the multiobjective approach that was taken in [5]. Our analysis is supported by an extensive study on a wide range of TSP instances from the literature with , and nodes, respectively. More precisely, we pursue three research questions in this work:

Given different values and time limits
, what is the probability to calculate an
approximate solution for different variants of LKH+r and EAX+r on a large and diverse set of ETSP instances in time ? 
Which approximation quality can we expect given a time limit ?

How can automated algorithm selection approaches make use of information on anytime behaviour of TSP solvers?
These questions are addressed in the remainder of the paper which is organized as follows. Section 2 introduces the methodology underlying the experimental study, including problem instances and algorithms. Results are presented and analyzed in Section 3, and the paper concludes with a discussion of promising future research directions in Section 4.
2 Methodology
In this section we detail the setup of our experimental study.
2.1 Problem instances
Several ASstudies revealed – which is in line with intuition – that characteristics of TSP problems, such as clustering properties or the depth of a minimum spanning tree, may have a strong impact on the running time until an optimal solution is found [13]. It is legitimate to assume that this will also be true for the approximation case. However, feature impacts and relations may change. Thus, to investigate – and ideally support – our assumptions empirically, we conducted an experimental study across a wide range of different ETSP instances. For better comparability of our results, our setup is aligned with previous studies [14, 13, 16] and hence covers the following TSP sets:

Classical Random Uniform Euclidean instances where point coordinates are spread uniformly at random in the bounded Euclidean space .

In [17] this type of strongly clustered instances of size was proposed. For a given number of clusters , respective cluster centers are placed wellspread in the Euclidean plane (in ) by LatinHypercubeSampling (LHS). Subsequently, cities are sampled around each cluster center assuring cluster segregation.

A morphed instance originates from combining a rue with a netgen instance of equal size. First, an optimal weighted point matching is calculated between the point coordinates of both instances. Next, the matched points are used to calculate new points by convex combination of the coordinates of the matched points. This approach was introduced in [18] and later improved in [17].

Instances were generated by sequential application of mutation to an initial rue instance as proposed by [4]. For each mutation, a random subset of points is selected and rearranged by means of “creative” mutation operators, e.g., a gridmutation, which aligns a random subset of points in a grid structure. These operators are inspired by observations on realworld instances (e.g., from Very Large Scale Integration, VLSI) and meant to produce instances that are structurally heterogeneous.

TSP instances evolved by means of an evolutionary algorithm which minimizes the ratio of PenalizedAverageRuntime (PAR,
[3]) scores^{1}^{1}1The PAR score is the average running time of independent solver runs until an optimal solution was found. Runs which are not successful within time are penalized with where is a penalty factor usually set to 10. of solvers EAX+r and LKH+r producing instances that are easy for one and hard(er) for the competitor. The set of evolved instances considered within this work is taken from [4].
For instance sets rue, netgen, morphed and tspgen we each consider 150 instances of size . Subsets of 30 instances thus contain clusters for instances of type netgen. The evolved instances – taken from [4] – are restricted to nodes^{2}^{2}2Those instances were generated by an evolutionary algorithm where a single fitness function evaluation requires (1) a call of the exact Concorde solver and (2) multiple runs of LKH+r and EAX+r respectively. This becomes computationally very expensive for .. There are each 100 instances which are easy for EAX+r and LKH+r respectively. Summing up, in total, our benchmark set constitutes ETSP instances. Note that we intentionally do not include instances from the wellknown TSPLIB [21] benchmark set. To make wellfounded statements about the research questions addressed in this work we require a large and systematic set of instances from different classes of equal size. However, TSPLIB instances are very heterogeneous in both size and structure which does not allow for proper evaluation.
2.2 Considered Algorithms
In total our study considers six different solvers for the ETSP. The first four are restart variants of LKH [9] while the latter two are restart variants of EAX [20]^{3}^{3}3Restart variants trigger a cold restart once the internal stopping conditions are hit. This modification to LKH and EAX was introduced in [14].. In particular these variants incorporate generalized partition crossover (GPX2) into the algorithms [22].

The LKH algorithm is an iterated local search algorithm that heuristically generates opt moves. A powerful improvement of LKH was the introduction of multitrial LKH, where several solutions originating from soft restarts of the LinKernigham heuristic are recombined by a partition crossover operator named Iterative Partial Transcription (IPT). A recent proposal replaces IPT by the alternative crossover operator GPX2. Additionally, LKH v2.0.9 allows to use both IPT and GPX2 in sequence. Therefore, we consider the four restarts variants LKH+r (IPT), i.e., the vanilla version of LKH+r, LKH+r (GPX), LKH+r (IPT+GPX) and LKH+r (GPX+IPT).

EAX is a powerful genetic algorithm which uses the Edge Assembly Crossover (EAX) operator to combine two parents. The operator is designed to keep as many edges from the parents as possible and introduces only a few short edges to complete the tour. The EAX algorithm is a strategy with a sophisticated diversity preservation technique based on edge entropy to prevent the algorithm from premature convergence. We use the restart version EAX+r and additionally consider a modified version where individuals created by EAX+r are further recombined by applying GPX2. It should be noted that our modification is more straightforward than the different variants introduced in [22].
2.3 Estimation of Probabilities
Next, we describe the process of probability estimation. Given a value
, a timelimit , a stochastic algorithm , and an instance with optimal tour length OPT, we denote the probability to reach a solution of desired quality within the timelimit as . Given is stochastic and the trajectories of independent runs of that algorithm on instance are available, the probability can be estimated by the relative number of runs that succeeded in finding an approximation within , i.e.,Here, is the incumbent solution of on in the th run after time and is the indicator function evaluating to if its argument is true. Given a set of instances, an estimator for the success probability on the set is the average probability over all instances in , i.e.,
3 Experiments
3.1 Experimental Setup
Each of the six algorithms was run on each instance times with different random seeds in order to account for stochasticity. Throughout those experiments, we used a cutoff time of seconds (i.e., one hour). The algorithms log the incumbent, i.e., bestsofar, solution in every run.
As all results for the TSP sets netgen, morphed and tspgen were qualitatively similar, we combined the respective information into a single set called “structured”.
3.2 Perspective 1: First hitting times
In practical applications, often an upper bound on solver performance is desired in order to realistically assess the worst case scenario. Therefore, it is of interest how long it will take a solver at most to find a approximation of the true optimum. The chosen solvers were executed for a variety of approximation gaps and for each combination with TSP set and instance sizes . Note that in this part of our study, we analyze the algorithms’ performances across a wide range of approximation factors, . Thereby, we get a comprehensive overview of the behavior of the different algorithms, and later on can zoom in on more relevant areas. We take a pessimistic perspective and estimate the first hitting time as the maximum time needed by each algorithm to find a solution of the corresponding quality for the first time across all instances and runs of each instance set and size respectively.
As depicted in Fig. 1, both EAX variants are extremely fast for large values of , i.e., their solutions at early stages of the runs are already of very good quality. In fact, for , EAX+r (GPX) had the lowest first hitting times across all eight considered scenarios, which supports the effectiveness of the sophisticated crossover operator GPX. More precisely, for small () and mediumsized () instances, all runs of EAX+r (GPX) found a tour that is at most longer than the optimal tour within less than a second. On larger instances (), this optimizer needed just slightly more than a second. Yet, it is also observable that the classical EAX+r performs better than its GPXcounterpart for decreasing values and even outperforms it for all approximation gaps .
Further noticeable findings can be derived for LKH and its variants. First, the trajectories of all four LKH variants are almost identical across all the investigated scenarios, implying that no substantial differences among the considered versions regarding latest first hitting times could be detected. Moreover, with the exception of the mediumsized and large structured TSP instances with nodes, LKH performs at least as well as EAX within the midrange approximation factors . However, for the small approximation factors, LKH is again inferior to EAX – except for the problems that were specifically tailored in favor of LKH.
Thus, depending on the desired approximation quality, one could provide a threefold recommendation: if the acceptable approximationquality is rather large () EAX+r (ideally its GPXenhanced version) is preferable, for midrange values of one should rather use one of the LKH variants, and for very small approximation factors () one should consider the classical EAX+r. However, it has to be kept in mind that these findings solely focus the worst case scenario.
3.3 Perspective 2: Probability of Success
While the previous subsection focused on the latest first hitting time, i.e., a worst case analysis, we are now interested in the solvers’ average case performances. Of course, previous studies already addressed the average case as well, but usually with a focus on (penalized) aggregations of runtimes. Unfortunately, those runtimes only considered whether an algorithm found a tour of optimal length – which corresponds to . Further information on the solver’s performance, such as whether it failed (to find an optimal tour) by orders of magnitude or just by an infinitesimal deviation was neglected. In the following, we will overcome this “knowledge gap” by comparing the performances of a total of six versions of EAX+r and LKH+r across different TSP sets and approximation factors. More precisely, within this subsection we investigate the change of an algorithm’s average success probability over time for different approximation gaps. For small instances () we used and otherwise . This choice of values was considered sufficiently large, as for the largest values in those intervals the average success probability of most algorithms already converged to 1.0 within the investigated time.
The results for the TSP instances with nodes are depicted in Fig. 2 and also listed in Tab. 1. As the performances of all four LKH variants are very similar only the results for LKH+r’s default version, i.e., LKH+r (IPT) are provided as a representative. Interestingly, LKH+r outperforms both EAX variants on the TSP set “easy for LKH+r” across all considered combinations of time steps and approximation gaps, although those instances have been generated for just one specific pair of maximum runtime ( minutes) and approximation quality () as detailed in [4]. These findings are reflected by the median probability curves depicted in Fig. 2
. As shown in the second row of this plot, the curve of LKH+r is located in the very topleft corner, implying that it has achieved a high success probability (w.r.t. the respective approximation quality) within seconds. In contrast, the “tubes” associated with the EAX+r variants are rather broad and spread diagonally across all four images. That is, their median success probabilities show a high variance and both solvers require much more time to achieve feasible success probabilities (closer to 1.00) on those instances. The superiority of LKH+r over the two EAX+r versions is also confirmed by pairwise Wilcoxontests to a significance level of
(as indicated by the and in the last column of Tab. 1).On the other hand, the two EAX versions are absolutely superior to LKH+r on the “easy for EAX+r” problems. In fact, the tube of LKH+r basically covers the majority of each of the images in the first row of Fig. 2.
We further noticed that while EAX+r and its variant performs well on all sets except for “easy for LKH”, LKH+r exhibits clear preferences – ranging from very poor performances on the EAXtailored instances, via mediocre behavior on the structured instances, up to good performances on rue, and (of course) the LKHtailored problems. This clearly indicates that the structural properties of an instance, i.e., its node alignment, strongly affect the optimization behavior of LKH.
EAX+r (1)  EAX+r (GPX) (2)  LKH+r (IPT) (3)  
Group  max  mean  std  stat  max  mean  std  stat  max  mean  std  stat  
500  10  1.00  0.00  ,  0  0.99  0.04  0.00  0.55  0.46  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.66  0.44  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.69  0.42  
500  10  1.00  0.00  ,  0  0.99  0.04  0.00  0.42  0.46  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.54  0.45  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.59  0.44  
500  10  1.00  0.00  ,  0  0.99  0.06  0.00  0.27  0.40  
500  50  1.00  0.00  0  1.00  0.01  0.00  0.38  0.43  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.43  0.43  
500  10  1.00  0.00  ,  0  0.99  0.06  0.00  0.24  0.38  
500  50  1.00  0.00  0  1.00  0.01  0.00  0.35  0.41  
easy for EAX+r  500  100  1.00  0.00  0  1.00  0.00  0.00  0.40  0.41  
500  10  0.77  0.34  0  0.68  0.37  0.00  1.00  0.01  ,  
500  50  0.90  0.20  0  0.85  0.28  0.00  1.00  0.00  ,  
500  100  0.95  0.14  0  0.92  0.18  0.00  1.00  0.00  ,  
500  10  0.57  0.39  0  0.48  0.38  0.00  1.00  0.01  ,  
500  50  0.81  0.27  0  0.73  0.31  0.00  1.00  0.00  ,  
500  100  0.89  0.21  0  0.86  0.22  0.00  1.00  0.00  ,  
500  10  0.32  0.31  0  0.26  0.27  0.00  1.00  0.01  ,  
500  50  0.67  0.32  0  0.57  0.32  0.00  1.00  0.00  ,  
500  100  0.79  0.27  0  0.73  0.29  0.00  1.00  0.00  ,  
500  10  0.26  0.24  0  0.21  0.22  0.00  1.00  0.01  ,  
500  50  0.64  0.31  0  0.53  0.32  0.00  1.00  0.00  ,  
easy for LKH+r  500  100  0.78  0.27  0  0.69  0.31  0.00  1.00  0.00  ,  
500  10  1.00  0.01  ,  0  0.99  0.04  0.00  0.99  0.07  
500  50  1.00  0.00  0  1.00  0.00  0.00  1.00  0.03  
500  100  1.00  0.00  0  1.00  0.00  0.00  1.00  0.00  
500  10  1.00  0.01  ,  0  0.99  0.05  0.00  0.99  0.07  
500  50  1.00  0.00  0  1.00  0.00  0.00  1.00  0.03  
500  100  1.00  0.00  0  1.00  0.00  0.00  1.00  0.00  
500  10  1.00  0.01  ,  0  0.99  0.05  0.00  0.98  0.09  
500  50  1.00  0.00  0  1.00  0.00  0.00  1.00  0.04  
500  100  1.00  0.00  0  1.00  0.00  0.00  1.00  0.01  
500  10  1.00  0.01  ,  0  0.98  0.06  0.00  0.98  0.09  
500  50  1.00  0.00  0  1.00  0.00  0.00  1.00  0.04  
rue  500  100  1.00  0.00  0  1.00  0.00  0.00  1.00  0.01  
500  10  1.00  0.01  ,  0  1.00  0.03  0.01  0.91  0.22  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.97  0.13  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.98  0.11  
500  10  1.00  0.01  ,  0  0.99  0.04  0.01  0.89  0.24  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.96  0.15  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.98  0.13  
500  10  1.00  0.01  ,  0  0.99  0.04  0.01  0.87  0.26  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.96  0.16  
500  100  1.00  0.00  0  1.00  0.00  0.00  0.97  0.13  
500  10  1.00  0.01  ,  0  0.99  0.04  0.01  0.87  0.26  
500  50  1.00  0.00  0  1.00  0.00  0.00  0.96  0.16  
structured  500  100  1.00  0.00  0  1.00  0.00  0.00  0.97  0.13 
), standard deviation (
std) and results of pairwise Wilcoxontests for EAX+r, EAX+r (GPX) and LKH+r (IPT). A value in the stat column indicates that the results of the algorithm are statistically significant in comparison to algorithm . Results are split by instance group, and time for instances of size . Best mean values per row are highlighted in bold face.In contrast to our a priori expectations, we could not detect strong differences of a solver’s success probabilities across the different approximation qualities for the small instances () at hand. However, for larger instances with nodes the picture slightly changes as shown in Fig. 3 and Tab. 2. First, with increasing instance size and decreasing , the EAXvariant EAX+r (GPX) is more and more outperformed by its contenders. This effect is reflected by the large number of ’s in Tab. 2, which indicate a significantly better performance of the respective solver compared to EAX+r (GPX). Secondly, across all solvers one can observe an increase in variation of the success probabilities (i.e., wider tubes), as well as slightly increasing runtimes (i.e., shift to the right) for decreasing approximation factors . However, those findings are quite intuitive as diminishing gaps correspond to more accurate results – which are harder to find.
Observing Tab. 2 another pattern becomes visible: EAX+r performs significantly better than LKH+r on the structured instances. An even more interesting pattern can be observed for the (unstructured) rue instances. If the accepted approximation factor is rather small () and a sufficiently large time (s) is given, then EAX+r performs significantly better than LKH+r. However, if the instance size is large () and only a short amount of time is given (s), LKH+r is superior to both (considered) versions of EAX.
As a result, the ideal choice of (inexact) TSP optimization heuristic depends on a combination of (i) the magnitude of the allowed approximation gap, (ii) the size of the TSP instance, (iii) the given cutoff time for the solver, as well as (iv) the structure (i.e., node placement) of the instance itself. With the exception of (i), these findings are in line with the automated algorithm selection studies by [14, 13], in which the authors showed (for ) that an instance’s structural information can be efficiently exploited to select the best performing optimization algorithm for the instance at hand.
EAX+r (1)  EAX+r (GPX) (2)  LKH+r (IPT) (3)  
Group  max  mean  std  stat  max  mean  std  stat  max  mean  std  stat  
1000  10  1.00  0.00  0.00  0.92  0.13  0.00  1.00  0.00  
1000  50  1.00  0.00  0.00  1.00  0.00  0.00  1.00  0.00  
1000  100  1.00  0.00  0.00  1.00  0.00  0.00  1.00  0.00  
1000  10  0.97  0.08  0.00  0.27  0.24  0.00  0.93  0.18  
1000  50  1.00  0.00  ,  0.00  0.82  0.24  0.00  0.98  0.09  
1000  100  1.00  0.00  ,  0.00  0.93  0.16  0.00  0.99  0.07  
1000  10  0.89  0.18  0.00  0.16  0.20  0.00  0.84  0.27  
1000  50  1.00  0.02  ,  0.00  0.66  0.32  0.00  0.96  0.15  
1000  100  1.00  0.00  ,  0.00  0.81  0.26  0.00  0.97  0.11  
1000  10  0.88  0.19  0.00  0.15  0.20  0.00  0.83  0.27  
1000  50  1.00  0.03  ,  0.00  0.64  0.32  0.00  0.96  0.15  
1000  100  1.00  0.00  ,  0.00  0.79  0.27  0.00  0.97  0.11  
2000  10  0.43  0.17  0.04  0.00  0.00  0.00  1.00  0.00  ,  
2000  50  1.00  0.00  0.00  0.05  0.09  0.00  1.00  0.00  
2000  100  1.00  0.00  0.00  0.49  0.21  0.00  1.00  0.00  
2000  10  0.18  0.16  0.04  0.00  0.00  0.00  0.59  0.30  ,  
2000  50  0.99  0.05  ,  0.00  0.00  0.00  0.00  0.89  0.18  
2000  100  1.00  0.01  ,  0.00  0.01  0.03  0.00  0.95  0.12  
2000  10  0.06  0.10  0.04  0.00  0.00  0.00  0.34  0.28  ,  
2000  50  0.83  0.23  ,  0.00  0.00  0.00  0.00  0.71  0.32  
2000  100  0.94  0.13  ,  0.00  0.00  0.01  0.00  0.83  0.28  
2000  10  0.04  0.08  0.04  0.00  0.00  0.00  0.31  0.27  ,  
2000  50  0.78  0.25  ,  0.00  0.00  0.00  0.00  0.67  0.32  
rue  2000  100  0.91  0.17  ,  0.00  0.00  0.01  0.00  0.80  0.29  
1000  10  1.00  0.00  ,  0.00  0.99  0.06  0.03  0.97  0.11  
1000  50  1.00  0.00  0.00  1.00  0.00  0.00  0.99  0.06  
1000  100  1.00  0.00  0.00  1.00  0.00  0.00  1.00  0.04  
1000  10  0.99  0.07  ,  0.00  0.66  0.31  0.03  0.80  0.29  
1000  50  1.00  0.01  ,  0.00  0.96  0.13  0.00  0.94  0.17  
1000  100  1.00  0.01  ,  0.00  0.98  0.09  0.00  0.97  0.13  
1000  10  0.95  0.14  ,  0.00  0.48  0.34  0.03  0.68  0.35  
1000  50  1.00  0.03  ,  0.00  0.88  0.22  0.00  0.87  0.26  
1000  100  1.00  0.03  ,  0.00  0.94  0.17  0.00  0.92  0.22  
1000  10  0.94  0.15  ,  0.00  0.45  0.34  0.03  0.65  0.36  
1000  50  1.00  0.04  ,  0.00  0.87  0.24  0.00  0.86  0.27  
1000  100  1.00  0.03  ,  0.00  0.93  0.18  0.00  0.91  0.23  
2000  10  0.78  0.26  0.06  0.00  0.00  0.09  0.89  0.21  ,  
2000  50  1.00  0.00  ,  0.01  0.45  0.35  0.04  0.99  0.06  
2000  100  1.00  0.00  ,  0.01  0.87  0.19  0.04  1.00  0.03  
2000  10  0.51  0.28  ,  0.06  0.00  0.00  0.09  0.40  0.33  
2000  50  0.99  0.06  ,  0.01  0.03  0.10  0.04  0.80  0.29  
2000  100  1.00  0.02  ,  0.01  0.15  0.18  0.04  0.88  0.24  
2000  10  0.32  0.26  ,  0.06  0.00  0.00  0.09  0.19  0.25  
2000  50  0.93  0.16  ,  0.01  0.01  0.05  0.04  0.60  0.35  
2000  100  0.98  0.09  ,  0.01  0.04  0.10  0.04  0.72  0.34  
2000  10  0.28  0.25  ,  0.06  0.00  0.00  0.09  0.15  0.24  
2000  50  0.90  0.18  ,  0.01  0.01  0.05  0.04  0.55  0.36  
structured  2000  100  0.97  0.11  ,  0.01  0.03  0.09  0.04  0.68  0.35 
4 Conclusion and Future Work
Taking an “anytime perspective” in TSP solver benchmarking, i.e., addressing research questions R1 and R2 stated in the introduction, results in detailed insights into solver performances together with respective approximation speed and indicates structural relationships with instance properties. Specifically, our sophisticated evolutionary approach [4] for generating instances which are very hard for EAX+r and easy for LKH+r proves to perform extremely well reflecting the huge impact of instance properties on problem hardness.
Next steps will incorporate the insights in anytime performance of TSP solvers gained from our empirical study into automated algorithm selection models (ref. to research question R3). Building on existing highperforming approaches conditioned on the necessity of solving the problem to optimality (e.g., [13]) an extension to the anytime scenario would be highly desirable. For this purpose, several ingredients need to be developed such as instance features characterizing anytime performance. Here, initial work on socalled probing features exists [14] and needs to be extended. Moreover, one of course needs an anytime performance indicator which is not straightforward to design as it has to incorporate different aspects of quality and hence is multiobjective in nature. Interesting concepts in the context of automated algorithm configuration [15] will be tested and improved upon.
Secondly, the derived insights offer very promising potential in terms of hybridization of inexact TSP solvers as performance rankings differ along the optimization runs with increasing approximation quality. Comparing, e.g., maximum first hitting times, EAX+r generates substantial improvements very fast, is then overtaken by LKH+r (or respective LKH variants) while it clearly is the singlebest solver referring to final approximation qualities. Of course, this behaviour is dependent on the instances’ structural properties such that a hybridized TSP solver variant that is capable of processing instance features could vastly improve approximation speed and final quality.
Acknowledgment
P. Kerschke and H. Trautmann acknowledge support by the European Research Center for Information Systems (ERCIS). J. Bossek was supported by the Australian Research Council (ARC) through grant DP160102401.
References
 [1] (2007) The Traveling Salesman Problem: A Computational Study. Princeton University Press. External Links: ISBN 9780691129938 Cited by: §1.
 [2] (1998) Polynomial Time Approximation Schemes for Euclidean Traveling Salesman and Other Geometric Problems. Journal of the ACM (JACM) 45 (5), pp. 753 – 782. External Links: ISSN 00045411 Cited by: §1.
 [3] (2016) ASlib: A Benchmark Library for Algorithm Selection. Artificial Intelligence Journal (AIJ) 237, pp. 41 – 58. External Links: ISSN 00043702 Cited by: item evolved.
 [4] (2019) Evolving Diverse TSP Instances by Means of Novel and Creative Mutation Operators. In Proc. of the 15 ACM/SIGEVO Workshop on Found. of Genetic Alg. (FOGA), pp. 58–71. Cited by: item tspgen, item evolved, §2.1, §3.3, §4.
 [5] (2020) A MultiObjective Perspective on Performance Assessment and Automated Selection of SingleObjective Optimization Algorithms. Applied Soft Computing (ASOC) 88, pp. 105901. Cited by: §1.
 [6] (1976) The Vehicle Routing Problem. Revue française d’automatique, d’informatique et de recherche opérationnelle (RAIRO). Recherche opérationnelle 10 (1), pp. 55 – 70. Cited by: §1.
 [7] (2012) In pursuit of the traveling salesman: mathematics at the limits of computation. Princeton University Press. External Links: ISBN 9780691152707 Cited by: §1.
 [8] (2016) COCO: A Platform for Comparing Continuous Optimizers in a BlackBox Setting. ArXiv eprints arXiv:1603.08785. Cited by: §1.
 [9] (2009) General kopt submoves for the LinKernighan TSP heuristic. Mathematical Programming Computation 1 (23), pp. 119 – 163. Cited by: §1, §2.2.
 [10] (2004) Stochastic Local Search: Foundations and Applications. Elsevier. Cited by: §1.
 [11] (2015) Algorithm Runtime Prediction: Methods & Evaluation. International Joint Conference on Artificial Intelligence, pp. 4197–4201. External Links: ISBN 9781577357384, ISSN 10450823 Cited by: §1.
 [12] (2019) Automated Algorithm Selection: Survey and Perspectives. Evolutionary Computation (ECJ) 27, pp. 3 – 45. Cited by: §1.

[13]
(2018)
Leveraging TSP Solver Complementarity through Machine Learning
. Evolutionary Computation (ECJ) 26, pp. 597 – 620. Cited by: §1, §1, §2.1, §3.3, §4.  [14] (2015) Improving the state of the art in inexact tsp solving using perinstance algorithm selection. In Learning and Intelligent Optimization, C. Dhaenens, L. Jourdan, and M. Marmion (Eds.), pp. 202–217. External Links: ISBN 9783319190846 Cited by: §1, §1, §2.1, §3.3, §4, footnote 3.
 [15] (2014) Automatically Improving the Anytime Behaviour of Optimisation Algorithms. European Journal of Operational Research (EJOR) 235 (3), pp. 569 – 582. Cited by: §4.
 [16] (2019) Rigorous Performance Analysis of StateoftheArt TSP Heuristic Solvers. In Evolutionary Computation in Combinatorial Optimization, A. Liefooghe and L. Paquete (Eds.), pp. 99 – 114. External Links: ISBN 9783030167110 Cited by: §2.1.
 [17] (2015) Evaluation of a MultiObjective EA on Benchmark Instances for Dynamic Routing of a Vehicle. In Proceedings of the 17th Genetic and Evolutionary Computation Conf. (GECCO), pp. 425 – 432. External Links: ISBN 9781450334723 Cited by: item netgen, item morphed.
 [18] (2013) A Novel FeatureBased Approach to Characterize Algorithm Performance for the Traveling Salesperson Problem. Annals of Math. and Artificial Intelligence 69 (2), pp. 151–182. External Links: ISSN 10122443 Cited by: item morphed.
 [19] (1999) Guillotine Subdivisions Approximate Polygonal Subdivisions: A Simple PolynomialTime Approximation Scheme for Geometric TSP, kMST, and Related Problems. SIAM Journal on Computing 28 (4), pp. 1298 – 1309. External Links: ISSN 00975397 Cited by: §1.
 [20] (2013) A Powerful Genetic Algorithm Using Edge Assembly Crossover for the Traveling Salesman Problem. INFORMS Journal on Computing 25 (2), pp. 346 – 363. External Links: ISBN 10919856, ISSN 10919856 Cited by: §1, §2.2.
 [21] (1991) TSPLIB – A Traveling Salesman Problem Library. ORSA Journal on Computing 3 (4), pp. 376–384. Cited by: §2.1.
 [22] (201707) Building a Better Heuristic for the Traveling Salesman Problem: Combining Edge Assembly Crossover and Partition Crossover. In Proc. of the 19th Genetic and Evolutionary Computation Conference (GECCO), pp. 329 – 336. Cited by: §1, item EAX variants:, §2.2.