 # Solving the Clustered Traveling Salesman Problem via TSP methods

The Clustered Traveling Salesman Problem (CTSP) is a variant of the popular Traveling Salesman Problem (TSP) arising from a number of real-life applications. In this work, we explore an uncharted solution approach that solves the CTSP by transforming it to the well-studied TSP. For this purpose, we first investigate a technique to convert a CTSP instance to a TSP and then apply popular TSP solvers (including exact and heuristic solvers) to solve the resulting TSP instance. We want to answer the following questions: How do state-of-the-art TSP solvers perform on clustered instances converted from the CTSP? Do state-of-the-art TSP solvers compete well with the best performing methods specifically designed for the CTSP? For this purpose, we present intensive computational experiments on various CTSP benchmark instances to draw conclusions.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The Clustered Traveling Salesman Problem (CTSP), originally proposed by Chisman , is an extension of the classic Traveling Salesman Problem (TSP) where the cities are grouped into clusters and each cluster of cities must be visited contiguously. Formally, the problem is defined on a symmetric complete weighted graph with a set of vertices and a set of edges . The vertex set is partitioned into disjoint clusters . Let be an symmetric distance matrix such that represents the travel cost between two corresponding vertices and , and satisfies the triangle inequality rule. The objective of the CTSP is to find a minimum cost Hamiltonian circuit over all the vertices, where the vertices of each cluster must be visited consecutively. Figure 1: A feasible solution for an instance of the CTSP

Fig. 1 shows a feasible solution for a CTSP instance, where the solution corresponds to a Hamiltonian cycle such that the vertices of each cluster are visited contiguously.

The CTSP can be formally modelled as the following integer programming model , where without loss of generality, the salesman is assumed to leave origin city and return to .

 minf=n∑i=1n∑j=1cijxij (1)

subject to

 n∑j=1xij=1∀i∈V (2)
 n∑i=1xij=1∀j∈V (3)
 ui−uj+(n−1)xij≤n−22≤i≠j≤n (4)
 ∑i∈Vk∑j∈Vkxij=|Vk|−1∀Vk⊂V,|Vk|≥1,k=1,2,...,m (5)
 xij∈{0,1}∀i,j∈V (6)
 ui≥02≤i≤n (7)

where if city is visited immediately after city ; otherwise.

Objective function (1) seeks to minimize the total distance traveled by the salesman. Constraints (2) and (3) ensure that each city is visited exactly once. Constraints (4) eliminate subtours, while constraints (5) guarantee that the cities of each cluster are visited contiguously. The remaining constraints are related to the decision variables.

One notices that the CTSP is equivalent to the TSP when there is a single cluster or when each cluster contains exactly one vertex. Therefore, the CTSP is NP-hard, and thus computationally challenging in the general case. From a practical perspective, the CTSP is a versatile modeling tool for several operational research applications arising in a wide variety of areas, including automated warehouse routing , emergency vehicle dispatching , production planning , disk defragmentation , and commercial transactions with supermarkets, shops and grocery suppliers . As a result, effective solution methods for the CTSP can help to solve these practical problems. In fact, the computational challenge and wide range of applications of the problem have motivated a variety of approaches that are reviewed in Section 2. However, unlike the classic TSP problem for which many powerful methods have been introduced in the past decades, studies on the CTSP are still quite limited.

In this work, we investigate a problem transformation approach mentioned in  (1975), which converts the CTSP to the TSP and assess the interest of popular modern TSP solvers for solving the converted instances. To our knowledge, this is the first large computational study testing modern TSP solvers on solving the CTSP. The work is motivated by the following considerations. First, this transformation was tested in  (1985) and  (1979). Many powerful modern TSP solvers have not been tested for solving the CTSP. Second, intensive researches on the TSP have led to the development of very powerful solvers. Thus, it is interesting to know whether we can take advantage of these solvers to effectively solve the CTSP. Third, the TSP instances converted from the CTSP are characterized by their cluster structures. These instances constitute interesting test cases for existing TSP solvers. This work aims thus to answer the following questions.

• How do state-of-the-art exact TSP solvers perform on clustered instances converted from the CTSP?

• How do state-of-the-art inexact (heuristic) TSP solvers perform on clustered instances converted from the CTSP?

• Do state-of-the-art TSP solvers compete well with the best performing methods specifically designed for the CTSP?

Answering these questions helps to enrich the state-of-the-art of solving the CTSP and gain novel knowledge on using powerful TSP methods to solve new problems.

The remainder of this paper is organized as follows. Section 2 reviews existing solution methods for the CTSP. Section 3 presents the transformation of the CTSP to the TSP and three popular TSP methods (solvers). Section 4 shows computational studies of the TSP solvers applied to the clustered instances and comparisons with existing algorithms dedicated to the CTSP. Finally, concluding remarks are provided in Section 5.

## 2 Literature review on existing solution methods

There are several dedicated solution algorithms for solving the CTSP that are based on exact, approximation, and metaheuristic approaches.

Along with the introduction of the CTSP, Chisman  proposed a branch-and-bound algorithm to solve the integer programming model presented in Section 1. Jongens and Volgenant  developed an algorithm based on the 1-tree relaxation to provide lower bounds as well as a heuristic to find satisfactory upper bounds. Mestria et al.  used the mathematical formulation of  and IBM Parallel CPLEX solver (version 11.2) to obtain lower bounds for medium CTSP instances ().

Various -approximation algorithms [5, 10, 12] have been developed for the CTSP. These approximation algorithms require either the starting and ending vertices in each cluster or a prespecified order of visiting the clusters in the tour as inputs, and solve the inter-cluster and intra-cluster problems independently. Recently, Bao and Liu  presented a new -approximation algorithm where no starting and ending vertices were specified.

Given that the CTSP is a NP-hard problem, a number of heuristic and meta-heuristic algorithms have also been investigated, which aim to provide high-quality solutions in acceptable computation time, but without provable optimal guarantee of the attained solutions. For example, Laporte et al.  presented a tabu search algorithm to solve a particular case of the CTSP, where the clusters are visited in a prespecified order. Potvin and Guertin 

developed a genetic algorithm for the CTSP that finds inter-cluster paths and then intra-cluster paths. Later, Ding et al.

 proposed a two-level genetic algorithm for the CTSP. In the first level, a genetic algorithm is used to find the shortest Hamiltonian cycle for each cluster. In the second level, a modified genetic algorithm is applied to merge the Hamiltonian cycles of all the clusters into a complete tour.

In addition to these early heuristic algorithms, Mestria et al.  investigated GRASP (Greedy Randomized Adaptive Search Procedure) with path-relinking. Among the six proposed heuristics, one heuristic corresponds to the traditional GRASP procedure whereas the other heuristics include different path relinking procedures. In , Mestria studied a hybrid heuristic, which is based on a combination of GRASP, Iterated Local Search (ILS) and Variable Neighborhood Descent (VND). Recently, Mestria  presented another complex hybrid algorithm (VNRDGILS) which mixes GRASP, ILS, and Variable Neighborhood Random Descent to explore several neighborhoods. According to the computational results reported in [23, 24, 25], these GRASP-based algorithms are among the best performing heuristics specially designed for the CTSP currently available in the literature.

In this work, we explore the uncharted problem transformation approach that converts the CTSP to the conventional TSP and employs popular (exact and inexact) TSP solvers to solve the TSP instances converted from the CTSP benchmark instances.

## 3 Solving the CTSP via TSP methods

### 3.1 Transformation of the CTSP to the TSP

The basic idea of this transformation of the CTSP to the TSP is to add a large artificial cost to all inter-cluster edges in order to force the salesman to visit all the cities within each cluster before leaving it.

Given a CTSP instance with distance matrix , we define a TSP instance with distance matrix as follow.

• Define and .

• Define the travel distance in by

 c′ij={cij+Mif i and j % belong to different clusterscijotherwise

Obviously, if the value of is sufficiently large, then the best Hamiltonian cycle in is a feasible CTSP solution in , in which the vertices of each cluster are visited contiguously.

Property. An optimal solution to the TSP instance corresponds to an optimal solution to the original CTSP instance.

Proof. Let and be the optimal solutions of the TSP instance and the original CTSP instance , respectively. Let be the number of clusters of . To minimize the total travel cost, there are only inter-cluster edges in . Therefore, is a feasible CTSP solution for and satisfies the following relation:

 f(S′)=f(S)+m×M

Obviously, corresponds to by subtracting the constant .

### 3.2 Solution methods for the TSP

There are numerous solution methods for the TSP . In this work, we adopt three very popular TSP solvers whose codes are publicly available, including one exact solver (Concorde ) and two inexact (heuristic) solvers (LHK-2  and GA-EAX ).

Notice that the TSP instance converted from a CTSP instance has a particular feature that the vertices are grouped into clusters and the distance between each pair of vertices within a same cluster is in general small, while this distance is large for two vertices from different clusters. Along with the presentation of the TSP solvers, we discuss their suitability for solving such clustered instances each time this is appropriate.

#### 3.2.1 Exact Concorde solver

Concorde is an advanced exact TSP solver for the symmetric TSP based on Branch-and-Bound and problem specific cutting plane methods 

. It makes use of a specifically designed QSopt linear programming solver. According to

, Concorde is the best performing exact algorithm for the TSP. As shown in , Concorde can solve benchmark instances from TSPLIB with up to 1000 vertices to optimality within a reasonable computation time and it also solves large TSP instances at the cost of a long computation time.

The run time behavior of Concorde has been investigated essentially on random uniform instances. For instance, in , Applegate et al. investigated the run time required by Concorde for solving random uniform instances and indicated that the run time increases as an exponential function of instance size . In , Hoos and Stützle further demonstrated that the median run time required by Concorde scales with instance size of the form () on the widely studied class of uniform random TSP instances. To our knowledge, no study has been reported concerning the behavior of Concorde on sharply clustered instances. As a result, the current study will provide useful information on this issue.

#### 3.2.2 Lin-Kernighan based heuristic solver

According to the TSP literature, a majority of the best performing TSP heuristic algorithms is based on the Lin-Kernighan (LK) heuristic  and its extensions. The LK heuristic is a variable-depth -opt local search procedure, where the -opt neighborhood is partially searched with a smart pruning strategy. LK explores the most promising neighbors within the -opt neighborhood, that is, the set of feasible tours obtained by removing edges and adding other edges such that the resulting tour is feasible. Several improved versions of the basic LK heuristic have been introduced within the iterated local search framework (e.g., [4, 13, 14, 26]).

Among these iterated LK algorithms, Helsgaun’s LKH [13, 14] is the uncontested state-of-the-art heuristic TSP solver. In , Helsgaun developed an iterated version of LK together with an efficient implementation of the LK algorithm, known as the Lin-Kernighan-Helsgaun (LKH-1) heuristic, where a 5-opt move is used as the basic move to broaden the search and an -measure method based on sensitivity analysis of minimum spanning trees is used to restrict the search to relative few of the -nearest neighbors of a vertex to speed up the search process. Later, in , Helsgaun further extended LKH-1 by developing a highly effective implementation of the -opt procedure (called LKH-2), which eliminated many of the limitations and shortcomings of LKH-1. Furthermore, LKH-2 specially extended the data structures of LKH-1 to solve very large TSP instances. The main features of LKH-2 include (1) using sequential and non-sequential -opt moves, (2) using several partitioning procedures to partition a large TSP instance into smaller subproblems, (3) using a tour merging procedure to generate a better solution from two or more local optimum solutions, and (4) applying a backbone-guided search to guide the local search to make biased local perturbations. LKH-2 is considered to be one of most effective heuristic methods for finding very high-quality solutions for various large TSP instances.

However, the LK algorithm and any LK-based algorithms are unsuitable for clustered instances of the TSP because they require much longer running times on such instances than on uniformly distributed instances

. The main reason why the LK heuristic stumbles on clustered instances is that relatively large inter-cluster edges serve as bait edges. When removing such a bait edge, the LK heuristic is tricked into long and often fruitless searches. More precisely, each time an edge bridging two clusters is removed, the cumulative gain rises enormously, and the procedure is encouraged to perform very deep searches. To alleviate the problem, a cluster compensation technique was proposed in  for the Lin-Kernighan heuristic to limit its performance degradation. In , Helsgaun showed that the LKH-2 algorithm performs significantly worse on sharply clustered instances than on uniform random instances. However, no effective method was proposed in  to remedy this difficulty.

#### 3.2.3 Edge assembly crossover based genetic algorithm

Population-based evolutionary algorithms are another well-known approach for the TSP. A popular example is the powerful genetic algorithm introduced by Nagata and Kobayashi in

. This algorithm (called GA-EAX, see Algorithm 1) is characterized by its powerful edge assembly crossover (EAX) operator introduced in  with an efficient implementation and a cost-effective selection strategy for maintaining population diversity.

The key EAX operator generates, from two high-quality tours (parents), one offspring tour by first inheriting the edges from the parents to construct disjoint subtours and then connecting the subtours with new edges in a greedy fashion (similar to building a minimal spanning tree). Let and be the parents, EAX operates as follows (see Fig. 2 for an example):

1. Generate an undirected multigraph defined as , where and are the sets of edges of parents and , respectively.

2. Extract all AB-cycles from . An AB-cycle is defined as a cycle in , such that edges of and edges of are alternately linked.

3. Construct an E-set by selecting AB-cycles according to a given selection strategy (e.g., single, k-multiple, block and block2 ), where an E-set is a set of AB-cycles.

4. Copy parent to an intermediate solution . Then, remove the edges of in the E-set from and add those of in the E-set to . This leads to an intermediate solution with one or more subtours.

5. Connect all the subtours in with new short edges to generate a complete tour (a feasible offspring solution) by using a greedy heuristic.

Note that different versions of EAX can be developed by using different selection strategies of AB-cycles for constructing E-sets. The GA-EAX algorithm employs the single and block2 strategies to generates offspring solutions from parent solutions. To maintain a healthy population diversity, GA-EAX also uses an edge entropy measure to select the solution to be used to replace a parent in the population.

Other studies (e.g., ) also indicated the usefulness of edge-assembly-like crossovers for solving clustered instances of the TSP. As shown in the next section, the EAX-based genetic algorithm performs remarkably well on all the clustered instances transformed from the CTSP.

## 4 Computational experiments

In this section, we evaluate the capacity of the TSP solvers presented in Section 3.2 to solve the CTSP via its transformation to the TSP. For this purpose, we examine their qualitative performances and run time efficiencies on various CTSP benchmark instances and make comparisons with the best dedicated CTSP algorithms in the literature.

### 4.1 Benchmark instances

Our computational assessments are based on three sets of 45 CTSP benchmark instances with 101 to 5000 vertices. Sets 1 and 2 include 20 medium instances () and 15 large instances (), which are classical and widely used in the CTSP literature (e.g., [23, 24, 25]). Set 3 is a new set of 10 very large instances with 3000 and 5000 vertices.

Sets 1 and 2 (35 instances): These instances belong to the following six types: (1) instances taken from the TSPLIB 

where the clusters are generated by using a k-means clustering algorithm; (2) instances created from a selection of classic TSP instances

, where the clusters are created by grouping the vertices in geometric centers; (3) instances generated by using the Concorde interface ; (4) instances generated using the layout proposed in ; (5) instances similar to type 2, but generated with different parameters; (6) instances adapted from the TSPLIB , where the rectangular floor plan is divided into several quadrilaterals and each quadrilateral corresponds to a cluster.

Set 3 (10 instances): These instances were created from 10 very large TSP instances  with 3000 and 5000 vertices. Following , for these instances, geometric centers are selected and the clusters are created by grouping the vertices in the geometric centers, where the coordinates of geometric centers are selected uniformly in the interval [0,1000) and is the number of clusters.

All these instances are available at https://github.com/lyldft/ctsp.

### 4.2 TSP solvers and experimental protocol

For our study, we employed three popular TSP solvers presented in Section 3.2, which are among the most powerful methods for the TSP in the literature.

• Exact Concorde TSP solver: We used version Concorde-03.12.19 and ran the solver with its default parameter setting with a cutoff time of 24 CPU hours per instance.

• Inexact LKH-2 TSP solver: LKH-2 is an iterated local search procedure and typically terminates after a fixed number of iterations (default is ). We observed that LKH-2 with this default stopping condition becomes too time consuming on our clustered instances (see discussion in Section 3.2.2). In our experiment, we used a shorter number of iterations of 0.1* and 0.2* while using the default values for the other parameters of LKH-2.

• Inexact GA-EAX TSP solver: We used GA-EAX with its default parameter setting given in : , and GA-EAX terminates if the difference between the average tour length and the shortest tour length in the population is less than .

The experiments were carried out on a computer running Windows 7 with an Intel Core i7-4790 processor (3.60 GHz and 8 GB of RAM). Given the stochastic nature of LKH-2 and GA-EAX, we ran each algorithm 10 times for each instances while the deterministic Concorde TSP solver was run one time to solve each instance.

### 4.3 Computational results and comparison of popular TSP solvers

Our computational studies aim to answer the following questions: How do state-of-the-art exact TSP solvers perform on clustered instances converted from the CTSP? How do state-of-the-art inexact (heuristic) TSP solvers perform on clustered instances converted from the CTSP?

The results of the three TSP solvers (Concorde, LKH-2, GA-EAX) on the 20 medium and 15 large CTSP benchmark instances are summarized in Tables 1 and 2. Columns 1 to 3 show the basic information of each instance: the instance name (Instance), the number of vertices () and the number of clusters (). Column 4 gives the optimal objective value reported by the exact Concorde TSP solver, followed by the required run time in seconds. For both the LKH-2 and GA-EAX solvers, we show the best (B-Err) and average (A-Err) results over 10 independent runs in the form of the percentage gap to the optimal solution, as well as the average run time in seconds. If the best solution over 10 independent runs equals the optimal solution obtained with the exact Concorde TSP solver, the corresponding cell in column B-Err shows ‘=’ along with the number of runs that succeeded in finding the optimal solution. Finally, row ‘Avg.’ provides the average run time in seconds for each approach, and the average gap between the average objective values obtained with LKH-2/GA-EAX and the optimal values obtained with the Concorde TSP solver.

From Tables 1-2, we can make the following observations.

First, the exact Concorde TSP solver performs very well for these 35 instances and is able to solve all of them exactly. Specifically, the 20 medium instances can be solved easily in a short run time (an average of about 30 seconds). The 15 large instances are more difficult and the run time needed to solve these instances increases considerably (an average of 1553 seconds, reaching 9663 seconds for the most difficult instance).

Second, the inexact LKH-2 TSP solver does not performs as well as Concorde. With the stopping condition of 0.1* iterations, LKH-2 misses respectively 2 and 8 optimal solutions for the medium and large instances with an average run time of 49.9 and 371.6 seconds. LKH-2 obtains improved results (optimal solution for one more medium instance and 3 large instances) with the relaxed condition of 0.2* iterations. However, in this case, LKH-2 requires roughly doubled its run time.

Third, the GA-EAX solver performs remarkably well by attaining the optimal values for all 35 instances. For the 20 medium instances, GA-EAX consistently hits the optimal solutions for each of its 10 run (except for one instance for which it has a hit of 9 out of 10). For the 15 large instances, except 3 cases, GA-EAX hits the optimum of each instance at least 6 times out of 10 runs. The average run time is only 5.6 seconds for the medium instances and 30.3 seconds for the large instances. Compared to the Concorde TSP solver and the LKH-2 TSP solver, the GA-EAX algorithm is thus extremely time efficient. Moreover, contrary to the Concorde and LKH-2 solvers, the computation time required by GA-EAX remains very stable across the instances of the same set, indicating a high robustness and scalability of this solver.

Table 3 presents the results of the three TSP solvers on the 10 new very large CTSP instances of Set 3. Notice that if an instance cannot be solved exactly by the Concorde TSP solver, the percentage gaps (B-Err and A-Err) are calculated using the Concorde’s best upper bound. In this case, column ‘Opt.’ corresponds to the best upper bound from Concorde, and a negative (positive) gap indicates a better (worse) result compared to this bound.

From Table 3, we can make the following observations. First, Concorde manages to optimally solve 7 out of these 10 very large instances with a run time ranging from 1100 seconds to more than 25000 seconds. For these 7 instances, LKH-2 attains the optimal solutions for 6 instances while GA-EAX reaches all optimal solutions. Second, for the three instances that cannot be solved exactly by Concorde, both LKH-2 and GA-EAX report better results than the best upper bounds of Concorde. However, LKH-2 has a worse performance both in terms of solution quality and computation time compared with GA-EAX. Third, GA-EAX has an excellent time efficiency across the instances of this set and scales very well with the increase of instance sizes. These observations are consistent with those from Tables 1-2.

To sum, the exact Concorde TSP solver is very efficient for the CTSP instances with up to 1000 vertices and becomes time consuming for larger instances. The inexact LKH-2 TSP solver has troubles to solve these clustered instances, which is consistent with previous studies such as [14, 28]. The EAX-based genetic algorithm performs remarkably well both in terms of solution quality and computational efficiency and scales well with the instance sizes.

To deepen our computational study, we call upon to the performance profile, a analytic tool for evaluating the performances of multiple compared optimization algorithms 

. The performance profile uses a cumulative distribution function for a performance metric, such as run time, objective function values, number of iterations, and so on. For a given metric, the performance profile associated to an algorithm

indicates the probability

that the algorithm attains results which are within a factor of the best result attained by all compared algorithms over a set of problem instances. A higher probability indicates a better algorithmic performance under the given metric. The value of is the probability that the algorithm will win over the rest of the compared algorithms.

To make a fair and meaningful comparison with this tool, we focus on the two inexact solvers LKH-2 and GA-EAX and run each solver 10 times on each of the 45 instances. We use the software ‘perprof-py’  to draw the performance profiles (see Figure 3) where the quality of the solution is measured by the average objective value and average run time. These performance profiles shows a clear dominance of GA-EAX over LKH-2 both in terms of solution quality and run time efficiency.

### 4.4 TSP solvers v.s. state-of-the-art CTSP heuristics

In Section 4.3, we identified GA-EAX as the most suitable method for solving clustered instances converted from the CTSP. We now answer the following question: Does GA-EAX compete well with state-of-the-art CTSP heuristics specially designed for the problem?

For this purpose, we adopt three best performing CTSP heuristics: VNRDGILS , HHGILS , and GPR1R2 . Indeed, according to the experimental studies reported in [23, 24, 25], these three heuristics perform the best among the recent CTSP heuristics available in the literature (see Table 4). This study is based on the 35 medium and large instances of Sets 1 and 2 (no results for the three CTSP heuristics are available on the 10 very large instances of Set 3).

Table 5 provides a summary of the results of the GA-EAX TSP solver along with the results reported by the three CTSP algorithms on the medium and large instances. For each instance and algorithm, columns ‘’, ‘’ and ‘’ show respectively the best objective value over 10 independent runs, the average objective value and the average run time in seconds. To determine whether there exists a statistically significant difference in performance between the GA-EAX TSP solver and each CTSP algorithm in terms of best and average results, the -values from the Wilcoxon signed-rank tests are given in the last row of the tables. Entries with “-” mean that the corresponding results are not available in the literature. The best objective values obtained by the compared algorithms are indicated in bold if they attain the optimal solution. Notice that the results of the CTSP algorithms (VNRDGILS, HHGILS and GPR1R2) correspond to 10 executions per instance on a computer with 2.83 GHz Intel Core 2 CPU and 8 GB RAM and the time limit per run was set to 720 seconds for medium instances and 1080 seconds for large instances.

From Table 5, we observe that compared to the three CTSP algorithms, the GA-EAX TSP solver attains consistently the optimal solutions for all 35 medium and large CTSP instances. However, among the three CTSP algorithms, the optimal result is obtained only for 1 instance by HHGILS. Although the experimental platforms are different, we further observe that the GA-EAX TSP solver is more than an order of magnitude faster than the CTSP algorithms while reporting much better results.

Figure 4 provides boxplot graphs to compare the distribution and range of the average results for each compared algorithm, except GPR1R2 for the medium instances since its results on several medium instances are not available. In this figure, the average objective value of a given algorithm is normalized according to the relation , where is the optimal value. The plots in Figure 4 show clear differences in the distributions of the average results between GA-EAX and each compared CTSP heuristic, which further confirms the efficiency of the GA-EAX TSP solver with respect to these dedicated CTSP heuristics.

Table 6 summarizes the statistical results for each compared algorithm on the two sets of medium and large instances. The first row indicates the number of optimal solutions found by each approach. The average percentage gap of the best/average result from the optimal result is provided in row ‘Average /’. Finally, row ‘Average time (s)’ provides the average run time in seconds for each algorithm. From Table 6, we observe that the GA-EAX solver significantly outperforms the three CTSP algorithms on the medium and large instances in terms of both the best and the average results. For the large instance set, the improvement gaps between the results of GA-EAX and those of the CTSP methods are very high, ranging from 10.39% to 15.49%. Furthermore, in terms of the average run time, GA-EAX is about 30 to 130 times faster than the CTSP algorithms. The above results thus indicate that the GA-EAX TSP solver has a strong dominance over current best performing CTSP approaches in the literature. Finally, the results of the Concorde TSP solver and the LKH-2 solver reported in Section 4.3 indicate that these TSP solvers also dominate the current best CTSP algorithms in the literature.

## 5 Conclusions

This paper presents the first large computational study on testing modern TSP solvers for solving the CTSP. According to the computational results from the exact Concorde TSP solver and the inexact LKH-2 and GA-EAX TSP solvers on two sets of medium and large CTSP benchmark instances available in the literature (with up to 2000 vertices) and a new set of very large CTSP instances (with up to 5000 vertices), we can make the following conclusions.

• The exact Concorde TSP solver can optimally solve all medium and large CTSP instances, but fails to solve three very large instances with 5000 vertices in 24 hours. Its solution time increases considerably with the instance sizes.

• Due to the clustering nature of the transformed instances, the powerful inexact LKH-2 TSP solver does not perform well. LKH-2 reports a worse performance both in terms of solution quality and computation time, compared with GA-EAX.

• The GA-EAX solver performs remarkably well both in terms of solution quality and computational efficiency, with a very high scalability. It can stably attain the optimal solutions for all medium and large CTSP instances available in the literature with a short time.

• The TSP solvers all dominate the current best performing CTSP heuristics specially designed for the problem. This is particular true for the GA-EAX solver, which is 30 to 130 times faster than the state-of-the-art CTSP heuristics to find much better results.

Finally, this study also indicates that the existing CTSP benchmark instances in the literature are not challenging for modern TSP solvers even if they remain difficult for the current CTSP algorithms.

## Acknowledgment

This work is partially supported by the National Natural Science Foundation Program of China [Grant No. 71401059, 71771099] and the China Postdoctoral Science Foundation [Grant No. 2019M662649].

## References

•  Applegate, D., Bixby, R., Chvatal, V., & Cook, W. (2001). Concorde TSP solver http://www.math.uwaterloo.ca/tsp/concorde/index.html.
•  Applegate, D., Bixby, R., & Chvatal, V. (2007). Concorde TSP Solver. William Cook, School of Industrial and Systems Engineering, Georgia Tech.
•  Applegate, D., Bixby, R., Chvatal, V., & Cook, W. (2006). The traveling salesman problem: a computational study. Princeton University Press.
•  Applegate, D., Cook, W., & Rohe, A. (2003). Chained Lin-Kernighan for large traveling salesman problems. INFORMS Journal on Computing, 15(1), 82-92.
•  Anily, S., Bramel, J., & Hertz, A. (1999). A 53-approximation algorithm for the clustered traveling salesman tour and path problems. Operations Research Letters, 24(1-2), 29-35.
•  Bao, X., & Liu, Z. (2012). An improved approximation algorithm for the clustered traveling salesman problem. Information Processing Letters, 112(23), 908-910.
•  Chisman, J. A. (1975). The clustered traveling salesman problem. Computers & Operations Research, 2(2), 115-119.
•  Ding, C., Cheng, Y., & He, M. (2007). Two-level genetic algorithm for clustered traveling salesman problem with application in large-scale TSPs. Tsinghua Science and Technology, 12(4), 459-465.
•  Dolan, E. D., & Moré, J. J. (2002). Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2), 201-213.
•  Gendreau, M., Laporte, G., & Hertz, A. (1997). An approximation algorithm for the traveling salesman problem with backhauls. Operations Research, 45(4), 639-641.
• 

Ghaziri, H., & Osman, I. H. (2003). A neural network algorithm for the traveling salesman problem with backhauls. Computers & Industrial Engineering, 44(2), 267-281.

•  Guttmann-Beck, N., Hassin, R., Khuller, S., & Raghavachari, B. (2000). Approximation algorithms with bounded performance guarantees for the clustered traveling salesman problem. Algorithmica, 28(4), 422-437.
•  Helsgaun, K. (2000). An effective implementation of the Lin-Kernighan traveling salesman heuristic. European Journal of Operational Research, 126(1), 106-130.
•  Helsgaun, K. (2009). General k-opt submoves for the Lin-Kernighan TSP heuristic. Mathematical Programming Computation, 1(2-3), 119-163.
•  Hains, D., Whitley, D., & Howe, A. (2012). Improving Lin-Kernighan-Helsgaun with crossover on clustered instances of the TSP. In International Conference on Parallel Problem Solving from Nature (pp. 388-397). Springer, Berlin, Heidelberg.
•  Hoos, H. H., & Stützle, T. (2014). On the empirical scaling of run-time for finding optimal solutions to the travelling salesman problem. European Journal of Operational Research, 238(1), 87-94.
•  Johnson, D. S., & McGeoch, L. A. (2007). Experimental analysis of heuristics for the STSP. In The Traveling Salesman Problem and its Variations (pp. 369-443). Springer, Boston, HEA.
•  Jongens, K., & Volgenant, T. (1985). The symmetric clustered traveling salesman problem. European Journal of Operational Research, 19(1), 68-75.
•  Laporte, G., & Palekar, U. (2002). Some applications of the clustered travelling salesman problem. Journal of the Operational Research Society, 53(9), 972-976.
•  Laporte, G., Potvin, J. Y., & Quilleret, F. (1997). A tabu search heuristic using genetic diversification for the clustered traveling salesman problem. Journal of Heuristics, 2(3), 187-200.
•  Lin, S., & Kernighan, B. W. (1973). An effective heuristic algorithm for the traveling-salesman problem. Operations Research, 21(2), 498-516.
•  Lokin, F. C. J. (1979). Procedures for travelling salesman problems with additional constraints. European Journal of Operational Research, 3(2), 135-141.
•  Mestria, M., Ochi, L. S., & de Lima Martins, S. (2013). GRASP with path relinking for the symmetric Euclidean clustered traveling salesman problem. Computers & Operations Research, 40(12), 3218-3229.
•  Mestria, M. (2016). A hybrid heuristic algorithm for the clustered traveling salesman problem. Pesquisa Operacional, 36(1), 113-132.
•  Mestria, M. (2018). New hybrid heuristic algorithm for the clustered traveling salesman problem. Computers & Industrial Engineering, 116, 1-12.
• 

Martin, O., Otto, S. W., & Felten, E. W. (1991). Large-step Markov chains for the traveling salesman problem. Oregon Graduate Institute of Science and Technology, Department of Computer Science and Engineering.

•  Nagata, Y.,& Kobayashi, S. (1997) Edge assembly crossover: A high-power genetic algorithm for the traveling salesman problem. Proc. 7th Internat. Conf. Genetic Algorithms (Morgan Kaufmann,San Francisco), 450-457.
•  Neto, D. (1999). Efficient Cluster Compensation For Lin-Kernighan Heuristics. PhD thesis, University of Toronto.
•  Nagata, Y., & Kobayashi, S. (2013). A powerful genetic algorithm using edge assembly crossover for the traveling salesman problem. INFORMS Journal on Computing, 25(2), 346-363.
•  Potvin, J. Y., & Guertin, F. (1996). The clustered traveling salesman problem: A genetic approach. In Meta-Heuristics (pp. 619-631). Springer, Boston, HEA.
• 

Siqueira, A. S., da Silva, R. C., & Santos, L. R. (2016). Perprof-py: A python package for performance profile of mathematical optimization software. Journal of Open Research Software, 4(1).

•  Reinelt, G. (1991). TSPLIB-A traveling salesman problem library. ORSA Journal on Computing, 3(4), 376-384.
•  Weintraub, A., Aboud, J., Fernandez, C., Laporte, G., & Ramirez, E. (1999). An emergency vehicle dispatching system for an electric utility in Chile. Journal of the Operational Research Society, 50(7), 690-696.
•  Whitley, D., Hains, D., & Howe, A. (2010). A hybrid genetic algorithm for the traveling salesman problem using generalized partition crossover. In International Conference on Parallel Problem Solving from Nature (pp. 566-575). Springer, Berlin, Heidelberg.
•  Watson, J., Ross, C., Eisele, V., Denton, J., Bins, J., Guerra, C. D. W. A. H., & Howe, A. (1998). The traveling salesman problem, edge assembly crossover, and 2-opt. In International Conference on Parallel Problem Solving from Nature (pp. 823-832). Springer, Berlin, Heidelberg.