1 Introduction
In the last few decades, Machine Learning (ML) [Bishop2006]
has progressively replaced human and expert systems to solve numerous tasks. For instance, the first algorithms in computer vision were based on handcrafted features. Nowadays, such algorithms are learned endtoend using Deep Learning (DL)
[LeCun, Bengio, and Hinton2015] and outperform all the traditional approaches. Similar examples are also present in speech recognition, machine translation, and in many other tasks [Bahdanau, Cho, and Bengio2014, Chorowski et al.2015, Silver et al.2017]. Combinatorial Optimization Problems (COP) [Parker and Rardin2014] have also been recently addressed by several ML based approaches.Traditional methods dedicated to solving COPs can be classified into two main categories. The first,
exact methods (such as integer programming or constraint programming) are based on a clever exploration of a search tree and provide the optimal solution if we allow the algorithm to run completely. The drawback is their prohibitive execution cost, which makes them unsuitable for large instances. The second, heuristics, are algorithms that are often fast to find solutions, but cannot provide any theoretical guarantees on their quality.Although DL has also been considered with exact methods [Khalil et al.2016, Cappart et al.2018, Gasse et al.2019], its most popular use in combinatorial optimization is the design of heuristics. As for more traditional tasks, the holy grail is to have a model able to learn a heuristic endtoend that solves a specific NPhard problem.
The most famous and widely studied problem in combinatorial optimization is the wellknown Travelling Salesman Problem (TSP), where even its simplest version, defined on a 2D euclidean graph, has been proven to be NPhard [Karp1972]. Despite the theoretical complexity of this problem, the Operations Research (OR) community has managed to build efficient algorithms for solving it [Applegate et al.2006]. In the last few years, the TSP has also been tackled by many DL approaches that leveraged different DL architectures and algorithms, however they remain far below the performance of the OR traditional approaches.
By definition, ML infers knowledge from data in order to be able to transfer it to unseen similar situations. The challenge in reaching stateoftheart performances seems to indicate that resorting only to learned knowledge is not enough to be able to have nearoptimal solutions for the TSP. Most ML solutions to the TSP thus also rely on a search procedure, characterized by a more costly execution time. A search procedure is the backbone of all combinatorial optimization algorithms (branchandbound, constraint programming, local search, etc.). This brings us to a fundamental question:
What is the importance of learning versus searching in MLbased approaches to combinatorial optimization?
This paper attempts to provide the first answer to this question by proposing a new evaluation metric,
ratio of optimal decisions (ROD), based on a fair comparison of learning approaches with a parametrized oracle that is able to predict the individual decisions of a COP with a certain level of prescribed accuracy. Intuitively, a model that demonstrates a similar performance as a parametrized oracle that has poor accuracy is a sign that there is still room to improve the learning phase. Conversely, if its performance equals one of a highly accurate oracle instead, the improvements on the overall methods will most likely come from a better search procedure.Based on this idea, the technical contributions of this paper are as follows: (1) a new metric, ROD, for evaluating ML approaches dedicated to solve a COP that evaluates the accuracy of the learning component in isolation from the search component; (2) the application of the metric for reevaluating the stateoftheart ML models for the TSP. The results show that even if the optimality gap is far worse than the traditional OR approaches, the performances of the learning component of published ML approaches are nevertheless equal to highly accurate oracles; (3) empirical evidence that the design of the search procedure has a tremendous impact on the performance of a ML approach; (4) the opensource release of the metric in order to help the development of future ML models for COPs.
This paper is structured as follows. The following section presents the most influential and recent developments of approaches dedicated to solve the TSP. Next, the shortcomings of the optimality gap for evaluating ML models are described. It motivates the use of our new metric, ROD, which is presented thereafter. It is this particular section that is the core contribution of the paper. Finally, experiments showing the application of the metric on recent TSP models are carried out in the last section.
2 Literature Review
The Traveling Salesman Problem (TSP) is a traditional COP that has been extensively studied in the literature. Given a weighted graph, the goal is to find the shortest possible path that visits each vertex exactly once. Finding the optimal tour is NPhard [Karp1972]. It is also true for the 2D euclidean TSP [Papadimitriou1977], that considers fully connected graphs where the edges are weighted by the euclidean distances between the vertices. In practice, traditional TSP solvers rely on handcrafted heuristics to guide the search procedure in order to find highquality solutions. Efficient approaches exist, both for exact methods and heuristics.
At the time of writing, the stateoftheart approaches for solving TSPs are as follows. For exact methods, it is the wellknown Concorde solver [Applegate et al.2006], which is able to solve and prove optimality to instances up to 109,399 nodes^{1}^{1}1http://www.math.uwaterloo.ca/tsp/uk/index.html but with the prohibitive computation time of months. On the heuristic side, the most efficient approach is a variant of the LinKernighanHelsgaun algorithm (LKH) [Lin and Kernighan1973, Helsgaun2000], that has been successively refined across the years [Helsgaun2009, Taillard and Helsgaun2019]. It is able to find solutions to instances of nodes with a duality gap of 0.584%, according to the HeldKarp lower bound [Held and Karp1970].
As machine learning has popularized, especially with the rise of deep learning [LeCun, Bengio, and Hinton2015], the TSP has been of particular interest for DL practitioners because they not only have the ambition to learn endtoend new heuristics for this problem, but they are also willing to show that ML can play an important role in solving COPs. As far as we know, the TSP is the NPhard problem that has been the most frequently considered for evaluating new ML models. It serves then as a reference, in a similar way as the MNIST dataset [LeCun et al.1998], that is still used as a baseline for evaluating classification models.
The first notable ML approach dealing with the TSP was introduced by [Hopfield and Tank1985], who solved small instances (up to 30 nodes) by the means of a Hopfieldnetwork. More recently, new approaches resurfaced with, first, [Vinyals, Fortunato, and Jaitly2015], that introduced the Pointer Network (PN) architecture, which is dedicated to output a permutation of an input sequence. In a case study, they apply the PN for solving euclidean TSP. It is done in a surpervised manner, and a beamsearch procedure is used in order to construct the final solution. Then, the PN was reused by [Bello et al.2016] who replaced the supervised training by reinforcement learning (RL) [Sutton and Barto2018] through policy gradients methods [Williams1992] and a variant of the A3C algorithm [Mnih et al.2016]. The method is then improved with two search strategies, sampling and active search. Nevertheless, [Deudon et al.2018] proposed another encoderdecoder architecture, but enriched it with an attention mechanism [Vaswani et al.2017]. It is noteworthy to mention that it is also the first approach bringing the standard 2OPT local search procedure on top of their model.
Moreover, [Khalil et al.2017] propose to leverage another DL architecture for tackling combinatorial optimization problems over graphs, such as the TSP. This architecture is called structure2vec [Dai, Dai, and Song2016] and is dedicated to embedding the vertices of a graph into features while keeping information on the structure of the graph. The problem is solved using neural fitted Qlearning [Riedmiller2005]. The TSP tours are constructed stepbystep thanks to a greedy insertion method able to place each new vertex in the locally optimal position within the partially formed tour. [Kool, van Hoof, and Welling2018] combine the ideas of a graph embedding, an encoderdecoder architecture, and a graph attention network [Veličković et al.2017] with the REINFORCE RL algorithm [Williams1992] and sampling which is then put into practice for the decoding.
Finally, [Joshi, Laurent, and Bresson2019]
came back to supervised learning, and make use of residual gated graph convolutional networks
[Bresson and Laurent2017]. Unlike the other approaches, the model does not output a valid TSP tour, but rather a probability for each edge of being part of the tour. The final TSP tour is computed afterwards using a
greedy search or a beamsearch procedure.Despite the increasing performances, no ML model competes with Concorde nor the LKH algorithm. However, they are nevertheless able to find a feasible solution far more quickly by leveraging learned knowledge. Once the model has been trained, [Kool, van Hoof, and Welling2018] reported that approximate solutions with an average optimality gap of 4.53% can be found in 6 seconds for a dataset of 10000 euclidean TSPs of 100 nodes. For comparison, Concorde was able to solve the same instances at optimality in three minutes. By integrating a sampling selection, [Kool, van Hoof, and Welling2018] could reduce the optimality gap at 2.26% but at the expense of an execution time of one hour.
Interestingly, these recent ML approaches make use of various construction techniques (beamsearch, active search, sampling, 2OPT, etc.) but without discussing how they balanced the tradeoff between performance and execution time. While spending more time in the search procedure will improve the performance, it will also increase the execution time. Finding an appropriate balance between both is only mentioned by [Joshi, Laurent, and Bresson2019] who identify it as a future challenge. Moreover, none of these papers discuss the learning accuracy of their learning component.
That being said, the development of ML approaches for the TSP has put a strong focus on improving the learning phase. The search phase was only done by simple methods, although there exists a myriad of more refined search procedures in the literature, such as simulated annealing [Kirkpatrick, Gelatt, and Vecchi1983], tabu search [Glover and Laguna1998], large neighbourhood search [Pisinger and Ropke2010], and many others [Aarts and Lenstra2003].
Based on these observations, our motivation is to provide more tools in order to help researchers make more informed choices when designing ML approaches for tackling COPs, in general. This paper answers this question by proposing a new metric for evaluating the pure learning component of ML models. This metric is complementary to the optimality gap, which, while commonly used, also suffers from some shortcomings.
3 Shortcomings of the Optimality Gap Metric
The optimality gap, defined as the relative distance between an approximate solution and the optimal solution, is a standard and widelyused metric for evaluating approaches that solve COPs. Its main advantage is its simplicity, combined with its practical use (the performance of a model applied on a specific problem is summarized into a single value). It gives a good sense of how far we are from the optimal solution and how efficient the model is for finding the best solution. When the problem is still open and the optimal solution has not yet been proven, a relative gap can be computed using the best known solution as a baseline or using a dual bound, such as the linear relaxation or the HelpKarp lower bound for the TSP. The focus of such metrics is put on the quality of the solution where a solution is abstracted through a sequence of decisions.
Generally speaking, the performance of a ML model is evaluated through a comparison with a known ground truth, often obtained beforehand by human experts. For classification and regression tasks, this ground truth is the labelled test set that we want to access. For more complex tasks, the evaluation is done by metrics comparing the model output with references produced by human experts. For instance, the ROUGE
metric is often used for evaluating text summarization tasks
[Lin2004], the BLEU metric for translation [Papineni et al.2002], or the ELO rating for reinforcement learning agents playing games [Silver et al.2017]. However, only the optimality gap has been used so far as the main evaluating metric for ML approaches tackling the TSP. Although the optimality gap also provides a comparison with a ground truth (i.e., the algorithm providing the optimal solution), we argue that this metric is not sufficient to evaluate the performance of ML approaches to COPs as it measures both the learning and search components together. If poor results are achieved, one does not really know whether the issues come from an insufficient learning ability or a weak search mechanism.This problem is illustrated in Fig. 1. The left figure presents the optimal solution of a TSP (non labelled edges have a weight of 1). On the right, a suboptimal solution is proposed. Although only one nonoptimal decision has been done (the second misplaced edge being forced to complete the tour), the optimality gap is huge. When considering only the optimality gap, one might believe that a new DL architecture could help. However, the quasitotality of edges were guessed correctly and a simple search heuristic would have fixed the problem. For this reason, we advocate the use of a second metric for evaluating ML models dedicated to solve COPs. The idea is to have a metric able to compare the accuracy of the learning component against a ground truth, independently of the search component.
4 ROD: Evaluating only the learning component
This section describes ratio of optimal decisions (ROD), a new metric we introduce for evaluating the learning component of ML approaches dedicated to solve COPs. Its focus is not on the quality of the solution, such as the optimality gap, but rather on the quality of the individual decisions, which better reflects how good a model is performing against a ground truth.
When solving a COP, one has to assign a specific value to a set of variables in order to find a feasible assignment that minimizes (resp. maximizes) an objective function. A simple and general way to model such a problem is to use a Dynamic Programming (DP) formulation. The idea is to simplify the problem by breaking it down into a sequence of decision steps. At each step, a new variable is selected and assigned a value until all the variables have been set. A cost is induced after each assignment and the total cost, when all decisions are taken, corresponds to the outcome of the objective function. A fundamental property of DP is the socalled principle of optimality, introduced by [Bellman1966]: a sequence of optimal decisions done at each step gives the optimal solution of the complete problem.
In the ML terminology, an optimal oracle can be considered as a model having a perfect knowledge, which means that it never takes suboptimal decisions. Let us assume that we have a parametrized oracle that is able to take each optimal decision with a certain accuracy. Based on this idea, the ROD metric we introduce is defined as follows.
Definition 1 (Ratio of optimal decisions (ROD))
Let be a COP, be a model dedicated to solve , and be a parametrized oracle that is able to take each optimal decision with a certain accuracy. The ROD of regarding is defined as the ratio of optimal decisions that is required by in order to equal the optimality gap of on .
Example 1
Let us consider the TSP of Fig. 1 that contains 13 decisions, and let us assume that the solution of Fig. (b)b has been obtained using a parametrized oracle . Models having an optimality gap of 27.3% will have a ROD of regarding because they will have the same performance as a parametrized oracle that has made 12 optimal decisions among 13.
The goal of ROD is to measure the breakeven ratio of a parametrized oracle, where its performance equals the one of the model we want to evaluate. This ratio will indicate that the model has similar performances as a function of having a parametrized level of knowledge. By doing so, the model is directly compared with a ground truth only, without integrating the difficulty of the problem. Indeed, if both of them make the same bad decision, the optimality gap increase will remain the same for both.
The last question remaining is how ROD can be computed. This question is tightly related to the construction of the parametrized oracle. Two design choices are required for that: (1) given a ratio, how do we select the decisions that will be optimal; and (2) when a nonoptimal decision is made, which one must be selected? These questions are discussed in the next section.
5 Construction of the Parametrized Oracle
The formulation proposed is generic, in the sense that it can be used for any COP. Let us consider a COP where is the set of variables, the set of domains restricting the values that variables can take, the set of constraints and the objective function.
The first step is to design a DP model associated to the COP. It requires adequately defining the tuple where is the set of all the possible states that can be generated, is the set of possible actions, is the transition function leading the system to move from a state to another one given the action taken and is the cost function returning the cost of every action taken in a given state. The DP model of the parametrized oracle is defined as follows.
 State

Let be the number of variables and be an arbitrarily ordered sequence of these variables. A state corresponds to the first nonassigned variable , with , or if all of them are assigned. The initial state is then and a state is terminal when .
 Action

An action done at state is defined as the selection of a value for its assignment to the variable referenced by . The action is valid if and only if (1) it is in the domain of the variable (), and (2) its assignment does not violate any constraint .
 Transition

The transition function is defined as , where is a function assigning the value to the variable and returning the next nonassigned variable from the sequence .
 Cost Function

The cost function corresponds to the increase on the objective cost that is caused by the assignment of the value to the variable associated to .
Once defined, the DP model has to be solved by means of a policy. For each state, we must decide what action to perform. To do so, we make use of the function , which corresponds to the decision suggested by the parametrized oracle introduced in the previous section.
This function predicts the optimal action for a state with a probability , otherwise it selects a relatively good action by sampling it randomly from the set of the remaining actions. The sampling is done using a weighted distribution that favors actions generating low cost increases. By doing so, the parametrized oracle’s behaviour will be more similar to ML models. If a suboptimal decision is suggested, it is more likely that the model will choose a good action instead of a poor one. This weighted distribution is defined as follows.
(1) 
The negative exponent is for weighting the actions by their inverse cost. Then, is defined in Eq. (2), where denotes a uniform sample from a set, and a sample following the distribution of Eq. (1).
(2) 
Finally, the complete oracle is a function taking as input a COP and a policy parametrized by and which outputs the cost of a solution of according to . The ROD of a ML model we want to evaluate on , corresponds to the probability yielding an oracle having the same optimality gap as , on average. In practice, one can also compute ROD of a model evaluated on many instances of the same COP.
An important note is that the parametrized oracle does not resort to any kind of search procedure: no lookaheads are allowed and decisions cannot be undone. This design choice is made in order to have a fair comparison with a pure ML model that does not use local search corrections.
Using the parametrized oracle, ROD is computed by increasing stepbystep until and returns the same optimality gap for a data set of instances we want to evaluate. The computation of ROD is shown in Alg. 1. As long as the current oracle has a lower performance than , the ratio is increased, otherwise it is returned and it corresponds to the ROD value. The exact optimality gap can be obtained using the perfect oracle () for computing the optimal solution of each instance.
6 Case Study: the Travelling Salesman Problem
Two sets of experiments are carried out. First, we use ROD to reevaluate stateoftheart ML models for solving TSP on 2D euclidean graphs. Then, we propose and analyze several standard search procedures that can be used in order to improve the performances for each model.
6.1 Problem Definition
Definition 2 (Travelling Salesman Problem (TSP))
Let be a simple weighted graph of vertices. A tour of is a permutation of the vertices such that it is possible to go through every vertex () exactly once by following the edges () and come back to the initial vertex. The cost associated with a tour corresponds to the sum of the weights of edges that are followed. The TSP consists in finding a tour with minimal cost.
With the 2D euclidean variant, the graph is fully connected and the edges are weighted by the euclidean distances between the vertices. The TSP can be formalized as a COP , as follows. Each variable corresponds to a vertex of the graph and indicates what will be the next vertex to visit after leaving . We then have . The only constraint is that the final tour must be a permutation (i.e. a circuit) of the set of vertices. In the literature, it is often expressed through the circuit constraint: [Beldiceanu and Contejean1994]. Finally, the objective function is to minimize the cost of the tour. As described in the previous section, a COP has an associated DP formulation , which corresponds to the parameterized oracle.
6.2 ML Models and Search Procedures Considered
The models we selected are summarized in Table 1. While all of them are dedicated to the TSP or close variants, they differ by their neural architecture and the learning algorithm considered. Graph Convolutional Networks (GCN) [Dai, Dai, and Song2016, Bresson and Laurent2017] and Graph Attention Networks (GAT) [Veličković et al.2017] are two standard and popular architectures when dealing with problems having a graph structure. In regards to the training, the trend is more on RL approaches that either use Deep Qlearning (DQN) or policy gradient methods like REINFORCE; although supervised training is also considered.
Different search procedures have been also considered for improving the solution obtained by the model. Many neural networks output probability distributions that give insight on what edges should be used to construct an optimal solution.
Greedy decoding consists in taking the edges having the highest probability, whereas sampling consists in selecting some of them randomly, based on the distribution that has been inferred. The best solution found is then returned. Beamsearch is a different construction approach where, at each step, we keep only the best partial solutions, and where is a parameter referred to as the the beamwidth. This procedure has been improved by [Joshi, Laurent, and Bresson2019] and uses a shortest path heuristic for closing the TSP tour.On top of that, one can also integrate perturbative search heuristics that modify a complete solution in order to improve it. Famous examples are the 2OPT, 3OPT and LinKernighan heuristic (LK) [Lin and Kernighan1973] that swap edges of the current solution to reduce the tour cost. Perturbative search procedures have been less considered in the previous ML models, with the exception of [Deudon et al.2018] that use the 2OPT heuristic.
Approach  Model  Learning  Search 
Khalil et al. 2017  GCN  DQN  Greedy 
Deudon et al. 2018  GAT  REINFORCE  Sampling 
2OPT  
Kool et al. 2018  GAT  REINFORCE  Greedy 
Sampling  
Joshi et al. 2019  GCN  Supervised  Greedy 
Beamsearch  
Shortest tour 
6.3 Experimental Protocol
In order to analyze the models of Table 1,
we resort to the source code the authors provided with the related paper
^{2}^{2}2https://github.com/HanjunDai/graph_comb_opt
https://github.com/MichelDeudon/encodeattendnavigate
https://github.com/wouterkool/attentionlearntoroute
https://github.com/chaitjo/graphconvnettsp.
For [Kool, van Hoof, and
Welling2018, Joshi, Laurent, and
Bresson2019], the models had already been trained by the authors, so we used them as is. Otherwise, we retrain them using the procedure described.
At first, as we are interested in analyzing the quality of the learned model, we modified the codes in order to only allow a greedy decoding.
For the evaluation, we implemented the ROD construction as presented in Alg. 1. Two test sets of 1000 random 2D euclidean graphs, with 50 and 100 vertices, are considered. The instances are generated by uniformly sampling the vertices on the unit 2D square. All the vertices are then connected using the euclidean norm for weighting the edges. The optimal tours of the instances are computed using the Concorde solver [Applegate et al.2006]. Concerning the search procedures, we reused an opensource implementation of the 2OPT, 3OPT, and LK heuristic^{3}^{3}3https://gitlab.com/Soha/localtsp and the search mechanisms already integrated in the ML models as well.
For the reproducibility of results and to further help research in this field, the implementation of ROD is available online^{4}^{4}4https://github.com/qcappart/ROD_oracle. Finally, for the neural network computations, all the experiments have been carried out on a single Tesla V100 PCIe 32GB GPU, while the rest of the operations were done on Intel Xeon Silver 4116 CPUs.
6.4 Application of ROD
Evaluation on TSPs of 50 vertices  Evaluation on TSPs of 100 vertices  
Number of  optimality gap (%)  ROD (%)  optimality gap (%)  ROD (%)  
vertices for training  20  50  100  20  50  100  20  50  100  20  50  100 
Khalil et al. (2016)  11.01  9.29  8.74  96.4  97.1  97.3  10.66  10.27  10.74  97.9  98.0  97.9 
Deudon et al. (2018)  9.18  4.87  7.46  97.1  98.3  97.6  23.03  9.03  8.38  95.4  98.3  98.4 
Kool et al. (2018)  4.35  1.69  4.22  98.5  99.5  98.6  16.11  4.89  4.35  96.8  99.1  99.2 
Joshi et al. (2019)  42.58  4.41  38.85  85.0  98.6  86.5  70.64  52.92  8.61  83.9  88.2  98.3 
Concorde Solver  0  0  0        0  0  0       
Table 2 reports the optimality gap and ROD on the two test sets for the learning component of the ML models previously described. Three configurations for the training are considered (instances of 20, 50, and 100 vertices). First of all, we notice that the optimality gap obtained is consistent with the results published by the authors. When considering only the learning component of the models, [Kool, van Hoof, and Welling2018] is the most efficient. It is also noteworthy to mention that ROD metric is consistent with the optimality gap. An increase of the former is characterised by a decrease of the latter. Standard analysis that were done using the optimality gap as a baseline can still be performed with ROD. For instance, the approach of [Joshi, Laurent, and Bresson2019] seems to suffer from overfitting: the performances decrease drastically when the training and the evaluation are done with instances of a different size.
In any case, ROD provides new information that remained hidden with the optimality gap metric. First, we observe that in all the situations the optimality gap is far from what is achieved by Concorde, which provides optimal solutions. Initially, one could think that there is still room for improving the learning component of these models. However, ROD indicates that most of the models already achieve highquality performances by equaling parametrized oracles taking, on average, more than 95% of the optimal decisions during the construction of the solution. As a result of these high performances, designing new competitive learning components would not be a trivial task, and improvements may come from a search procedure instead.
Following the same idea, we can also notice that ROD remains stable when evaluation is done on instances on 100 vertices instead of 50. For [Kool, van Hoof, and Welling2018], the difference is less than 2% in the worst case, which indicates a relatively good generalization. Contrarily, the related optimality gap, when trained on 20 nodes, have an increase exceeding 10%. This observation gives another indication that the optimality gap is not suited for evaluating the learning ability of ML models for COPs.
6.5 Impact of the Search Procedures
The previous set of experiments showed that further improvements may not come from a better learning ability but, rather, from a search procedure instead. The goal of this second set of experiments is to analyse the impact of different standard search procedures performed during the evaluation. To do so, the greedy construction, sampling and beamsearch are considered for constructing the first solution from the learned model, and 2OPT, 3OPT, and the LinKernighan heuristic (LK) are added to improve it. The impact of these search procedures is analyzed in Table 3. The optimality gap of the ML models from Table 1, with the improvements done by the search procedures, are reported. The best results among the ML models are highlighted. The training and the evaluation are done on TSPs of 100 vertices and the test set contains 1000 instances. 16 iterations are considered when sampling, as well as a beamwidth of 16 for the beamsearch (BS). For the special case of [Joshi, Laurent, and Bresson2019], their shortestpath heuristic (BS*) is also analyzed. The optimality gap, the impact of the search procedure on the optimality gap (: the reduction of the optimality gap from the situation without the search procedure), and the average execution time are reported. As a baseline, the performances of the nearest neighbour (NN) heuristic, that selects the closest available city in the current TSP tour, is also reported.
At first glance, we can observe that all the methods benefit from a search procedure, both when constructing the solution (greedy, sampling, and beamsearch) and when improving it (2OPT, 3OPT, and LK). The approaches of [Khalil et al.2017] and [Joshi, Laurent, and Bresson2019] seem to be the ones that benefit the most from a search procedure ( value). For instance, the optimality gap of the greedy variant of [Joshi, Laurent, and Bresson2019] can be reduced from 8.61% to 0.24% by adding beam search and LK. However, resorting to more elaborate search procedures can only be done at the expense of a more prohibitive execution time. On this last example, the execution time is roughly 18 times greater (from 91ms to 1.7s). There is, then, a tradeoff to define in calculating the difference in performance and execution time when resorting to a search procedure. In practice, one can also run a search procedure until the available time has been exceeded and then return the best solution found.
Interestingly, even if the approach of [Kool, van Hoof, and Welling2018] had the best performance when no search procedure is considered, it is then beaten by [Joshi, Laurent, and Bresson2019] when search is integrated. It empirically shows that the models do not behave in the same way with a similar search procedure. Finally, it is noteworthy to mention, only by plugging ML models with standard search procedures from the literature, we can achieve stateoftheart results by reducing the optimality gap from to when a beamwidth of 16 and LK heuristic are considered with the approach of [Joshi, Laurent, and Bresson2019]. This advocates the use of hybrid methods, combining learning and searching, when dealing with NPhard problems.
Algorithm  Construction  Without local search  2OPT  3OPT  LK  
gap (%)  time (s)  (%)  gap (%)  time (s)  (%)  gap (%)  time (s)  (%)  gap (%)  time (s)  
NN    14.82  0.035  9.30  5.52  0.063  12.39  2.43  83.0  12.97  1.85  29.4 
Khalil et al.  Greedy  10.74  0.075  2.74  8.00  0.093  7.86  2.88  90.2  9.43  1.31  23.8 
Deudon et al.  Greedy  8.38  0.036  2.78  5.60  0.840  4.98  3.40  70.3  6.32  2.06  21.4 
S (16 it.)  7.48  0.046  3.22  4.26  1.196  4.41  3.07  79.1  5.40  2.08  29.0  
Kool et al.  Greedy  4.35  0.023  0.98  3.37  0.046  2.31  2.04  38.6  2.89  1.46  2.5 
S (16 it.)  3.18  0.205  0.68  2.50  0.167  1.58  1.60  32.9  1.93  1.25  33.1  
Joshi et al.  Greedy  8.61  0.091  6.87  1.74  0.063  4.98  3.63  25.6  8.24  0.37  2.2 
BS ()  7.05  0.095  5.80  1.25  0.058  4.38  2.67  21.8  6.81  0.24  1.7  
BS* ()  6.35  0.151  5.15  1.20  0.068  4.57  1.78  21.4  6.09  0.26  1.8 
7 Conclusion
A search procedure is the backbone of traditional algorithms dealing with Combinatorial Optimization Problems (COP). More recently a new paradigm, based on learning, has been considered for solving COPs. Despite the apparently good results of learning approaches, the performances are still far from those that can be achieved when using algorithms based on a specialized search procedure. This observation indicates that resorting only to learned knowledge may not be enough to be able to have nearoptimal solutions when solving COPs. For this reason, stateoftheart ML approaches also rely on a search procedure, characterized by a more costly execution time. Which brings us back to the question: What is the importance of learning versus searching when designing ML approaches for solving COPs?
The goal of this paper was to provide the first answers to this question. To do so, we target the wellknown Travelling Salesman Problem, that is, to the best of our knowledge, the NPhard problem that has been the most studied by learning approaches for COPs.
Firstly, we introduced a new evaluation metric, ROD, that can be used for evaluating the learning component of any ML approach dedicated to solve COPs. We applied this metric on the four stateoftheart ML models from the literature and the results showed that, even if the models do no compete with traditional approaches, the learning ability of the models are nevertheless of high quality. This result suggests that further improvements may come from the search procedure, and not from the learning component itself. Future works on the TSP should then be dedicated to improving the search procedure. In order to ease such new developments, we made ROD opensource.
Secondly, we experimentally showed that the four ML approaches that we tested benefited from a search procedure. Only by combining ML approaches with standard search procedures, the optimality gap obtained by [Joshi, Laurent, and Bresson2019] could be reduced from 7.05% to 0.24% for TSPs of 100 vertices. However, it is done at the expense of a higher execution time. Finding the most adapted search procedures for a ML model is still an open question.
So far, we used ROD only on the TSP. In our future work, we plan to consider it for evaluating ML models dedicated to solve other COPs, such as the Knapsack [Bello et al.2016] or the Maximum Independent Set Problem [Khalil et al.2017]. Moreover, we also plan to consider more elaborate search procedures (simulated annealing, tabu search, large neighbourhood search, etc.) for the evaluation.
References
 [Aarts and Lenstra2003] Aarts, E., and Lenstra, J. K. 2003. Local search in combinatorial optimization. Princeton University Press.
 [Applegate et al.2006] Applegate, D. L.; Bixby, R. E.; Chvatal, V.; and Cook, W. J. 2006. The traveling salesman problem: a computational study. Princeton university press.
 [Bahdanau, Cho, and Bengio2014] Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
 [Beldiceanu and Contejean1994] Beldiceanu, N., and Contejean, E. 1994. Introducing global constraints in chip. Mathematical and computer Modelling 20(12):97–123.
 [Bellman1966] Bellman, R. 1966. Dynamic programming. Science 153(3731):34–37.
 [Bello et al.2016] Bello, I.; Pham, H.; Le, Q. V.; Norouzi, M.; and Bengio, S. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940.
 [Bishop2006] Bishop, C. M. 2006. Pattern recognition and machine learning. springer.
 [Bresson and Laurent2017] Bresson, X., and Laurent, T. 2017. Residual gated graph convnets. arXiv preprint arXiv:1711.07553.
 [Cappart et al.2018] Cappart, Q.; Goutierre, E.; Bergman, D.; and Rousseau, L.M. 2018. Improving optimization bounds using machine learning: decision diagrams meet deep reinforcement learning. arXiv preprint arXiv:1809.03359.
 [Chorowski et al.2015] Chorowski, J. K.; Bahdanau, D.; Serdyuk, D.; Cho, K.; and Bengio, Y. 2015. Attentionbased models for speech recognition. In Advances in neural information processing systems, 577–585.
 [Dai, Dai, and Song2016] Dai, H.; Dai, B.; and Song, L. 2016. Discriminative embeddings of latent variable models for structured data. In International conference on machine learning, 2702–2711.

[Deudon et al.2018]
Deudon, M.; Cournut, P.; Lacoste, A.; Adulyasak, Y.; and Rousseau, L.M.
2018.
Learning heuristics for the tsp by policy gradient.
In
International Conference on the Integration of Constraint Programming, Artificial Intelligence, and Operations Research
, 170–181. Springer.  [Gasse et al.2019] Gasse, M.; Chételat, D.; Ferroni, N.; Charlin, L.; and Lodi, A. 2019. Exact combinatorial optimization with graph convolutional neural networks. arXiv preprint arXiv:1906.01629.
 [Glover and Laguna1998] Glover, F., and Laguna, M. 1998. Tabu search. In Handbook of combinatorial optimization. Springer. 2093–2229.
 [Held and Karp1970] Held, M., and Karp, R. M. 1970. The travelingsalesman problem and minimum spanning trees. Operations Research 18(6):1138–1162.
 [Helsgaun2000] Helsgaun, K. 2000. An effective implementation of the lin–kernighan traveling salesman heuristic. European Journal of Operational Research 126(1):106–130.
 [Helsgaun2009] Helsgaun, K. 2009. General kopt submoves for the lin–kernighan tsp heuristic. Mathematical Programming Computation 1(23):119–163.
 [Hopfield and Tank1985] Hopfield, J. J., and Tank, D. W. 1985. “neural” computation of decisions in optimization problems. Biological cybernetics 52(3):141–152.
 [Joshi, Laurent, and Bresson2019] Joshi, C. K.; Laurent, T.; and Bresson, X. 2019. An efficient graph convolutional network technique for the travelling salesman problem. arXiv preprint arXiv:1906.01227.
 [Karp1972] Karp, R. M. 1972. Reducibility among combinatorial problems. In Complexity of computer computations. Springer. 85–103.
 [Khalil et al.2016] Khalil, E. B.; Le Bodic, P.; Song, L.; Nemhauser, G.; and Dilkina, B. 2016. Learning to branch in mixed integer programming. In Thirtieth AAAI Conference on Artificial Intelligence.
 [Khalil et al.2017] Khalil, E.; Dai, H.; Zhang, Y.; Dilkina, B.; and Song, L. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, 6348–6358.
 [Kirkpatrick, Gelatt, and Vecchi1983] Kirkpatrick, S.; Gelatt, C. D.; and Vecchi, M. P. 1983. Optimization by simulated annealing. science 220(4598):671–680.
 [Kool, van Hoof, and Welling2018] Kool, W.; van Hoof, H.; and Welling, M. 2018. Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475.
 [LeCun, Bengio, and Hinton2015] LeCun, Y.; Bengio, Y.; and Hinton, G. 2015. Deep learning. nature 521(7553):436.
 [LeCun et al.1998] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P.; et al. 1998. Gradientbased learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324.
 [Lin and Kernighan1973] Lin, S., and Kernighan, B. W. 1973. An effective heuristic algorithm for the travelingsalesman problem. Operations research 21(2):498–516.
 [Lin2004] Lin, C.Y. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, 74–81.
 [Mnih et al.2016] Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, 1928–1937.
 [Papadimitriou1977] Papadimitriou, C. H. 1977. The euclidean travelling salesman problem is npcomplete. Theoretical computer science 4(3):237–244.
 [Papineni et al.2002] Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.J. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, 311–318. Association for Computational Linguistics.
 [Parker and Rardin2014] Parker, R. G., and Rardin, R. L. 2014. Discrete optimization. Elsevier.
 [Pisinger and Ropke2010] Pisinger, D., and Ropke, S. 2010. Large neighborhood search. In Handbook of metaheuristics. Springer. 399–419.
 [Riedmiller2005] Riedmiller, M. 2005. Neural fitted q iteration–first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning, 317–328. Springer.
 [Silver et al.2017] Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. 2017. Mastering the game of go without human knowledge. Nature 550(7676):354.
 [Sutton and Barto2018] Sutton, R. S., and Barto, A. G. 2018. Reinforcement learning: An introduction. MIT press.
 [Taillard and Helsgaun2019] Taillard, É. D., and Helsgaun, K. 2019. Popmusic for the travelling salesman problem. European Journal of Operational Research 272(2):420–429.
 [Vaswani et al.2017] Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A. N.; Kaiser, Ł.; and Polosukhin, I. 2017. Attention is all you need. In Advances in neural information processing systems, 5998–6008.
 [Veličković et al.2017] Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; and Bengio, Y. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903.
 [Vinyals, Fortunato, and Jaitly2015] Vinyals, O.; Fortunato, M.; and Jaitly, N. 2015. Pointer networks. In Advances in Neural Information Processing Systems, 2692–2700.
 [Williams1992] Williams, R. J. 1992. Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine learning 8(34):229–256.
Comments
There are no comments yet.