I Introduction
In the past decades, search methods have become major approaches for tackling various computationally hard problems. Most, if not all, established search methods, from specialized heuristic algorithms tailored for a particular problem class, e.g., the Lin–Kernighan (LK) heuristic for the Traveling Salesman Problem (TSP) [20]
, to general algorithmic frameworks, e.g., Evolutionary Algorithms, share a common feature. That is, they are parameterized algorithms, which means they involve parameters that need to be configured by users before the algorithm is applied to a problem.
Although theoretical analysis for many parameterized algorithms have offered worst or average bounds on their performance, their actual performance in practice is in many cases highly sensitive to the settings of parameters [7, 14, 17, 10]. More importantly, finding the optimal configuration, i.e., parameter setting, requires knowledge of both the algorithm and the problem to solve, which cannot be done manually with ease. Hence, a lot of efforts have been made to develop parameter tuning, also referred to as algorithm configuration, methods [13, 14, 2, 1, 25, 6, 10] to automatically identify the most appropriate parameter setting for an algorithm. These methods essentially involve a highlevel iterative generateandtest process. To be specific, given a set of instances from the target problem class, different configurations are iteratively generated and tested on the instance set. Upon termination, the process outputs the configuration that performs the best on the instance set.
From the practical point of view, a parameter tuning method is expected to identify configuration that generalizes well, i.e., performs well not only on the instance set used during the tuning phase, but also on unseen instances of the same problem class. The reason is that intelligent systems, which incorporate parameterized search algorithms as a module, are seldom built to address a few specific problem instances, but for a whole target problem class, and it is unlikely to know in advance the exact problem instances that a system will encounter in practice. The need for generalization requires the instance set used for parameter tuning to be sufficiently large such that it consists of good representatives of all instances of the target problem class. Unfortunately, parameter tuning in a realworld scenario is more likely than not to face the smallsamplesize challenge. In other words, the available instance set is not only of small size, but also may not well represent the target problem class. For example, the widely studied TSP benchmark suites (i.e., TSPlib [30]) consists of a few hundred TSP instances, while there could be millions of possibilities for concrete TSP instances even if only considering a fixed number of cities. In consequence, the more powerful of a parameter tuning method, the higher risk that the obtained configuration will overfit the instances involved in the tuning process.
This paper suggests that the pursuit of generalizable configurations for parameterized search algorithms could be modelled as a coevolutionary system, in which two internal populations, representing the configurations and the problem instances, respectively, compete with each other during the evolution course. The evolution of the latter promotes exploration in the instance space of the target problem class to generate synthetic instances that exploit the weakness of the former. The former, on the other hand, improves itself by identifying configurations that could better handle the former. In this way, the configuration population is encouraged to evolve towards achieving good performance on as many instances of the target problem class as possible, i.e., towards better generalization. Specifically, contributions of this paper include:

A novel parameter tuning framework, namely CoEvolution of Parameterized Search (CEPS), is proposed. It is also shown that CEPS is a process that minimizes the upper bound, i.e., a tractable surrogate, of the generalization performance.

To demonstrate the implementation details of CEPS as well as to assess its potential, concrete instantiations are also presented for two hard optimization problems, i.e., TSP and the Vehicle Routing Problem with Simultaneous Pickup–Delivery and Time Windows (VRPSPDTW) [36]. Computational studies confirm that CEPS is able to obtain configurations with better generalization performance.

The proposal of CEPS extends the realm of CoEvolution, for the first time, to evolving algorithm configurations and problem instances. Since CEPS does not invoke domainspecific knowledge, its potential applications can go beyond optimization problems, even to planning and learning problems.
The rest of the paper is organized as follows. In Section II, we introduce the challenge of seeking generalizable configurations for a parameterized search algorithm, as well as the theoretical insight behind CEPS. Section III firstly presents the CEPS framework, followed by its instantiations for TSP and VRPSPDTW. Computational studies on these two problems are presented in Section IV. Finally, Section V concludes the paper with discussions.
Ii Parameterized Solvers Made Generalizable
Iia Notations and Problem Description
Assume a solver is to be built for a problem class (e.g., TSP), for which an instance of the problem class is denoted as , and the set of all possible is denoted as
. Generally speaking, the solver can be any concrete computational process, e.g., a traditional heuristic search process such as the LK Heuristic for TSP or even a neural network
[27, 5, 18], that outputs a solution for a given instance of the target problem class. Given a parameterized algorithm for which the parameter(s) needs to be assigned with some values. A solver is a specification of the algorithm with a concrete value assignment of . For the sake of brevity, we refer each concrete value assignment as a configuration of the algorithm in this paper. Since a configuration fully defines a solver, to seek a good (say generalizable) solver, is equivalent to seek a good configuration. Hence, the quality of a configuration on a given instance is directly denoted as , which is a performance indicator of the corresponding solver on a given instance . This indicator could concern many aspects, e.g., the quality of the obtained , the CPU time required to achieve a solution above a given quality threshold, or even be stated in a multiobjective form.Following the above definitions, optimizing the generalization performance of a solver can be stated as (assuming the smaller the better):
(1) 
where stands for the prior distribution of
. Since in practice the prior distribution is usually unknown, a uniform distribution can be assumed without loss of generality. Eq. (
1) can be then simplified to Eq. (2) by omitting a normalization constant:(2) 
The challenge with Eqs. (1) and (2) is that in practice they cannot be directly optimized since the set is, more often than not, unavailable. Instead, only a set of “training” instances, i.e., a subset , is given for the purpose of tuning . In fact, it has been observed that the size of the training instance set is rather limited [15, 33, 22], leading to the smallsamplesize phenomenon. Even worse, given a collected from the real world, it is nontrivial (if not impossible) to know how to verify whether it is a good representative of . In case the training instance set is too small, or is not a good representative of the whole problem class, the best configuration obtained with it would fail to generalize.
IiB Enhancing Generalization with Synthetic Instances
A natural idea to tackle the smallsamplesize challenge is to augment with a set of synthetic instances, say , such that the configuration obtained with would generalize better than that obtained with . This idea is generally valid because if the size of continues to grow, will eventually approach . Hence, the key question is how a generalizable configuration could be obtained with a sufficiently small . This question can be restated as: how to generate synthetic training instances, such that the generalization performance of the obtained configuration could be improved as much as possible with a of (say predefined) small size.
Given a parameterized search algorithm, suppose a configuration has been obtained as the bestperforming configuration on . A synthetic instance set is to be generated, with the aim that a new configuration obtained with would outperform in terms of generalization as much as possible. Ignoring the inner optimization/tuning process with which and are obtained, generating highquality could be more formally stated as another optimization problem as in Eq. (3):
(3) 
Since is obtained with , we further assume that for any , . Although this is a rather restrictive assumption, it will be shown in Section IIC that it could be fulfilled for a large class of search algorithms. Substituting this assumption into Eq. (3) leads to:
(4) 
Inequality (4) gives an upper bound of . If the righthand side of it is smaller than 0, then can generalize better than . More importantly, neither nor can be precisely measured in practice, thus minimizing a measurable surrogate, i.e., the upper bound, would be a good alternative to improve the generalization performance. Therefore, given a training instance set and a configuration obtained with , an improved configuration (in terms of generalization performance) could be obtained with a strategy with two steps:

identify the for which is maximized;

identify the that minimizes (note that, once is generated, the term can be viewed as a constant and can be omitted).
IiC Enhancing Generalization via Competitive Coevolution
The two steps derived in Section IIB naturally serve as the core buildingblock of an iterative process that gradually seek configurations with better generalization performance. There could be many ways to design such an iterative process. Among them Competitive Coevolution [31] provides a readily available framework. Concretely, one can maintain an instance population and a configuration population. In each iteration, the two populations alternately evolve and compete with each other, i.e., the instance population evolves to identify and the configuration population evolves to identify the .
In addition to the intuitive coincidence, coevolution also offers two important technical advantages for implementing the twostep improvement strategy. First, a common challenge for implementing the twostep improvement strategy is how to find the optimal and in each iteration, each of which is itself a hard optimization problem. For this reason, encouraging sufficient exploration in the instance and configuration space when seeking and would be important for obtaining satisfactory improvement in each iteration. By maintaining a population rather than an individual, coevolution would be more likely to successfully find sufficiently good synthetic instances and configurations in each iteration.
Second, and possibly more importantly, recall that the twostep improvement strategy is derived from the assumption that for any . Maintaining a population of configurations provides a guarantee for the assumption to hold. To be concrete, the configuration population could be directly employed to construct a Parallel Algorithm Portfolio (PAP) [8, 12, 23, 28, 24]. A PAP solver involves multiple component solvers, each of which is readily a solver for the target problem. Given a problem instance, all the component solvers are run independently, typically in parallel, to get multiple solutions. Then, the best solution will be taken as the output of the PAP solver. Suppose a PAP is constructed with the configuration population, i.e., . By definition, the performance of on an instance s, denoted as , is:
(5) 
which is always not larger than . Hence, the perturbation to any individual configuration is equivalent to starting from a PAP with component solvers and try to improve it with a new component solver, for which the above mentioned assumption always holds.
Iii Coevolution of Parameterized Search
Iiia The General Framework
By incorporating the twosteps improvement strategy into a coevolution process, we arrive at the proposed CEPS framework, as demonstrated in Algorithm 1. For the advantages of PAP over a single solver as discussed in Section IIC, the whole final configuration population is used as the output by default to exploit the full power of CEPS. It should be noted that Algorithm 1 could be easily adapted to output the best configuration in the final population. In general, CEPS consists of two major phases, i.e., an initialization phase (lines 27), and a coevolution phase (lines 827) which could be further subdivided into alternating between the evolution of the configuration population (lines 1015) and the evolution of the instance population (lines 1726) for MaxIte iterations in total. These modules are detailed as follows.
IiiA1 Initialization
Given an initial training instance set , a simple greedy strategy is adopted to initialize a configuration population of size . First, a set of candidate configurations are randomly sampled from the configuration space and tested on the training set (line 2). Then, starting from an empty set (line 3), the configuration population is built iteratively (lines 47). At each iteration, the configuration whose inclusion leads to the bestperforming PAP is selected from (line 5) and inserted into the population (line 6). The process terminates when configurations have been selected.
IiiA2 Evolution of the Configuration Population
Given a population of parent configurations, offspring configurations are first generated by mutating a randomly selected configuration in the parent population, leading to new PAP (lines 1014). The offspring configuration that leading to the bestperforming PAP will replace its parent (line 15). An existing automatic algorithm configuration tool, namely SMAC [14], is employed as the mutation operator. More specifically, to mutate a configuration , SMAC is used to search in the configuration space to find a new configuration with the target that the inclusion of into leads to the minimization of the performance of the resultant PAP on the training set, in terms of the predefined performance indicator (line 12).
IiiA3 Evolution of the Instance Population
In this phase CEPS first creates a backup of the training set (line 18) that will be restored to the training set at the end of instance generation (line 26), and then evolves (lines 1925). Since the aim of the evolution of is to identify instances that are hard for , each instance in is assigned with a fitness as the performance of on it (line 19) — the worse the performance, the higher the fitness. In each generation of the evolution of , CEPS first randomly selects an instance from as the parent and mutates it to generate an offspring (line 21), which is then evaluated against the configuration population (line 22). Finally, CEPS uses to randomly replace an instance in that has lower fitness than (lines 2324). In this way, as the number of generations increases, the average fitness of instances in will gradually increase, meaning that the instances in will be harder and harder for all individuals in the configuration population. Note in the last iteration (i.e., the MaxIteth iteration) of the coevolution phase, evolution of the instance population is skipped (line 17) because there is no need to generate more instances since the final algorithm configurations has been tuned completely.
IiiB Instantiations for TSP and VRPSPDTW
Algorithm 1 is a rather generic framework since the representations of both populations depend on the target parameterized search method and the target problem class, respectively. The mutation operator for the instance population, as well as the fitness function also depend on target problem class. In this paper, two instantiations of CEPS, namely CEPSTSP and CEPSVRPSPDTW, have been developed for the TSP and VRPSPDTW problems, respectively. These two target problem classes are chosen because, as a classic NPhard problem, TSP is one of the most widely investigated benchmarking problems in academia. In comparison, VRPSPDTW is a much more complex routing problem that takes realworld requirements into account. The significant difference between these two problems could provide a good context for assessing CEPS.
Since TSP and VRPSPDTW are hard combinatorial optimization problems, specialized search methods as well as instance mutation operators are needed to instantiate CEPSTSP and CEPSVRPSPDTW. On the other hand, the main purpose of CEPSTSP and CEPSVRPSPDTW is to assess the potential of CEPS. Elaborating the full details of the two algorithms might dilute the major focus of this paper. Therefore, below we briefly introduce the features of the three main modules, i.e., the parameterized search methods, the instance mutation operators and the fitness functions (performance indicators) of the two algorithms. Interested readers may refer to the original publications
[9, 36] for implementation details of the search methods used, and to the Appendices for the details of the instance mutation operators.IiiB1 CepsTsp
Given a list of cities and the distances between each pair of cities, the target of TSP is to find the shortest route that visits each city and returns to the origin city. Specifically, the symmetric TSP with distances in a twodimensional Euclidean space is considered here. The adopted parameterized search method is the Helsgaun’s LinKernighan Heuristic (LKH) [9] version 2.0.7 (with 23 parameters), one of the stateoftheart inexact solver for such TSP. Each TSP instance is represented by a list of coordinates with each coordinate as a city. An operator widely used for generating TSP instances (see [34] for example), is employed as the instance mutation operator of CEPSTSP, as detailed in Appendix A.
For TSP, the quality of a configuration on an instance , i.e., the performance indicator , is the runtime needed by to solve . More specifically, the search algorithm specified by would be terminated as soon as finding the optimal solution of or running for a cutoff time of 10 seconds. In the first case, the run is considered successful and is exactly the recorded runtime; in the second case, the run is considered timeout and is the cutoff time multiplied by a penalty factor 10, i.e., 10 seconds 10 = 100 seconds. With the definition of , the performance of a PAP solver on an instance , i.e., , is defined as in Eq. (5), which is exactly the fitness function used in the evolution of the instance population in CEPSTSP (line 19 and line 22 in Algorithm 1). Finally, the performance of a solver (a single configuration or a PAP solver) is measured by the average runtime over an instance set, referred to as Penalized Average Runtime–10 (PAR10) score [14], which is directly used for fitness evaluation in CEPSTSP to compare PAPs constructed with the configuration population (line 15 of Algorithm 1). The smaller the score, the better.
IiiB2 CepsVrpspdtw
Given a number of customers who require both pickup service and delivery service within a certain time window, the target of VRPSPDTW [36] is to send out a fleet of capacitated vehicles, which are stationed at a depot, to meet the customer demands with the minimum total cost. The depot and the customers are located in a twodimensional Euclidean space, and the distance between two nodes is the same in each opposite direction. VRPSPDTW has five constraints: 1) the number of used vehicles must be smaller than the number of available ones; 2) each customer must be served exactly once; 3) the vehicles cannot be overloaded during transportation; 4) the vehicles can only departure after the start of the time window of the depot, and must return to the depot before the end of the time window of the depot; 5) the service of the vehicle to each customer must be performed within that customer’s time window. The detailed problem formulation of VRPSPDTW could be found in Appendix B.
The adopted parameterized search method for VRPSPDTW is a powerful coevolutionary genetic algorithm (CoGA) proposed by
[36] (with 12 parameters). Each VRPSPDTW instance is represented by a list of coordinates with each coordinate as a customer (or the depot), and each coordinate is associated with several attributes, i.e., time windows , pickup demand and delivery demand . In comparison with TSP, VRPSPDTW is a much more complicated problem. Since no readily available operator for generating VRPSPDTW instances could be found in the literature, a specialized instance mutation operator is proposed for CEPSVRPSPDTW, as detailed in Appendix B.For VRPSPDTW, the quality of a configuration on an instance , i.e., the performance indicator , is the quality of the solution, in terms of the cost of the solution. More specifically, the search algorithm specified by would be terminated as soon as has run for a cutoff time of 150 seconds. Assume successfully finds a feasible solution of cost to . Considering for different VRPSPDTW instances, the scales of the solution costs may vary significantly, thus the “normalized cost” is introduced to replace :
(6) 
where is the mean value of the distances between all pairs of customers in instance . In case that fails to find a feasible solution to within the cutoff time, the corresponding run is considered timeout and will be set to a large penalty value, i.e., 2000. Same as CEPSTSP, in CEPSVRPSPDTW the fitness function used in the evolution of the instance population is defined in Eq. (5), which is further based on the definition of as described above. Similar to the case of TSP, the performance of a solver (a single configuration or a PAP solver) is measured by the average of on an instance set, referred to as Penalized Average Normalized Cost (PANC) score, which is directly employed for fitness evaluation in CEPSVRPSPDTW to compare PAPs constructed with the configuration population (line 15 of Algorithm 1). The smaller the score, the better.
Iv Computational Studies
Instance Sets  #solvers in PAP  Performance Indicator  Parameterized Method  

TSP  500 instances generated by 10 different generators. Training/Testing split: 30/470  4  Runtime needed to find the optima of the instances. In particular, PAR10 score with cutoff time 10 seconds was used (see Section IIIB1)  LKH version 2.0.7 [9] with 23 parameters 
VRPSPDTW  233 instances obtained from realworld application. Training/Testing split: 14/219  4  Cost of the found solutions. In particular, PANC score with cutofftime 150 seconds was used (see Section IIIB2)  CoGA [36] (with 12 parameters) 
To assess the potential of CEPS, computational studies have been conducted with CEPSTSP and CEPSVRPSPDTW. The experiments mainly aim to address two questions:

whether CEPS could better tackle the smallsamplesize challenge, i.e., build generalizable solvers with limited training instances, compared to the stateoftheart parameter tuning methods;

whether coevolution, i.e., alternately evolving the configuration population and the instance population, is effective as expected at enhancing the generalization of the resultant solver.
To answer these two questions, two instance sets were firstly generated for TSP and VRPSPDTW, respectively. The TSP instance set consists of 500 instances and the VRPSPDTW instance set consists of 233 instances. It should be noted that, these instances are generated as our testbed. To avoid bias towards CEPS, these instances should not be generated in the same way that CEPS evolves the instance population. After the benchmark sets were generated, each of them was then randomly split into a training and a testing set, the size of which is 6% and 94% of the whole set, respectively. To reduce the effect of the random splitting on the experimental results, the split was conducted for 3 times, leading to 3 unique pairs of training and testing sets for TSP and VRPSPDTW, denoted as TSP_1/2/3 and VRPSPDTW_1/2/3, respectively. Throughout the experiments, testing instances are only used to approximate the generalization performance of the solver obtained by CEPS and compared methods, in our cases the PAP constructed with the final configuration population. Only the training instances are used for parameter tuning, regardless of the methods used. The TSP/VRPSPDTW instance set, the compared methods, and experimental protocol are further elaborated below.
Iva Benchmark Instances
For TSP, we collected 10 different instance generators from the literature, including the portgen generator from the 8th DIMACS Implementation Challenge [16], the ClusteredNetwork generator from the Rpackage netgen [4] and 8 TSP instance mutators from the Rpackage tspgen [3]. The details of the used generators are presented in Appendix C. Considering the rather different instancegeneration mechanisms underlying them, they are expected to generate highly diverse TSP instances. We used each of them to generate 50 instances, which finally gave us a set of 500 TSP instances. The problem sizes (i.e., city number) of all these instances are 800.
For VRPSPDTW, we obtained data from a realworld application of the JD logistics company. Specifically, the data contain customer requests that occurred during a period of time in a city. The total number of customers is 3000, of which 400 customers require service per day. Therefore, to generate a VRPSPDTW instance, we randomly select 400 customers from the 3000 customers, and the pickup/delivery demands of each customer are randomly selected from all the demands that the customer has during this period of time. We repeated this process for 500 times, thus obtaining a set of 500 VRPSPDTW instances. After that, a VRPSPDTW solver [36] was used to determine whether the generated instances have feasible solutions and those without feasible solutions were discarded. Finally, we obtained a set of 233 VRPSPDTW instances.
IvB Compared Methods
As explained in Section IIIB, CEPSTSP and CEPSVRPSPDTW employ the whole final configuration population to construct PAPs, which are more likely to outperform solvers constructed with a single configuration. Hence, to make a fair comparison, the two instantiations are compared with the stateoftheart PAP tuning methods, namely GLOBAL [21], PARHYDRA [37, 21] and PCIT [23]. It should be noted that all these methods involve no instance evolution mechanism, i.e., the given training instances are assumed to sufficiently represent the target problem class. Hence, given our experimental settings, comparison between CEPS and these approaches aims to evaluate whether CEPS could better tackle the smallsamplesize challenge.
To address research question 2) raised at the beginning of Section IV, i.e., the role of coevolution for achieving better (if any) generalization performance, a baseline method, named Evolution of Parameterized Search (EPS), was also adopted in the comparison. EPS differs from CEPS in that it conducts instance mutation and PAP tuning in two isolated phases, rather than alternately. Given the same training instance set as CEPS, EPS first applies the instance mutation operator to generate an augmented set of instances. The size of this augmented set is kept the same as the number of instances generated during the whole procedure of CEPS. Then, a PAP is evolved with the same approach as in CEPS, but using the union of the initial and the augmented training instance sets as the input.
IvC Experimental Protocol
We set the number of component solvers in PAP, i.e., , to 4, since 4core machines are widely available now. The parameters of the compared methods were set following suggestions in the literature. The number of iterations in the coevolution phase of CEPS, i.e., MaxIte, and the number of offspring configurations in the evolution of the configuration population, i.e., , were set to 4 and 10, respectively. The termination condition for the evolution of the instance population in CEPS was the predefined time budget being exhausted. In the experiments the total CPU time consumed by each compared method was kept almost the same. The detailed time settings of each method are presented in Appendix D. The abovedescribed experimental settings, as well as the used parameterized search methods and performance indicators (see Section IIIB), are summarized in Table I.
For each pair of training and testing sets, i.e., TSP_1/2/3 and VRPSPDTW_1/2/3, we applied each considered method to tune a PAP on the training set and then tested the resulting PAP on the testing set. For each testing instance, the PAP was applied for 3 runs and the median of those three runs was recorded as the performance of the solver on the instance. The performance of a PAP on different testing instances were then aggregated to obtain the number of total timeouts (#TOs), PAR10 score (for TSP solver) and PANC score (for VRPSPDTW solver) on the testing set. For two different PAPs, we further performed a Wilcoxon signedrank test (with significance level ) to determine whether the difference between their PAR10/PANC scores on the testing set was significant.
All the experiments were conducted on a cluster of 3 Intel Xeon machines with 128 GB RAM and 24 cores each (2.20 GHz, 30 MB Cache), running Centos 7.5. The entire experiments took almost 3 months to complete.
IvD Results and Analysis
TSP1  TSP2  TSP3  VRPSPDTW1  VRPSPDTW2  VRPSPDTW3  

#TOs  PAR10  #TOs  PAR10  #TOs  PAR10  #TOs  PANC  #TOs  PANC  #TOs  PANC  
GLOBAL  10  3.85  15  5.18  10  3.67  3  249  3  248  4  257 
PCIT  6  2.75  4  2.31  5  2.51  4  258  2  240  6  274 
PARHYDRA  9  3.55  4  2.19  5  2.36  1  233  5  265  3  248 
EPS  6  3.35  6  2.81  8  2.81  2  237  2  236  1  229 
CEPS  6  1  3  1  1  2  236 
Table II presents the testing results of the PAPs tuned by each considered method. In Table II the PAR10/PANC score of a PAP is indicated by underline if it achieved the best score, and is indicated in bold face if it was not significantly different from the best score on the corresponding testing set. Note for PAR10/PANC, the performance is better if the score is smaller. One could make two important observations from Table II. First, the PAP obtained by CEPS has the smallest number of timeouts, which means that it has the highest success rate for solving the testing instances among all the tested PAPs. In terms of PAR10 and PANC, the PAP output by CEPS achieved the best scores. In particular, in 5 out of the 6 experiments, it achieved the best PAR10/PANC scores (note in the remaining 1 experiment the gap between CEPS and the best method was not statistically significant), and in 4 out of the 6 experiments it was significantly better than others. Furthermore, recall that in different experiments the training/testing splits were different, compared to other approaches, CEPS performed more stably over all 6 experiments. For instance, the #TOs of PCIT and PARHYDRA fluctuates over different training/testing sets on VRPSPDTW problem. In summary, CEPS is not only the bestperforming method, but also is less sensitive to the training data, i.e., could better tackle the smallsamplesize challenge.
Second, EPS also involves instance generation, while was outperformed by methods that do not generate synthetic instances in several cases, e.g., compared to PCIT and PARHYDRA on TSP_2. This observation indicates that isolating instance generation from PAP tuning may have negative effects. On the other hand, the fact CEPS performed better than EPS shows the effectiveness of integrating instance generation into the coevolving framework.
To further verify the effectiveness of the coevolution phase in CEPS, we tested the initial PAPs in CEPS (the PAPs built by the initiliation phase) on the testing sets. Figure 1 illustrates the comparisons between the testing performances (in terms of PAR10/PANC scores) of the initial PAPs and the final PAPs obtained by CEPS. On all of the 6 testing sets, the performance of the final PAP is better than the performance of the initial PAP, and the average improvement rate is 29%. These results indicate that the coevolution in CEPS is effective as expected at enhancing the generalization of the PAP solvers.
IvE Assessing Generalization on Existing Benchmarks
Instance Type  #instances  #better  #notworse  #worse 

RCdp (small)  9  0 (0%)  9 (100%)  0 (0%) 
Rdp  23  4 (17%)  17 (57%)  6 (26%) 
Cdp  17  0 (0%)  9 (53%)  8 (47%) 
RCdp  16  6 (13%)  8 (38%)  8 (50%) 
count  65  10 (15%)  43 (66 %)  22 (34%) 
To further investigate the generalization ability of CEPS, the PAP tuned by CEPS in the VRPSPDTW_1 experiment have been tested on existing VRPSPDTW benchmarks [36], which is a widely used benchmark for VRPSPDTW. Note that compared to the benchmarks, the training instances in VRPSPDTW_1 were obtained from different sources (i.e., realworld application), and may have quite different problem characteristics, e.g., customer number and node distribution. Hence, the PAP tuned by CEPS could be said to generalize well to totally unseen data if it was tuned using the VRPSPDTW_1 training set while still performs well on the VRPSPDTW benchmarks. Table III presents the comparisons between the solutions found by the PAP and the best known solutions reported in the literature [36, 35, 11, 29, 32] (up to May 2019), regardless of what algorithm was used. Table 3 shows that overall the PAP could generalize well to the existing benchmarks. On 43 out of 65 (66%) instances, the solutions found by the PAP are not worse than the best solutions currently known. It is notable that on 10 instances the PAP found new best solutions. Another observation is that the PAP performed not very well on the “cdp” type instances, in which the locations of customers are clustered (see [36] for details). We speculate that this is because the parameterized solver used for tuning PAPs has an inherent deficiency when handling this type of problem, which on the other hand indicates that highlyparameterized solvers with flexible solving capacities are important to fully exploit the power of CEPS on a specific problem class.
V Conclusion
In this work, a coevolutionary approach, i.e., CEPS, is proposed for tuning parameterized search methods to obtain good generalization performance. By coevolving the training instance set and the candidate configurations, CEPS gradually guides the search of configurations towards instances on which the current configurations fail to perform well, and thus leads to configurations that could generalize better. From a theoretical point of view, the evolution of instance set is essentially a greedy mechanism for instance augmentation that guarantees the generalization performance of the resultant solver to improve as much as possible. As a result, CEPS is particularly effective in case that only a limited number of problem instances is available for parameter tuning. We believe such a scenario is, more often than not, true when building realworld systems for tackling hard optimization problems. Two concrete instantiations, i.e., CEPSTSP and CEPSVRPSPDTW, are also presented. The performance of the two instantiations on TSP and VRPSPDTW problems support the effectiveness of CEPS in the sense that, in comparison with stateoftheart parameter tuning approaches, the solvers obtained by CEPS achieves better generalization performance.
Since CEPS is a generic framework, some discussions would help elaborating issues that are of significance in practice. First, although this work assumes CEPS takes a set of initial training instances as the input, such training instances are not necessarily realworld instances but could be generated randomly. In other words, CEPS could be used in a fully coldstart setting (a.k.a. zeroshot), i.e., no realworld instances are available for the target problem class. Further, CEPS could either be run offline or online, i.e., it could accommodate new real instances whenever available.
Second, an instantiation of CEPS involves specification of instance mutation operator, the parameterized search method, as well as the parameter tuning method. Hence, as demonstrated by CEPSTSP and CEPSVRPSPDTW, one needs to first define these three modules according to previous literature on the target problem class, or from scratch. Among these three modules, the instance generators could be adapted from one problem to another more easily and most parameter tuning methods are generic methods that are applicable to a broad family of problems. Hence, the parameterized search method might be the most crucial one among the three modules. Although CEPS is not restricted to a specific type of search methods, the PAPtype method adopted in this work is suggested as a default choice, not only for its advantages in terms of solution quality (compared to a single solver) on a large variety of problems [37, 28, 21, 23, 24] and the theoretical guarantee it offers as elaborated in Section IIB, but also because PAP allows exploiting modern highperformance computing facilities in an extremely simple way. This merit is sometimes even more important than solution quality, since the wallclock runtime is always a crucial performance indicator for realworld optimization systems.
Finally, it is interesting to note that the emerging topic of learn to optimize, which explores utilizing machine learning techniques, e.g., reinforcement learning, to build neural networks for solving optimization problems
[27, 5, 18, 26, 19], could also be combined with CEPS. In this case, the implementation of CEPS would be able to leverage on gradient descent methods to tune/evolve the configurations (i.e., training the weights of a network).Appendix A Instance Mutation Operator for TSP
For TSP defined on twodimensional Euclidean space, each instance is represented by a list of coordinates with each coordinate as a city. As illustrated in Algorithm 2, the instance mutation operator in CEPSTSP works as follows. Let and , and , be the minimum and the maximum of the “” values and the “” values across all coordinates of a given instance , respectively. When applying mutation to , for each coordinate in , and
are offset with probability 0.9 by the step sizes sampled from
and , respectively, and with probability 0.1, and are replaced by new values sampled from and , respectively.refers to normal distribution with mean
and variance
, and refers to normal distribution defined on closed interval .Appendix B Instance Mutation Operator for VRPSPDTW
Ba Problem Description of VRPSPDTW
VRPSPDTW is defined on a complete graph with as the node set and as the arc set defined between each pair of nodes, i.e., . For convenience, the depot is denoted as 0 and the customers are denoted as . In this paper we consider the VRPSPDTW [36, 35] defined on twodimensional Euclidean space. That is, each node has a coordinate and the distance between and , denoted as , is the Euclidean distance. The travel speed between nodes is 1, which means the travel time between node and node equals to .
In addition to the coordinate, each customer is associated with 5 attributes, i.e., a delivery demand , a pickup demand , a time window and a service time . represents the amount of goods to deliver from the depot to customer and represents the amount of goods to pick up from customer to be delivered to the depot. and define the start and the end of the time window in which the customer receives service. The time windows are treated as hard constraints. That is, arrival of a vehicle at the customer before results in a wait before service can begin; while arrival after is infeasible. Finally, is the time spent by the vehicle to load/unload goods at customer . A fleet of identical vehicles, each with a capacity of and dispatching cost , is initially located at the depot. Each vehicle starts at the depot and then serve the customers, and finally returns to the depot. For convenience, the depot 0 is also associated with 5 attributes, in which and are the earliest time the vehicles can depart from the depot and the latest time the vehicles can return to the depot, respectively, and .
A solution to VRPSPDTW could be represented by a set of vehicle routes, i.e., , in which each route consists of a sequence of nodes that the vehicle visits, i.e., , where is the th node visited in , and is the length of . Let denote the total travel distance in , and let denote the highest load on the vehicle that occurs in . Let and denote the time of arrival at and the time of departure from , respectively.
The total cost of consists of two parts: the dispatching cost of the used vehicles, which is , and the transportation cost, which is the total travel distance in multiplied by unit transportation cost . The objective of the VRPSPDTW problem is to find routes for vehicles that serve all the customers at a minimal cost, as presented in Eq. (7):
(7) 
where the constraints are: 1) the number of used vehicles must be smaller than the number of available ones; 2) each customer must be served exactly once; 3) the vehicle cannot be overloaded during transportation; 4) the vehicles can only serve after the start of the time window of the depot, and must return to the depot before the end of the time window of the depot; 5) the service of the vehicle to each customer must be performed within that customer’s time window.
BB Instance Mutation Operator
We consider a practical application scenario from the JD logistics company. Consider a VRPSPDTW solver that needs to solve a VRPSPDTW instance every day. The company has about 3000 customers in total in the city, but only about 13% of its customers (i.e., 400) require service per day. Therefore, for the solver, the different VRPSPDTW instances it faces have the following connections: 1) the location and the time window of the depot are unchanged, and the vehicle fleet is unchanged; 2) the locations of the customers will change; 3) the time windows of the customers will change; 4) the delivery and pickup demands of the customers will change.
Based on the above observation, we design a specialized mutation operator for VRPSPDTW, as presented in Algorithm 3. First, the coordinate mutation as used in CEPSTSP is also used here. Moreover, for the pickup demand and the delivery demand of each customer, they are replaced by new values sampled from and , respectively, where and , and , are the minimum and the maximum of the “” value and the “” values across all customers of , respectively. For the time window of each customer, and are offset by the step sizes sampled from , where and are the earliest time the vehicles can depart from the depot and the latest time the vehicles can return to the depot.
Appendix C TSP Instance Generators
The adopted 10 TSP generators include the portgen generator from the 8th DIMACS Implementation Challenge [16], the ClusteredNetwork generator from the Rpackage netgen [4] and 8 TSP instance mutators, namely explosion, implosion, cluster, rotation, linearprojection, expansion, compression and gridmutation, from the Rpackage tspgen [3].

The portgen generator generates an instance by uniform randomly placing the points. The generated instances are called “uniform” instances.

The ClusteredNetwork generator generates an instance by placing points around different central points. The number of the clusters were set to 4,5,6,7, and 8, for each of which 10 instances were generated.

The explosion mutation operator generates an instance by tearing holes into the city points of a “uniform” instance, with all points within the explosion range pushed out of the explosion area.

The implosion mutation operator generates an instance by driving the city points of a “uniform” instance towards an implosion center.

The cluster mutation operator generates an instance by randomly sampling a cluster centroid in a “uniform” instance, and then moving a randomly selected set of points into the cluster region.

The rotation mutation operator generates an instance by rotating a subset points of a “uniform” instance with a randomly selected angle.

The linearprojection mutation operator generates an instance by projecting a subset points of a “uniform” instance to a linear function.

The expansion mutation operator generates an instance by placing a tube around a linear function in the points of a “uniform” instance, and then orthogonally pushes all points within that tube out of that region.

The compression mutation operator generates an instance by squeezing a set of randomly selected points of a “uniform” instance from within a tube (surrounding a linear function) towards the tube’s central axis.

The gridmutation mutation operator generates an instance by randomly relocating a “box” of city points of a “uniform” instance.
Appendix D Detailed Time Settings of Compared Methods
TSP  

CPU Time  
CEPS  1.5h  0.5h  1.5h  8h  320h 
GLOBAL  7.5h  1h  –  –  340h 
PCIT  7.5h  1h  –  –  340h 
PARHYDRA  2h  1h  –  –  300h 
VRPSPDTW  
CPU Time  
CEPS  6h  2h  6h  32h  1312h 
GLOBAL  30h  4h  –  –  1360h 
PCIT  30h  4h  –  –  1360h 
PARHYDRA  8h  4h  –  –  1200h 
The most timeconsuming parts of PAP tuning methods are the runs of the configurations on the problem instances, and the incurred computational costs account for the vast majority of the total costs. For CEPS, the configurations would be run in the initialization phase (line 5), in the evolution of the configuration population (line 12 and line 15) and in the evolution of the instance population (line 22). Therefore for each of these four procedures we set the corresponding wallclock time budget, i.e., , , and
, to control the overall computational costs of CEPS. Then the total CPU time consumed by CEPS could be estimated by
. In this paper, , and are set to 4, 4 and 10, respectively.The total CPU time consumed by GLOBAL and PCIT could be estimated by , while for PARHYDRA it could be estimated by (see [23, 37, 21] for how these results are derived). Note for different methods , and could be set to different values. The detailed setting of the time budget for each PAP tuning method is given in Table IV. Overall the total CPU time consumed by each method is kept almost the same.
References

[1]
(201507)
Modelbased genetic algorithms for algorithm configuration.
In
Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI’2015
, Buenos Aires, Argentina, pp. 733–739. Cited by: §I.  [2] (200909) A genderbased genetic algorithm for the automatic configuration of algorithms. In Proceedings of the 15th International Conference on Principles and Practice of Constraint Programming, CP’2009, Lisbon, Portugal, pp. 142–157. Cited by: §I.
 [3] (201908) Evolving diverse TSP instances by means of novel and creative mutation operators. In Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms, FOGA’2019, Potsdam, Germany, pp. 58–71. Cited by: Appendix C, §IVA.
 [4] (2015) netgen: Network generator for combinatorial graph problems. Note: https://github.com/jakobbossek/netgen Cited by: Appendix C, §IVA.
 [5] (201912) Learning to perform local rewriting for combinatorial optimization. In Proceedings of the 32ed Annual Conference on Neural Information Processing Systems, NeurIPS’2019, Vancouver, Canada, pp. 6278–6289. Cited by: §IIA, §V.

[6]
(2015)
Tuning optimization algorithms under multiple objective function evaluation budgets.
IEEE Transactions on Evolutionary Computation
19 (3), pp. 341–358. Cited by: §I.  [7] (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm and Evolutionary Computation 1 (1), pp. 19–31. Cited by: §I.
 [8] (2001) Algorithm portfolios. Artificial Intelligence 126 (12), pp. 43–62. Cited by: §IIC.
 [9] (2000) An effective implementation of the LinKernighan traveling salesman heuristic. European Journal of Operational Research 126 (1), pp. 106–130. Cited by: §IIIB1, §IIIB, TABLE I.
 [10] (2020) A survey of automatic parameter tuning methods for metaheuristics. IEEE Transactions on Evolutionary Computation 24 (2), pp. 201–216. Cited by: §I.
 [11] (2016) Vehicle routing problem with simultaneous pickup and delivery and timewindows based on improved global artificial fish swarm algorithm. Computer Engineering and Applications 52 (21), pp. 21–29. Cited by: §IVE.
 [12] (1997) An economics approach to hard computational problems. Science 275 (5296), pp. 51–54. Cited by: §IIC.
 [13] (2009) ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research 36 (1), pp. 267–306. Cited by: §I.
 [14] (201101) Sequential modelbased optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization, LION’2011, Rome, Italy, pp. 507–523. Cited by: §I, §IIIA2, §IIIB1.
 [15] (200707) Automatic algorithm configuration based on local search. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, AAAI’2007, Vancouver, Canada, pp. 1152–1157. Cited by: §IIA.
 [16] (2007) Experimental analysis of heuristics for the STSP. In The Traveling Salesman Problem and Its Variations, G. Gutin and A. P. Punnen (Eds.), pp. 369–443. Cited by: Appendix C, §IVA.
 [17] (2015) Parameter control in evolutionary algorithms: trends and challenges. IEEE Transactions on Evolutionary Computation 19 (2), pp. 167–187. Cited by: §I.
 [18] (201905) Attention, Learn to solve routing problems!. In Proceedings of the 7th International Conference on Learning Representations, ICLR’2019, New Orleans, LA. Cited by: §IIA, §V.
 [19] (201612) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems, NIPS’2016, Barcelona, Spain, pp. 3675–3683. Cited by: §V.
 [20] (1973) An effective heuristic algorithm for the travelingsalesman Problem. Operations Ressarch 21 (2), pp. 498–516. Cited by: §I.
 [21] (2017) Automatic construction of parallel portfolios via algorithm configuration. Artificial Intelligence 244, pp. 272–290. Cited by: Appendix D, §IVB, §V.
 [22] (202002) On performance estimation in automatic algorithm configuration. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI’ 2020, New York, NY, pp. 2384–2391. Cited by: §IIA.
 [23] (201901) Automatic construction of parallel portfolios via explicit instance grouping. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI’ 2019, Honolulu, HI, pp. 1560–1567. Cited by: Appendix D, §IIC, §IVB, §V.
 [24] (2020) Generative adversarial construction of parallel portfolios. Note: IEEE Transactions on Cyberneticsto be published Cited by: §IIC, §V.
 [25] (2016) The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives 3, pp. 43–58. Cited by: §I.
 [26] (2015) Humanlevel control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. Cited by: §V.
 [27] (201812) Reinforcement learning for solving the vehicle routing problem. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems, NIPS’2018, Quebec, Canada, pp. 9861–9871. Cited by: §IIA, §V.
 [28] (2010) Populationbased algorithm portfolios for numerical optimization. IEEE Transactions on Evolutionary Computation 14 (5), pp. 782–800. Cited by: §IIC, §V.
 [29] (201806) An evolutionary ant colony algorithm for a vehicle routing problem with simultaneous pickup and delivery and hard time windows. In Proceedings of the 30th Chinese Control and Decision Conference, CCDC’2018, Shenyang, China, pp. 6499–6503. Cited by: §IVE.
 [30] (1991) TSPLIB  A traveling salesman problem library. INFORMS Journal on Computing 3 (4), pp. 376–384. Cited by: §I.
 [31] (1997) New methods for competitive coevolution. Evolutionary Computation 5 (1), pp. 1–29. Cited by: §IIC.
 [32] (2018) An efficient tabu search based procedure for simultaneous delivery and pickup problem with time window. IFACPapersOnLine 51 (11), pp. 241–246. Cited by: §IVE.
 [33] (2015) Generating new test instances by evolving in instance space. Computers & Operations Research 63, pp. 102–113. Cited by: §IIA.
 [34] (2006) Evolving combinatorial problem instances that are difficult to solve. Evolutionary Computation 14 (4), pp. 433–462. Cited by: §IIIB1.
 [35] (2015) A parallel simulated annealing method for the vehicle routing problem with simultaneous pickupdelivery and time windows. Computers & Industrial Engineering 83, pp. 111–122. Cited by: §BA, §IVE.
 [36] (2012) A genetic algorithm for the simultaneous delivery and pickup problems with time window. Computers & Industrial Engineering 62 (1), pp. 84–95. Cited by: §BA, item 2, §IIIB2, §IIIB2, §IIIB, §IVA, §IVE, TABLE I.
 [37] (201007) Hydra: Automatically configuring algorithms for portfoliobased selection. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI’2010, Atlanta, GA, pp. 210–216. Cited by: Appendix D, §IVB, §V.
Comments
There are no comments yet.