In the past decades, search methods have become major approaches for tackling various computationally hard problems. Most, if not all, established search methods, from specialized heuristic algorithms tailored for a particular problem class, e.g., the Lin–Kernighan (LK) heuristic for the Traveling Salesman Problem (TSP) 
, to general algorithmic frameworks, e.g., Evolutionary Algorithms, share a common feature. That is, they are parameterized algorithms, which means they involve parameters that need to be configured by users before the algorithm is applied to a problem.
Although theoretical analysis for many parameterized algorithms have offered worst or average bounds on their performance, their actual performance in practice is in many cases highly sensitive to the settings of parameters [7, 14, 17, 10]. More importantly, finding the optimal configuration, i.e., parameter setting, requires knowledge of both the algorithm and the problem to solve, which cannot be done manually with ease. Hence, a lot of efforts have been made to develop parameter tuning, also referred to as algorithm configuration, methods [13, 14, 2, 1, 25, 6, 10] to automatically identify the most appropriate parameter setting for an algorithm. These methods essentially involve a high-level iterative generate-and-test process. To be specific, given a set of instances from the target problem class, different configurations are iteratively generated and tested on the instance set. Upon termination, the process outputs the configuration that performs the best on the instance set.
From the practical point of view, a parameter tuning method is expected to identify configuration that generalizes well, i.e., performs well not only on the instance set used during the tuning phase, but also on unseen instances of the same problem class. The reason is that intelligent systems, which incorporate parameterized search algorithms as a module, are seldom built to address a few specific problem instances, but for a whole target problem class, and it is unlikely to know in advance the exact problem instances that a system will encounter in practice. The need for generalization requires the instance set used for parameter tuning to be sufficiently large such that it consists of good representatives of all instances of the target problem class. Unfortunately, parameter tuning in a real-world scenario is more likely than not to face the small-sample-size challenge. In other words, the available instance set is not only of small size, but also may not well represent the target problem class. For example, the widely studied TSP benchmark suites (i.e., TSPlib ) consists of a few hundred TSP instances, while there could be millions of possibilities for concrete TSP instances even if only considering a fixed number of cities. In consequence, the more powerful of a parameter tuning method, the higher risk that the obtained configuration will over-fit the instances involved in the tuning process.
This paper suggests that the pursuit of generalizable configurations for parameterized search algorithms could be modelled as a co-evolutionary system, in which two internal populations, representing the configurations and the problem instances, respectively, compete with each other during the evolution course. The evolution of the latter promotes exploration in the instance space of the target problem class to generate synthetic instances that exploit the weakness of the former. The former, on the other hand, improves itself by identifying configurations that could better handle the former. In this way, the configuration population is encouraged to evolve towards achieving good performance on as many instances of the target problem class as possible, i.e., towards better generalization. Specifically, contributions of this paper include:
A novel parameter tuning framework, namely Co-Evolution of Parameterized Search (CEPS), is proposed. It is also shown that CEPS is a process that minimizes the upper bound, i.e., a tractable surrogate, of the generalization performance.
To demonstrate the implementation details of CEPS as well as to assess its potential, concrete instantiations are also presented for two hard optimization problems, i.e., TSP and the Vehicle Routing Problem with Simultaneous Pickup–Delivery and Time Windows (VRPSPDTW) . Computational studies confirm that CEPS is able to obtain configurations with better generalization performance.
The proposal of CEPS extends the realm of Co-Evolution, for the first time, to evolving algorithm configurations and problem instances. Since CEPS does not invoke domain-specific knowledge, its potential applications can go beyond optimization problems, even to planning and learning problems.
The rest of the paper is organized as follows. In Section II, we introduce the challenge of seeking generalizable configurations for a parameterized search algorithm, as well as the theoretical insight behind CEPS. Section III firstly presents the CEPS framework, followed by its instantiations for TSP and VRPSPDTW. Computational studies on these two problems are presented in Section IV. Finally, Section V concludes the paper with discussions.
Ii Parameterized Solvers Made Generalizable
Ii-a Notations and Problem Description
Assume a solver is to be built for a problem class (e.g., TSP), for which an instance of the problem class is denoted as , and the set of all possible is denoted as
. Generally speaking, the solver can be any concrete computational process, e.g., a traditional heuristic search process such as the LK Heuristic for TSP or even a neural network[27, 5, 18], that outputs a solution for a given instance of the target problem class. Given a parameterized algorithm for which the parameter(s) needs to be assigned with some values. A solver is a specification of the algorithm with a concrete value assignment of . For the sake of brevity, we refer each concrete value assignment as a configuration of the algorithm in this paper. Since a configuration fully defines a solver, to seek a good (say generalizable) solver, is equivalent to seek a good configuration. Hence, the quality of a configuration on a given instance is directly denoted as , which is a performance indicator of the corresponding solver on a given instance . This indicator could concern many aspects, e.g., the quality of the obtained , the CPU time required to achieve a solution above a given quality threshold, or even be stated in a multi-objective form.
Following the above definitions, optimizing the generalization performance of a solver can be stated as (assuming the smaller the better):
where stands for the prior distribution of
. Since in practice the prior distribution is usually unknown, a uniform distribution can be assumed without loss of generality. Eq. (1) can be then simplified to Eq. (2) by omitting a normalization constant:
The challenge with Eqs. (1) and (2) is that in practice they cannot be directly optimized since the set is, more often than not, unavailable. Instead, only a set of “training” instances, i.e., a subset , is given for the purpose of tuning . In fact, it has been observed that the size of the training instance set is rather limited [15, 33, 22], leading to the small-sample-size phenomenon. Even worse, given a collected from the real world, it is non-trivial (if not impossible) to know how to verify whether it is a good representative of . In case the training instance set is too small, or is not a good representative of the whole problem class, the best configuration obtained with it would fail to generalize.
Ii-B Enhancing Generalization with Synthetic Instances
A natural idea to tackle the small-sample-size challenge is to augment with a set of synthetic instances, say , such that the configuration obtained with would generalize better than that obtained with . This idea is generally valid because if the size of continues to grow, will eventually approach . Hence, the key question is how a generalizable configuration could be obtained with a sufficiently small . This question can be re-stated as: how to generate synthetic training instances, such that the generalization performance of the obtained configuration could be improved as much as possible with a of (say predefined) small size.
Given a parameterized search algorithm, suppose a configuration has been obtained as the best-performing configuration on . A synthetic instance set is to be generated, with the aim that a new configuration obtained with would outperform in terms of generalization as much as possible. Ignoring the inner optimization/tuning process with which and are obtained, generating high-quality could be more formally stated as another optimization problem as in Eq. (3):
Since is obtained with , we further assume that for any , . Although this is a rather restrictive assumption, it will be shown in Section II-C that it could be fulfilled for a large class of search algorithms. Substituting this assumption into Eq. (3) leads to:
Inequality (4) gives an upper bound of . If the right-hand side of it is smaller than 0, then can generalize better than . More importantly, neither nor can be precisely measured in practice, thus minimizing a measurable surrogate, i.e., the upper bound, would be a good alternative to improve the generalization performance. Therefore, given a training instance set and a configuration obtained with , an improved configuration (in terms of generalization performance) could be obtained with a strategy with two steps:
identify the for which is maximized;
identify the that minimizes (note that, once is generated, the term can be viewed as a constant and can be omitted).
Ii-C Enhancing Generalization via Competitive Co-evolution
The two steps derived in Section II-B naturally serve as the core building-block of an iterative process that gradually seek configurations with better generalization performance. There could be many ways to design such an iterative process. Among them Competitive Co-evolution  provides a readily available framework. Concretely, one can maintain an instance population and a configuration population. In each iteration, the two populations alternately evolve and compete with each other, i.e., the instance population evolves to identify and the configuration population evolves to identify the .
In addition to the intuitive coincidence, co-evolution also offers two important technical advantages for implementing the two-step improvement strategy. First, a common challenge for implementing the two-step improvement strategy is how to find the optimal and in each iteration, each of which is itself a hard optimization problem. For this reason, encouraging sufficient exploration in the instance and configuration space when seeking and would be important for obtaining satisfactory improvement in each iteration. By maintaining a population rather than an individual, co-evolution would be more likely to successfully find sufficiently good synthetic instances and configurations in each iteration.
Second, and possibly more importantly, recall that the two-step improvement strategy is derived from the assumption that for any . Maintaining a population of configurations provides a guarantee for the assumption to hold. To be concrete, the configuration population could be directly employed to construct a Parallel Algorithm Portfolio (PAP) [8, 12, 23, 28, 24]. A PAP solver involves multiple component solvers, each of which is readily a solver for the target problem. Given a problem instance, all the component solvers are run independently, typically in parallel, to get multiple solutions. Then, the best solution will be taken as the output of the PAP solver. Suppose a PAP is constructed with the configuration population, i.e., . By definition, the performance of on an instance s, denoted as , is:
which is always not larger than . Hence, the perturbation to any individual configuration is equivalent to starting from a PAP with component solvers and try to improve it with a new component solver, for which the above mentioned assumption always holds.
Iii Co-evolution of Parameterized Search
Iii-a The General Framework
By incorporating the two-steps improvement strategy into a co-evolution process, we arrive at the proposed CEPS framework, as demonstrated in Algorithm 1. For the advantages of PAP over a single solver as discussed in Section II-C, the whole final configuration population is used as the output by default to exploit the full power of CEPS. It should be noted that Algorithm 1 could be easily adapted to output the best configuration in the final population. In general, CEPS consists of two major phases, i.e., an initialization phase (lines 2-7), and a co-evolution phase (lines 8-27) which could be further subdivided into alternating between the evolution of the configuration population (lines 10-15) and the evolution of the instance population (lines 17-26) for MaxIte iterations in total. These modules are detailed as follows.
Given an initial training instance set , a simple greedy strategy is adopted to initialize a configuration population of size . First, a set of candidate configurations are randomly sampled from the configuration space and tested on the training set (line 2). Then, starting from an empty set (line 3), the configuration population is built iteratively (lines 4-7). At each iteration, the configuration whose inclusion leads to the best-performing PAP is selected from (line 5) and inserted into the population (line 6). The process terminates when configurations have been selected.
Iii-A2 Evolution of the Configuration Population
Given a population of parent configurations, offspring configurations are first generated by mutating a randomly selected configuration in the parent population, leading to new PAP (lines 10-14). The offspring configuration that leading to the best-performing PAP will replace its parent (line 15). An existing automatic algorithm configuration tool, namely SMAC , is employed as the mutation operator. More specifically, to mutate a configuration , SMAC is used to search in the configuration space to find a new configuration with the target that the inclusion of into leads to the minimization of the performance of the resultant PAP on the training set, in terms of the predefined performance indicator (line 12).
Iii-A3 Evolution of the Instance Population
In this phase CEPS first creates a backup of the training set (line 18) that will be restored to the training set at the end of instance generation (line 26), and then evolves (lines 19-25). Since the aim of the evolution of is to identify instances that are hard for , each instance in is assigned with a fitness as the performance of on it (line 19) — the worse the performance, the higher the fitness. In each generation of the evolution of , CEPS first randomly selects an instance from as the parent and mutates it to generate an offspring (line 21), which is then evaluated against the configuration population (line 22). Finally, CEPS uses to randomly replace an instance in that has lower fitness than (lines 23-24). In this way, as the number of generations increases, the average fitness of instances in will gradually increase, meaning that the instances in will be harder and harder for all individuals in the configuration population. Note in the last iteration (i.e., the MaxIte-th iteration) of the co-evolution phase, evolution of the instance population is skipped (line 17) because there is no need to generate more instances since the final algorithm configurations has been tuned completely.
Iii-B Instantiations for TSP and VRPSPDTW
Algorithm 1 is a rather generic framework since the representations of both populations depend on the target parameterized search method and the target problem class, respectively. The mutation operator for the instance population, as well as the fitness function also depend on target problem class. In this paper, two instantiations of CEPS, namely CEPS-TSP and CEPS-VRPSPDTW, have been developed for the TSP and VRPSPDTW problems, respectively. These two target problem classes are chosen because, as a classic NP-hard problem, TSP is one of the most widely investigated benchmarking problems in academia. In comparison, VRPSPDTW is a much more complex routing problem that takes real-world requirements into account. The significant difference between these two problems could provide a good context for assessing CEPS.
Since TSP and VRPSPDTW are hard combinatorial optimization problems, specialized search methods as well as instance mutation operators are needed to instantiate CEPS-TSP and CEPS-VRPSPDTW. On the other hand, the main purpose of CEPS-TSP and CEPS-VRPSPDTW is to assess the potential of CEPS. Elaborating the full details of the two algorithms might dilute the major focus of this paper. Therefore, below we briefly introduce the features of the three main modules, i.e., the parameterized search methods, the instance mutation operators and the fitness functions (performance indicators) of the two algorithms. Interested readers may refer to the original publications[9, 36] for implementation details of the search methods used, and to the Appendices for the details of the instance mutation operators.
Given a list of cities and the distances between each pair of cities, the target of TSP is to find the shortest route that visits each city and returns to the origin city. Specifically, the symmetric TSP with distances in a two-dimensional Euclidean space is considered here. The adopted parameterized search method is the Helsgaun’s Lin-Kernighan Heuristic (LKH)  version 2.0.7 (with 23 parameters), one of the state-of-the-art inexact solver for such TSP. Each TSP instance is represented by a list of coordinates with each coordinate as a city. An operator widely used for generating TSP instances (see  for example), is employed as the instance mutation operator of CEPS-TSP, as detailed in Appendix A.
For TSP, the quality of a configuration on an instance , i.e., the performance indicator , is the runtime needed by to solve . More specifically, the search algorithm specified by would be terminated as soon as finding the optimal solution of or running for a cutoff time of 10 seconds. In the first case, the run is considered successful and is exactly the recorded runtime; in the second case, the run is considered timeout and is the cutoff time multiplied by a penalty factor 10, i.e., 10 seconds 10 = 100 seconds. With the definition of , the performance of a PAP solver on an instance , i.e., , is defined as in Eq. (5), which is exactly the fitness function used in the evolution of the instance population in CEPS-TSP (line 19 and line 22 in Algorithm 1). Finally, the performance of a solver (a single configuration or a PAP solver) is measured by the average runtime over an instance set, referred to as Penalized Average Runtime–10 (PAR-10) score , which is directly used for fitness evaluation in CEPS-TSP to compare PAPs constructed with the configuration population (line 15 of Algorithm 1). The smaller the score, the better.
Given a number of customers who require both pickup service and delivery service within a certain time window, the target of VRPSPDTW  is to send out a fleet of capacitated vehicles, which are stationed at a depot, to meet the customer demands with the minimum total cost. The depot and the customers are located in a two-dimensional Euclidean space, and the distance between two nodes is the same in each opposite direction. VRPSPDTW has five constraints: 1) the number of used vehicles must be smaller than the number of available ones; 2) each customer must be served exactly once; 3) the vehicles cannot be overloaded during transportation; 4) the vehicles can only departure after the start of the time window of the depot, and must return to the depot before the end of the time window of the depot; 5) the service of the vehicle to each customer must be performed within that customer’s time window. The detailed problem formulation of VRPSPDTW could be found in Appendix B.
The adopted parameterized search method for VRPSPDTW is a powerful co-evolutionary genetic algorithm (Co-GA) proposed by (with 12 parameters). Each VRPSPDTW instance is represented by a list of coordinates with each coordinate as a customer (or the depot), and each coordinate is associated with several attributes, i.e., time windows , pickup demand and delivery demand . In comparison with TSP, VRPSPDTW is a much more complicated problem. Since no readily available operator for generating VRPSPDTW instances could be found in the literature, a specialized instance mutation operator is proposed for CEPS-VRPSPDTW, as detailed in Appendix B.
For VRPSPDTW, the quality of a configuration on an instance , i.e., the performance indicator , is the quality of the solution, in terms of the cost of the solution. More specifically, the search algorithm specified by would be terminated as soon as has run for a cutoff time of 150 seconds. Assume successfully finds a feasible solution of cost to . Considering for different VRPSPDTW instances, the scales of the solution costs may vary significantly, thus the “normalized cost” is introduced to replace :
where is the mean value of the distances between all pairs of customers in instance . In case that fails to find a feasible solution to within the cutoff time, the corresponding run is considered timeout and will be set to a large penalty value, i.e., 2000. Same as CEPS-TSP, in CEPS-VRPSPDTW the fitness function used in the evolution of the instance population is defined in Eq. (5), which is further based on the definition of as described above. Similar to the case of TSP, the performance of a solver (a single configuration or a PAP solver) is measured by the average of on an instance set, referred to as Penalized Average Normalized Cost (PANC) score, which is directly employed for fitness evaluation in CEPS-VRPSPDTW to compare PAPs constructed with the configuration population (line 15 of Algorithm 1). The smaller the score, the better.
Iv Computational Studies
|Instance Sets||#solvers in PAP||Performance Indicator||Parameterized Method|
|TSP||500 instances generated by 10 different generators. Training/Testing split: 30/470||4||Runtime needed to find the optima of the instances. In particular, PAR-10 score with cutoff time 10 seconds was used (see Section III-B1)||LKH version 2.0.7  with 23 parameters|
|VRPSPDTW||233 instances obtained from real-world application. Training/Testing split: 14/219||4||Cost of the found solutions. In particular, PANC score with cutofftime 150 seconds was used (see Section III-B2)||Co-GA  (with 12 parameters)|
To assess the potential of CEPS, computational studies have been conducted with CEPS-TSP and CEPS-VRPSPDTW. The experiments mainly aim to address two questions:
whether CEPS could better tackle the small-sample-size challenge, i.e., build generalizable solvers with limited training instances, compared to the state-of-the-art parameter tuning methods;
whether co-evolution, i.e., alternately evolving the configuration population and the instance population, is effective as expected at enhancing the generalization of the resultant solver.
To answer these two questions, two instance sets were firstly generated for TSP and VRPSPDTW, respectively. The TSP instance set consists of 500 instances and the VRPSPDTW instance set consists of 233 instances. It should be noted that, these instances are generated as our testbed. To avoid bias towards CEPS, these instances should not be generated in the same way that CEPS evolves the instance population. After the benchmark sets were generated, each of them was then randomly split into a training and a testing set, the size of which is 6% and 94% of the whole set, respectively. To reduce the effect of the random splitting on the experimental results, the split was conducted for 3 times, leading to 3 unique pairs of training and testing sets for TSP and VRPSPDTW, denoted as TSP_1/2/3 and VRPSPDTW_1/2/3, respectively. Throughout the experiments, testing instances are only used to approximate the generalization performance of the solver obtained by CEPS and compared methods, in our cases the PAP constructed with the final configuration population. Only the training instances are used for parameter tuning, regardless of the methods used. The TSP/VRPSPDTW instance set, the compared methods, and experimental protocol are further elaborated below.
Iv-a Benchmark Instances
For TSP, we collected 10 different instance generators from the literature, including the portgen generator from the 8th DIMACS Implementation Challenge , the ClusteredNetwork generator from the R-package netgen  and 8 TSP instance mutators from the R-package tspgen . The details of the used generators are presented in Appendix C. Considering the rather different instance-generation mechanisms underlying them, they are expected to generate highly diverse TSP instances. We used each of them to generate 50 instances, which finally gave us a set of 500 TSP instances. The problem sizes (i.e., city number) of all these instances are 800.
For VRPSPDTW, we obtained data from a real-world application of the JD logistics company. Specifically, the data contain customer requests that occurred during a period of time in a city. The total number of customers is 3000, of which 400 customers require service per day. Therefore, to generate a VRPSPDTW instance, we randomly select 400 customers from the 3000 customers, and the pickup/delivery demands of each customer are randomly selected from all the demands that the customer has during this period of time. We repeated this process for 500 times, thus obtaining a set of 500 VRPSPDTW instances. After that, a VRPSPDTW solver  was used to determine whether the generated instances have feasible solutions and those without feasible solutions were discarded. Finally, we obtained a set of 233 VRPSPDTW instances.
Iv-B Compared Methods
As explained in Section III-B, CEPS-TSP and CEPS-VRPSPDTW employ the whole final configuration population to construct PAPs, which are more likely to outperform solvers constructed with a single configuration. Hence, to make a fair comparison, the two instantiations are compared with the state-of-the-art PAP tuning methods, namely GLOBAL , PARHYDRA [37, 21] and PCIT . It should be noted that all these methods involve no instance evolution mechanism, i.e., the given training instances are assumed to sufficiently represent the target problem class. Hence, given our experimental settings, comparison between CEPS and these approaches aims to evaluate whether CEPS could better tackle the small-sample-size challenge.
To address research question 2) raised at the beginning of Section IV, i.e., the role of co-evolution for achieving better (if any) generalization performance, a baseline method, named Evolution of Parameterized Search (EPS), was also adopted in the comparison. EPS differs from CEPS in that it conducts instance mutation and PAP tuning in two isolated phases, rather than alternately. Given the same training instance set as CEPS, EPS first applies the instance mutation operator to generate an augmented set of instances. The size of this augmented set is kept the same as the number of instances generated during the whole procedure of CEPS. Then, a PAP is evolved with the same approach as in CEPS, but using the union of the initial and the augmented training instance sets as the input.
Iv-C Experimental Protocol
We set the number of component solvers in PAP, i.e., , to 4, since 4-core machines are widely available now. The parameters of the compared methods were set following suggestions in the literature. The number of iterations in the co-evolution phase of CEPS, i.e., MaxIte, and the number of offspring configurations in the evolution of the configuration population, i.e., , were set to 4 and 10, respectively. The termination condition for the evolution of the instance population in CEPS was the predefined time budget being exhausted. In the experiments the total CPU time consumed by each compared method was kept almost the same. The detailed time settings of each method are presented in Appendix D. The above-described experimental settings, as well as the used parameterized search methods and performance indicators (see Section III-B), are summarized in Table I.
For each pair of training and testing sets, i.e., TSP_1/2/3 and VRPSPDTW_1/2/3, we applied each considered method to tune a PAP on the training set and then tested the resulting PAP on the testing set. For each testing instance, the PAP was applied for 3 runs and the median of those three runs was recorded as the performance of the solver on the instance. The performance of a PAP on different testing instances were then aggregated to obtain the number of total timeouts (#TOs), PAR-10 score (for TSP solver) and PANC score (for VRPSPDTW solver) on the testing set. For two different PAPs, we further performed a Wilcoxon signed-rank test (with significance level ) to determine whether the difference between their PAR-10/PANC scores on the testing set was significant.
All the experiments were conducted on a cluster of 3 Intel Xeon machines with 128 GB RAM and 24 cores each (2.20 GHz, 30 MB Cache), running Centos 7.5. The entire experiments took almost 3 months to complete.
Iv-D Results and Analysis
Table II presents the testing results of the PAPs tuned by each considered method. In Table II the PAR-10/PANC score of a PAP is indicated by underline if it achieved the best score, and is indicated in bold face if it was not significantly different from the best score on the corresponding testing set. Note for PAR-10/PANC, the performance is better if the score is smaller. One could make two important observations from Table II. First, the PAP obtained by CEPS has the smallest number of timeouts, which means that it has the highest success rate for solving the testing instances among all the tested PAPs. In terms of PAR-10 and PANC, the PAP output by CEPS achieved the best scores. In particular, in 5 out of the 6 experiments, it achieved the best PAR-10/PANC scores (note in the remaining 1 experiment the gap between CEPS and the best method was not statistically significant), and in 4 out of the 6 experiments it was significantly better than others. Furthermore, recall that in different experiments the training/testing splits were different, compared to other approaches, CEPS performed more stably over all 6 experiments. For instance, the #TOs of PCIT and PARHYDRA fluctuates over different training/testing sets on VRPSPDTW problem. In summary, CEPS is not only the best-performing method, but also is less sensitive to the training data, i.e., could better tackle the small-sample-size challenge.
Second, EPS also involves instance generation, while was outperformed by methods that do not generate synthetic instances in several cases, e.g., compared to PCIT and PARHYDRA on TSP_2. This observation indicates that isolating instance generation from PAP tuning may have negative effects. On the other hand, the fact CEPS performed better than EPS shows the effectiveness of integrating instance generation into the co-evolving framework.
To further verify the effectiveness of the co-evolution phase in CEPS, we tested the initial PAPs in CEPS (the PAPs built by the initiliation phase) on the testing sets. Figure 1 illustrates the comparisons between the testing performances (in terms of PAR-10/PANC scores) of the initial PAPs and the final PAPs obtained by CEPS. On all of the 6 testing sets, the performance of the final PAP is better than the performance of the initial PAP, and the average improvement rate is 29%. These results indicate that the co-evolution in CEPS is effective as expected at enhancing the generalization of the PAP solvers.
Iv-E Assessing Generalization on Existing Benchmarks
|RCdp (small)||9||0 (0%)||9 (100%)||0 (0%)|
|Rdp||23||4 (17%)||17 (57%)||6 (26%)|
|Cdp||17||0 (0%)||9 (53%)||8 (47%)|
|RCdp||16||6 (13%)||8 (38%)||8 (50%)|
|count||65||10 (15%)||43 (66 %)||22 (34%)|
To further investigate the generalization ability of CEPS, the PAP tuned by CEPS in the VRPSPDTW_1 experiment have been tested on existing VRPSPDTW benchmarks , which is a widely used benchmark for VRPSPDTW. Note that compared to the benchmarks, the training instances in VRPSPDTW_1 were obtained from different sources (i.e., real-world application), and may have quite different problem characteristics, e.g., customer number and node distribution. Hence, the PAP tuned by CEPS could be said to generalize well to totally unseen data if it was tuned using the VRPSPDTW_1 training set while still performs well on the VRPSPDTW benchmarks. Table III presents the comparisons between the solutions found by the PAP and the best known solutions reported in the literature [36, 35, 11, 29, 32] (up to May 2019), regardless of what algorithm was used. Table 3 shows that overall the PAP could generalize well to the existing benchmarks. On 43 out of 65 (66%) instances, the solutions found by the PAP are not worse than the best solutions currently known. It is notable that on 10 instances the PAP found new best solutions. Another observation is that the PAP performed not very well on the “cdp” type instances, in which the locations of customers are clustered (see  for details). We speculate that this is because the parameterized solver used for tuning PAPs has an inherent deficiency when handling this type of problem, which on the other hand indicates that highly-parameterized solvers with flexible solving capacities are important to fully exploit the power of CEPS on a specific problem class.
In this work, a co-evolutionary approach, i.e., CEPS, is proposed for tuning parameterized search methods to obtain good generalization performance. By co-evolving the training instance set and the candidate configurations, CEPS gradually guides the search of configurations towards instances on which the current configurations fail to perform well, and thus leads to configurations that could generalize better. From a theoretical point of view, the evolution of instance set is essentially a greedy mechanism for instance augmentation that guarantees the generalization performance of the resultant solver to improve as much as possible. As a result, CEPS is particularly effective in case that only a limited number of problem instances is available for parameter tuning. We believe such a scenario is, more often than not, true when building real-world systems for tackling hard optimization problems. Two concrete instantiations, i.e., CEPS-TSP and CEPS-VRPSPDTW, are also presented. The performance of the two instantiations on TSP and VRPSPDTW problems support the effectiveness of CEPS in the sense that, in comparison with state-of-the-art parameter tuning approaches, the solvers obtained by CEPS achieves better generalization performance.
Since CEPS is a generic framework, some discussions would help elaborating issues that are of significance in practice. First, although this work assumes CEPS takes a set of initial training instances as the input, such training instances are not necessarily real-world instances but could be generated randomly. In other words, CEPS could be used in a fully cold-start setting (a.k.a. zero-shot), i.e., no real-world instances are available for the target problem class. Further, CEPS could either be run offline or online, i.e., it could accommodate new real instances whenever available.
Second, an instantiation of CEPS involves specification of instance mutation operator, the parameterized search method, as well as the parameter tuning method. Hence, as demonstrated by CEPS-TSP and CEPS-VRPSPDTW, one needs to first define these three modules according to previous literature on the target problem class, or from scratch. Among these three modules, the instance generators could be adapted from one problem to another more easily and most parameter tuning methods are generic methods that are applicable to a broad family of problems. Hence, the parameterized search method might be the most crucial one among the three modules. Although CEPS is not restricted to a specific type of search methods, the PAP-type method adopted in this work is suggested as a default choice, not only for its advantages in terms of solution quality (compared to a single solver) on a large variety of problems [37, 28, 21, 23, 24] and the theoretical guarantee it offers as elaborated in Section II-B, but also because PAP allows exploiting modern high-performance computing facilities in an extremely simple way. This merit is sometimes even more important than solution quality, since the wall-clock runtime is always a crucial performance indicator for real-world optimization systems.
Finally, it is interesting to note that the emerging topic of learn to optimize, which explores utilizing machine learning techniques, e.g., reinforcement learning, to build neural networks for solving optimization problems[27, 5, 18, 26, 19], could also be combined with CEPS. In this case, the implementation of CEPS would be able to leverage on gradient descent methods to tune/evolve the configurations (i.e., training the weights of a network).
Appendix A Instance Mutation Operator for TSP
For TSP defined on two-dimensional Euclidean space, each instance is represented by a list of coordinates with each coordinate as a city. As illustrated in Algorithm 2, the instance mutation operator in CEPS-TSP works as follows. Let and , and , be the minimum and the maximum of the “” values and the “” values across all coordinates of a given instance , respectively. When applying mutation to , for each coordinate in , and
are offset with probability 0.9 by the step sizes sampled fromand , respectively, and with probability 0.1, and are replaced by new values sampled from and , respectively.
refers to normal distribution with mean
and variance, and refers to normal distribution defined on closed interval .
Appendix B Instance Mutation Operator for VRPSPDTW
B-a Problem Description of VRPSPDTW
VRPSPDTW is defined on a complete graph with as the node set and as the arc set defined between each pair of nodes, i.e., . For convenience, the depot is denoted as 0 and the customers are denoted as . In this paper we consider the VRPSPDTW [36, 35] defined on two-dimensional Euclidean space. That is, each node has a coordinate and the distance between and , denoted as , is the Euclidean distance. The travel speed between nodes is 1, which means the travel time between node and node equals to .
In addition to the coordinate, each customer is associated with 5 attributes, i.e., a delivery demand , a pickup demand , a time window and a service time . represents the amount of goods to deliver from the depot to customer and represents the amount of goods to pick up from customer to be delivered to the depot. and define the start and the end of the time window in which the customer receives service. The time windows are treated as hard constraints. That is, arrival of a vehicle at the customer before results in a wait before service can begin; while arrival after is infeasible. Finally, is the time spent by the vehicle to load/unload goods at customer . A fleet of identical vehicles, each with a capacity of and dispatching cost , is initially located at the depot. Each vehicle starts at the depot and then serve the customers, and finally returns to the depot. For convenience, the depot 0 is also associated with 5 attributes, in which and are the earliest time the vehicles can depart from the depot and the latest time the vehicles can return to the depot, respectively, and .
A solution to VRPSPDTW could be represented by a set of vehicle routes, i.e., , in which each route consists of a sequence of nodes that the vehicle visits, i.e., , where is the -th node visited in , and is the length of . Let denote the total travel distance in , and let denote the highest load on the vehicle that occurs in . Let and denote the time of arrival at and the time of departure from , respectively.
The total cost of consists of two parts: the dispatching cost of the used vehicles, which is , and the transportation cost, which is the total travel distance in multiplied by unit transportation cost . The objective of the VRPSPDTW problem is to find routes for vehicles that serve all the customers at a minimal cost, as presented in Eq. (7):
where the constraints are: 1) the number of used vehicles must be smaller than the number of available ones; 2) each customer must be served exactly once; 3) the vehicle cannot be overloaded during transportation; 4) the vehicles can only serve after the start of the time window of the depot, and must return to the depot before the end of the time window of the depot; 5) the service of the vehicle to each customer must be performed within that customer’s time window.
B-B Instance Mutation Operator
We consider a practical application scenario from the JD logistics company. Consider a VRPSPDTW solver that needs to solve a VRPSPDTW instance every day. The company has about 3000 customers in total in the city, but only about 13% of its customers (i.e., 400) require service per day. Therefore, for the solver, the different VRPSPDTW instances it faces have the following connections: 1) the location and the time window of the depot are unchanged, and the vehicle fleet is unchanged; 2) the locations of the customers will change; 3) the time windows of the customers will change; 4) the delivery and pickup demands of the customers will change.
Based on the above observation, we design a specialized mutation operator for VRPSPDTW, as presented in Algorithm 3. First, the coordinate mutation as used in CEPS-TSP is also used here. Moreover, for the pickup demand and the delivery demand of each customer, they are replaced by new values sampled from and , respectively, where and , and , are the minimum and the maximum of the “” value and the “” values across all customers of , respectively. For the time window of each customer, and are offset by the step sizes sampled from , where and are the earliest time the vehicles can depart from the depot and the latest time the vehicles can return to the depot.
Appendix C TSP Instance Generators
The adopted 10 TSP generators include the portgen generator from the 8th DIMACS Implementation Challenge , the ClusteredNetwork generator from the R-package netgen  and 8 TSP instance mutators, namely explosion, implosion, cluster, rotation, linearprojection, expansion, compression and gridmutation, from the R-package tspgen .
The portgen generator generates an instance by uniform randomly placing the points. The generated instances are called “uniform” instances.
The ClusteredNetwork generator generates an instance by placing points around different central points. The number of the clusters were set to 4,5,6,7, and 8, for each of which 10 instances were generated.
The explosion mutation operator generates an instance by tearing holes into the city points of a “uniform” instance, with all points within the explosion range pushed out of the explosion area.
The implosion mutation operator generates an instance by driving the city points of a “uniform” instance towards an implosion center.
The cluster mutation operator generates an instance by randomly sampling a cluster centroid in a “uniform” instance, and then moving a randomly selected set of points into the cluster region.
The rotation mutation operator generates an instance by rotating a subset points of a “uniform” instance with a randomly selected angle.
The linearprojection mutation operator generates an instance by projecting a subset points of a “uniform” instance to a linear function.
The expansion mutation operator generates an instance by placing a tube around a linear function in the points of a “uniform” instance, and then orthogonally pushes all points within that tube out of that region.
The compression mutation operator generates an instance by squeezing a set of randomly selected points of a “uniform” instance from within a tube (surrounding a linear function) towards the tube’s central axis.
The gridmutation mutation operator generates an instance by randomly relocating a “box” of city points of a “uniform” instance.
Appendix D Detailed Time Settings of Compared Methods
The most time-consuming parts of PAP tuning methods are the runs of the configurations on the problem instances, and the incurred computational costs account for the vast majority of the total costs. For CEPS, the configurations would be run in the initialization phase (line 5), in the evolution of the configuration population (line 12 and line 15) and in the evolution of the instance population (line 22). Therefore for each of these four procedures we set the corresponding wall-clock time budget, i.e., , , and
, to control the overall computational costs of CEPS. Then the total CPU time consumed by CEPS could be estimated by. In this paper, , and are set to 4, 4 and 10, respectively.
The total CPU time consumed by GLOBAL and PCIT could be estimated by , while for PARHYDRA it could be estimated by (see [23, 37, 21] for how these results are derived). Note for different methods , and could be set to different values. The detailed setting of the time budget for each PAP tuning method is given in Table IV. Overall the total CPU time consumed by each method is kept almost the same.
Model-based genetic algorithms for algorithm configuration.
Proceedings of the 24th International Joint Conference on Artificial Intelligence, IJCAI’2015, Buenos Aires, Argentina, pp. 733–739. Cited by: §I.
-  (2009-09) A gender-based genetic algorithm for the automatic configuration of algorithms. In Proceedings of the 15th International Conference on Principles and Practice of Constraint Programming, CP’2009, Lisbon, Portugal, pp. 142–157. Cited by: §I.
-  (2019-08) Evolving diverse TSP instances by means of novel and creative mutation operators. In Proceedings of the 15th ACM/SIGEVO Conference on Foundations of Genetic Algorithms, FOGA’2019, Potsdam, Germany, pp. 58–71. Cited by: Appendix C, §IV-A.
-  (2015) netgen: Network generator for combinatorial graph problems. Note: https://github.com/jakobbossek/netgen Cited by: Appendix C, §IV-A.
-  (2019-12) Learning to perform local rewriting for combinatorial optimization. In Proceedings of the 32ed Annual Conference on Neural Information Processing Systems, NeurIPS’2019, Vancouver, Canada, pp. 6278–6289. Cited by: §II-A, §V.
Tuning optimization algorithms under multiple objective function evaluation budgets.
IEEE Transactions on Evolutionary Computation19 (3), pp. 341–358. Cited by: §I.
-  (2011) Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm and Evolutionary Computation 1 (1), pp. 19–31. Cited by: §I.
-  (2001) Algorithm portfolios. Artificial Intelligence 126 (1-2), pp. 43–62. Cited by: §II-C.
-  (2000) An effective implementation of the Lin-Kernighan traveling salesman heuristic. European Journal of Operational Research 126 (1), pp. 106–130. Cited by: §III-B1, §III-B, TABLE I.
-  (2020) A survey of automatic parameter tuning methods for metaheuristics. IEEE Transactions on Evolutionary Computation 24 (2), pp. 201–216. Cited by: §I.
-  (2016) Vehicle routing problem with simultaneous pick-up and delivery and time-windows based on improved global artificial fish swarm algorithm. Computer Engineering and Applications 52 (21), pp. 21–29. Cited by: §IV-E.
-  (1997) An economics approach to hard computational problems. Science 275 (5296), pp. 51–54. Cited by: §II-C.
-  (2009) ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research 36 (1), pp. 267–306. Cited by: §I.
-  (2011-01) Sequential model-based optimization for general algorithm configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization, LION’2011, Rome, Italy, pp. 507–523. Cited by: §I, §III-A2, §III-B1.
-  (2007-07) Automatic algorithm configuration based on local search. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, AAAI’2007, Vancouver, Canada, pp. 1152–1157. Cited by: §II-A.
-  (2007) Experimental analysis of heuristics for the STSP. In The Traveling Salesman Problem and Its Variations, G. Gutin and A. P. Punnen (Eds.), pp. 369–443. Cited by: Appendix C, §IV-A.
-  (2015) Parameter control in evolutionary algorithms: trends and challenges. IEEE Transactions on Evolutionary Computation 19 (2), pp. 167–187. Cited by: §I.
-  (2019-05) Attention, Learn to solve routing problems!. In Proceedings of the 7th International Conference on Learning Representations, ICLR’2019, New Orleans, LA. Cited by: §II-A, §V.
-  (2016-12) Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems, NIPS’2016, Barcelona, Spain, pp. 3675–3683. Cited by: §V.
-  (1973) An effective heuristic algorithm for the traveling-salesman Problem. Operations Ressarch 21 (2), pp. 498–516. Cited by: §I.
-  (2017) Automatic construction of parallel portfolios via algorithm configuration. Artificial Intelligence 244, pp. 272–290. Cited by: Appendix D, §IV-B, §V.
-  (2020-02) On performance estimation in automatic algorithm configuration. In Proceedings of the 34th AAAI Conference on Artificial Intelligence, AAAI’ 2020, New York, NY, pp. 2384–2391. Cited by: §II-A.
-  (2019-01) Automatic construction of parallel portfolios via explicit instance grouping. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, AAAI’ 2019, Honolulu, HI, pp. 1560–1567. Cited by: Appendix D, §II-C, §IV-B, §V.
-  (2020) Generative adversarial construction of parallel portfolios. Note: IEEE Transactions on Cyberneticsto be published Cited by: §II-C, §V.
-  (2016) The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives 3, pp. 43–58. Cited by: §I.
-  (2015) Human-level control through deep reinforcement learning. Nature 518 (7540), pp. 529–533. Cited by: §V.
-  (2018-12) Reinforcement learning for solving the vehicle routing problem. In Proceedings of the 31st Annual Conference on Neural Information Processing Systems, NIPS’2018, Quebec, Canada, pp. 9861–9871. Cited by: §II-A, §V.
-  (2010) Population-based algorithm portfolios for numerical optimization. IEEE Transactions on Evolutionary Computation 14 (5), pp. 782–800. Cited by: §II-C, §V.
-  (2018-06) An evolutionary ant colony algorithm for a vehicle routing problem with simultaneous pick-up and delivery and hard time windows. In Proceedings of the 30th Chinese Control and Decision Conference, CCDC’2018, Shenyang, China, pp. 6499–6503. Cited by: §IV-E.
-  (1991) TSPLIB - A traveling salesman problem library. INFORMS Journal on Computing 3 (4), pp. 376–384. Cited by: §I.
-  (1997) New methods for competitive coevolution. Evolutionary Computation 5 (1), pp. 1–29. Cited by: §II-C.
-  (2018) An efficient tabu search based procedure for simultaneous delivery and pick-up problem with time window. IFAC-PapersOnLine 51 (11), pp. 241–246. Cited by: §IV-E.
-  (2015) Generating new test instances by evolving in instance space. Computers & Operations Research 63, pp. 102–113. Cited by: §II-A.
-  (2006) Evolving combinatorial problem instances that are difficult to solve. Evolutionary Computation 14 (4), pp. 433–462. Cited by: §III-B1.
-  (2015) A parallel simulated annealing method for the vehicle routing problem with simultaneous pickup-delivery and time windows. Computers & Industrial Engineering 83, pp. 111–122. Cited by: §B-A, §IV-E.
-  (2012) A genetic algorithm for the simultaneous delivery and pickup problems with time window. Computers & Industrial Engineering 62 (1), pp. 84–95. Cited by: §B-A, item 2, §III-B2, §III-B2, §III-B, §IV-A, §IV-E, TABLE I.
-  (2010-07) Hydra: Automatically configuring algorithms for portfolio-based selection. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, AAAI’2010, Atlanta, GA, pp. 210–216. Cited by: Appendix D, §IV-B, §V.