1 Introduction
Metaheuristics such as evolutionary algorithms
[16, 14] represent a class of computational problem solvers subject to stochastic behavior, determined in part by the values of userdefined parameters. These parameters are responsible for determining the globallocal exploration profile, solution quality, and efficiency of the algorithm when searching for solutions in the objective space. Poor choices of parameter values can result in low performance of the method, even if the implementation is done properly, while wellchosen values can lead the algorithm to consistently return highquality solutions. Moreover, good parameter configurations are often problemdependent [6], which limits the utility of looking for onesizefitsall configurations and requires the development of efficient strategies for tuning parameters based on a limited sample of representative instances of the problem class of interest.Assuming that parameters can assume several (sometimes infinitely many) values, a possibly very large number of combinations of parameter values – called here candidate configurations, or simply configurations – can be considered for an algorithm when solving a given problem. There are two sources of random variation in the expected performance of an algorithm (equipped with a given configuration) when solving instances of a given problem class: the uncertainty due to the instance being solved, which gives rise to an
acrossinstances variance
; and the uncertainty due to the stochastic behavior of the metaheuristic itself, which results in a withininstance variance [6]. Due to these random influences on the observed performance of a given algorithm configuration, several researchers have proposed strategies for recommending candidate configurations based on statistical concepts, in a process commonly referred to as parameter tuning [15, 17].This work is focused on the application of statistical modeling to the development of tuning approaches. More specifically, we present a modular framework for implementing parameter tuning methods, which is based on concepts drawn from Sequential Model Based Optimization (SMBO) strategies [2, 22]
. The proposed framework is aimed not only at returning algorithm configurations that are welladjusted for particular problem classes, but also to provide statistical models capable of supporting further investigations on the relative relevance of algorithm parameters and interaction effects, as well as estimations of expected algorithm performance given new sets of parameter values.
The remainder of this manuscript is organized as follows. We start by formally stating the Algorithm Tuning Problem that we are attempting to solve (Section 2), and briefly reviewing the most widely used parameter tuning methods (Section 3). The proposed tuning framework is introduced in Section 4. To illustrate its use, we consider the problem of tuning six parameters from an evolutionary multiobjective optimization algorithm, and contrast the results obtained by the proposed method against those returned by Iterated Racing [27]. Finally, some conclusions and possibilities of continuity are explored in Section 6.
2 The Algorithm Tuning Problem
In this work we are interested in tuning algorithm parameters for a given problem class of interest, i.e., finding the combination of parameter values that results in the best expected performance of a given algorithm on instances belonging to a given family of problems. Here we present a formalization of this problem, based on a description originally presented by Birattari [5, 6].
Assume that we have an algorithm containing free parameters to be set by the user, and let denote a list of length containing specific parameter values for that algorithm. We refer to as a candidate configuration for the algorithm under study, with representing the set of all possible parameter configurations for that algorithm.^{2}^{2}2For the sake of simplicity, in the remainder of this work we refer to the algorithm equipped with a given set of parameter values as simply a configuration. Similarly, let denote a given problem instance belonging to a problem class of interest, denoted by . Also, let
be a random variable representing the performance
^{3}^{3}3Measured according to a given indicator of choice. In the remainder of this work, we assume the use of indicators for which larger values represent better performance. of a candidate configuration on a given instance , with denoting a statistical parameter of that can be used to quantify the general quality of configuration as a solver of instance , e.g., the mean or median of .Let denote the set of quality values of a candidate configuration for all instances belonging to problem class ; and denote a statistical parameter of which is of interest when comparing different configurations, e.g., the mean or the median performance across all instances belonging to . Under these definitions, the algorithm tuning problem tackled in this work can be defined as:
(1) 
that is, the problem of finding the configuration that maximizes the performance of a given algorithm for a given class of instances. Automated approaches for addressing this problem generally try to obtain using information from a finite subset of problem instances.
An important point to be aware of is that the instances used for tuning are usually not the ones that are relevant in practice: an underlying assumption of methods that attempt to solve the problem defined above is that the instances used for tuning can be regarded as a representative sample of the problem class of interest, and can therefore be used for modeling and inference of the expected behavior of the algorithm for that problem class.
It must also be highlighted that, under this definition of the algorithm tuning problem, the independent observations to be used in any statistical modeling or inferential procedure refer to individual estimates of performance of a given configuration on a given instance, i.e., to individual values of
. Repeated runs of a given configuration on the same instance are useful for improving the accuracy of estimates of these performance values, but cannot count as independent degreesoffreedom for the statistical procedures. Failure to account for this particular fact would result in pseudoreplication
[18, 26], a violation of the assumption of independence underlying the statistical approaches used in most tuning procedures that leads to inflated typeI errors in inferential tests, and to artificially reduced standard errors in descriptive models.
In the next section we review some of the most common approaches used to tackle the algorithm configuration problem. While in most cases the problem is not explicitly stated as above, the workings of these methods indicate that in most cases this is the problem (or at least one of the problems) they attempt to solve. After briefly discussing the existing approaches, we will present our proposed tuning framework in Section 4.
3 Overview of Parameter Tuning Methods
A variety of different tuning methods have been proposed over the years to determine the best configurations of algorithms when solving a given problem. Based on their working mechanisms and design principles, it is possible to group these methods in three major categories: racing methods, SMBO methods, and hyperheuristics. In this section we review the most widely used methods from each category.
3.1 Racing Methods
The basic concepts of racing methods were initially proposed in the machine learning literature for solving the
model selection problem [6]. The basic idea of these methods [28, 31] is that the search for the best model structure can be sped up by discarding inferior candidate models as soon as sufficient statistical evidence is gathered against them. A similar concept is used by racing methods for parameter tuning: discard candidate configurations as soon as they are detected as inferior according to some statistical criteria.The most relevant methods in this class are all based on concepts originally introduced in the form of the FRace [7]. The main concept behind FRace is to iteratively evaluate a given set of candidate configurations on a finite number of instances, gradually building statistical evidence until it is possible to conclude, at a predefined level of confidence, that one or more candidate configurations are significantly worse than the others. Once this is determined, those inferior configurations are eliminated and the process continues with the remaining ones. FRace stops when a given termination criterion is observed, e.g., the maximum computational budget is used or the number of remaining configurations falls below a given threshold. At each iteration this method employs Friedman tests [40] as their main inferential procedure, followed (if statistically significant differences are detected) by posthoc nonparametric pairwise comparisons between the estimated best configuration and all others. Configurations whose median performance is detected as significantly worse than that of the best one are discarded from the race. The FRace method then proceeds by evaluating the remaining configurations on more instances, iteratively increasing the statistical power of the tests and enabling the detection of smaller differences in median performance. The method stops when only a single configuration remains, a given number of instances have been sampled, or a predefined computational budget has been exhausted.
Improvements to FRace were proposed in the form of the Iterated FRace (I/FRace) method [1], later generalized as Iterated Racing [27]. I/FRace works by iteratively applying FRace, generating new candidate configurations at each iteration by sampling from a multivariate random distribution of parameter values that is biased by the best configurations returned in the previous iterations [8]. This biased sampling drives the search process towards obtaining candidate configurations that are similar to the best ones observed up to a given iteration. Iterated Racing allows the use of different statistical tests in place of the Friedman test, prevents premature convergence of the tuning method by means of soft restart rules, and include elitist options to force the preservation of highquality candidate configurations.
3.2 SMBO Methods
Tuning methods based on the sequential modelbased optimization (SMBO) approach are motivated by results from the literature on statistical modeling and blackbox optimization methods. From an initial set of observations of performance over the space of configurations, SMBO methods fit one or more response surfaces, which are then used to determine which new configurations should be sampled. These new results are then added to the existing sample, and used to update the response surfaces. As iterations progress, SMBO methods tend to generate models that are increasingly biased towards those regions of the parameter space which contain configurations with good performance. The three most widely known tuning methods based on SMBO are Sequential Parameter Optimization (SPO) [4], BONESA [15], and Sequential ModelBased Algorithm Configuration (SMAC) [20], which are briefly discussed below.
Sequential Parameter Optimization was proposed in 2005 [4, 3], and is based on a strategy of iteratively improving a prediction model to reveal the relationship between parameter values and algorithm performance. This model is then used to select the most promising values for the parameters. In the first iteration of SPOT a few candidate configurations are generated using Latin Hypercube Sampling (LHS) [29, 44]
over the space of algorithm parameters. These candidate configurations are evaluated on a problem instance, and this information is used to fit a statistical prediction model. The standard initial model used by SPOT is a secondorder linear regression model, but regression trees and Kriging have also been employed
[2]. Based on the candidate configuration with the best observed performance and on the response surface, new candidate configurations are generated so as to maximize the probability, conditional on the available information, that they will present good performance values. These new points are evaluated and an updated model is fit, in a process that iterates until a predefined termination criterion is reached. At each cycle, the number of evaluations of each candidate configuration on the problem instance is increased, obtaining more accurate estimations of average performance. Besides searching for the best configuration, SPOT also allows the user to analyze the variation of algorithm behavior with its parameter values using the statistical models generated, thereby enabling deeper experimental investigations and experimentdriven algorithm development.
BONESA [15] is a tuning method based on learning and searching loops. These two modules continuously exchange information as iterations progress: the learning loop uses a prediction model to compare candidate configurations, while the searching loop is responsible for sampling new candidate configurations based on the results of the learning module. The distinguishing feature of this method is its multiobjective approach: to select the best parameter values for a given problem class, BONESA uses a Pareto strength approach [15] and attempts to simultaneously maximize the performance of the algorithm for all problem instances used in the tuning effort.
In the first iteration, BONESA randomly samples a number of candidate configurations, evaluating them once for each available tuning instance. The learning loop uses this information to predict the utility values for new candidate configurations, using an approach based on the weighted average of the utilities of the nearest neighbors of the proposed configurations. These predicted utilities are then used for comparing the candidate configurations using a criterion based on Pareto dominance and an adaptation of Welch’s t test
[30]. The results of the tests are then aggregated and used to calculate the Pareto strength of each candidate configuration [15] and to generate new configurations (based on the best ones), for which the Pareto strength is also calculated. Then, those with the highest Pareto strength values are selected to compose the new set of configurations to be evaluated on the tuning instances. The method iterates until a given stop criterion is reached.Finally, the Sequential ModelBased Algorithm Configuration (SMAC) method [20, 19] was, similarly to the SPO, initially designed for tuning algorithm parameters on a single problem instance.^{4}^{4}4Both methods can, however, be adapted for tuning algorithms for problem classes.
The method generates an initial set of candidate configurations and evaluates their performance on the instance. Based on this information, it fits a predictive model of performance over the space of parameter values, and then performs a multistart search for finding the candidate configuration that maximizes an expected positive improvement function. This new candidate configuration is then evaluated and added to the pool of candidate configurations, and the process is repeated. SMAC has been used with different types of prediction model, including Gaussian Processes and Random Forests; and different search strategies, including DIRECT and CMAES.
3.3 Hyperheuristics
The term hyperheuristics
is used here to classify those tuning methods which consist in the application of metaheuristics for obtaining the best parameter values of algorithms, trying to solve the algorithm configuration problem by directly tackling its optimization formulation, discussed in Section
2. While in principle any optimization approach could be used to solve the algorithm tuning problem, knowledge about the characteristics of this problem have motivated the development of specific strategies. Two of the most common ones are REVAC [32, 33] and ParamILS [21], as presented below.Nannen and Eiben proposed a parameter tuning method for Evolutionary Algorithms called Relevance Estimation and Value Calibration (REVAC) [32, 33], which aims to answer questions related to two aspects of algorithm design and configuration: (i) which of the free parameters of a given method are in fact relevant, i.e., effectively influence the performance of the algorithm; (ii) for those parameters that are in fact relevant, which values lead to the best performance of the algorithm.
REVAC is itself configured as an evolutionary strategy. The method begins with a population of randomly generated candidate configurations, which are evaluated according to a performance function, and new candidate configurations are obtained using usual recombination and mutation operators [33]
. At each iteration the marginal probability density functions for each parameter of the algorithm are estimated from the population of candidate configurations. The Shannon entropy of these distributions is used to estimate the relevance of each parameter. Parameters for which entropy decreases quickly as iterations progress need little information to be tuned, and are therefore considered more relevant to the performance of the EA. Conversely, those for which entropy does not decrease are considered less relevant, and may be discarded or receive arbitrary values. The method iterates until predefined stop criteria are reached.
ParamILS [21] is a framework of tuning methods, which is based on Iterated Local Search (ILS). Starting from a given initial candidate configuration, at each iteration the incumbent configuration is perturbed and undergoes a first improvement local search, to generate a new candidate configuration that replaces the incumbent one if it presents better performance. The neighborhood of a given configuration is the set of all configurations that differ from it in a single parameter, and the determination of whether a candidate configuration is better than the incumbent one is performed using statistical tests, with problem instances as a blocking factor [21, 30]. Variants of this basic algorithm include [21] FocusedILS, which adaptively selects the number of training instances; and Adaptive Capping of Algorithm Runs, which controls the cutoff time for each run of the candidate configurations.
4 Proposed Tuning Framework
In this section we propose a modular structure for tackling the algorithm configuration problem presented in Section 2. The proposed framework, which we will refer to as MetaTuner, can be used to instantiate distinct tuning approaches through the adoption of specific methods for each of its components, depending on the nature of the tuning process at hand. This modular approach results not only in a greater flexibility for the framework, but is also useful for faster development and testing of proposed improvements.
The proposed approach is based on a common assumption in the design of computer experiments [38], that if the number of instances and of candidate configurations is sufficient, enough information will be gathered so that the resulting response surfaces are somehow representative of the expected performance landscape of the algorithm for the problem class of interest. Under this assumption, optimizing these surfaces will tend to drive the method towards regions of the parameter space containing good candidate configurations, allowing the method to iteratively concentrate its efforts on those regions of the parameter space with the highest average performance.
The general aspects of the proposed framework can be easily explained from the structure presented in Algorithm 1.^{5}^{5}5An opensource implementation is available in the form of R package MetaTuner:
https://github.com/fcampelo/MetaTuner The method starts by sampling a few configurations, which are evaluated on a randomly sampled initial set of tuning instances. The performance results obtained are then used for fitting a regression model of the expected performance of configurations on the problem class of interest. The regression model is then subject to perturbations (e.g., by perturbing the fitted parameters), resulting in a number of additional response surfaces. For each surface (including the unperturbed one) an optimization process is executed, returning a new candidate configuration which maximizes the value of the estimated average performance value for that model. These new candidate configurations are then evaluated on all instances sampled so far, and added to an archive. Finally, the archive is truncated to a given size, maintaining only the candidate solutions with the best expected performance value for the problem class. The whole process then iterates by sampling a few more instances (if available), and proceeds until a predefined stopping condition is reached.
In the remainder of this section we detail the implementation of an initial instantiation of the proposed framework, aimed at tuning continuous parameters.^{6}^{6}6The tuning of categorical or hierarchical parameters is not considered in the present work.
4.1 Generation of Initial Candidate Configurations
Considering the importance of gathering enough information for generating a reasonable first set of regression models, a point of particular importance is to ensure that the sampling of initial candidates (line 3 of Algorithm 1) be welldistributed in the parameter space, so that the method will have the chance to investigate different regions of the space of parameters.
There are a few strategies that guarantee a wellspread initial sampling in continuous spaces. Some of the most widely known include Latin Hypercube Sampling (LHS) [29, 45], lowdiscrepancy sequences of points (LDSP) [25], and uniform designs (UD) [36]. Since LHS is possibly the one most widely used in computational experiments [39], the version of MetaTuner described here uses this particular sampling scheme for generating its initial set of candidate configurations.
Before proceeding, it is important to understand that performance degradation can occur if the parameters being tuned can assume values on possibly very different scales – e.g., in the case of polynomial mutation [13], the rate parameter exists in the interval, while can in principle assume any nonnegative value. This is a wellknown issue in the regression and machine learning literature [43], which can be avoided when tuning numerical parameters by simply rescaling all parameters to a common scale, e.g., :
(2) 
where is the value of the th component of candidate configuration , and denote the lower and upper allowed values for the th parameter being tuned. Notice that this require all parameters to have upper and lower limits, which is generally not a problem – even for parameters that are theoretically unbounded, it is generally possible to define reasonable bounds based on theory or previous experience.
4.2 Evaluation of Candidate Configurations and Estimation of Quality Value
Given the possibly heterogeneous nature of the tuning instances and the expected variations of performance of different configurations, it is possible that the distributions of
, i.e., of the performance of candidate configurations on the instances, exist on very different scales. While some regression models, particularly quantile regression
[24], can deal with these differences of scale relatively well, most have their performance heavily degraded in the presence of such large scale differences and heterogeneity of variances. To alleviate these particular problems, the performance of candidate configurations on the tuning instances (lines 5, 10 and 20 of the algorithm) is calculated by running the configurations on the test instances and transforming the output (i.e., the value of the quality indicator used) to a common scale.Let denote a single observation of the performance of configuration on instance . The observed performance in this case is calculated by linearly scaling to the interval :
(3) 
where denote the smallest and largest values observed so far for instance , across all configurations already evaluated. Once these values are calculated for all instances visited by a given configuration , the summary performance estimator is calculated as the sample average of the
values associated with that configuration. Notice that this average can be the simple mean, trimmed mean, median, or any other indicator of location. In the version used here, the median is employed due to its robustness to outliers and distributional asymmetries, an important characteristic when dealing with possibly heterogeneous tuning instances.
Notice from Algorithm 1 that, at each iteration, configurations in the elite archive are evaluated on the new instances, while configurations that were not selected are not, even though all are used for modeling the average behavior. This is done to increase the accuracy of estimation of the average performance on the most promising configurations, and can be used, for instance, to attribute weights to each observation in the regression modeling. At each iteration, the new configurations generated by optimizing the estimated response surfaces, are also evaluated on all instances visited so far, since they are expected to yield good average performances.
Finally, it must be highlighted that the values of need to be recalculated at each iteration for all configurations, since the normalizing bounds can change across iterations.
4.3 Regression Modeling
The role of regression modeling in the proposed tuning framework is to enable predictions of the expected performance of a given configuration, conditional on its parameter values. For this, modeling strategies need ideally to be i) reasonably accurate; ii) capable of working with relatively few data points; iii) computationally inexpensive (at least relatively to the cost of evaluating the configurations); and iv) parsimonious in terms of the number of coefficients in the model. Another desirable trait is that the regression models scale reasonably well up to a reasonable number of parameters, e.g., (which is a reasonable upper limit for free parameters that are expected to be adjusted by the user).
For continuous parameters, usual models include linear regression with ordinary [30] or weighted [41] least square estimators; quantile regression [23]
; and ridge or lasso regression
[42], among others. In this work we opted for using the latter, which is briefly introduced below.4.3.1 Ridge and Lasso Regression
There are two reasons why ordinary least squares (OLS) regression is often inadequate
[42], namely prediction accuracy and interpretation. Poor prediction accuracy can be caused by a low bias and large variance of OLS regression, whereas interpretation is often challenging given the large number of coefficients often used when regressing models with more than a couple of variables.As a way to remedy both these issues, some methods employ shrinkage techniques to remove coefficients that do not contribute to the explanatory power of a given model. Shrinking a coefficients can be achieved, e.g., by including a penalty term in the problem of minimizing the least squared errors. Considering the predictor of as a linear function of the form , the problem becomes [37]:
(4) 
where is a regularization parameter, and regulates the order of the norm used for the penalization term. Two special cases of this definition are the lasso () and ridge () regressions. The minimization of (4) becomes more aggressive at shrinking coefficients towards zero (i.e., removing their associated terms from the model) as larger values of are used. This regression approach can be useful in the presence of complex models with many terms, particularly when there is a large difference in the relevance of each term, as is often the case of algorithm parameters [32, 33]. In these cases, shrinkage will reduce all coefficient values, leading those least relevant to zero and amplifying the differences between them, simplifying the model and facilitating interpretation.
4.4 Generating Perturbed Models
The generation of response surfaces to be optimized at each iteration is performed in two steps: firstly, a regression model is fit using a modeling technique of choice. Secondly, the model obtained is perturbed several times, generating new response surfaces (line 16 of Algorithm 1). To generate the perturbed models, all (nonzero) coefficients of the model are subject to uniform noise. The range of this noise is defined by the standard errors of each coefficient, which can be obtained either analytically (e.g., in the case of linear regression models using OLS) or by resampling methods.
For ridge and lasso regression models, standard errors are obtained using a leaveoneout (LOO) strategy. After fitting a model using the approach described in the previous section, all coefficients that were shrunk down to zero are removed from the model. The resulting polynomial is then used as a basis for fitting new models, each of which is fitted on a dataset obtained by ignoring the information regarding a single configuration. Based on the coefficients fit for each of these
leaveoneout models, the standard error of each coefficient is estimated as the sample standard deviation of the values obtained for that coefficient on all LOO models.
4.5 Model Optimization
Considering that the response surfaces represent preliminary models of the average behavior of the algorithm conditional on its parameter values, optimizing them should yield a set of new candidate configurations with expected good performance. As the iterations progress and more candidate configurations are evaluated on more instances, it is expected that the resulting models become increasingly accurate.
The main concept of the proposed tuning framework is to iteratively fit regression models of average behavior as a function of parameter values, using increasing amounts of information, and optimizing the resulting response surfaces (and perturbed versions of them, obtained by incorporating estimation uncertainties of the model coefficients  lines 14–18 of Algorithm 1) to search for more promising parameter values. The new candidate configurations returned by optimizing these models are evaluated on all instances already visited by the method (line 20) and added to the archive (lines 21–22).
The optimization approach to be used depends on the nature of the regression models, which may provide, e.g., analytical gradients or guarantees of unimodality. For more general or complex models, fast heuristics can be employed. In this work we opted for using NelderMead Simplex [35, 34] to optimize the response surfaces.
5 Experimental Results
To illustrate the use of the proposed approach, we performed the tuning of six parameters of the MOEA/D algorithm [47, 10] for the hypothetical problem class represented by the 2objective problems of the UF benchmark set [48, 9], with dimensions between 3 and 40. Dimensions that are multiples of were reserved for testing (91 test instances), while all others (175 instances) were available for the tuning effort. The quality of the solutions returned by the algorithm on each problem instance was computed using the IGD indicator [49].
The complete specification of the MOEA/D algorithm and its fixed and tunable parameters is shown in Table 1. A detailed description of the MOEA/D algorithm and its componentbased modeling can be found in the available literature [47, 11].
Module  Type  Parameter (value / range) 
Decomposition method  SLD  
Scalar aggregation function  PBI  
Objective scaling  None  – 
Neighborhood assignment  In the variable space  
Variation Operators  SBX recombination  
Polynomial mutation  
Simple truncation  –  
Update strategy  Restricted  
Termination criteria  Function evaluations 
For this tuning problem, the proposed framework was set up as follows: the initial sampling was performed with configurations, randomly generated using Latin Hypercube sampling and evaluated using instances randomly drawn from the tuning set. At each iteration, new configurations were generated, and new tuning instance was added to the pool. The summary function used for calculating the expected performance of each configuration was the median. Regression models were fit using Lasso regression, using a polynomial model of order as the basis.^{7}^{7}7The penalization parameter was selected automatically by cross validation. All models were fit using the implementation available in the R package hqreg [46]. Perturbed models were generated using standard errors obtained by leaveoneout resampling, and the optimization of response surfaces was performed using the NelderMead Simplex algorithm. A computational budget of runs was defined for the tuning effort.
To obtain a comparison baseline, we used the official implementation of Iterated Racing, available in R package irace [27]. The standard configuration was used, and the same computational budget was imposed. Thirty repetitions of the tuning effort were performed using the set of tuning instances defined earlier. For each repetition, the best configuration returned was used to solve the instances in the test set, and its performance on each test instance was calculated as the mean IGD of 14 runs. The resulting observations were used to compare the performance of the two parameter tuning approaches.
The proposed approach and the Iterated Racing were then compared using a paired ttest [30] at a significance level,^{8}^{8}8The test assumptions were validated using graphical analysis of the residuals. with instance as the pairing variable (repeated runs were meanaggregated to prevent pseudoreplication). No significant differences in mean performance were found (;
). This test indicates not only the lack of statistically significant differences in mean performance, but also suggests (given the width of the resulting confidence interval) that this experiment had adequate statistical power to detect any differences greater than about
in mean IGD [12].Figure 1 illustrates the estimated mean (with standard error bars) IGD values obtained by the configurations returned by each methodology on each instance. As expected, the performance of configurations returned by both methods tends to degrade as the problem dimension is increased, but both tend to follow similar patterns in all cases, which is reflected by the absence of statistically significant differences.
Figure 2 shows the variability of the parameter values (with the tuning range normalized to the interval ) of the best configurations returned by the tuning methods. Parameters , , and show a large variability within the search range defined in the tuning effort, which suggests a relative insensitivity of the MOEA/D performance to the values selected for these parameters, at least in terms of main (marginal) effects. Parameter shows a bias towards the upper half of the search range (i.e., values greater than ), while the polynomial mutation parameter shows a very strong tendency to assume values very close to the lower bound of the tuning range (i.e., values close to ), which suggests a strong effect of this parameter on the IGD performance of the MOEA/D, for the problem class used in this example. It is interesting to observe that both the proposed method and the Iterate Racing exhibit the same general trends in terms of the variability in the parameter values returned.
While no statistically significant differences have been found between the proposed method and Iterated Racing, one advantage of the approach presented in this work is the explicit statistical modeling of the expected performance of the algorithm, as a function of its parameter values, which allows us to investigate the conjectures raised by the analysis of Figure 2. Arbitrarily selecting the results of the first replicate of the proposed method as a representative example,^{9}^{9}9For reference, the configuration returned by this particular execution was , , , , , and . the lasso regression (which automatically shrinks coefficients with very small effects towards zero) returned a model of the expected IGD performance of the MOEA/D on the problem class represented by the tuning instances as:
(5) 
where represents the expected IGD of the algorithm on the problem class represented by the tuning instances. This result not only highlights the importance of the parameters identified in Figure 2 as possibly relevant, i.e., and , quantifying the effects of increasing / decreasing these parameters on the expected IGD performance; but also suggests two others ( and ) which, while not critical in terms of their main effects, present contributions to the performance of the algorithm in terms of interactions with the two most relevant parameters.
Finally, since the lasso regression [42, 46] tends to shrink coefficients towards zero based on the explanatory value of their respective terms (which is conditional on the sample used for fitting the model), it can be interesting to investigate which terms were generally preserved across repeated runs of the method. Figure 3 illustrates the model terms that appeared more frequently in the models returned by the proposed tuning method, as well as their usual (nonzero) values.
From this figure it is clear that the main effects of and were generally detected by the modeling approach as being particularly relevant, as well as and a few other interaction terms. These results point at some interesting insights regarding the influence of these parameters on the performance of the MOEA/D. For instance, larger values of (which are associated with the selection bias of the MOEA/D towards greater diversity on the space of objectives), tend to predict a significant improvement (i.e., reduction) in terms of the expected IGD. Further parametric analyses can also be performed, but this detailed exploration falls outside the scope of the current paper.
6 Conclusions
In this work we present a new parameter tuning framework based on concepts from Sequential Model Based Optimization (SMBO) methods. The proposed framework is centered on the sequential optimization of perturbed regression models of expected algorithm performance conditional on parameter values, and on the sequential evaluation of new problem instances on the most promising candidate configurations.
The proposed framework was instantiated into a method for tuning numeric parameters, and a case study was presented using the Iterated Racing approach as a comparison baseline. The results suggest that, in terms of raw performance, the proposed framework was able to return configurations that matched the performance of iterated racing, which is generally considered a particularly efficient tuning approach. Moreover, the proposed method is also designed to provide insights on the relevance of the parameters being tuned, as well as predictive models of expected performance conditional on the parameter values, which can inform further development of the algorithm or the definition of standard values for practical implementations, or suggest
While the results presented in this work suggest that the proposed approach may be an interesting alternative for parameter tuning, and possibly for the investigation of design aspects in algorithm research, further testing and development are needed to effectively establish its power and limitations. For instance, while both the lasso regression and the NelderMead Simplex algorithm used in the tuning method presented in this work can be effectively used for highdimension situations, the limitations of the proposed framework in terms of the number of parameters that can be tuned remains to be investigated. Moreover, the adaptation of the principles described here to the tuning of categorical or hierarchical parameters is also a field which we have not yet investigated, and represent possibilities of continuity for this work.
References
 [1] Balaprakash, P., Birattari, M., Stützle, T.: Improvement strategies for the FRace algorithm: Sampling design and iterative refinement. In: International workshop on hybrid metaheuristics, pp. 108–122. Springer (2007)
 [2] BartzBeielstein, T.: Sequential parameter optimization. In: Dagstuhl Seminar Proceedings 09181. Samplingbased optimization in the Presence of Uncertainty (2009)

[3]
BartzBeielstein, T., Lasarczyk, C., Zaefferer, M.: Sequential parameter
optimization.
In: Proceedings Congress on Evolutionary Computation 2005 (CEC’05). Edinburgh, Scotland (2005).
URL http://www.spotseven.de/wpcontent/papercitedata/pdf/blp05.pdf  [4] BartzBeielstein, T., Lasarczyk, C.W.G., Preuss, M.: Sequential parameter optimization. Evolutionary Computation 1, 773–780 (2005). DOI 10.1109/CEC.2005.1554761
 [5] Birattari, M.: On the estimation of the expected performance of a metaheuristic on a class of instances. how many instances, how many runs? Tech. Rep. Technical Report TR/IRIDIA/2004001., IRIDIA, Université Libre de Bruxelles, Belgium (2004)
 [6] Birattari, M.: Tuning Metaheuristics  A Machine Learning Perspective, 1st ed. SpringerVerlag Berlin Heidelberg (2005). DOI 10.100/9783642004834
 [7] Birattari, M., Stutzle, T., Paquete, L., Varrentrapp, K.: A racing algorithm for configuring metaheuristics. In: e. W. B. Langdon et al. (ed.) Proc. Genetic and Evolutionary Computation Conference, GECCO 2002, pp. 11–18. Morgan Kaufmann Publishers, San Francisco, CA (2002)
 [8] Birattari, M., Yuan, Z., Balaprakash, P., Stützle, T.: Frace and Iterated FRace: An overview. In: T. BartzBeielstein, M. Chiarandini, L. Paquete, M. Preuss (eds.) Experimental Methods for the Analysis of Optimization Algorithms, pp. 311–336. Springer (2010)
 [9] Bossek, J.: smoof: Single and multiobjective optimization test functions. The R Journal (2017). URL https://journal.rproject.org/archive/2017/RJ2017004/index.html
 [10] Campelo, F., Aranha, C.: MOEADr: ComponentWise MOEA/D Implementation (2018). URL https://cran.Rproject.org/package=MOEADr. R package version 1.1.0
 [11] Campelo, F., Batista, L.S., Aranha, C.: The MOEADr Package  A ComponentBased Framework for Multiobjective Evolutionary Algorithms Based on Decomposition. ArXiv eprints (2018). URL https://arxiv.org/abs/1807.06731 (Accepted for publication, Journal of Statistical Software).
 [12] Campelo, F., Takahashi, F.: Sample size estimation for power and accuracy in the experimental comparison of algorithms. ArXiv eprints (2018). URL https://arxiv.org/abs/1808.02997 (Under review, Journal of Heuristics).

[13]
Deb, K., Agrawal, S.: A nichedpenalty approach for constraint handling in genetic algorithms.
In: Artificial Neural Nets and Genetic Algorithms, pp. 235–243. SpringerVerlag Science + Business Media (1999)  [14] Eiben, A., Smith, J.: Introduction to Evolutionary Computing. Springer (2003)
 [15] Eiben, A.E., Smit, S.K.: Parameter tuning for configuring and analyzing evolutionary algorithms. Swarm and Evolutionary Computation 1(1), 19–31 (2011)
 [16] Gendreau, M., Potvin, J.Y. (eds.): Handbook of Metaheuristics. Springer (2010)
 [17] Hoos, H.H.: Automated algorithm configuration and parameter tuning. Springer (2012)
 [18] Hurlbert, S.H.: Pseudoreplication and the design of ecological field experiments. Ecological Monographs 54(2), 187–211 (1984)
 [19] Hutter, F., Hoos, H., LeytonBrown, K.: An evaluation of sequential modelbased optimization for expensive blackbox functions. In: Proc. Genetic and Evolutionary Computation Conference (2013)
 [20] Hutter, F., Hoos, H.H., LeytonBrown, K.: Sequential modelbased optimization for general algorithm configuration. In: Proc. 5th International Conference on Learning and Intelligent Optimization, LION’05, pp. 507–523. SpringerVerlag, Berlin, Heidelberg (2011)

[21]
Hutter, F., Hoos, H.H., LeytonBrown, K., Stutzle, T.: Paramils: an automatic
algorithm configuration framework.
Journal of Artificial Intelligence Research
36, 267–306 (2009)  [22] Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive blackbox functions. Journal of Global Optimization 13(1), 455–492 (1998)
 [23] Koenker, R.: Quantile regression. In: S. Fienberg, J. Kadane (eds.) International Encyclopedia of the Social Sciences. Gale (2000)
 [24] Koenker, R., Hallock, K.F.: Quantile regression. Journal of Economic Perspectives 15(4), 143–156 (2001)

[25]
Kuipers, L., Niederreiter, H.: Uniform distribution of sequences.
Dover Publications (2005)  [26] Lazic, S.E.: The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? Lazic BMC Neuroscience 5(11) (2010)
 [27] LópezIbáñez, M., DuboisLacoste, J., Cáceres, L.P., Birattari, M., Stützle, T.: The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives 3, 43 – 58 (2016)
 [28] Maron, O., Moore, A.W.: Hoeffding races: Accelerating model selection search for classification and function approximation. In: Advances in neural information processing systems, pp. 59–66 (1994)
 [29] McKay, M.D., Beckman, R.J., Conover, W.J.: A comparison of three methods for selecting value of input variables in the analysis of output from a computer code. Technometrics 21(2), 239–245 (1979)
 [30] Montgomery, D.C.: Design and Analysis of Experiments, 5th ed. John Wiley & Sons, New York, NY (2012)
 [31] Moore, A., Lee, M.S.: Efficient algorithms for minimizing cross validation error. In: Proc. 11th International Conference on Machine Learning. Morgan Kaufmann (1994)
 [32] Nannen, V., Eiben, A.E.: A method for parameter calibration and relevance estimation in evolutionary algorithms. GECCO’06  Genetic and Evolutionary Computation Conference (2006)
 [33] Nannen, V., Eiben, A.E.: Relevance estimation and value calibration of evolutionary algorithm parameters. International Joint Conference on Artificial Intelligence (2007)
 [34] Nash, J.C., Varadhan, R., Brothedieck, G.: GeneralPurpose Optimization (2017). URL http://stat.ethz.ch/Rmanual/Rdevel/library/stats/html/optim.html
 [35] Nelder, J.A., Mead, R.: A simplex method for function minimization. The Computer Journal 7(4), 308–313 (1965)
 [36] Ning, J.H., Fang, K.T., Zhou, Y.D.: Uniform design for experiments with mixtures. Communications in Statistics  Theory and Methods 40(10), 1734–1742 (2011)

[37]
Owen, A.B.: A robust hybrid of lasso and ridge regression.
Tech. rep., Stanford University (2006)  [38] Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer experiments. Statistical Science 4(4), 409–423 (1989)
 [39] Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Experiments. Springer New York (2003)
 [40] Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 5th ed. Chapman & Hall/CRC (2011)
 [41] Strutz, T.: Data fitting and uncertainty: A practical introduction to weighted least squares and beyond. Vieweg and Teubner (2010)
 [42] Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)
 [43] Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2016)
 [44] Wyss, G.D., Jorgensen, K.H.: A User’s Guide to LHS: Sandia’s Latin Hypercube Sampling Software. Risk Assessment and Systems Modeling Department  Sandia National Laboratories (1998)
 [45] Ye, K.Q.: Orthogonal column latin hypercubes and their application in computer experiments. Journal of the American Statistical Association 93(444), 1430–1439 (1998)
 [46] Yi, C.: hqreg: Regularization Paths for Lasso or ElasticNet Penalized Huber Loss Regression and Quantile Regression (2017). URL https://CRAN.Rproject.org/package=hqreg. R package version 1.4
 [47] Zhang, Q., Li, H.: MOEA/D: A multiobjective evolutionary algorithm based on decomposition. IEEE Trans. Evolutionary Computation 11(6), 712–731 (2007)
 [48] Zhang, Q., Zhou, A., Zhao, S., Suganthan, P., Liu, W., Tiwari, S.: Multiobjective optimization test instances for the CEC 2009 special session and competition. Tech. Rep. Tech. Rep. CES887, University of Essex, UK (2008). URL http://dces.essex.ac.uk/staff/zhang/moeacompetition09.htm. (revised on 20/04/2009)
 [49] Zitzler, E., Thiele, L., Deb, K.: Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary Computations 8(2), 173–195 (2000)
Comments
There are no comments yet.