The performance of many algorithms is highly dependent on tuned parameter configurations made with regards to the user’s preferences or performance criteria (e.g. quality of the solution obtained in a given CPU cost, the smallest CPU cost to reach a given solution quality, the probability to reach a given quality, with given thresholds, etc.). This configuration task can be considered as a second layer optimisation problem 
relevant in the fields of optimisation, machine learning and AI in general. It is a field of study that is increasingly critical as the prevalence of the application of such methods is expanded. Over the years, a range of automatic parameter tuners have been proposed, thus leaving the configuration to a computer rather than manually searching for performance-optimised settings across a set of problem instances. These tuning environments can save time and achieve better results.
Among such automated algorithm configuration (AAC) tools, we cite GGA , ParamILS , SPOT  and irace . These methods have been successfully applied to (pre-tuned) state-of-the-art solvers of various problem domains, such as mixed integer programming , AI planning , machine learning , or propositional satisfiability solving . Figure 1 illustrates the abstract standard architecture adopted by these tools.
However, the outcomes of these tools are static (or feature-independent), which means an algorithm configuration derived by any of these tools is not changed depending on an instance of a target optimisation problem. This leads to a significant issue as theoretical and empirical studies on various algorithms and problems have shown that parameters of an algorithm are highly dependent on features of a specific instance of a target problem  such as the problem size [7, 33].
A possible solution to this issue is to cluster problem instances into multiple sub-groups by their size (and other potential features), then use curve fitting to map features to parameters . A similar approach is also found in  that first partitions problem instances based the values of their landscape features and selects an appropriate configuration of a new problem instance based on its closeness to the partitions. However, the former approach does not scale well to multiple features and parameters, and the latter faces over-fitting issues due to the nature of the partitioning approach, making it difficult to assign an unseen instance to a specific group.
are examples of model-based tools that consider instance features to define parameter values by applying machine learning techniques to build the model. However, an issue of these approaches is the low explainability of the outcome. For instance, while machine learning techniques such as random forest and neural networks can be used to map the parameters to problem features with a high accuracy, they are considered as black-boxes, i.e., the outcome is virtually impossible to understand or interpret. Explainability is an important concept, as not only it allows us to understand the relationships between input and output, but in the context of parameter tuning, it can provide an outcome that can be used to inspire fundamental research [16, 17].
To tackle these issues, we propose an offline algorithm tuning approach that extracts relationships between problem features and algorithm parameters using a genetic programming algorithm framework. We will refer to this approach as MATE, which stands for Model-based Algorithm Tuning Engine. The main contributions in this work are as follows:
We formulate the model-based parameter tuning problem as a symbolic regression problem, where knowledge about the problem is taken into account in the form of problem features;
We implement an efficient Genetic Programming (GP) algorithm that configures parameters in terms of problem features; and
In our empirical investigation, we rediscover asymptotically-correct theoretical results for two algorithms (1+1-EA and RLS) and four problems (OneMax, LeadingOnes, BinValue and Jump). In these experiments, MATE shows its potential in algorithm parameter configuration to produce models based on instance features.
Several methods have tried to tackle the dependence between the problem features and the algorithm parameters. The Per Instance Algorithm Configuration (PIAC) , for example, can learn a mapping between features and best parameter configuration, building an Empirical Performance Model (EPM) that predicts the performance of the algorithm for sample (instance, algorithm/configuration) pairs. PIAC methodology has been applied for several combinatorial problems [19, 34, 24] and continuous domains .
Sequential Model-based Algorithm Configuration (SMAC)  is also an automated algorithm configuration tool which considers a model, usually a random forest, to design the relationship between a performance metric (the algorithm runtime) and parameters values. Besides, SMAC can also include problem features within the tuning process as a subset of input variables. Differently from PIAC, instance features are only considered for training instances. SMAC presents multiple drawbacks as it is restricted to runtime as the performance metric . In addition, the method to extract instance features is not implemented for all NP-hard problems .
Table 1 presents a summary for some state-of-the-art methods including the approach proposed in this paper. The term feature-independent means that the corresponding approach does not consider instance features. Model-based approaches use a trained model (e.g. machine learning, regression, etc.) to design parameters configuration. Model-free approaches generally rely on an experimental design methodology or optimisation method to find parameter settings of an algorithm that optimise a cost metric on a given instance set.
|GGA||Genetic Algorithm||Feature-independent, model-free|||
|ParamILS||Iterated Local Search||Feature-independent, model-free|||
|irace||Racing procedure||Feature-independent, model-free|||
|SPOT||classical regression, tree-based, random forest and Gaussian process||Feature-independent, model-based|||
|PIAC||Regression methods||Feature-dependent, model-based|||
|SMAC||Random Forest||Feature-dependent, model-based|||
|MATE||Genetic Programming||Feature-dependent, model-based, explainable|
The main differences between MATE and the other related approaches are:
The framework is designed to cover any algorithm, suitable for both continuous and combinatorial optimisation;
A transparent machine learning method (GP) is utilised to ensure that the outcome is explainable and easily reusable, deducing human-readable configurations (in contrast to, e.g., random forests, neural networks, etc.); and
The training phase is done on one specific algorithm and one specific problem in our approach - the model is less instance-focused but more problem-domain focused (by abstracting via the use of features). For example, the AAC experiments behind [16, 17]) have guided the creation of new heavy-tailed mutation operators that were beating the state-of-the-art. Similarly, the AAC and PIAC experiments in  showed model dependencies on easily-deducible instance features.
Lastly, our present paper is much aligned with the recently founded research field “Data Mining Algorithms Using/Used-by Optimisers (DUO)” . There, data miners can generate models explored by optimisers; and optimisers can adjust the control parameters of a data miner.
3 The MATE Framework
3.1 Problem Formulation and Notation
Let us denote an optimisation problem by whose instances are characterised by the problem-specific features . A target algorithm with its parameters is given to address the problem . A set of instances of the problem and a matrix , whose element value represents the th feature value of the th problem instance, are given.
Under this setting, we define the model-based parameter tuning problem as the problem of deriving a list of mappings where each mapping , which we will refer to as a parameter expression, returns a value for the parameter given feature values of an instance of the problem . Specifically, the objective of the problem is to find a parameter expression set , such that the performance of the algorithm across all the problem instances in is optimised.
3.2 Architecture Overview
In this section, we introduce our approach for parameter tuning based on the problem features. Figure 2 illustrates the architecture of the MATE tuning engine. In contrast to static methods, we consider the features of the problem. These feature are to be used in the training phase in addition to the instances, the target algorithm and the parameter specifications. Once the training is finished, the model can be used on unseen instances to find the parameters of the algorithm in terms of the problem feature values of the instance.
For example, a desired outcome of applying the MATE framework can be:
Mutation probability of an evolutionary algorithm in terms of the problem size;
Perturbation strength in an iterated local search algorithm in terms of the ruggedness of the instance and the problem size; and
Population size of an evolutionary algorithm in terms of the problem size.
Note that all the examples include the problem size as a problem feature. In both theory and practice, the problem size is among the most important problem features, and it is usually known prior to the optimisation, without any need for a pre-processing step. More importantly, an extensive number of theoretical studies showed that the optimal choice of parameters is usually expressed in terms of the problem size (see, e.g. [7, 11, 33]).
3.3 The Tuning Algorithm
We use a tree-based Genetic Programming system as the tuning algorithm. It starts with a random population of trees, where each tree represents a potential parameter expression. The population is then evolved to optimise the the GP score expressed in the following paragraph. Without loss of generality, we assume that the target problem is always a maximisation problem.
3.3.1 The Score Function and Bias Reduction
The score function is expressed as the weighted sum of the obtained objective values on each instance in the training set . Using the notations previously introduced, the score is defined in Equation (1):
is the GP score function,
is a function measuring the goodness of applying the algorithm with the parameter values to instance ,
is the best known objective value for instance .
The weights are used as a form of normalisation to reduce the bias some instances might induce. A solution to address this issue would be to use the optimal value or a tight upper bound. However, since we assume the such values are unknown (the problem itself can be unknown), we use the best known objective value () as a reference instead. In order to always ensure that score is well contained, the the reference values are constantly updated whenever possible during the tuning process.
3.3.2 Replacement Strategy – Statistical Significance and Bloat Control
As the target algorithm can be stochastic, it is mandatory to perform multiple runs to ensure statistical significance (refer to Table 3). Thus, the replacement of trees is done based on the Wilcoxon rank-sum test.
Another aspect to take into account during the replacement process is bloat control. In our implementation, we use a simple bloat minimisation method based on the size of tree (number of nodes).
Given a newly generated tree (), we compare it against each tree () in the current population starting from the ones with the lowest scores using the following rules:
If is deemed to be significantly better than (using the Wilcoxon test). then we replace with irrespective of the sizes.
If there is no statistical significance between and , but has a smaller size than , then we replace with .
Otherwise, we do not perform the replacement.
4 Computational Study
4.0.1 Experimental Setting
To evaluate our framework, we consider two target algorithms, the (1+1) EA() and RLS(). The (1+1) EA() is a simple hill-climber which uses standard bit mutation with mutation rate . RLS() differs from the (1+1) EA() only in that it uses the mutation operator that always flips uniformly chosen, pairwise different bits. That is, the mutation strength
is deterministic in RLS, whereas it is binomially distributed in case of the (1+1) EA(), Bin, where is the number of bits.
Applying the MATE framework, the two algorithms are configured for the four different problems with different time budgets as summarised in Table 2. In the table, the features of the problems used to tune the algorithm parameters and the different feature values chosen to generate problem instances of the problems are also presented. These problems have been chosen because they are among the best studied benchmark problems in the theory of evolutionary algorithms . The details of our GP implementation for the experiments are presented in Table 3.
|OneMax()||: number of bits|
|BinValue()||: number of bits|
|LeadingOnes()||: number of bits|
: width of region with bad fitness values
: number of bits
|Number of GP generations|
|Initialisation||grow () and full () methods|
|Mutation operator||random mutations|
|Crossover operator||sub-tree gluing|
|Number of independent runs of target algorithm|
|-value for the Wilcoxon ranksum test|
It is worth noting that we are focusing in this paper on tuning algorithms with a single parameter. This is done to deliver a first prototype that is validated on algorithms and problems extensively studied by the EA theory community. An extension to tuning several algorithm parameters forms an important direction for future work.
For example, given a budget of , it is known that the (1+1)EA optimises the OneMax function as well as any other linear functions with a decent probability. It is also known that the is asymptotically optimal . Note, though, that such fixed-budget results are still very sparse , since the theory of EA community largely focuses on expected optimisation times. Since these can nevertheless give some insight into the optimal parameter settings, we note the following:
OneMax and BinValue: the (1+1)EA optimises every linear function in expected time , and no parameter configuration has smaller expected running time, apart from possible lower order terms . For RLS, it is not difficult to see that yields an expected optimisation time of , and that this is the optimal (static) mutation strength;
LeadingOnes: on average, RLS(1) needs steps to optimise LeadingOnes. This choise also minimises the expected optimisation time. For the (1+1) EA, minimises the expected optimisation time, which is around for this setting . The standard mutation rate requires evaluations, on average, to locate the optimum, of the LeadingOnes function. For LeadingOnes, it is known that the optimal parameter setting drastically depends on the available budget. This can be inferred from the proofs in [7, 13]; and
Jump: mutation rate minimises the expected optimisation time of the (1+1) EA on Jump, which is nevertheless .
4.0.2 Training Phase
The experimental study is conducted by running MATE ten times on each algorithm, problem and budget combination (refer to Table 4 for the list of budgets). This results in an elite population of individuals for each setting, from which we select the best expressions. These results are then merged and, based on the frequency of each expression, the best expressions are selected.
In the current implementation, expression types (integers and non-integers) are not taken into account during the evolution. Therefore, the resulting expressions are converted into integers in the case of RLS by merging all real numbers using (e.g. will be replaced by ). On the other hand, expressions are simplified for EA by eliminating additive constants (e.g. is replaced by ).
4.0.3 Evaluation Phase I
To assess the performance of MATE, we evaluate for each (problem, budget) combination each of the top most frequent expressions, by running them independent times on each training dimension. We then normalise the outputs as in Equation (1). The results are shown in the box plots in Table 4.
The -axis show the best found expressions with its frequency between square brackets, and the -axis represents the normalised fitness.
Comparison amongst the top 3 configurations. When comparing the top 3 ranked configurations, we observe the following from Table 4 while we compare medians.
OneMax: For (1+1) EA, , which ranked second for budgets and and first for budget performs better than ; while for RLS, the expression appears at least on , providing the best results;
BinValue: represents on for (1+1) EA experiments, and a similar performance with and ; while on case the expression provides better results than and ; on the same way the expression corresponds to of the cases on RLS with the budget of with a better performance than and ;
LeadingOnes: is the most frequent expression among all considered budgets on (1+1) EA and presents the best performance amongst the top 3 expressions for all budget cases; represents on RLS cases with iterations and performs better than and for both considered budgets.
Jump: and present similar results for both budget cases; appears on and of the cases on (1+1) EA on the considered budgets respectively, and performs worse than the other two configurations; for RLS experiments is the most frequent expression and performs better than and .
Comparison of top 3 configurations against other parameter settings. For a fair assessment of our results, we add to this comparison some expressions that were not ranked in the top 3. These are with for (1+1) EA for OneMax and LeadingOnes. For readability purposes, the top expressions are complemented with of these additional expressions in the same order they are shown. We can observe that these additional expressions present low frequencies, being the highest case with with the budget , while expressions and are the lowest cases among the considered budgets. Note that the frequencies do not necessarily sum up to as other expressions not reported here might occur.
Comparison with theoretical results. As we have mentioned in the beginning of this section, one should be careful when comparing theoretical results that have been derived either in terms of running time or in terms of asymptotic convergence analysis, as typically done in runtime analysis. It is well known that optimal parameter settings for concrete (typically, comparatively small) dimensions can be different from the asymptotically optimal ones [8, 9]. We nevertheless see that those configurations that minimise the expected running times (again, in the classical, asymptotic sense) also show up in the top 3 ranked configurations. We highlight the asymptotically optimal best possible running time by an asterix * in Table 4. Budgets exceeding this bound are marked by two asterix **. As for the individual problems, we note the following:
OneMax: It is interesting to note here that the performance is not monotonic in , i.e., performs worse than and . This is caused by a phenomenon described in [10, Section 4.3.1], which states that, regardless of the starting point, the expected progress is always maximised by an uneven mutation strength. MATE correctly identifies this and suggests uneven mutation strengths in almost all cases.
BinValue: We observe that it is very difficult here to distinguish the performance of the different configurations. This is in the nature of BinValues, as setting the first bit correctly already ensures 50% of the optimal fitness values. We very drastically see this effect in the recommendation to use for the RLS cases. With this configuration, the algorithm evaluates only two points: the random initial point and its pairwise complement , regardless of the budget. As can be seen in Table 4, the performance of this simple strategy is quite efficient, and hard to beat
LeadingOnes: As mentioned earlier, for the (1+1) EA, the optimal mutation rate in terms of minimising the expected running time is around . We see that , which did not show in the top 3 ranked configurations performs better than any of the suggestions by MATE.
Jump: as discussed, mutation rate minimises the expected optimisation time. MATE recognises it as a good configuration in some of the runs. However, we see that , which equals for 5 out of our 12 training sets, shows comparable performance, and in the budget case even slightly better performance.
4.0.4 Evaluation Phase II
To properly asses the performance of MATE, we conducted experiments for OneMax and LeadingOnes instances of larger sizes that were not considered in the training phase. The goal of this experiment is to empirically demonstrate that our approach generalises well for large and unseen instances. These results are presented in Table 5 where runs were performed for for OneMax with and LeadingOnes with . We can observe the following:
There is less overlapping amongst the confidence intervals especially for smaller budgets, which means there is a higher level of separability amongst the performances of the different expressions.
By comparing these results with the ones from Table 4, we can observe that the results of the top 3 expressions on large instances are statistically better in the majority of cases.
OneMax: For (1+1) EA, in contrast to the results in Table 4 where and show a similar performance, here performs better than the other expressions. For RLS, the best performing expression is , which was ranked first.
LeadingOnes: For (1+1) EA the best expressions are , which was ranked second, and , which was not ranked among the top 3 expressions. For RLS, , ranked first and second, is the best performing expression.
5 Conclusions and Future Directions
With this article, we have presented MATE as a model-based algorithm tuning engine, which produces human-readable models that map feature values of problem instances to algorithm parameters. Our experiments showed that our first prototype implementation can find asymptotic relationships between the feature values and algorithm parameters, known from EA theory.
In the future, we intend to explore, among other, the following two avenues. First, the design of MATE itself will be subject to extensions, e.g. to better handle performance differences between instances via ranks or racing. Second, while our proof-of-concept study here was motivated by theoretical insights, we will investigate more realistic problems for which instance features are readily available, such as, NK-landscapes, the travelling salesperson problem, or the knapsack problem.
-  (2018) Better software analytics via "duo": data mining algorithms using/used-by optimizers. External Links: Cited by: §2.
-  (2009) A gender-based genetic algorithm for the automatic configuration of algorithms. In International Conference on Principles and Practice of Constraint Programming, pp. 142–157. Cited by: §1, §1, Table 1.
-  (2020) High-performance simulation-based optimization. Springer. Cited by: §2.
-  (2010) SPOT: a toolbox for interactive and automatic tuning in the r environment. In Proceedings, Vol. 20, pp. 264–273. Cited by: §1, Table 1.
-  (2016) Feature based algorithm configuration: a case study with differential evolution. In Proc. of Parallel Problem Solving from Nature (PPSN’16), pp. 156–166. Cited by: §1.
Per instance algorithm configuration of cma-es with limited budget.
Proc. of Genetic and Evolutionary Computation Conference, GECCO ’17, pp. 681–688. External Links: Cited by: §2.
-  (2010) Optimal fixed and adaptive mutation rates for the LeadingOnes problem. In Proc. of Parallel Problem Solving from Nature (PPSN’10), LNCS, Vol. 6238, pp. 1–10. Cited by: §1, §3.2, 2nd item.
-  (2019) Maximizing drift is not optimal for solving onemax. In Proc. of Genetic and Evolutionary Computation Conference (GECCO’19), pp. 425–426. Note: Full version available at http://arxiv.org/abs/1904.07818 Cited by: §4.0.3.
Fitness probability distribution of bit-flip mutation. Evolutionary Computation 23 (2), pp. 217–248. Cited by: §4.0.3.
-  (2020) Optimal parameter choices via precise black-box analysis. Theor. Comput. Sci. 801, pp. 1–34. Cited by: 1st item.
-  (2017) Fast genetic algorithms. In Proc. of Genetic and Evolutionary Computation Conference (GECCO’17), pp. 777–784. Cited by: §1, §3.2, 3rd item.
-  (2020) Theory of evolutionary computation—recent developments in discrete optimization. Springer. Cited by: §4.0.1.
Analyzing randomized search heuristics via stochastic domination. Theor. Comput. Sci. 773, pp. 115–137. Cited by: 2nd item.
-  (2018) Efficiently solving the traveling thief problem using hill climbing and simulated annealing. Information Sciences 432, pp. 231–244. Cited by: §1.
-  (2011) Fd-autotune: domain-specific configuration using fast downward. In ICAPS 2011 Workshop on Planning and Learning, pp. 13–17. Cited by: §1.
Heavy-tailed mutation operators in single-objective combinatorial optimization. In Parallel Problem Solving from Nature – PPSN XV, A. Auger, C. M. Fonseca, N. Lourenço, P. Machado, L. Paquete, and D. Whitley (Eds.), pp. 134–145. External Links: Cited by: §1, item 3.
-  (2018) Escaping large deceptive basins of attraction with heavy-tailed mutation operators. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO ’18, pp. 293–300. External Links: Cited by: §1, item 3.
-  (2012) Programming by optimization.. Commun. ACM 55 (2), pp. 70–80. Cited by: §1.
-  (2006) Performance prediction and automated tuning of randomized and parametric algorithms. In International Conference on Principles and Practice of Constraint Programming, pp. 213–228. Cited by: §2.
ParamILS: an automatic algorithm configuration framework.
Journal of Artificial Intelligence Research36, pp. 267–306. Cited by: §1, Table 1.
-  (2010) Automated configuration of mixed integer programming solvers. In International Conference on Integration of Artificial Intelligence (AI) and Operations Research (OR) Techniques in Constraint Programming, pp. 186–202. Cited by: §1.
-  (2011) Sequential model-based optimization for general algorithm configuration. In International conference on learning and intelligent optimization, pp. 507–523. Cited by: §1, Table 1, §2.
-  (2017) The configurable SAT solver challenge (CSSC). Artificial Intelligence 243, pp. 1–25. Cited by: §1.
-  (2014) Algorithm runtime prediction: methods & evaluation. Artificial Intelligence 206, pp. 79–111. Cited by: §2.
-  (2020) Analysing stochastic search heuristics operating on a fixed budget. In Theory of Evolutionary Computation: Recent Developments in Discrete Optimization, B. Doerr and F. Neumann (Eds.), pp. 249–270. External Links: Cited by: §4.0.1.
-  (2015) Fixed budget performance of the (1+1) EA on linear functions. In Proc. of the 2015 ACM Conference on Foundations of Genetic Algorithms (FOGA’15), pp. 52–61. Cited by: §4.0.1.
-  (2002) Learning the empirical hardness of optimization problems: the case of combinatorial auctions. In International Conference on Principles and Practice of Constraint Programming, pp. 556–572. Cited by: §1, Table 1, §2.
-  (2017) Towards landscape-aware automatic algorithm configuration: preliminary experiments on neutral and rugged landscapes. In European Conference on Evolutionary Computation in Combinatorial Optimization, pp. 215–232. Cited by: §1.
-  (2016) The irace package: iterated racing for automatic algorithm configuration. Operations Research Perspectives 3, pp. 43–58. Cited by: §1, Table 1.
-  (2020) Explainable ai: from black box to glass box. Journal of the Academy of Marketing Science 48 (1), pp. 137–141. Cited by: §1.
-  (2012) Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cited by: §1.
-  (2019) Predicting good configurations for github and stack overflow topic models. In Proceedings of the 16th International Conference on Mining Software Repositories, MSR ’19, pp. 84–95. Cited by: item 3.
-  (2013) Tight bounds on the optimization time of a randomized search heuristic on linear functions. Combinatorics, Probability & Computing 22, pp. 294–318. Cited by: §1, §3.2, 1st item.
-  (2008) SATzilla: portfolio-based algorithm selection for SAT. Journal of artificial intelligence research 32, pp. 565–606. Cited by: §2.