Introduction
Many algorithms in the field of artificial intelligence rely crucially on good parameter settings to yield strong performance; prominent examples include solvers for many hard combinatorial problems (e.g., the propositional satisfiability problem SAT
[Hutter et al.2017] or AI planning [Fawcett et al.2011]) as well as a wide range of machine learning algorithms (in particular deep neural networks
[Snoek, Larochelle, and Adams2012] and automated machine learning frameworks [Feurer et al.2015]). To overcome the tedious and errorprone task of manual parameter tuning for a given algorithm , algorithm configuration (AC) procedures automatically determine a parameter configuration of with low cost (e.g., runtime) on a given benchmark set. General algorithm configuration procedures fall into two categories: modelfree approaches, such as ParamILS [Hutter et al.2009], irace [LópezIbáñez et al.2016] or GGA [Ansótegui, Sellmann, and Tierney2009], and modelbased approaches, such as SMAC [Hutter, Hoos, and LeytonBrown2011] or GGA++ [Ansótegui et al.2015].Even though modelbased approaches learn to predict the cost of different configurations on the benchmark instances at hand, so far all AC procedures start their configuration process from scratch when presented with a new set of benchmark instances. Compared with the way humans exploit information from past benchmark sets, this is obviously suboptimal. Inspired by the human ability to learn across different tasks, we propose to use performance measurements for an algorithm on previous benchmark sets in order to warmstart its configuration on a new benchmark set. As we will show in the experiments, our new warmstarting methods can substantially speed up AC procedures, by up to a factor of . In our experiments, this amounts to spending less than 20 minutes to obtain comparable performance as could previously be obtained within two days.
Preliminaries
Algorithm configuration (AC).
Formally, given a target algorithm with configuration space
across problem instances, as well as a cost metric to be minimized, the algorithm configuration (AC) problem is to determine a parameter configuration with low expected cost on instances drawn from :(1) 
In practice, is typically approximated by a finite set of instances drawn from . An example AC problem is to set a SAT solver’s parameters to minimize its average runtime on a given benchmark set of formal verification instances. Througout the paper, we refer to algorithms for solving the AC problem as AC procedures. They execute the target algorithm with different parameter configurations on different instances and measure the resulting costs .
Empirical performance models (EPMs).
A core ingredient in modelbased approaches for AC is a probabilistic regression model that is trained based on the cost values observed thus far and can be used to predict the cost of new parameter configurations on new problem instances
. Since this regression model predicts empirical algorithm performance (i.e., its cost), it is known as an empirical performance model (EPM; leytonbrownacm09a,hutteraij14a leytonbrownacm09a,hutteraij14a). Random forests have been established as the bestperforming type of EPM and are thus used in all current modelbased AC approaches.
For the purposes of this regression model, the instances are characterized by instance features. These features reach from simple ones (such as the number of clauses and variables of a SAT formula) to more complex ones (such as statistics gathered by briefly running a probing algorithm). Nowadays, informative instance features are available for most hard combinatorial problems (e.g., SAT [Nudelman et al.2004], mixed integer programming [Hutter et al.2014b], AI planning [Fawcett et al.2014], and answer set programming [Hoos, Lindauer, and Schaub2014]).
Modelbased algorithm configuration.
The core idea of sequential modelbased algorithm configuration is to iteratively fit an EPM based on the cost data observed so far and use it to guide the search for wellperforming parameter configurations. Algorithm 1 outlines the modelbased algorithm configuration framework, similarly as introduced by hutterlion11a (hutterlion11a) for the AC procedure SMAC, but also encompassing the GGA++ approach by ansoteguiijcai15a (ansoteguiijcai15a). We now discuss this algorithm framework in detail since our warmstarting extensions will adapt its various elements.
First, in Line 1 a modelbased AC procedure runs the algorithm to be optimized with configurations in a socalled initial design, keeping track of their costs and of the best configuration seen so far (the socalled incumbent). It also keeps track of a runhistory , which contains tuples of the cost obtained when evaluating configuration on instance . To obtain good anytime performance, by default SMAC only executes a single run of a userdefined default configuration on a randomlychosen instance as its initial design and uses as its initial incumbent . GGA++ samples a set of configurations as initial generation and races them against each other on a subset of the instances.
In Lines 25, the AC procedure performs the modelbased search. While a userspecified configuration budget (e.g., number of algorithm runs or wallclock time) is not exhausted, it fits a randomforestbased EPM on the existing cost data in (Line 3), aggregates the EPM’s predictions over the instances in order to obtain marginal cost predictions for each configuration and then uses these predictions in order to select a set of promising configurations to challenge the incumbent (Line 4) (SMAC) or to generate wellperforming offsprings (GGA++). For this step, a socalled acquisition function trades off exploitation of promising areas of the configuration space versus exploration of areas for which the model is still uncertain; common choices are expected improvement [Jones, Schonlau, and Welch1998], upper confidence bounds [Srinivas et al.2010] or entropy search [Hennig and Schuler2012].
To determine a new incumbent configuration , in Line 5 the AC procedure races these challengers and the current incumbent by evaluating them on individual instances and adding the observed data to . Since these evaluations can be computationally costly the race only evaluates as many instances as needed per configuration and terminates slow runs early [Hutter et al.2009].
Warmstarting Approaches for AC
In this section, we discuss how the efficiency of modelbased AC procedures (as described in the previous section) can be improved by warmstarting the search from data generated in previous AC runs. We assume that the algorithm to be optimized and its configuration space is the same in all runs, but the set of instances can change between the runs. To warmstart a new AC run, we consider the following data from previous AC runs on previous instance sets :

Sets of optimized configurations found in previous AC runs on —potentially, multiple runs were performed on the same instance set to return the result with best training performance such that contains the final incumbents from each of these runs;

We denote the union of previous instances as for set superscripts .

Runhistory data of all AC runs on previous instance sets . ^{1}^{1}1If the set of instances and the runhistory are not indexed, we always refer to the ones of the current AC run.
To design warmstarting approaches, we consider the following desired properties:

When the performance data gathered on previous instance sets is informative about performance on the current instance set, it should speed up our method.

When said performance data is misleading, our method should stop using it and should not be much slower than without it.

The runtime overhead generated by using the prior data should be fairly small.
In the following subsections, we describe different warmstarting approaches that satisfy these properties.
Warmstarting Initial Design (INIT)
The first approach we consider for warmstarting our modelbased AC procedure is to adapt its initial design (Line 1 of Algorithm 1) to start from configurations that performed well in the past. Specifically, we include the incumbent configurations from all previous AC runs as well as the userspecified default .
Evaluating all previous incumbents in the initial design can be inefficient (contradicting Property 3), particularly if they are very similar. This can happen when the previous instance sets are quite similar, or when multiple runs were performed on a single instance set.
To obtain a complementary set of configurations that covers all previously optimized instances well but is not redundant, we propose to use a two step approach. First, we determine the best configuration for each previous .
(2) 
Secondly, we use an iterative, greedy forward search to select a complementary set of configurations across all previous instance sets—inspired by the perinstance selection procedure Hydra [Xu, Hoos, and LeytonBrown2010]. Specifically, for the second step we define the mincost of a set of configurations on the union of all previous instances as
(3) 
start with , and at each iteration, add the configuration to that minimizes . Because is a supermodular set function this greedy algorithm is guaranteed to select a set of configurations whose mincost is within a factor of of optimal among sets of the same size [Krause and Golovin2012].
Since we do not necessarily know the empirical cost of all on all , we use an EPM
as a plugin estimator to predict these costs. We train this EPM on all previous runhistory data
. In order to enable this, the benchmark sets for all previous AC runs have to be characterized with the same set of instance features.In SMAC, we use this set of complementary configurations in the initial design using the same racing function as in comparing challengers to the incumbent (Line ) to obtain the initial incumbent; to avoid rejecting challengers too quickly, a challenger is compared on at least instances before it can be rejected. In GGA++, these configurations can be included in the first generation of configurations.
DataDriven ModelWarmstarting (DMW)
Since modelbased AC procedures are guided by their EPM, we considered to warmstart this EPM by including all cost data gathered in previous AC runs as part of its training data. In the beginning, the predictions of this EPM would mostly rely on , and as more data is acquired on the current benchmark this would increasingly affect the model.
However, this approach has two disadvantages:

When a lot of warmstarting data is available it requires many evaluations on the current instance set to affect model predictions. If the previous data is misleading, this would violate our desired Property 2.

Fitting the EPM on will be expensive even in early iterations, because will typically contain many observations. Even by using SMAC’s mechanism to invest at least the same amount of time in Lines 3 and 4 as in Line 5, in preliminary experiments this slowed down SMAC substantially (violating Property 3).
For these two reasons, we do not use this approach for warmstarting but propose an alternative. Specifically, to avoid the computational overhead of refitting a very large EPM in each iteration, and to allow our model to discard misleading previous data, we propose to fit individual EPMs for each once and to combine their predictions with those of an EPM fitted on the newly gathered cost data . This relates to stacking in ensemble learning [Wolpert1992]; however in our case, each constituent EPM is trained on a different dataset. Hence, in principle we could even use different instance features for each instance set.
To aggregate predictions of the individual EPMs, we propose to use a linear combination:
(4) 
where
are weights fitted with stochastic gradient descent (SGD) to minimize the combined model’s root mean squared error (RMSE). To avoid overfitting of the weights, we randomly split the current
into a training and validation set (), use the training set to fit , and then compute predictions of and each on the validation set, which are used to fit the weights . Finally, we refit the EPM on all data in to obtain a maximally informed model.In the beginning of a new AC run, with few data in , will not be very accurate, causing its weight to be low, such that the previous models will substantially influence the cost predictions. As more data is gathered in , the predictive accuracy of will improve and the predictions of the previous models will become less important.
Besides weighting based on the accuracy of the individual models, the weights have the second purpose of scaling the individual model’s predictions appropriately: these scales reflect the different hardnesses of the instance sets they were trained on and by setting the weights to minimize RMSE of the combined model on the current instances , they will automatically normalize for scale.
The performance predictions of DMW can be used in any modelbased AC procedure, such as SMAC and GGA++.
Combining INIT and DMW (IDMW)
Importantly, the two methods we propose are complementary. A warmstarted initial design (INIT) can be easily combined with datadriven modelwarmstarting (DMW) because both approaches affect different parts of modelbased algorithm configuration: where to start from and how to integrate the full performance data from the current and the previous benchmarks to decide where to sample next. In fact, the two warmstarting methods can even synergize to yield more than the sum of their pieces: by evaluating strong configurations from previous AC runs in the initial design through INIT, the weights of the stacked model in DMW can be fitted on these important observations early on, improving the accuracy of its predictions even in early iterations.
Experiments
We evaluated how our three warmstarting approaches improve the stateoftheart AC procedure SMAC.^{2}^{2}2The source code of GGA++ is not publicly available and thus, we could not run experiments on GGA++. In particular, we were interested in the following research questions:
 Q1

Can warmstarted SMAC find better performing configurations within the same configuration budget?
 Q2

Can warmstarted SMAC find wellperforming configurations faster than default SMAC?
 Q3

What is the effect of using warmstarting data from related and unrelated benchmarks?
Experimental Setup
To answer these questions, we ran SMAC (0.5.0) and our warmstarting variants^{3}^{3}3Code and data is publicly available at: http://www.ml4aad.org/smac/. on twelve wellstudied AC tasks from the configurable SAT solver challenge [Hutter et al.2017], which are publicly available in the algorithm configuration library [Hutter et al.2014a]. Since our warmstarting approaches have to generalize across different instance sets and not across algorithms, we considered AC tasks of the highly flexible and robust SAT solver SparrowToRiss across instance sets. SparrowToRiss is a combination of two wellperforming solvers: Riss [Manthey2014] is a treebased solver that performs well on industrial and handcrafted instances; Sparrow [Balint et al.2011] is a localsearch solver that performs well on random, satisfiable instances. SparrowToRiss first runs Sparrow for a parametrized amount of time and then runs Riss if Sparrow could not find a satisfying assignment. Thus, SparrowToRiss can be applied to a large variety of different SAT instances. Riss, Sparrow and SparrowToRiss also won several medals in the international SAT competition. Furthermore, configuring SparrowToRiss is a challenging task because it has a very large configuration space with parameters and conditional dependencies.
To study warmstarting on different categories of instances, the AC tasks consider SAT instances from applications with a lot of internal structure, handcrafted instances with some internal structure, and randomlygenerated SAT instances with little structure. We ran SparrowToRiss on

application instances from boundedmodel checking (BMC), hardware verification (IBM) and fuzz testing based on circuits (CF);

handcrafted instances from graphisomorphism (GI), low autocorrelation binary sequence (LABS) and rooks instances (NRooks);

randomly generated instances, specifically, 3SAT instances at the phase transition from the ToughSAT instance generator (
3cnf), a mix of satisfiable and unsatisfiable 3SAT instances at the phase transition (K3), and unsatisfiable 5SAT instances from a generator used in the SAT Challenge and SAT Competition (UNSATk5); and on 
randomly generated satisfiable instances, specifically, instances with 3 literals per clause and clauses (3SAT1k), instances with 5 literals per clause and clauses (5SAT500) and instances with literals per clause and clauses (7SAT90).
Further details on these instances are given in the description of the configurable SAT solver challenge [Hutter et al.2017]. The instances were split into a training set for configuration and a test set to validate the performance of the configured SparrowToRiss on unseen instances.
For each configuration run on a benchmark set in one of the categories, our warmstarting methods had access to observations on the other two benchmark sets in the category. For example, warmstarted SMAC optimizing SparrowToRiss on IBM had access to the observations and final incumbents of SparrowToRiss on CF and BMC.
As a cost metric, we chose the commonlyused penalized average runtime metric (PAR, i.e., counting each timeout as times the runtime cutoff) with a cutoff of CPU seconds. To avoid a constant inflation of the PAR values, we removed all test instances post hoc that were never solved by any configuration in our experiments ( CF instances, IBM instances, BMC instances, GI instances, LABS instances and 3cnf instances).
On each AC task, we ran 10 independent SMAC runs with a configuration budget of days each. All runs were run on a compute cluster with nodes equipped with two Intel Xeon E52630v4 and GB memory running CentOS 7.
PAR10 scores  Speedup over default SMAC  
SMAC  AAF  INIT  DMW  IDMW  AAF  INIT  DMW  IDMW  
CF  326.5  0.1  0.5  0.7  2.7  
IBM  150.6  3.9  16.2  1.4  9  
BMC  421.5  1.2  1  11  29.3  
GI  314.1  25.6  0.6  7.1  19.4  
LABS  330.1  0.8  0.8  0.8  0.8  
NRooks  116.7  0.4  0.4  0.4  0.5  
3cnf  890.5  10.7  1  1  8.4  
K3  152.8  0.9  0.9  1.8  1.8  
UNSATk5  151.9  1  1  1  1  
3SAT1k  104.4  3.1  2.1  2.1  3.8  
5SAT500  3000  6  0.7  0.7  0.8  
7SAT90  52.3  53.5  2.3  0.5  165.3  
2.4  1.1  1.3  4.3 
Baselines
As baselines, we ran (I) the userspecified default configuration to show the effect of algorithm configuration, (II) SMAC
without warmstarting, and (III) a stateoftheart warmstarting approach for hyperparameter optimizers proposed by wistubadsaa16 (wistubadsaa16), which we abbreviate as “adapted acquisition function” (AAF). The goal of AAF is to bias the acquisition function (Line
in Algorithm 1) towards previously wellperforming regions in the configuration space.^{4}^{4}4We note that combining AAF and INIT is not effective because evaluating the incumbents of INIT would nullify the acquisition function bias of AAF. To generalize AAF to algorithm configuration, we use marginalized prediction across all instances .Q1: Same configuration Budget
The left part of Table 1 shows the median PAR10 test scores of the finallyreturned configurations across the SMAC runs. Default SMAC nearly always improved the PAR scores of SparrowToRiss substantially compared to the SparrowToRiss default, yielding up to a fold speedup (on UNSATk5). Warmstarted SMAC performed significantly better yet on of the AC tasks (BMC, 3cnf, 5SAT500 and 7SAT90), with additional speedups up to fold (on 5SAT500). On two of the crafted instance sets (LABS and NRooks), the warmstarting approaches performed worse than default SMAC—details discussed later.
Overall, the best results were achieved by the combination of our approaches, IDMW. This yielded the best performance of all approaches in 6 of the 12 scenarios (with sometimes substantial improvements over default SMAC) and statistically insignificantly different results than the best approach in 3 of the scenarios. Notably, IDMW performed better on average than its individual components INIT and DMW and clearly outperformed AAF.
Q2: Speedup
The right part of Table 1 shows how much faster our warmstarted SMAC reached the PAR10 performance default that SMAC reached with the full configuration budget.^{5}^{5}5A priori it is not clear how to define a speedup metric comparing algorithm configurators across several runs. To take noise into account across our runs, we performed a permutation test (with with permutations) to determine the first time point from which onwards there was no statistical evidence that default SMAC with a full budget would perform better. To take early convergence/stagnation of default SMAC into account, we compute the speedup of default SMAC to itself and divide the speedups by default SMAC’s speedup. The warmstarting methods outperformed default SMAC in almost all cases (again except LABS and NRooks), with up to 165fold speedups. The most consistent speedups were achieved by the combination of our warmstarting approaches, IDMW, with a geometricaverage fold speedup. We note that our baseline AAF also yielded good speedups (geometric average of 2.4), but its final performance was often quite poor (see left part of Table 1).
Figure 2 illustrates the anytime test performance of all SMAC variants.^{6}^{6}6Since Figure 2 shows test performance on unseen test instances, performance is not guaranteed to improve monotonically (a new best configuration on the training instances might not generalize well to the test instances). In Figure 1(a), AAF, INIT and IDMW improved the performance of SparrowToRiss very early (after roughly  seconds), but only the DMW variants performed well in the long run.
To study the effect of our worst results, Figure 1(b) and 1(c) show the anytime performance on NRooks and LABS, respectively. Figure 1(b) shows that warmstarted SMAC performed better in the beginning, but that default SMAC performed slightly better in the end. The better initial performance is not captured in our quantitative analysis in Table 1. In contrast, Figure 1(c) shows that for LABS, warmstarted SMAC was initially mislead and then started improving like default SMAC, but with a time lag; we note that we only observed this pattern on LABS and conclude that configurations found on NRooks and GI do not generalize to LABS.
Q3: Warmstarting Influence
To study how our warmstarting methods learn from previous data, in Figure 3 we show how the weights of the DMW approach changed over time. Figure 2(a) shows a representative plot: the weights were similar in the beginning (i.e., all EPMs contributed similarly to cost predictions) and over time, the weights of the previous models decreased, with the weight of the current EPM dominating. When optimizing on IBM, the EPM trained on observations from CF was the most important EPM in the beginning.
In contrast, Figure 2(b) shows a case in which the previous performance data acquired for benchmarks K3 and 3cnf do not help for cost predictions on UNSATk5. (This was to be expected, because 3cnf comprises only satisfiable instances, K3 a mix of satisfiable and unsatisfiable instances, and UNSATk5 only unsatisfiable instances.) As the figure shows, our DMW approach briefly used the data from the mixed K3 benchmark (blue curves), but quickly focused only on data from the current benchmark. These two examples illustrate that our DMW approach indeed successfully used data from related benchmarks and quickly ignored data from unrelated ones.
Related Work
The most related work comes from the field of hyperparameter optimization (HPO) of machine learning algorithms. HPO, when cast as the optimization of (cross)validation error, is a special case of AC. This special case does not require the concept of problem instances, does not require the modelling of runtimes of randomized algorithms, does not need to adaptively terminate slow algorithm runs and handle the resulting censored algorithm runtimes, and typically deals with fairly lowdimensional and allcontinuous (hyper)parameter configuration spaces. These works therefore do not directly transfer to the general AC problem.
Several warmstarting approaches exist for HPO. A prominent approach is to learn surrogate models across datasets [Swersky, Snoek, and Adams2013, Bardenet et al.2014, Yogatama and Mann2014]. All of these works are based on Gaussian process models whose computational complexity scales cubically in the number of data points, and therefore, all of them were limited to hundreds or at most thousands of data points. We generalize them to the AC setting (which, on top of the differences to HPO stated above, also needs to handle up to a million cost measurements for an algorithm) in our DMW approach.
Another approach for warmstarting HPO is by adapting the initial design. feureraaai15a (feureraaai15a) proposed to initialize HPO in the automatic machine learning framework AutoSklearn with wellperforming configurations from previous datasets. They had optimized configurations from different machine learning data sets available as warmstarting data and chose which of these to use for a new dataset based on its characteristics; specifically, they used the optimized configurations from the most similar datasets. This approach could be adapted to AC warmstarting in cases where we have many AC benchmarks. However, one disadvantage of the approach is that – unlike our INIT approach – it does not aim for complementarity in the selected configurations. wistubadsaa15 (wistubadsaa15) proposed another approach for warmstarting the initial design which does not depend on instance features and is not limited to configurations returned in previous optimization experiments. They combined surrogate predictions from previous runs and used gradient descent to determine promising configurations. This approach is limited to continuous (hyper)parameters and thus does not apply to the general AC setting.
One related variant of algorithm configuration is the problem of configuring on a stream of problem instances that changes over time. The ReACT approach [Fitzgerald et al.2014] targets this problem setting, keeping track of configurations that worked well on previous instances. If the characteristics of the instances change over time, it also adapts the current configuration by combining observations on previous instances and on new instances. In contrast to our setting, ReACT does not return a single configuration for an instance set and requires parallel compute resources to run a parallel portfolio all the time.
Discussion & Conclusion
In this paper, we introduced several methods to warmstart modelbased algorithm configuration (AC) using observations from previous AC experiments on different benchmark instance sets. As we showed in our experiments, warmstarting can speed up the configuration process up to 165fold and can also improve the configurations finally returned.
While we focused on the stateoftheart configurator SMAC in our experiments, our methods are also applicable to other modelbased configurators, such as GGA++, and our warmstarted initial design approach is even applicable to modelfree configurators, such as ParamILS and irace. We expect that our results would similarly generalize to these.
A practical limitation of our DMW approach (and thus also of IDMW) is that the memory consumption grows substantially with each additional EPM (at least when using random forests fitted on hundreds of thousands of observations). We also tried to study warmstarting SMAC for optimizing SparrowToRiss on all instance sets except the one at hand, but unfortunately, the memory consumption exceeded 12GB RAM. Therefore, one possible approach would be to reduce memory consumption and to use instance features to select a subset of EPMs constructed on similar instances.
Another direction for future work is to combine warmstarting with parameter importance analysis [Hutter, Hoos, and LeytonBrown2014, Biedenkapp et al.2017], e.g., for determining important parameters on previous instance sets and focusing the search on these parameters for a new instance set. Finally, a promising future direction is to integrate warmstarting into iterative configuration procedures, such as Hydra [Xu, Hoos, and LeytonBrown2010], ParHydra [Lindauer et al.2017], or Cedalion [Seipp et al.2015], which construct portfolios of complementary configurations in an iterative fashion using multiple AC runs.
Acknowledgements
The authors acknowledge funding by the DFG (German Research Foundation) under Emmy Noether grant HU 1900/21 and support by the state of BadenWürttemberg through bwHPC and the DFG through grant no INST 39/9631 FUGG.
References

[Ansótegui et al.2015]
Ansótegui, C.; Malitsky, Y.; Sellmann, M.; and Tierney, K.
2015.
Modelbased genetic algorithms for algorithm configuration.
In Yang, Q., and Wooldridge, M., eds., Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’15), 733–739.  [Ansótegui, Sellmann, and Tierney2009] Ansótegui, C.; Sellmann, M.; and Tierney, K. 2009. A genderbased genetic algorithm for the automatic configuration of algorithms. In Gent, I., ed., Proceedings of the Fifteenth International Conference on Principles and Practice of Constraint Programming (CP’09), volume 5732 of Lecture Notes in Computer Science, 142–157. SpringerVerlag.
 [Balint et al.2011] Balint, A.; Frohlich, A.; Tompkins, D.; and Hoos, H. 2011. Sparrow2011. In Proceedings of SAT Competition 2011.
 [Bardenet et al.2014] Bardenet, R.; Brendel, M.; Kégl, B.; and Sebag, M. 2014. Collaborative hyperparameter tuning. In Dasgupta, S., and McAllester, D., eds., Proceedings of the 30th International Conference on Machine Learning (ICML’13), 199–207. Omnipress.
 [Biedenkapp et al.2017] Biedenkapp, A.; Lindauer, M.; Eggensperger, K.; Fawcett, C.; Hoos, H.; and Hutter, F. 2017. Efficient parameter importance analysis via ablation with surrogates. In Proceedings of the ThirtyFirst Conference on Artificial Intelligence (AAAI’17), 773–779.
 [Bonet and Koenig2015] Bonet, B., and Koenig, S., eds. 2015. Proceedings of the Twentynineth Conference on Artificial Intelligence (AAAI’15). AAAI Press.
 [Fawcett et al.2011] Fawcett, C.; Helmert, M.; Hoos, H.; Karpas, E.; Roger, G.; and Seipp, J. 2011. Fdautotune: Domainspecific configuration using fastdownward. In Helmert, M., and Edelkamp, S., eds., Working notes of the Twentyfirst International Conference on Automated Planning and Scheduling (ICAPS11), Workshop on Planning and Learning.
 [Fawcett et al.2014] Fawcett, C.; Vallati, M.; Hutter, F.; Hoffmann, J.; Hoos, H.; and LeytonBrown, K. 2014. Improved features for runtime prediction of domainindependent planners. In Chien, S.; Minh, D.; Fern, A.; and Ruml, W., eds., Proceedings of the TwentyFourth International Conference on Automated Planning and Scheduling (ICAPS14). AAAI.
 [Feurer et al.2015] Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J. T.; Blum, M.; and Hutter, F. 2015. Efficient and robust automated machine learning. In Cortes, C.; Lawrence, N.; Lee, D.; Sugiyama, M.; and Garnett, R., eds., Proceedings of the 29th International Conference on Advances in Neural Information Processing Systems (NIPS’15).
 [Feurer, Springenberg, and Hutter2015] Feurer, M.; Springenberg, T.; and Hutter, F. 2015. Initializing Bayesian hyperparameter optimization via metalearning. In Bonet and Koenig aaai15, 1128–1135.
 [Fitzgerald et al.2014] Fitzgerald, T.; O’Sullivan, B.; Malitsky, Y.; and Tierney, K. 2014. React: Realtime algorithm configuration through tournaments. In Edelkamp, S., and Barták, R., eds., Proceedings of the Seventh Annual Symposium on Combinatorial Search (SOCS’14). AAAI Press.
 [Hennig and Schuler2012] Hennig, P., and Schuler, C. 2012. Entropy search for informationefficient global optimization. Journal of Machine Learning Research 98888(1):1809–1837.

[Hoos, Lindauer, and
Schaub2014]
Hoos, H.; Lindauer, M.; and Schaub, T.
2014.
claspfolio 2: Advances in algorithm selection for answer set
programming.
Theory and Practice of Logic Programming
14:569–585.  [Hutter et al.2009] Hutter, F.; Hoos, H.; LeytonBrown, K.; and Stützle, T. 2009. ParamILS: An automatic algorithm configuration framework. Journal of Artificial Intelligence Research 36:267–306.
 [Hutter et al.2014a] Hutter, F.; LópezIbánez, M.; Fawcett, C.; Lindauer, M.; Hoos, H.; LeytonBrown, K.; and Stützle, T. 2014a. Aclib: a benchmark library for algorithm configuration. In Pardalos, P., and Resende, M., eds., Proceedings of the Eighth International Conference on Learning and Intelligent Optimization (LION’14), Lecture Notes in Computer Science, 36–40. SpringerVerlag.
 [Hutter et al.2014b] Hutter, F.; Xu, L.; Hoos, H.; and LeytonBrown, K. 2014b. Algorithm runtime prediction: Methods and evaluation. Artificial Intelligence 206:79–111.
 [Hutter et al.2017] Hutter, F.; Lindauer, M.; Balint, A.; Bayless, S.; Hoos, H.; and LeytonBrown, K. 2017. The configurable SAT solver challenge (CSSC). Artificial Intelligence Journal (AIJ) 243:1–25.
 [Hutter, Hoos, and LeytonBrown2011] Hutter, F.; Hoos, H.; and LeytonBrown, K. 2011. Sequential modelbased optimization for general algorithm configuration. In Coello, C., ed., Proceedings of the Fifth International Conference on Learning and Intelligent Optimization (LION’11), volume 6683 of Lecture Notes in Computer Science, 507–523. SpringerVerlag.
 [Hutter, Hoos, and LeytonBrown2014] Hutter, F.; Hoos, H.; and LeytonBrown, K. 2014. An efficient approach for assessing hyperparameter importance. In Xing, E., and Jebara, T., eds., Proceedings of the 31th International Conference on Machine Learning, (ICML’14), 754–762. Omnipress.
 [Jones, Schonlau, and Welch1998] Jones, D.; Schonlau, M.; and Welch, W. 1998. Efficient global optimization of expensive black box functions. Journal of Global Optimization 13:455–492.
 [Krause and Golovin2012] Krause, A., and Golovin, D. 2012. Submodular function maximization. Tractability: Practical Approaches to Hard Problems 3(19):8.
 [LeytonBrown, Nudelman, and Shoham2009] LeytonBrown, K.; Nudelman, E.; and Shoham, Y. 2009. Empirical hardness models: Methodology and a case study on combinatorial auctions. Journal of the ACM 56(4).
 [Lindauer et al.2017] Lindauer, M.; Hoos, H.; LeytonBrown, K.; and Schaub, T. 2017. Automatic construction of parallel portfolios via algorithm configuration. Artificial Intelligence 244:272–290.
 [LópezIbáñez et al.2016] LópezIbáñez, M.; DuboisLacoste, J.; Caceres, L. P.; Birattari, M.; and Stützle, T. 2016. The irace package: Iterated racing for automatic algorithm configuration. Operations Research Perspectives 3:43–58.
 [Manthey2014] Manthey, N. 2014. Riss 4.27. In Belov, A.; Diepold, D.; Heule, M.; and Järvisalo, M., eds., Proceedings of SAT Competition 2014: Solver and Benchmark Descriptions, volume B20142 of Department of Computer Science Series of Publications B, 65–67. University of Helsinki.
 [Nudelman et al.2004] Nudelman, E.; LeytonBrown, K.; Devkar, A.; Shoham, Y.; and Hoos, H. 2004. Understanding random SAT: Beyond the clausestovariables ratio. In Wallace, M., ed., Proceedings of the 10th International Conference on Principles and Practice of Constraint Programming (CP’04), volume 3258 of Lecture Notes in Computer Science, 438–452. SpringerVerlag.
 [Seipp et al.2015] Seipp, J.; Sievers, S.; Helmert, M.; and Hutter, F. 2015. Automatic configuration of sequential planning portfolios. In Bonet and Koenig aaai15, 3364–3370.
 [Snoek, Larochelle, and Adams2012] Snoek, J.; Larochelle, H.; and Adams, R. P. 2012. Practical Bayesian optimization of machine learning algorithms. In Bartlett, P.; Pereira, F.; Burges, C.; Bottou, L.; and Weinberger, K., eds., Proceedings of the 26th International Conference on Advances in Neural Information Processing Systems (NIPS’12), 2960–2968.
 [Srinivas et al.2010] Srinivas, N.; Krause, A.; Kakade, S.; and Seeger, M. 2010. Gaussian process optimization in the bandit setting: No regret and experimental design. In Fürnkranz, J., and Joachims, T., eds., Proceedings of the 27th International Conference on Machine Learning (ICML’10), 1015–1022. Omnipress.
 [Swersky, Snoek, and Adams2013] Swersky, K.; Snoek, J.; and Adams, R. 2013. Multitask Bayesian optimization. In Burges, C.; Bottou, L.; Welling, M.; Ghahramani, Z.; and Weinberger, K., eds., Proceedings of the 27th International Conference on Advances in Neural Information Processing Systems (NIPS’13), 2004–2012.

[Wistuba, Schilling, and
SchmidtThieme2015]
Wistuba, M.; Schilling, N.; and SchmidtThieme, L.
2015.
Learning hyperparameter optimization initializations.
In
Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA)
, 1–10. IEEE.  [Wistuba, Schilling, and SchmidtThieme2016] Wistuba, M.; Schilling, N.; and SchmidtThieme, L. 2016. Hyperparameter optimization machines. In Proceedings of the International Conference on Data Science and Advanced Analytics (DSAA), 41–50. IEEE.
 [Wolpert1992] Wolpert, D. 1992. Stacked generalization. Neural Networks 5(2):241–259.
 [Xu, Hoos, and LeytonBrown2010] Xu, L.; Hoos, H.; and LeytonBrown, K. 2010. Hydra: Automatically configuring algorithms for portfoliobased selection. In Fox, M., and Poole, D., eds., Proceedings of the Twentyfourth National Conference on Artificial Intelligence (AAAI’10), 210–216. AAAI Press.

[Yogatama and
Mann2014]
Yogatama, D., and Mann, G.
2014.
Efficient transfer learning method for automatic hyperparameter tuning.
In Kaski, S., and Corander, J., eds., Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), volume 33 of JMLR Workshop and Conference Proceedings, 1077–1085.
Comments
There are no comments yet.