1 Introduction
Algorithm selection and hyperparameter optimization are core parts of automatic machine learning (AutoML). Previously, AutoML approaches often define the search space as the algorithm selection space [Brazdil et al.2003, Adankon and Cheriet2009, Biem2003], hyperparameter space [Hu et al.2018, Hu et al.2019], or the joint of the both spaces (CASH problem) [Feurer et al.2015, Thornton et al.2013]. While the joint space allows a more thorough search that could cover potentially better configurations, the huge space is a barrier to effective search in limited time. Moreover, the joint space can be quite redundant when considering only one of the algorithms, since the hyperparameters of the other algorithms are irrelevant. Therefore, the joint space contains redundancy or even can be misleading.
The cascaded algorithm selection can have levels [Jamieson and Talwalkar2016]. The first level is on the hyperparameter optimization. It only needs to focus on the selected algorithm, but not the hyperparameters of all algorithms. The second level is on the algorithm selection. However, previous methods in this kind commonly carry out a full hyperparameter optimization on the candidate algorithms, making the slow and expensive algorithm evaluations.
In this paper, we propose a cascaded algorithm selection approach to avoid a fullspace hyperparameter optimization. The hyperparameter optimization usually employs some stepping search methods, which can be paused after every search step, and can also be resumed. The selection receives feedback and allocates the next search step to one of the algorithms. Thus, the cascaded algorithm selection is naturally to be modeled as a multiarmed bandit problem [Auer et al.2002]. However, most of the classical bandits maximize the average feedbacks. In the AutoML, however, only the best feedback matters. A variant of the bandit, the extreme bandit [Carpentier and Valko2014], can model this situation, which tries to identify the arm with the maximize (or equivalently minimize) feedback value. However, as the extreme bandit follows the extreme distribution, it is not only unstable but often require to known the distribution type, making the extreme bandit approach unpractical.
In this paper, we propose the extremeregion UCB bandit (ERUCB), which focuses on the extreme region of the feedback distributions. Unlike the extreme bandit, ERUCB considers a region instead of the extreme point, which can lead to a better mathematical condition. Moreover, in machine learning where the test data is commonly different from the train data, the extreme region can be more robust for generalization. With arms and trials, our analysis proves that ERUCB has the regret upper bound, which has the same order with the classical UCB strategy. The experiments on synthetic and real AutoML tasks reveal that the ERUCB can find the best algorithm precisely, and exploit it with the majority of the trial budget.
The rest sections present background & related works, extremeregion UCB bandit, experiments, and conclusion.
2 Background & Related Works
We consider the algorithm selection and hyperparameter optimization on classification tasks. Let and denote the training and testing datasets. Let denote the algorithm set with candidates. For , denotes a hyperparameter setting, where is the hyperparameter space of . Let denote a performance criterion for a configuration , e.g., accuracy, AUC score, etc. The AutoML problem can be formulated as follows:
(1) 
where and . It is also concludes the CASH problem formulation [Feurer et al.2015].
Because of the nonconvex, noncontinuous and nondifferentiable properties, derivativefree optimization [Yu et al.2016, Hu et al.2017] is usually applied to solve it. For example, a treestructure based Bayesian optimization (SMAC) [Hutter et al.2011] is employed on AutoWEKA [Thornton et al.2013] and AutoSKLEARN [Feurer et al.2015], the popular opensource AutoML tools. Derivativefree optimization explores search space by sampling and evaluating. But the high timecost restrains the total number of evaluations on AutoML. With the limited trials, the performance of derivativefree optimization is very sensitive to search space. However, in above formulation, the search space . Obviously, is redundant, because the best configuration is only relevant to the hyperparameter space of the best algorithm.
Hence, we consider an easier formulation, i.e., optimizing hyperparameters of algorithms separately:
(2) 
The hyperparameter processes can be seen as arms. The algorithm selection level is a multiarmed bandit problem. The bandit is a classical formulation of the resource allocation problem. In [Felício et al.2017], the authors formulated the coldstart user recommendation as a multiarmed bandit problem, which user information was unavailable at the beginning. The feedbacks of users has to be obtained by trials. In this situation, the bandit concerns more about the average feedback of arms. In [Cicirello and Smith2005], the authors proposed the max
armed bandit, which focused on the maximum feedback of trials. But it assumed that the reward distribution was a Gaussian distribution, and it was designed for the heuristic search, in which more than one arms can be selected at a trial step.
In this paper, we customize the extremeregion UCB (ERUCB) bandit for AutoML problems.
3 ExtremeRegion UCB Bandit
In this section, we present details of the ERUCB: the bandit formulation for AutoML, the deduction of the ERUCB strategy and the theoretical analysis on the ERUCB strategy.
3.1 Bandit formulation for AutoML
In the classical multiarmed bandit, feedbacks of an arm obey an underlying distribution. In this paper, we employ the random search on the hyperparameter optimization. A trial in a model is uniformly sampling hyperparameters from , and its performance is the feedback of this trial. Thus, , where denote a feedback of a trial on , and is the underlying performance distribution of . Because of the random search, is fixed. With algorithm candidates, let denote the performance distribution set. The armed bandit formulation for AutoML is: at the th trial, the is selected from algorithm candidates, and get a feedback independently from .
3.2 Deduction
In AutoML tasks, the selected algorithm is required to have maximum performances. For this requirement, we present the extremeregion target for the proposed bandit. Then, we show the deduction details of extremeregion UCB strategy.
3.2.1 Extremeregion target
The target of the hyperparameter optimization is to find the hyperparameters which have the maximum performance. In the bandit, with a fixed
, we want the probability
as large as possible. With the Chebyshev inequality: , let ,(3) 
In other words, with the same fixed probability upper bound , the best arm selection is:
(4) 
With the given and , the groundtruth selection strategy is (4
). But, when facing the unknown distributions, we have to estimate the expectation and variance based on the observations. With the Markov inequality, it is easy to relate the expectation
with its estimation. But for variance, it is hard to find the relationship. With the variance definition:(5) 
Because
is the expectation of the random variable
. The Markov inequality can be applied to it easily. And can partly represent according to (5). Thus, we try to replace with :(6) 
Comparing with (4), (6) magnifies the effect of expectation item on selection strategy. To tackle this issue, we introduce a hyperparameter , and construct a new random variable . Furthermore, let , and . Thus, the extremeregion target is:
(7) 
We prove that it can reduce the effect of expectation on algorithm selection by introducing into :
Proof.
According to definitions of , and ,
(8) 
Comparing with (4), because of , the item of expectation is reduced, but the item of variance stays the same. It concludes the proof. ∎
3.2.2 Extremeregion UCB strategy
We apply the upper confidence bound (UCB) strategy on the extremeregion target. In this paper, we assume that the random variables satisfy the following moment condition. There exists a convex function
on the reals, for all ,(9) 
If we let and , (9) is known as Hoeffding’s lemma. We apply this assumption to construct an upper bound for the estimated expectations at some fixed confidence level. Let denote the LegendreFenchel transform of . With observations of , let and denote the estimated expectations of and . Only for with a fixed , using the Markov inequality:
(10) 
The same deduction for , and is a monotonically increasing function:
(11) 
Because , and let . With the union bound, we combine and as follows:
(12) 
Let . With the probability at least ,
(13) 
Within total trials, let denote the number that the th arm is selected, and . ERUCB strategy is:
(14) 
and are the exploitation and exploration items. With Hoeffding’s lemma, taking , then, . And let . The exploration can be rewritten as:
(15) 
Thus, the Hoeffding’s ERUCB strategy is:
(16) 
Because on AutoML, the exploitation item is often much smaller than the exploration item. To further exploration and exploitation tradeoff, we introduce a hyperparameter . The practical Hoeffding’s ERUCB strategy is:
(17) 
The cascaded algorithm selection and hyperparameter optimization with ERUCB bandit is presented at Algorithm 1. Line 2 and 7 are the procedures of uniformly sampling hyperparameters for the selected algorithm and obtaining the feedbacks. Line 1 to 4 are the initialization steps. In the main loop (line 5 to 10), the algorithm is selected by the ERUCB strategy (line 6). Line 7 to 9 are the procedures for updating the exploitation item for the selected algorithm.
We have to discuss the hyperparameters, i.e., , and for the ERUCB bandit. is employed to control the space size of the extreme region. It is usually a small real number, e.g., 0.1 or 0.01. is the explorationandexploitation tradeoff hyperparameter. In AutoML tasks, is used to magnify the exploitation item. Thus, it is usually a big number such as 10 or 20. is applied to reduce the impact of expectation item in the selection strategy. It should be tuned according to tasks. In experiments, we will investigate them empirically.
3.3 Theoretical Analysis
We present the analysis of the upper bound for ERUCB strategy (3.2.2) and the Hoeffding’s ERUCB strategy (15) on the extremeregion regret. For the arbitrary arm and a fixed , we define . Thus, . According to (7), let , thus , and . We assume by choosing an appropriate . The extremeregion regret is the Definition 1.
Definition 1 (Extremeregion regret).
At th trial, event A is the number of times that occurs, and event B is the number of times that occurs with a given strategy. The extremeregion regret is:
Introducing and , The extremeregion regret can be rewritten as:
(18) 
We can prove the following simple upper regret bound for ERUCB strategy:
Theorem 1 (Regret of ErUcb).
Assume the feedback distribution of arbitrary arm satisfy (9). With , ERUCB satisfies:
Due to the limitation of paper length, we present the proof details in our supplementary material. Based on Theorem 1, we can easily prove the extremeregion regret of the Hoeffding’s ERUCB strategy:
Corollary 1 (Regret of Hoeffding’s ERUCB).
Assume the feedback distribution of arbitrary arm satisfy (9). With , Hoeffding’s ERUCB satisfies:
According to the theoretical analysis, the ERUCB bandit has upper bound on the extremeregion regret.
4 Experiments
In the experiment section, we empirically investigate the effectiveness of the ERUCB bandit on some synthetic and realworld AutoML tasks. Some stateoftheart bandit strategies are selected as the compared methods, including the classical UCB (CUCB) [Bubeck et al.2012], greedy [Sutton and Barto2018], softmax strategy [Tokic and Palm2011] and random strategy which allocates the budget by selecting arms randomly. In addition, we apply the random search on the joint hyperparameter spaces of all algorithms (Joint) to compare with the cascaded hyperparameter optimization.
4.1 Synthetic problem
We construct a 7armed bandit problem in this section. The feedbacks obey Gaussian distributions with different expectations and variances: , , , , , , . The best arm is not only related with the expectation, but also influenced by the variance. Obviously, it is more likely to obtain the best feedback by exploiting in , in other words, . We study on the three hyperparameters of ERUCB firstly, and then compare the ERUCB with other methods.
4.1.1 Hyperparameter study
We investigate the , and for the ERUCB. With fixed two of them, we study another one: with fixed , , we study ; with fixed , , we study ; with fixed , , we study . For every hyperparameter, we evenly sample 1000 settings from the setting region. The core problem we care about is how the methods allocate budget to arms. Let define the exploitation rate for arm . Large means the large number of trials that the arm is selected. The trial budget is set as 1000. The experiment for every hyperparameter setting is repeated for 3 times independently, and the average results are presented.
Figure 1:a.1, 2 and 3 show the study results of , and . The arm is the best selection. Thus, the larger the better. For (Figure 1:a.1), the green line () is approaching 1 when nears by 0. In practice, should be set as a small value. For (Figure 1:a.2), when is small, the exploitation rates of arms are similar. And the green line is increasing during is increasing. It means that the small encourages exploration and the large encourages exploitation according to the observations. For (Figure 1:a.3), the exploitation rates are sensitive to when is around the expectations of reward distributions. Thus, should be carefully tuned according to different tasks.
Methods  

ERUCB  0.02  1,1,1  1,1,1  0.01 
CUCB  0.940.01  1,7,6  7,7,7  0.010.01 
Greedy  0.980.04  6,1,1  6,1,6  0.310.42 
Softmax  1.010.01  1,1,1  7,7,1  0.180.01 
Random  1.000.05  1,1,1  4,1,6  0.150.01 
4.1.2 Investigation with compared methods
According to the hyperparameter study results of the ERUCB, we set , , , and compare it with the CUCB, greedy (), Softmax strategy () and random selection strategy. The trial budget is 1000. Every experiment is repeated for 3 times independently. The average performances are presented in Table 1.
Table 1 shows that the ERUCB outperforms the compared methods. Furthermore, the ERUCB can find the best arm (arm ) and allocate most of budget to it ( and average is 0.9). Because the CUCB depends only on mean observations to make decisions. It wrongly allocates budget to arm (). The of greedy is very unstable. It means greedy can’t find the best arm effectively. In general, the ERUCB can effectively discover the bestarm and reasonably allocate budget to exploration and exploitation in this synthetic problem.
4.2 Realword AutoML tasks
We apply the ERUCB to solve the realworld classification tasks. We select 10 frequentlyused algorithms as the candidates from SKLEARN [Pedregosa et al.2011], including DecisionTree (DT), AdaBoost (Ada), QuadraticDiscriminantAnalysis (QDA), GaussianNB (GNB), BernoulliNB (BNB), KNeighbors (KN), ExtraTree (ET), PassiveAggressive (PA), RandomForest (RF) and SGD. And 12 classification datasets from UCI are selected as AutoML tasks. The evaluation criterion of each configuration is the accuracy score. The compared methods are CUCB, greedy (), Softmax strategy (), random strategy and Joint. The trial budget is 1000. We set , for the ERUCB on all datasets. The is set according to the tasks, and showed in Table 2. For each method and each dataset, we run every experiment 3 times independently, and the average performances of our experiment are presented. In addition, we apply the random search with 1000 trials to explore on every algorithm candidate. According to (2), we can find out the best groundtruth algorithm for the datasets.
Dataset  Methods  VEval  B. Alg.  TEval  Dataset  Methods  VEval  B. Alg.  TEval  


ERUCB  SGD 

ERUCB  ET  
CUCB  .8924  .1067  PA  .8339  CUCB  .8690  .1173  RF  .6416  
Greedy  .8931  .0693  SGD  .8227  Greedy  .8630  .0277  RF  .6657  
Softmax  .9004  .1287  SGD  .8809  Softmax  .8620  .1163  DT  .6551  
Random  .8978  .1097  SGD  .8597  Random  .8628  .1053  RF  .6839  
Joint  .8978    SGD  .8994  Joint  .8619    RF  .8604  

ERUCB  Ada  .8515 

ERUCB  ET  
CUCB  .9414  .1693  Ada  CUCB  .7172  .1117  ET  .5534  
Greedy  .9492  .3593  Ada  .7510  Greedy  .6528  .0443  ET  .5006  
Softmax  .9414  .1290  PA  .8229  Softmax  .6866  .0977  ET  .5749  
Random  .9464  .1070  PA  .6999  Random  .6977  .0990  ET  .5534  
Joint  .9457    SGD  .8479  Joint  .6531    QDA  .5687  

ERUCB  RF 

ERUCB  RF  
CUCB  .8745  .2440  RF  .8333  CUCB  .7265  .1520  RF  .6370  
Greedy  .8129  .3023  RF  .8809  Greedy  .7163  .0017  RF  .6148  
Softmax  .8728  .1630  RF  .8809  Softmax  .7197  .1180  RF  .6148  
Random  .8695  .1037  RF  .8762  Random  .7247  .1027  RF  .6666  
Joint  .8549    RF  .8714  Joint  .7087    RF  .6444  

ERUCB  SGD  .7330 

ERUCB  Ada  
CUCB  .7297  .1133  SGD  .7272  CUCB  .7871  .1370  RF  .6688  
Greedy  .7362  .0157  SGD  .7272  Greedy  .7201  .0607  Ada  .6457  
Softmax  .7402  .1163  SGD  Softmax  .8010  .1150  Ada  .6238  
Random  .7406  .1027  SGD  .7330  Random  .8039  .1100  Ada  .6631  
Joint  .7399    SGD  .7316  Joint  .7884    Ada  .6304  

ERUCB  Ada  .9449 

ERUCB  RF  
CUCB  .9298  .1757  Ada  .9457  CUCB  .9779  .1950  RF  .9696  
Greedy  .9311  .7333  Ada  Greedy  .9790  .8047  RF  .9703  
Softmax  .9298  .1253  Ada  .9471  Softmax  .9768  .1397  RF  .9660  
Random  .9306  .1057  Ada  .9500  Random  .9776  .1203  RF  .9696  
Joint  .9290    RF  .9428  Joint  .9793    RF  .9464  

ERUCB  Ada  .9681 

ERUCB  RF  
CUCB  .9808  .1397  Ada  .9710  CUCB  .9820  .1200  RF  .9320  
Greedy  .9816  .8757  Ada  .9681  Greedy  .9813  .3567  Ada  .9427  
Softmax  .9794  .1207  Ada  Softmax  .9819  .1097  RF  .9267  
Random  .9794  .1060  Ada  Random  .9821  .1103  RF  .9347  
Joint  .9794    Ada  .9594  Joint  .9821    DT  .9320 
The average performances of the compared methods on all 12 datasets are presented in Table 2. From Table 2, we can get the following empirical conclusions:

“No free lunch” has been proved again in those experiments. The best performance algorithms are different in different datasets. Particularly, treebased ensemble algorithms, e.g., AdaBoost, RandomForest, etc, show the outstanding performance in most of the datasets. It indicates that the algorithm selection is necessary for making search hyperparameters easier.

The cascaded algorithm selection and hyperparameter optimization are necessary for making the search problem easier to solve. Comparing the random strategy with the Joint, the random strategy beats the Joint on most of the datasets (8/12). It indicates that the large search space provides more difficult for optimization.

It will mislead the strategy to select wrong algorithms only according to the average performance. In Table 2, the random strategy is not always bad in datasets. The strategies, such as CUCB, greedy and Softmax, which focus on the average performance are easy to select wrong algorithms which average performances are good.

The proposed ERUCB bandit strategy can effectively find out the best performance algorithm (B. Alg. is the groundtruth algorithm on 9/12 datasets), and reasonably allocate the trial budget to the best algorithm (ERUCB gets the highest on 12/12 datasets).
5 Conclusion
This paper proposes the extremeregion upper confidence bound (ERUCB) bandit for the cascaded algorithm selection and hyperparameter optimization. we employ the random search in the hyperparameter optimization level. The level of algorithm selection is formulated as a multiarmed bandit problem. The bandit strategies are applied to allocate the limited search budget to the hyperparameter optimization processes on algorithm candidates. However, the algorithm selection focuses on the algorithm with the maximum performance but not the average performance. To tackle this, we propose the extremeregion UCB (ERUCB) strategy, which selects the arm with the largest extreme region of the underlying distribution. The theoretical study shows that the ERUCB has extremeregion regret upper bound, which has the same order with the classical UCB strategy. The experiments on synthetic and realworld AutoML problems empirically verify that the ERUCB can precisely discover the algorithm with the best performance, and reasonably allocate the trial budget to the algorithm candidates.
References
 [Adankon and Cheriet2009] Mathias M Adankon and Mohamed Cheriet. Model selection for the LSSVM. application to handwriting recognition. Pattern Recognition, 42(12):3264–3270, 2009.
 [Auer et al.2002] Peter Auer, Nicolo CesaBianchi, and Paul Fischer. Finitetime analysis of the multiarmed bandit problem. Machine Learning, 47(23):235–256, 2002.
 [Biem2003] Alain Biem. A model selection criterion for classification: Application to hmm topology optimization. In Proceedings of the 7th International Conference on Document Analysis and Recognition, pages 104–108, 2003.
 [Brazdil et al.2003] Pavel B Brazdil, Carlos Soares, and Joaquim Pinto Da Costa. Ranking learning algorithms: Using IBL and metalearning on accuracy and time results. Machine Learning, 50(3):251–277, 2003.
 [Bubeck et al.2012] Sébastien Bubeck, Nicolo CesaBianchi, et al. Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012.
 [Carpentier and Valko2014] Alexandra Carpentier and Michal Valko. Extreme bandits. In Advances in Neural Information Processing Systems, pages 1089–1097, 2014.

[Cicirello and Smith2005]
Vincent A Cicirello and Stephen F Smith.
The max karmed bandit: A new model of exploration applied to search
heuristic selection.
In
Proceedings of the 20th AAAI Conference on Artificial Intelligence
, pages 1355–1361, 2005.  [Felício et al.2017] Crícia Z Felício, Klérisson VR Paixão, Celia AZ Barcelos, and Philippe Preux. A multiarmed bandit model selection for coldstart user recommendation. In Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, pages 32–40. ACM, 2017.
 [Feurer et al.2015] Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost , Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems, pages 2962–2970, 2015.
 [Hu et al.2017] YiQi Hu, Hong Qian, and Yang Yu. Sequential classificationbased optimization for direct policy search. In Proceedings of the 31st AAAI Conference on Artificial Intelligence, pages 2029–2035, 2017.
 [Hu et al.2018] YiQi Hu, Yang Yu, and ZhiHua Zhou. Experienced optimization with reusable directional model for hyperparameter search. In Proceeding of the 27th International Joint Conference on Artificial Intelligence, pages 2276–2282, 2018.
 [Hu et al.2019] YiQi Hu, Yang Yu, WeiWei Tu, Qiang Yang, Yuqiang Chen, and Wenyuan Dai. Multifidelity automatic hyperparameter tuning via transfer series expansion. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019.
 [Hutter et al.2011] Frank Hutter, Holger H Hoos, and Kevin LeytonBrown. Sequential modelbased optimization for general algorithm configuration. LION, 5:507–523, 2011.

[Jamieson and
Talwalkar2016]
Kevin G Jamieson and Ameet Talwalkar.
Nonstochastic best arm identification and hyperparameter optimization.
In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 240–248, 2016.  [Pedregosa et al.2011] Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Edouard Duchesnay. Scikitlearn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
 [Sutton and Barto2018] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction. MIT press, 2018.
 [Thornton et al.2013] Chris Thornton, Frank Hutter, Holger H Hoos, and Kevin LeytonBrown. Autoweka: Combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 847–855, 2013.
 [Tokic and Palm2011] Michel Tokic and Günther Palm. Valuedifference based exploration: adaptive control between epsilongreedy and softmax. In Annual Conference on Artificial Intelligence, pages 335–346. Springer, 2011.
 [Yu et al.2016] Yang Yu, Hong Qian, and YiQi Hu. Derivativefree optimization via classification. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, pages 2286–2292, 2016.