Practical and sample efficient zero-shot HPO

by   Fela Winkelmolen, et al.

Zero-shot hyperparameter optimization (HPO) is a simple yet effective use of transfer learning for constructing a small list of hyperparameter (HP) configurations that complement each other. That is to say, for any given dataset, at least one of them is expected to perform well. Current techniques for obtaining this list are computationally expensive as they rely on running training jobs on a diverse collection of datasets and a large collection of randomly drawn HPs. This cost is especially problematic in environments where the space of HPs is regularly changing due to new algorithm versions, or changing architectures of deep networks. We provide an overview of available approaches and introduce two novel techniques to handle the problem. The first is based on a surrogate model and adaptively chooses pairs of dataset, configuration to query. The second, for settings where finding, tuning and testing a surrogate model is problematic, is a multi-fidelity technique combining HyperBand with submodular optimization. We benchmark our methods experimentally on five tasks (XGBoost, LightGBM, CatBoost, MLP and AutoML) and show significant improvement in accuracy compared to standard zero-shot HPO with the same training budget. In addition to contributing new algorithms, we provide an extensive study of the zero-shot HPO technique resulting in (1) default hyper-parameters for popular algorithms that would benefit the community using them, (2) massive lookup tables to further the research of hyper-parameter tuning.


page 1

page 2

page 3

page 4


Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis

Recently, zero-shot TTS and VC methods have gained attention due to thei...

Practical Aspects of Zero-Shot Learning

One of important areas of machine learning research is zero-shot learnin...

Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction

In this paper, we introduce zero-shot cost models which enable learned c...

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

Hyperparameter (HP) tuning in deep learning is an expensive process, pro...

Meta-Learning for Symbolic Hyperparameter Defaults

Hyperparameter optimization in machine learning (ML) deals with the prob...

Performance Variability in Zero-Shot Classification

Zero-shot classification (ZSC) is the task of learning predictors for cl...

Mining Robust Default Configurations for Resource-constrained AutoML

Automatic machine learning (AutoML) is a key enabler of the mass deploym...

Please sign up or login with your details

Forgot password? Click here to reset