Tuning Mixed Input Hyperparameters on the Fly for Efficient Population Based AutoRL

06/30/2021
by   Jack Parker-Holder, et al.
3

Despite a series of recent successes in reinforcement learning (RL), many RL algorithms remain sensitive to hyperparameters. As such, there has recently been interest in the field of AutoRL, which seeks to automate design decisions to create more general algorithms. Recent work suggests that population based approaches may be effective AutoRL algorithms, by learning hyperparameter schedules on the fly. In particular, the PB2 algorithm is able to achieve strong performance in RL tasks by formulating online hyperparameter optimization as time varying GP-bandit problem, while also providing theoretical guarantees. However, PB2 is only designed to work for continuous hyperparameters, which severely limits its utility in practice. In this paper we introduce a new (provably) efficient hierarchical approach for optimizing both continuous and categorical variables, using a new time-varying bandit algorithm specifically designed for the population based training regime. We evaluate our approach on the challenging Procgen benchmark, where we show that explicitly modelling dependence between data augmentation and other hyperparameters improves generalization.

READ FULL TEXT
research
07/19/2022

Bayesian Generational Population-Based Training

Reinforcement learning (RL) offers the potential for training generally ...
research
09/03/2020

Sample-Efficient Automated Deep Reinforcement Learning

Despite significant progress in challenging problems across various doma...
research
01/17/2021

Cost-Efficient Online Hyperparameter Optimization

Recent work on hyperparameters optimization (HPO) has shown the possibil...
research
02/06/2020

One-Shot Bayes Opt with Probabilistic Population Based Training

Selecting optimal hyperparameters is a key challenge in machine learning...
research
07/11/2020

An Asymptotically Optimal Multi-Armed Bandit Algorithm and Hyperparameter Optimization

The evaluation of hyperparameters, neural architectures, or data augment...
research
03/09/2023

A Framework for History-Aware Hyperparameter Optimisation in Reinforcement Learning

A Reinforcement Learning (RL) system depends on a set of initial conditi...
research
09/28/2021

Faster Improvement Rate Population Based Training

The successful training of neural networks typically involves careful an...

Please sign up or login with your details

Forgot password? Click here to reset