Online Continuous Hyperparameter Optimization for Contextual Bandits

02/18/2023
by   Yue Kang, et al.
0

In stochastic contextual bandit problems, an agent sequentially makes actions from a time-dependent action set based on past experience to minimize the cumulative regret. Like many other machine learning algorithms, the performance of bandits heavily depends on their multiple hyperparameters, and theoretically derived parameter values may lead to unsatisfactory results in practice. Moreover, it is infeasible to use offline tuning methods like cross validation to choose hyperparameters under the bandit environment, as the decisions should be made in real time. To address this challenge, we propose the first online continuous hyperparameter tuning framework for contextual bandits to learn the optimal parameter configuration within a search space on the fly. Specifically, we use a double-layer bandit framework named CDT (Continuous Dynamic Tuning) and formulate the hyperparameter optimization as a non-stationary continuum-armed bandit, where each arm represents a combination of hyperparameters, and the corresponding reward is the algorithmic result. For the top layer, we propose the Zooming TS algorithm that utilizes Thompson Sampling (TS) for exploration and a restart technique to get around the switching environment. The proposed CDT framework can be easily used to tune contextual bandit algorithms without any pre-specified candidate set for hyperparameters. We further show that it could achieve sublinear regret in theory and performs consistently better on both synthetic and real datasets in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2021

Syndicated Bandits: A Framework for Auto Tuning Hyper-parameters in Contextual Bandit Algorithms

The stochastic contextual bandit problem, which models the trade-off bet...
research
08/05/2017

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed poli...
research
01/21/2019

Parallel Contextual Bandits in Wireless Handover Optimization

As cellular networks become denser, a scalable and dynamic tuning of wir...
research
05/23/2018

Learning Contextual Bandits in a Non-stationary Environment

Multi-armed bandit algorithms have become a reference solution for handl...
research
07/05/2023

Meta-Learning Adversarial Bandit Algorithms

We study online meta-learning with bandit feedback, with the goal of imp...
research
03/21/2016

Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization

Performance of machine learning algorithms depends critically on identif...
research
03/23/2023

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Current endpointing (EP) solutions learn in a supervised framework, whic...

Please sign up or login with your details

Forgot password? Click here to reset