Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

03/08/2022
by   Yu-Heng Hung, et al.
0

Reward-biased maximum likelihood estimation (RBMLE) is a classic principle in the adaptive control literature for tackling explore-exploit trade-offs. This paper studies the stochastic contextual bandit problem with general bounded reward functions and proposes NeuralRBMLE, which adapts the RBMLE principle by adding a bias term to the log-likelihood to enforce exploration. NeuralRBMLE leverages the representation power of neural networks and directly encodes exploratory behavior in the parameter space, without constructing confidence intervals of the estimated rewards. We propose two variants of NeuralRBMLE algorithms: The first variant directly obtains the RBMLE estimator by gradient ascent, and the second variant simplifies RBMLE to a simple index policy through an approximation. We show that both algorithms achieve 𝒪(√(T)) regret. Through extensive experiments, we demonstrate that the NeuralRBMLE algorithms achieve comparable or better empirical regrets than the state-of-the-art methods on real-world datasets with non-linear reward functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

Modifying the reward-biased maximum likelihood method originally propose...
research
07/02/2019

Bandit Learning Through Biased Maximum Likelihood Estimation

We propose BMLE, a new family of bandit algorithms, that are formulated ...
research
11/16/2020

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptiv...
research
03/12/2018

Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization o...
research
06/25/2018

Asymptotic Properties of Recursive Maximum Likelihood Estimation in Non-Linear State-Space Models

Using stochastic gradient search and the optimal filter derivative, it i...
research
10/12/2022

Maximum entropy exploration in contextual bandits with neural networks and energy based models

Contextual bandits can solve a huge range of real-world problems. Howeve...
research
10/01/2021

Relative Contagiousness of Emerging Virus Variants: An Analysis of SARS-CoV-2 Alpha and Delta Variants

We propose a simple dynamic model for estimating the relative contagious...

Please sign up or login with your details

Forgot password? Click here to reset