Reward-Biased Maximum Likelihood Estimation for Linear Stochastic Bandits

10/08/2020
by   Yu-Heng Hung, et al.
10

Modifying the reward-biased maximum likelihood method originally proposed in the adaptive control literature, we propose novel learning algorithms to handle the explore-exploit trade-off in linear bandits problems as well as generalized linear bandits problems. We develop novel index policies that we prove achieve order-optimality, and show that they achieve empirical performance competitive with the state-of-the-art benchmark methods in extensive experiments. The new policies achieve this with low computation time per pull for linear bandits, and thereby resulting in both favorable regret as well as computational efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2019

Bandit Learning Through Biased Maximum Likelihood Estimation

We propose BMLE, a new family of bandit algorithms, that are formulated ...
research
03/08/2022

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...
research
04/29/2020

Whittle index based Q-learning for restless bandits with average reward

A novel reinforcement learning algorithm is introduced for multiarmed re...
research
11/16/2020

Reward Biased Maximum Likelihood Estimation for Reinforcement Learning

The principle of Reward-Biased Maximum Likelihood Estimate Based Adaptiv...
research
02/11/2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...
research
01/28/2023

(Private) Kernelized Bandits with Distributed Biased Feedback

In this paper, we study kernelized bandits with distributed biased feedb...
research
10/24/2022

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees

We study the problem of representation learning in stochastic contextual...

Please sign up or login with your details

Forgot password? Click here to reset