A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

06/02/2023
by   Mohammad Ghavamzadeh, et al.
0

Algorithms for offline bandits must optimize decisions in uncertain environments using only offline data. A compelling and increasingly popular objective in offline bandits is to learn a policy which achieves low Bayesian regret with high confidence. An appealing approach to this problem, inspired by recent offline reinforcement learning results, is to maximize a form of lower confidence bound (LCB). This paper proposes a new approach that directly minimizes upper bounds on Bayesian regret using efficient conic optimization solvers. Our bounds build on connections among Bayesian regret, Value-at-Risk (VaR), and chance-constrained optimization. Compared to prior work, our algorithm attains superior theoretical offline regret bounds and better results in numerical simulations. Finally, we provide some evidence that popular LCB-style algorithms may be unsuitable for minimizing Bayesian regret in offline bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2023

Logarithmic Bayes Regret Bounds

We derive the first finite-time logarithmic regret bounds for Bayesian b...
research
03/17/2020

Multi-action Offline Policy Learning with Bayesian Optimization

We study an offline multi-action policy learning algorithm based on doub...
research
06/11/2020

Bandits with Partially Observable Offline Data

We study linear contextual bandits with access to a large, partially obs...
research
10/24/2022

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...
research
05/21/2022

Pessimism for Offline Linear Contextual Bandits using ℓ_p Confidence Sets

We present a family {π̂}_p≥ 1 of pessimistic learning rules for offline ...
research
03/21/2021

UCB-based Algorithms for Multinomial Logistic Regression Bandits

Out of the rich family of generalized linear bandits, perhaps the most w...
research
11/13/2020

Improving Offline Contextual Bandits with Distributional Robustness

This paper extends the Distributionally Robust Optimization (DRO) approa...

Please sign up or login with your details

Forgot password? Click here to reset