Improving Offline Contextual Bandits with Distributional Robustness

11/13/2020
by   Otmane Sakhi, et al.
0

This paper extends the Distributionally Robust Optimization (DRO) approach for offline contextual bandits. Specifically, we leverage this framework to introduce a convex reformulation of the Counterfactual Risk Minimization principle. Besides relying on convex programs, our approach is compatible with stochastic optimization, and can therefore be readily adapted tothe large data regime. Our approach relies on the construction of asymptotic confidence intervals for offline contextual bandits through the DRO framework. By leveraging known asymptotic results of robust estimators, we also show how to automatically calibrate such confidence intervals, which in turn removes the burden of hyper-parameter selection for policy optimization. We present preliminary empirical results supporting the effectiveness of our approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2022

Pessimism for Offline Linear Contextual Bandits using ℓ_p Confidence Sets

We present a family {π̂}_p≥ 1 of pessimistic learning rules for offline ...
research
11/11/2021

Offline Contextual Bandits for Wireless Network Optimization

The explosion in mobile data traffic together with the ever-increasing e...
research
10/11/2016

Statistics of Robust Optimization: A Generalized Empirical Likelihood Approach

We study statistical inference and robust solution methods for stochasti...
research
10/24/2022

PAC-Bayesian Offline Contextual Bandits With Guarantees

This paper introduces a new principled approach for offline policy optim...
research
06/02/2023

A Convex Relaxation Approach to Bayesian Regret Minimization in Offline Bandits

Algorithms for offline bandits must optimize decisions in uncertain envi...
research
05/23/2023

Optimal Learning via Moderate Deviations Theory

This paper proposes a statistically optimal approach for learning a func...
research
07/15/2020

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

The principle of optimism in the face of uncertainty is one of the most ...

Please sign up or login with your details

Forgot password? Click here to reset