Online learning in bandits with predicted context

07/26/2023
by   Yongyi Guo, et al.
0

We consider the contextual bandit problem where at each time, the agent only has access to a noisy version of the context and the error variance (or an estimator of this variance). This setting is motivated by a wide range of applications where the true context for decision-making is unobserved, and only a prediction of the context by a potentially complex machine learning algorithm is available. When the context error is non-diminishing, classical bandit algorithms fail to achieve sublinear regret. We propose the first online algorithm in this setting with sublinear regret compared to the appropriate benchmark. The key idea is to extend the measurement error model in classical statistics to the online decision-making setting, which is nontrivial due to the policy being dependent on the noisy context observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2012

Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret

Online learning algorithms are designed to learn even when their input i...
research
06/09/2023

Strategic Apple Tasting

Algorithmic decision-making in high-stakes domains often involves assign...
research
11/13/2019

Context-aware Dynamic Assets Selection for Online Portfolio Selection based on Contextual Bandit

Online portfolio selection is a sequential decision-making problem in fi...
research
07/09/2018

Online Scoring with Delayed Information: A Convex Optimization Viewpoint

We consider a system where agents enter in an online fashion and are eva...
research
01/31/2023

Improved Algorithms for Multi-period Multi-class Packing Problems with Bandit Feedback

We consider the linear contextual multi-class multi-period packing probl...
research
10/15/2020

Decision Making Problems with Funnel Structure: A Multi-Task Learning Approach with Application to Email Marketing Campaigns

This paper studies the decision making problem with Funnel Structure. Fu...
research
12/12/2022

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...

Please sign up or login with your details

Forgot password? Click here to reset