A Unified Analysis of Nonstochastic Delayed Feedback for Combinatorial Semi-Bandits, Linear Bandits, and MDPs

05/15/2023
by   Dirk van der Hoeven, et al.
0

We derive a new analysis of Follow The Regularized Leader (FTRL) for online learning with delayed bandit feedback. By separating the cost of delayed feedback from that of bandit feedback, our analysis allows us to obtain new results in three important settings. On the one hand, we derive the first optimal (up to logarithmic factors) regret bounds for combinatorial semi-bandits with delay and adversarial Markov decision processes with delay (and known transition functions). On the other hand, we use our analysis to derive an efficient algorithm for linear bandits with delay achieving near-optimal regret bounds. Our novel regret decomposition shows that FTRL remains stable across multiple rounds under mild assumptions on the Hessian of the regularizer.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Banker Online Mirror Descent

We propose Banker-OMD, a novel framework generalizing the classical Onli...
research
02/20/2023

A Blackbox Approach to Best of Both Worlds in Bandits and Beyond

Best-of-both-worlds algorithms for online learning which achieve near-op...
research
07/21/2022

Delayed Feedback in Generalised Linear Bandits Revisited

The stochastic generalised linear bandit is a well-understood model for ...
research
12/29/2020

Learning Adversarial Markov Decision Processes with Delayed Feedback

Reinforcement learning typically assumes that the agent observes feedbac...
research
05/18/2022

Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs

Reinforcement learning (RL) generalizes bandit problems with additional ...
research
12/17/2020

Experts with Lower-Bounded Loss Feedback: A Unifying Framework

The most prominent feedback models for the best expert problem are the f...
research
03/17/2015

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

We propose a sample-efficient alternative for importance weighting for s...

Please sign up or login with your details

Forgot password? Click here to reset