Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback

01/31/2022
by   Tiancheng Jin, et al.
0

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observed in delay. This paper studies online learning in episodic Markov decision process (MDP) with unknown transitions, adversarially changing costs, and unrestricted delayed bandit feedback. More precisely, the feedback for the agent in episode k is revealed only in the end of episode k + d^k, where the delay d^k can be changing over episodes and chosen by an oblivious adversary. We present the first algorithms that achieve near-optimal √(K + D) regret, where K is the number of episodes and D = ∑_k=1^K d^k is the total delay, significantly improving upon the best known regret bound of (K + D)^2/3.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2020

Learning Adversarial Markov Decision Processes with Delayed Feedback

Reinforcement learning typically assumes that the agent observes feedbac...
research
05/13/2023

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

Policy Optimization (PO) is one of the most popular methods in Reinforce...
research
05/18/2022

Slowly Changing Adversarial Bandit Algorithms are Provably Efficient for Discounted MDPs

Reinforcement learning (RL) generalizes bandit problems with additional ...
research
01/31/2022

Cooperative Online Learning in Stochastic and Adversarial MDPs

We study cooperative online learning in stochastic and adversarial Marko...
research
06/09/2021

Cooperative Online Learning

In this preliminary (and unpolished) version of the paper, we study an a...
research
06/07/2017

Efficient Reinforcement Learning via Initial Pure Exploration

In several realistic situations, an interactive learning agent can pract...
research
11/17/2020

REALab: An Embedded Perspective on Tampering

This paper describes REALab, a platform for embedded agency research in ...

Please sign up or login with your details

Forgot password? Click here to reset