Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

12/03/2019
by   Tiancheng Jin, et al.
0

We consider the problem of learning in episodic finite-horizon Markov decision processes with unknown transition function, bandit feedback, and adversarial losses. We propose an efficient algorithm that achieves Õ(L|X|^2√(|A|T)) regret with high probability, where L is the horizon, |X| is the number of states, |A| is the number of actions, and T is the number of episodes. To the best of our knowledge, our algorithm is the first one to ensure sub-linear regret in this challenging setting. Our key technical contribution is to introduce an optimistic loss estimator that is inversely weighted by an upper occupancy bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

We study the reinforcement learning for finite-horizon episodic Markov d...
research
12/29/2020

Learning Adversarial Markov Decision Processes with Delayed Feedback

Reinforcement learning typically assumes that the agent observes feedbac...
research
06/08/2021

The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition

We consider the best-of-both-worlds problem for learning an episodic Mar...
research
01/31/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

We study a novel variant of online finite-horizon Markov Decision Proces...
research
03/12/2013

Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

We study the problem of learning Markov decision processes with finite s...
research
05/26/2022

Follow-the-Perturbed-Leader for Adversarial Markov Decision Processes with Bandit Feedback

We consider regret minimization for Adversarial Markov Decision Processe...
research
08/09/2014

Bandit Algorithms for Tree Search

Bandit based methods for tree search have recently gained popularity whe...

Please sign up or login with your details

Forgot password? Click here to reset