Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

03/12/2013
by   Yasin Abbasi-Yadkori, et al.
0

We study the problem of learning Markov decision processes with finite state and action spaces when the transition probability distributions and loss functions are chosen adversarially and are allowed to change with time. We introduce an algorithm whose regret with respect to any policy in a comparison class grows as the square root of the number of rounds of the game, provided the transition probabilities satisfy a uniform mixing condition. Our approach is efficient as long as the comparison class is polynomial and we can compute expectations over sample paths for each policy. Designing an efficient algorithm with small regret for the general case remains an open problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2019

On Online Learning in Kernelized Markov Decision Processes

We develop algorithms with low regret for learning episodic Markov decis...
research
02/21/2017

Fast rates for online learning in Linearly Solvable Markov Decision Processes

We study the problem of online learning in a class of Markov decision pr...
research
12/03/2019

Learning Adversarial MDPs with Bandit Feedback and Unknown Transition

We consider the problem of learning in episodic finite-horizon Markov de...
research
09/09/2019

An Efficient Algorithm for Multiple-Pursuer-Multiple-Evader Pursuit/Evasion Game

We present a method for pursuit/evasion that is highly efficient and and...
research
05/04/2020

Stateful Posted Pricing with Vanishing Regret via Dynamic Deterministic Markov Decision Processes

In this paper, a rather general online problem called dynamic resource a...
research
01/31/2021

Online Markov Decision Processes with Aggregate Bandit Feedback

We study a novel variant of online finite-horizon Markov Decision Proces...
research
06/29/2014

Thompson Sampling for Learning Parameterized Markov Decision Processes

We consider reinforcement learning in parameterized Markov Decision Proc...

Please sign up or login with your details

Forgot password? Click here to reset