Efficient Learning in Non-Stationary Linear Markov Decision Processes

10/24/2020
by   Ahmed Touati, et al.
2

We study episodic reinforcement learning in non-stationary linear (a.k.a. low-rank) Markov Decision Processes (MDPs), i.e, both the reward and transition kernel are linear with respect to a given feature map and are allowed to evolve either slowly or abruptly over time. For this problem setting, we propose OPT-WLSVI an optimistic model-free algorithm based on weighted least squares value iteration which uses exponential weights to smoothly forget data that are far in the past. We show that our algorithm, when competing against the best policy at each time, achieves a regret that is upped bounded by 𝒪(d^7/6H^2 Δ^1/3 K^2/3) where d is the dimension of the feature space, H is the planning horizon, K is the number of episodes and Δ is a suitable measure of non-stationarity of the MDP. This is the first regret bound for non-stationary reinforcement learning with linear function approximation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/14/2019

Variational Regret Bounds for Reinforcement Learning

We consider undiscounted reinforcement learning in Markov decision proce...
research
07/22/2019

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

A Markov Decision Process (MDP) is a popular model for reinforcement lea...
research
04/22/2019

Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

This work tackles the problem of robust zero-shot planning in non-statio...
research
03/24/2022

Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies

This paper gives the first polynomial-time algorithm for tabular Markov ...
research
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...
research
07/25/2022

Online Reinforcement Learning for Periodic MDP

We study learning in periodic Markov Decision Process(MDP), a special ty...
research
03/31/2020

Learning to Ask Medical Questions using Reinforcement Learning

We propose a novel reinforcement learning-based approach for adaptive an...

Please sign up or login with your details

Forgot password? Click here to reset