Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning

04/22/2019
by   Erwan Lecarpentier, et al.
0

This work tackles the problem of robust zero-shot planning in non-stationary stochastic environments. We study Markov Decision Processes (MDPs) evolving over time and consider Model-Based Reinforcement Learning algorithms in this setting. We make two hypotheses: 1) the environment evolves continuously and its evolution rate is bounded, 2) a current model is known at each decision epoch but not its evolution. Our contribution can be presented in four points. First, we define this specific class of MDPs that we call Non-Stationary MDPs (NSMDPs). We introduce the notion of regular evolution by making an hypothesis of Lipschitz-Continuity on the transition and reward functions w.r.t. time. Secondly, we consider a planning agent using the current model of the environment, but unaware of its future evolution. This leads us to consider a worst-case method where the environment is seen as an adversarial agent. Third, following this approach, we propose the Risk-Averse Tree-Search (RATS) algorithm. This is a zero-shot Model-Based method similar to Minimax search. Finally, we illustrate the benefits brought by RATS empirically and compare its performance with reference Model-Based algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/24/2020

Efficient Learning in Non-Stationary Linear Markov Decision Processes

We study episodic reinforcement learning in non-stationary linear (a.k.a...
research
06/24/2020

Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision...
research
06/24/2020

Towards Minimax Optimal Reinforcement Learning in Factored Markov Decision Processes

We study minimax optimal reinforcement learning in episodic factored Mar...
research
12/12/2018

Transition Tensor Markov Decision Processes: Analyzing Shot Policies in Professional Basketball

In this paper we model basketball plays as episodes from team-specific n...
research
05/20/2021

Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection

Non-stationary environments are challenging for reinforcement learning a...
research
10/06/2021

Active Learning of Markov Decision Processes using Baum-Welch algorithm (Extended)

Cyber-physical systems (CPSs) are naturally modelled as reactive systems...
research
05/15/2019

Stochastic approximation with cone-contractive operators: Sharp ℓ_∞-bounds for Q-learning

Motivated by the study of Q-learning algorithms in reinforcement learnin...

Please sign up or login with your details

Forgot password? Click here to reset