Online Policy Optimization for Robust MDP

09/28/2022
by   Jing Dong, et al.
0

Reinforcement learning (RL) has exceeded human performance in many synthetic settings such as video games and Go. However, real-world deployment of end-to-end RL models is less common, as RL models can be very sensitive to slight perturbation of the environment. The robust Markov decision process (MDP) framework – in which the transition probabilities belong to an uncertainty set around a nominal model – provides one way to develop robust models. While previous analysis shows RL algorithms are effective assuming access to a generative model, it remains unclear whether RL can be efficient under a more realistic online setting, which requires a careful balance between exploration and exploitation. In this work, we consider online robust MDP by interacting with an unknown nominal system. We propose a robust optimistic policy optimization algorithm that is provably efficient. To address the additional uncertainty caused by an adversarial environment, our model features a new optimistic update rule derived via Fenchel conjugates. Our analysis establishes the first regret bound for online robust MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2023

On Practical Robust Reinforcement Learning: Practical Uncertainty Set and Double-Agent Algorithm

We study a robust reinforcement learning (RL) with model uncertainty. Gi...
research
12/31/2021

Robust Entropy-regularized Markov Decision Processes

Stochastic and soft optimal policies resulting from entropy-regularized ...
research
03/13/2022

Policy Learning for Robust Markov Decision Process with a Mismatched Generative Model

In high-stake scenarios like medical treatment and auto-piloting, it's r...
research
06/09/2023

Robust Reinforcement Learning via Adversarial Kernel Approximation

Robust Markov Decision Processes (RMDPs) provide a framework for sequent...
research
05/23/2018

Representation Balancing MDPs for Off-Policy Policy Evaluation

We study the problem of off-policy policy evaluation (OPPE) in RL. In co...
research
05/31/2022

Graph Backup: Data Efficient Backup Exploiting Markovian Transitions

The successes of deep Reinforcement Learning (RL) are limited to setting...
research
06/15/2021

Fundamental Limits of Reinforcement Learning in Environment with Endogeneous and Exogeneous Uncertainty

Online reinforcement learning (RL) has been widely applied in informatio...

Please sign up or login with your details

Forgot password? Click here to reset