First-order Policy Optimization for Robust Policy Evaluation

07/29/2023
∙
by   Yan Li, et al.
∙
0
∙

We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with s-rectangular ambiguity sets. The developed method, named first-order policy evaluation (FRPE), provides the first unified framework for robust policy evaluation in both deterministic (offline) and stochastic (online) settings, with either tabular representation or generic function approximation. In particular, we establish linear convergence in the deterministic setting, and 𝒊Ėƒ(1/Ïĩ^2) sample complexity in the stochastic setting. FRPE also extends naturally to evaluating the robust state-action value function with (s, a)-rectangular ambiguity sets. We discuss the application of the developed results for stochastic policy optimization of large-scale robust MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 09/21/2022

First-order Policy Optimization for Robust Markov Decision Process

We consider the problem of solving robust Markov decision process (MDP),...
research
∙ 06/20/2020

Model-Free Robust Reinforcement Learning with Linear Function Approximation

This paper addresses the problem of model-free reinforcement learning fo...
research
∙ 05/24/2019

Semi-Parametric Efficient Policy Learning with Continuous Actions

We consider off-policy evaluation and optimization with continuous actio...
research
∙ 01/24/2022

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

We propose the homotopic policy mirror descent (HPMD) method for solving...
research
∙ 09/16/2022

Towards A Unified Policy Abstraction Theory and Representation Learning Approach in Markov Decision Processes

Lying on the heart of intelligent decision-making systems, how policy is...
research
∙ 11/19/2010

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

We investigate projection methods, for evaluating a linear approximation...
research
∙ 10/23/2019

High-Confidence Policy Optimization: Reshaping Ambiguity Sets in Robust MDPs

Robust MDPs are a promising framework for computing robust policies in r...

Please sign up or login with your details

Forgot password? Click here to reset