Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

07/17/2023
by   Ruida Zhou, et al.
0

We study robust reinforcement learning (RL) with the goal of determining a well-performing policy that is robust against model mismatch between the training simulator and the testing environment. Previous policy-based robust RL algorithms mainly focus on the tabular setting under uncertainty sets that facilitate robust policy evaluation, but are no longer tractable when the number of states scales up. To this end, we propose two novel uncertainty set formulations, one based on double sampling and the other on an integral probability metric. Both make large-scale robust RL tractable even when one only has access to a simulator. We propose a robust natural actor-critic (RNAC) approach that incorporates the new uncertainty sets and employs function approximation. We provide finite-time convergence guarantees for the proposed RNAC algorithm to the optimal robust policy within the function approximation error. Finally, we demonstrate the robust performance of the policy learned by our proposed RNAC approach in multiple MuJoCo environments and a real-world TurtleBot navigation task.

READ FULL TEXT
research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
05/24/2023

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

Actor-critic (AC) methods are widely used in reinforcement learning (RL)...
research
09/14/2022

Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

Among the reasons hindering reinforcement learning (RL) applications to ...
research
10/07/2022

How to Enable Uncertainty Estimation in Proximal Policy Optimization

While deep reinforcement learning (RL) agents have showcased strong resu...
research
10/31/2021

An Actor-Critic Method for Simulation-Based Optimization

We focus on a simulation-based optimization problem of choosing the best...
research
11/07/2022

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

In the field of reinforcement learning, because of the high cost and ris...
research
03/22/2021

Provably Correct Optimization and Exploration with Non-linear Policies

Policy optimization methods remain a powerful workhorse in empirical Rei...

Please sign up or login with your details

Forgot password? Click here to reset