User-Oriented Robust Reinforcement Learning

02/15/2022
by   Haoyi You, et al.
0

Recently, improving the robustness of policies across different environments attracts increasing attention in the reinforcement learning (RL) community. Existing robust RL methods mostly aim to achieve the max-min robustness by optimizing the policy's performance in the worst-case environment. However, in practice, a user that uses an RL policy may have different preferences over its performance across environments. Clearly, the aforementioned max-min robustness is oftentimes too conservative to satisfy user preference. Therefore, in this paper, we integrate user preference into policy learning in robust RL, and propose a novel User-Oriented Robust RL (UOR-RL) framework. Specifically, we define a new User-Oriented Robustness (UOR) metric for RL, which allocates different weights to the environments according to user preference and generalizes the max-min robustness metric. To optimize the UOR metric, we develop two different UOR-RL training algorithms for the scenarios with or without a priori known environment distribution, respectively. Theoretically, we prove that our UOR-RL training algorithms converge to near-optimal policies even with inaccurate or completely no knowledge about the environment distribution. Furthermore, we carry out extensive experimental evaluations in 4 MuJoCo tasks. The experimental results demonstrate that UOR-RL is comparable to the state-of-the-art baselines under the average and worst-case performance metrics, and more importantly establishes new state-of-the-art performance under the UOR metric.

READ FULL TEXT

page 11

page 39

research
03/18/2021

Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

In real-world tasks, reinforcement learning (RL) agents frequently encou...
research
06/12/2023

Robust Reinforcement Learning through Efficient Adversarial Herding

Although reinforcement learning (RL) is considered the gold standard for...
research
01/26/2023

Policy Optimization with Robustness Certificates

We present a policy optimization framework in which the learned policy c...
research
06/04/2021

Robustifying Reinforcement Learning Policies with ℒ_1 Adaptive Control

A reinforcement learning (RL) policy trained in a nominal environment co...
research
11/07/2022

Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

In the field of reinforcement learning, because of the high cost and ris...
research
10/21/2022

Group Distributionally Robust Reinforcement Learning with Hierarchical Latent Variables

One key challenge for multi-task Reinforcement learning (RL) in practice...
research
02/06/2023

Robust Subtask Learning for Compositional Generalization

Compositional reinforcement learning is a promising approach for trainin...

Please sign up or login with your details

Forgot password? Click here to reset