Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification

11/07/2022
by   Takumi Tanabe, et al.
0

In the field of reinforcement learning, because of the high cost and risk of policy training in the real world, policies are trained in a simulation environment and transferred to the corresponding real-world environment. However, the simulation environment does not perfectly mimic the real-world environment, lead to model misspecification. Multiple studies report significant deterioration of policy performance in a real-world environment. In this study, we focus on scenarios involving a simulation environment with uncertainty parameters and the set of their possible values, called the uncertainty parameter set. The aim is to optimize the worst-case performance on the uncertainty parameter set to guarantee the performance in the corresponding real-world environment. To obtain a policy for the optimization, we propose an off-policy actor-critic approach called the Max-Min Twin Delayed Deep Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min optimization problem using a simultaneous gradient ascent descent approach. Experiments in multi-joint dynamics with contact (MuJoCo) environments show that the proposed method exhibited a worst-case performance superior to several baseline approaches.

READ FULL TEXT

page 21

page 22

page 25

page 26

research
03/11/2018

Soft-Robust Actor-Critic Policy-Gradient

Robust Reinforcement Learning aims to derive an optimal behavior that ac...
research
05/08/2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning

In cooperative stochastic games multiple agents work towards learning jo...
research
07/17/2023

Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation

We study robust reinforcement learning (RL) with the goal of determining...
research
02/15/2022

User-Oriented Robust Reinforcement Learning

Recently, improving the robustness of policies across different environm...
research
12/06/2022

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

The deployment of robots in uncontrolled environments requires them to o...
research
09/15/2023

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Off-policy evaluation and learning are concerned with assessing a given ...
research
03/18/2021

Modeling the Second Player in Distributionally Robust Optimization

Distributionally robust optimization (DRO) provides a framework for trai...

Please sign up or login with your details

Forgot password? Click here to reset