Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization

12/20/2021
by   Yufei Kuang, et al.
4

Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm – namely, state-conservative policy optimization (SCPO) – to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.

READ FULL TEXT

page 7

page 9

page 13

page 14

research
12/16/2021

Sample-Efficient Reinforcement Learning via Conservative Model-Based Actor-Critic

Model-based reinforcement learning algorithms, which aim to learn a mode...
research
07/18/2019

Transfer Learning Across Simulated Robots With Different Sensors

For a robot to learn a good policy, it often requires expensive equipmen...
research
06/01/2020

Robust Reinforcement Learning with Wasserstein Constraint

Robust Reinforcement Learning aims to find the optimal policy with some ...
research
09/23/2022

Quantification before Selection: Active Dynamics Preference for Robust Reinforcement Learning

Training a robust policy is critical for policy deployment in real-world...
research
10/27/2021

Dream to Explore: Adaptive Simulations for Autonomous Systems

One's ability to learn a generative model of the world without supervisi...
research
12/06/2022

ISAACS: Iterative Soft Adversarial Actor-Critic for Safety

The deployment of robots in uncontrolled environments requires them to o...
research
02/29/2020

Contextual Policy Reuse using Deep Mixture Models

Reinforcement learning methods that consider the context, or current sta...

Please sign up or login with your details

Forgot password? Click here to reset