Policy Transfer with Strategy Optimization

10/12/2018
by   Wenhao Yu, et al.
0

Computer simulation provides an automatic and safe way for training robotic control policies to achieve complex tasks such as locomotion. However, a policy trained in simulation usually does not transfer directly to the real hardware due to the differences between the two environments. Transfer learning using domain randomization is a promising approach, but it usually assumes that the target environment is close to the distribution of the training environments, thus relying heavily on accurate system identification. In this paper, we present a different approach that leverages domain randomization for transferring control policies to unknown environments. The key idea that, instead of learning a single policy in the simulation, we simultaneously learn a family of policies that exhibit different behaviors. When tested in the target environment, we directly search for the best policy in the family based on the task performance, without the need to identify the dynamic parameters. We evaluate our method on five simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy.

READ FULL TEXT
research
02/09/2023

AdaptSim: Task-Driven Simulation Adaptation for Sim-to-Real Transfer

Simulation parameter settings such as contact models and object geometry...
research
07/01/2021

Policy Transfer across Visual and Dynamics Domain Gaps via Iterative Grounding

The ability to transfer a policy from one environment to another is a pr...
research
07/13/2021

Distributionally Robust Policy Learning via Adversarial Environment Generation

Our goal is to train control policies that generalize well to unseen env...
research
09/23/2022

Quantification before Selection: Active Dynamics Preference for Robust Reinforcement Learning

Training a robust policy is critical for policy deployment in real-world...
research
05/19/2022

Concurrent Policy Blending and System Identification for Generalized Assistive Control

In this work, we address the problem of solving complex collaborative ro...
research
11/03/2020

Policy Transfer via Kinematic Domain Randomization and Adaptation

Transferring reinforcement learning policies trained in physics simulati...
research
09/28/2021

Not Only Domain Randomization: Universal Policy with Embedding System Identification

Domain randomization (DR) cannot provide optimal policies for adapting t...

Please sign up or login with your details

Forgot password? Click here to reset