SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

by   Yifeng Jiang, et al.

As learning-based approaches progress towards automating robot controllers design, transferring learned policies to new domains with different dynamics (e.g. sim-to-real transfer) still demands manual effort. This paper introduces SimGAN, a framework to tackle domain adaptation by identifying a hybrid physics simulator to match the simulated trajectories to the ones from the target domain, using a learned discriminative loss to address the limitations associated with manual loss design. Our hybrid simulator combines neural networks and traditional physics simulaton to balance expressiveness and generalizability, and alleviates the need for a carefully selected parameter set in System ID. Once the hybrid simulator is identified via adversarial reinforcement learning, it can be used to refine policies for the target domain, without the need to collect more data. We show that our approach outperforms multiple strong baselines on six robotic locomotion tasks for domain adaptation.


page 1

page 2

page 3

page 4


Learning Sampling Policies for Domain Adaptation

We address the problem of semi-supervised domain adaptation of classific...

EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

Sample complexity and safety are major challenges when learning policies...

Domain Adversarial Reinforcement Learning for Partial Domain Adaptation

Partial domain adaptation aims to transfer knowledge from a label-rich s...

Fisher Deep Domain Adaptation

Deep domain adaptation models learn a neural network in an unlabeled tar...

Compound Domain Adaptation in an Open World

Existing works on domain adaptation often assume clear boundaries betwee...

Stochastic Grounded Action Transformation for Robot Learning in Simulation

Robot control policies learned in simulation do not often transfer well ...

Learning and Deploying Robust Locomotion Policies with Minimal Dynamics Randomization

Training deep reinforcement learning (DRL) locomotion policies often req...