EPOpt: Learning Robust Neural Network Policies Using Model Ensembles

10/05/2016
by   Aravind Rajeswaran, et al.
0

Sample complexity and safety are major challenges when learning policies with reinforcement learning for real-world tasks, especially when the policies are represented using rich function approximators like deep neural networks. Model-based methods where the real-world target domain is approximated using a simulated source domain provide an avenue to tackle the above challenges by augmenting real data with simulated data. However, discrepancies between the simulated source domain and the target domain pose a challenge for simulated training. We introduce the EPOpt algorithm, which uses an ensemble of simulated source domains and a form of adversarial training to learn policies that are robust and generalize to a broad range of possible target domains, including unmodeled effects. Further, the probability distribution over source domains in the ensemble can be adapted using data from target domain and approximate Bayesian methods, to progressively make it a better approximation. Thus, learning on a model ensemble, along with source domain adaptation, provides the benefit of both robustness and learning/adaptation.

READ FULL TEXT

page 6

page 12

page 13

research
11/12/2020

Learning causal representations for robust domain adaptation

Domain adaptation solves the learning problem in a target domain by leve...
research
04/04/2023

MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds

Unsupervised domain adaptation (UDA) addresses the problem of distributi...
research
01/15/2021

SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

As learning-based approaches progress towards automating robot controlle...
research
05/19/2018

Learning Sampling Policies for Domain Adaptation

We address the problem of semi-supervised domain adaptation of classific...
research
08/27/2020

Domain Adaptation Through Task Distillation

Deep networks devour millions of precisely annotated images to build the...
research
06/24/2020

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers

We propose a simple, practical, and intuitive approach for domain adapta...
research
05/28/2023

Cross-Domain Policy Adaptation via Value-Guided Data Filtering

Generalizing policies across different domains with dynamics mismatch po...

Please sign up or login with your details

Forgot password? Click here to reset