Distributionally Robust Policy Learning via Adversarial Environment Generation

07/13/2021
by   Allen Z. Ren, et al.
0

Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 2D/3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.

READ FULL TEXT

page 2

page 6

page 15

page 16

page 17

page 18

research
10/12/2018

Policy Transfer with Strategy Optimization

Computer simulation provides an automatic and safe way for training robo...
research
11/16/2021

Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data

We are motivated by the problem of learning policies for robotic systems...
research
10/24/2022

Learning Robust Real-World Dexterous Grasping Policies via Implicit Shape Augmentation

Dexterous robotic hands have the capability to interact with a wide vari...
research
08/05/2020

Generalization Guarantees for Multi-Modal Imitation Learning

Control policies from imitation learning can often fail to generalize to...
research
06/25/2021

Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning

Our goal is to perform out-of-distribution (OOD) detection, i.e., to det...
research
09/18/2022

Towards Robust Off-Policy Evaluation via Human Inputs

Off-policy Evaluation (OPE) methods are crucial tools for evaluating pol...
research
05/07/2022

Gaussian Process Self-triggered Policy Search in Weakly Observable Environments

The environments of such large industrial machines as waste cranes in wa...

Please sign up or login with your details

Forgot password? Click here to reset