SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

06/17/2021
by   Linxi Fan, et al.
3

Generalization has been a long-standing challenge for reinforcement learning (RL). Visual RL, in particular, can be easily distracted by irrelevant factors in high-dimensional observation space. In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift. We propose SECANT, a novel self-expert cloning technique that leverages image augmentation in two stages to decouple robust representation learning from policy optimization. Specifically, an expert policy is first trained by RL from scratch with weak augmentations. A student network then learns to mimic the expert policy by supervised learning with strong augmentations, making its representation more robust against visual variations compared to the expert. Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: DeepMind Control (+26.5 autonomous driving (+47.7 release and video are available at https://linxifan.github.io/secant-site/.

READ FULL TEXT

page 1

page 2

page 5

page 7

page 15

research
10/01/2022

Zero-Shot Policy Transfer with Disentangled Task Representation of Meta-Reinforcement Learning

Humans are capable of abstracting various tasks as different combination...
research
06/04/2021

Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

A highly desirable property of a reinforcement learning (RL) agent – and...
research
04/23/2018

Zero-Shot Visual Imitation

The current dominant paradigm for imitation learning relies on strong su...
research
09/12/2022

Unified State Representation Learning under Data Augmentation

The capacity for rapid domain adaptation is important to increasing the ...
research
01/27/2020

Rotation, Translation, and Cropping for Zero-Shot Generalization

Deep Reinforcement Learning (DRL) has shown impressive performance on do...
research
10/17/2019

Scoring-Aggregating-Planning: Learning task-agnostic priors from interactions and sparse rewards for zero-shot generalization

Humans can learn task-agnostic priors from interactive experience and ut...
research
05/30/2023

Subequivariant Graph Reinforcement Learning in 3D Environments

Learning a shared policy that guides the locomotion of different agents ...

Please sign up or login with your details

Forgot password? Click here to reset