Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

06/04/2021
by   Bogdan Mazoure, et al.
0

A highly desirable property of a reinforcement learning (RL) agent – and a major difficulty for deep RL approaches – is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training. Many promising approaches to this challenge consider RL as a process of training two functions simultaneously: a complex nonlinear encoder that maps high-dimensional observations to a latent representation space, and a simple linear policy over this space. We posit that a superior encoder for zero-shot generalization in RL can be trained by using solely an auxiliary SSL objective if the training process encourages the encoder to map behaviorally similar observations to similar representations, as reward-based signal can cause overfitting in the encoder (Raileanu et al., 2021). We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations by applying a novel SSL objective to pairs of trajectories from the agent's policies. CTRL can be viewed as having the same effect as inducing a pseudo-bisimulation metric but, crucially, avoids the use of rewards and associated overfitting risks. Our experiments ablate various components of CTRL and demonstrate that in combination with PPO it achieves better generalization performance on the challenging Procgen benchmark suite (Cobbe et al., 2020).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2021

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement lear...
research
11/29/2021

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Reinforcement learning (RL) agents are widely used for solving complex s...
research
09/12/2022

Unified State Representation Learning under Data Augmentation

The capacity for rapid domain adaptation is important to increasing the ...
research
02/10/2021

Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation

Despite the recent success of deep reinforcement learning (RL), domain a...
research
11/28/2022

Tackling Visual Control via Multi-View Exploration Maximization

We present MEM: Multi-view Exploration Maximization for tackling complex...
research
03/04/2021

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Current deep reinforcement learning (RL) algorithms are still highly tas...
research
07/03/2022

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...

Please sign up or login with your details

Forgot password? Click here to reset