Cross-Trajectory Representation Learning for Zero-Shot Generalization in RL

by   Bogdan Mazoure, et al.

A highly desirable property of a reinforcement learning (RL) agent – and a major difficulty for deep RL approaches – is the ability to generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training. Many promising approaches to this challenge consider RL as a process of training two functions simultaneously: a complex nonlinear encoder that maps high-dimensional observations to a latent representation space, and a simple linear policy over this space. We posit that a superior encoder for zero-shot generalization in RL can be trained by using solely an auxiliary SSL objective if the training process encourages the encoder to map behaviorally similar observations to similar representations, as reward-based signal can cause overfitting in the encoder (Raileanu et al., 2021). We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations by applying a novel SSL objective to pairs of trajectories from the agent's policies. CTRL can be viewed as having the same effect as inducing a pseudo-bisimulation metric but, crucially, avoids the use of rewards and associated overfitting risks. Our experiments ablate various components of CTRL and demonstrate that in combination with PPO it achieves better generalization performance on the challenging Procgen benchmark suite (Cobbe et al., 2020).


page 1

page 2

page 3

page 4


SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Generalization has been a long-standing challenge for reinforcement lear...

Improving Zero-shot Generalization in Offline Reinforcement Learning using Generalized Similarity Functions

Reinforcement learning (RL) agents are widely used for solving complex s...

Unified State Representation Learning under Data Augmentation

The capacity for rapid domain adaptation is important to increasing the ...

Domain Adaptation In Reinforcement Learning Via Latent Unified State Representation

Despite the recent success of deep reinforcement learning (RL), domain a...

Tackling Visual Control via Multi-View Exploration Maximization

We present MEM: Multi-view Exploration Maximization for tackling complex...

Continuous Coordination As a Realistic Scenario for Lifelong Learning

Current deep reinforcement learning (RL) algorithms are still highly tas...

Stabilizing Off-Policy Deep Reinforcement Learning from Pixels

Off-policy reinforcement learning (RL) from pixel observations is notori...

Please sign up or login with your details

Forgot password? Click here to reset