Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

06/01/2020
by   Anoopkumar Sonar, et al.
10

A fundamental challenge in reinforcement learning is to learn policies that generalize beyond the operating domain experienced during training. In this paper, we approach this challenge through the following invariance principle: an agent must find a representation such that there exists an action-predictor built on top of this representation that is simultaneously optimal across all training domains. Intuitively, the resulting invariant policy enhances generalization by finding causes of successful actions. We propose a novel learning algorithm, Invariant Policy Optimization (IPO), that explicitly enforces this principle and learns an invariant policy during training. We compare our approach with standard policy gradient methods and demonstrate significant improvements in generalization performance on unseen domains for Linear Quadratic Regulator (LQR) problems and our own benchmark in the MiniGrid Gym environment.

READ FULL TEXT
research
02/20/2021

Decoupling Value and Policy for Generalization in Reinforcement Learning

Standard deep reinforcement learning algorithms use a shared representat...
research
06/07/2023

Generalization Across Observation Shifts in Reinforcement Learning

Learning policies which are robust to changes in the environment are cri...
research
04/07/2021

Unsupervised Visual Attention and Invariance for Reinforcement Learning

Vision-based reinforcement learning (RL) is successful, but how to gener...
research
11/06/2021

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

In reinforcement learning, continuous time is often discretized by a tim...
research
06/26/2018

Deictic Image Maps: An Abstraction For Learning Pose Invariant Manipulation Policies

In applications of deep reinforcement learning to robotics, it is often ...
research
06/05/2023

Explore to Generalize in Zero-Shot RL

We study zero-shot generalization in reinforcement learning - optimizing...
research
03/19/2020

Exchangeable Input Representations for Reinforcement Learning

Poor sample efficiency is a major limitation of deep reinforcement learn...

Please sign up or login with your details

Forgot password? Click here to reset