Generalization Across Observation Shifts in Reinforcement Learning

06/07/2023
by   Anuj Mahajan, et al.
0

Learning policies which are robust to changes in the environment are critical for real world deployment of Reinforcement Learning agents. They are also necessary for achieving good generalization across environment shifts. We focus on bisimulation metrics, which provide a powerful means for abstracting task relevant components of the observation and learning a succinct representation space for training the agent using reinforcement learning. In this work, we extend the bisimulation framework to also account for context dependent observation shifts. Specifically, we focus on the simulator based learning setting and use alternate observations to learn a representation space which is invariant to observation shifts using a novel bisimulation based objective. This allows us to deploy the agent to varying observation settings during test time and generalize to unseen scenarios. We further provide novel theoretical bounds for simulator fidelity and performance transfer guarantees for using a learnt policy to unseen shifts. Empirical analysis on the high-dimensional image based control domains demonstrates the efficacy of our method.

READ FULL TEXT
research
06/01/2020

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

A fundamental challenge in reinforcement learning is to learn policies t...
research
05/09/2023

Adaptive Domain Generalization for Digital Pathology Images

In AI-based histopathology, domain shifts are common and well-studied. H...
research
10/06/2022

Distributionally Adaptive Meta Reinforcement Learning

Meta-reinforcement learning algorithms provide a data-driven way to acqu...
research
06/27/2019

Emergence of Exploratory Look-Around Behaviors through Active Observation Completion

Standard computer vision systems assume access to intelligently captured...
research
05/26/2020

Active Measure Reinforcement Learning for Observation Cost Minimization

Standard reinforcement learning (RL) algorithms assume that the observat...
research
04/06/2022

Standardized feature extraction from pairwise conflicts applied to the train rescheduling problem

We propose a train rescheduling algorithm which applies a standardized f...
research
06/26/2018

Deictic Image Maps: An Abstraction For Learning Pose Invariant Manipulation Policies

In applications of deep reinforcement learning to robotics, it is often ...

Please sign up or login with your details

Forgot password? Click here to reset