Environment Probing Interaction Policies

07/26/2019
by   Wenxuan Zhou, et al.
0

A key challenge in reinforcement learning (RL) is environment generalization: a policy trained to solve a task in one environment often fails to solve the same task in a slightly different test environment. A common approach to improve inter-environment transfer is to learn policies that are invariant to the distribution of testing environments. However, we argue that instead of being invariant, the policy should identify the specific nuances of an environment and exploit them to achieve better performance. In this work, we propose the 'Environment-Probing' Interaction (EPI) policy, a policy that probes a new environment to extract an implicit understanding of that environment's behavior. Once this environment-specific information is obtained, it is used as an additional input to a task-specific policy that can now perform environment-conditioned actions to solve a task. To learn these EPI-policies, we present a reward function based on transition predictability. Specifically, a higher reward is given if the trajectory generated by the EPI-policy can be used to better predict transitions. We experimentally show that EPI-conditioned task-specific policies significantly outperform commonly used policy generalization methods on novel testing environments.

READ FULL TEXT
research
07/06/2020

Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a s...
research
10/29/2021

Learning to Be Cautious

A key challenge in the field of reinforcement learning is to develop age...
research
05/24/2019

InfoRL: Interpretable Reinforcement Learning using Information Maximization

Recent advances in reinforcement learning have proved that given an envi...
research
07/07/2022

Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets

Inspired by Gibson's notion of object affordances in human vision, we as...
research
08/02/2022

Implicit Two-Tower Policies

We present a new class of structured reinforcement learning policy-archi...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...
research
11/19/2019

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

In this paper we target the problem of transferring policies across mult...

Please sign up or login with your details

Forgot password? Click here to reset