Fast Task Inference with Variational Intrinsic Successor Features

06/12/2019
by   Steven Hansen, et al.
0

It has been established that diverse behaviors spanning the controllable subspace of an Markov decision process can be trained by rewarding a policy for being distinguishable from other policies gregor2016variational, eysenbach2018diversity, warde2018unsupervised. However, one limitation of this formulation is generalizing behaviors beyond the finite set being explicitly learned, as is needed for use on subsequent tasks. Successor features dayan93improving, barreto2017successor provide an appealing solution to this generalization problem, but require defining the reward function as linear in some grounded feature space. In this paper, we show that these two techniques can be combined, and that each method solves the other's primary limitation. To do so we introduce Variational Intrinsic Successor FeatuRes (VISR), a novel algorithm which learns controllable features that can be leveraged to provide enhanced generalization and fast task inference through the successor feature framework. We empirically validate VISR on the full Atari suite, in a novel setup wherein the rewards are only exposed briefly after a long unsupervised phase. Achieving human-level performance on 14 games and beating all baselines, we believe VISR represents a step towards agents that rapidly learn from limited feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/17/2018

On Learning Intrinsic Rewards for Policy Gradient Methods

In many sequential decision making tasks, it is challenging to design re...
research
02/28/2023

Policy Dispersion in Non-Markovian Environment

Markov Decision Process (MDP) presents a mathematical framework to formu...
research
06/22/2022

Optimistic Linear Support and Successor Features as a Basis for Optimal Policy Transfer

In many real-world applications, reinforcement learning (RL) agents migh...
research
03/14/2021

Learning One Representation to Optimize All Rewards

We introduce the forward-backward (FB) representation of the dynamics of...
research
08/07/2021

Controllable Summarization with Constrained Markov Decision Process

We study controllable text summarization which allows users to gain cont...
research
10/03/2020

Disentangling causal effects for hierarchical reinforcement learning

Exploration and credit assignment under sparse rewards are still challen...

Please sign up or login with your details

Forgot password? Click here to reset