Unsupervised Control Through Non-Parametric Discriminative Rewards

11/28/2018
by   David Warde-Farley, et al.
0

Learning to control an environment without hand-crafted rewards or expert data remains challenging and is at the frontier of reinforcement learning research. We present an unsupervised learning algorithm to train agents to achieve perceptually-specified goals using only a stream of observations and actions. Our agent simultaneously learns a goal-conditioned policy and a goal achievement reward function that measures how similar a state is to the goal state. This dual optimization leads to a co-operative game, giving rise to a learned reward function that reflects similarity in controllable aspects of the environment instead of distance in the space of observations. We demonstrate the efficacy of our agent to learn, in an unsupervised manner, to reach a diverse set of goals on three domains -- Atari, the DeepMind Control Suite and DeepMind Lab.

READ FULL TEXT

page 8

page 9

page 10

page 16

page 17

research
04/11/2021

Learn Goal-Conditioned Policy with Intrinsic Motivation for Deep Reinforcement Learning

It is of significance for an agent to learn a widely applicable and gene...
research
06/24/2023

Learning from Pixels with Expert Observations

In reinforcement learning (RL), sparse rewards can present a significant...
research
06/23/2022

Walk the Random Walk: Learning to Discover and Reach Goals Without Supervision

Learning a diverse set of skills by interacting with an environment with...
research
12/03/2018

Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning

Advances in Deep Reinforcement Learning have led to agents that perform ...
research
12/18/2017

'Indifference' methods for managing agent rewards

Indifference is a class of methods that are used to control a reward bas...
research
09/05/2007

Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity

I postulate that human or other intelligent agents function or should fu...
research
02/23/2023

To the Noise and Back: Diffusion for Shared Autonomy

Shared autonomy is an operational concept in which a user and an autonom...

Please sign up or login with your details

Forgot password? Click here to reset