Goal-Conditioned Generators of Deep Policies

07/04/2022
by   Francesco Faccio, et al.
5

Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a desired expected return," our NN generators combine powerful exploration of parameter space with generalization across commands to iteratively find better and better policies. A form of weight-sharing HyperNetworks and policy embeddings scales our method to generate deep NNs. Experiments show how a single learned policy generator can produce policies that achieve any return seen during training. Finally, we evaluate our algorithm on a set of continuous control tasks where it exhibits competitive performance. Our code is public.

READ FULL TEXT

page 2

page 4

page 5

page 9

page 10

page 11

page 14

page 16

research
07/04/2022

General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

Learning to evaluate and improve policies is a core problem of Reinforce...
research
04/23/2021

DisCo RL: Distribution-Conditioned Reinforcement Learning for General-Purpose Policies

Can we use reinforcement learning to learn general-purpose policies that...
research
07/05/2019

Self-supervised Learning of Distance Functions for Goal-Conditioned Reinforcement Learning

Goal-conditioned policies are used in order to break down complex reinfo...
research
10/07/2022

Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules

Work on fast weight programmers has demonstrated the effectiveness of ke...
research
11/01/2022

Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (RL) is a promising direction fo...
research
10/24/2022

Dichotomy of Control: Separating What You Can Control from What You Cannot

Future- or return-conditioned supervised learning is an emerging paradig...
research
02/23/2022

Learning Relative Return Policies With Upside-Down Reinforcement Learning

Lately, there has been a resurgence of interest in using supervised lear...

Please sign up or login with your details

Forgot password? Click here to reset