Skill-Based Reinforcement Learning with Intrinsic Reward Matching

10/14/2022
by   Ademi Adeniji, et al.
0

While unsupervised skill discovery has shown promise in autonomously acquiring behavioral primitives, there is still a large methodological disconnect between task-agnostic skill pretraining and downstream, task-aware finetuning. We present Intrinsic Reward Matching (IRM), which unifies these two phases of learning via the skill discriminator, a pretraining model component often discarded during finetuning. Conventional approaches finetune pretrained agents directly at the policy level, often relying on expensive environment rollouts to empirically determine the optimal skill. However, often the most concise yet complete description of a task is the reward function itself, and skill learning methods learn an intrinsic reward function via the discriminator that corresponds to the skill policy. We propose to leverage the skill discriminator to match the intrinsic and downstream task rewards and determine the optimal skill for an unseen task without environment samples, consequently finetuning with greater sample-efficiency. Furthermore, we generalize IRM to sequence skills and solve more complex, long-horizon tasks. We demonstrate that IRM is competitive with previous skill selection methods on the Unsupervised Reinforcement Learning Benchmark and enables us to utilize pretrained skills far more effectively on challenging tabletop manipulation tasks.

READ FULL TEXT
research
10/06/2021

The Information Geometry of Unsupervised Reinforcement Learning

How can a reinforcement learning (RL) agent prepare to solve downstream ...
research
08/24/2023

APART: Diverse Skill Discovery using All Pairs with Ascending Reward and DropouT

We study diverse skill discovery in reward-free environments, aiming to ...
research
07/29/2021

Learning more skills through optimistic exploration

Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach e...
research
05/23/2022

POLTER: Policy Trajectory Ensemble Regularization for Unsupervised Reinforcement Learning

The goal of Unsupervised Reinforcement Learning (URL) is to find a rewar...
research
02/04/2023

Hierarchical Learning with Unsupervised Skill Discovery for Highway Merging Applications

Driving in dense traffic with human and autonomous drivers is a challeng...
research
10/23/2021

Guided Policy Search for Parameterized Skills using Adverbs

We present a method for using adverb phrases to adjust skill parameters ...
research
05/25/2022

Skill Machines: Temporal Logic Composition in Reinforcement Learning

A major challenge in reinforcement learning is specifying tasks in a man...

Please sign up or login with your details

Forgot password? Click here to reset