Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

09/08/2023
by   David Yunis, et al.
0

Exploration in sparse-reward reinforcement learning is difficult due to the requirement of long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Typically such methods require a lengthy pretraining phase, especially in continuous action spaces, in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Such a method outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires orders-of-magnitude less computation in skill-generation and online rollouts.

READ FULL TEXT

page 9

page 10

page 18

research
05/26/2022

TempoRL: Temporal Priors for Exploration in Off-Policy Reinforcement Learning

Efficient exploration is a crucial challenge in deep reinforcement learn...
research
07/23/2022

Hierarchical Kickstarting for Skill Transfer in Reinforcement Learning

Practising and honing skills forms a fundamental component of how humans...
research
06/14/2023

Skill-Critic: Refining Learned Skills for Reinforcement Learning

Hierarchical reinforcement learning (RL) can accelerate long-horizon dec...
research
02/13/2018

Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control

Deep reinforcement learning has demonstrated increasing capabilities for...
research
09/25/2022

Temporally Extended Successor Representations

We present a temporally extended variation of the successor representati...
research
11/24/2022

SkillS: Adaptive Skill Sequencing for Efficient Temporally-Extended Exploration

The ability to effectively reuse prior knowledge is a key requirement wh...
research
01/10/2014

Exploiting generalisation symmetries in accuracy-based learning classifier systems: An initial study

Modern learning classifier systems typically exploit a niched genetic al...

Please sign up or login with your details

Forgot password? Click here to reset