Skill-Critic: Refining Learned Skills for Reinforcement Learning

06/14/2023
by   Ce Hao, et al.
0

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills, i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data, but the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose fine-tuning the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low and high-level policies; these policies are also initialized and regularized by the latent space learned from offline demonstrations to guide the joint policy optimization. We validate our approach in multiple sparse RL environments, including a new sparse reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for optimal performance. Images and videos are available at https://sites.google.com/view/skill-critic. We plan to open source the code with the final version.

READ FULL TEXT
research
10/26/2022

Leveraging Demonstrations with Latent Space Priors

Demonstrations provide insight into relevant state or action space regio...
research
06/13/2019

Sub-policy Adaptation for Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning is a promising approach to long-hori...
research
10/04/2021

Skill Induction and Planning with Latent Language

We present a framework for learning hierarchical policies from demonstra...
research
09/08/2023

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

Exploration in sparse-reward reinforcement learning is difficult due to ...
research
10/27/2020

COG: Connecting New Skills to Past Experience with Offline Reinforcement Learning

Reinforcement learning has been applied to a wide variety of robotics pr...
research
11/20/2018

Model Learning for Look-ahead Exploration in Continuous Control

We propose an exploration method that incorporates look-ahead search ove...
research
02/09/2022

Bayesian Nonparametrics for Offline Skill Discovery

Skills or low-level policies in reinforcement learning are temporally ex...

Please sign up or login with your details

Forgot password? Click here to reset