Reset-Free Lifelong Learning with Skill-Space Planning

by   Kevin Lu, et al.

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.


Skill-based Model-based Reinforcement Learning

Model-based reinforcement learning (RL) is a sample-efficient way of lea...

HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory

Building Reinforcement Learning (RL) algorithms which are able to adapt ...

One After Another: Learning Incremental Skills for a Changing World

Reward-free, unsupervised discovery of skills is an attractive alternati...

BAM: Bayes with Adaptive Memory

Online learning via Bayes' theorem allows new data to be continuously in...

Learning to Plan Hierarchically from Curriculum

We present a framework for learning to plan hierarchically in domains wi...

Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models

A core challenge for an autonomous agent acting in the real world is to ...

Robust Hierarchical Planning with Policy Delegation

We propose a novel framework and algorithm for hierarchical planning bas...