Reset-Free Lifelong Learning with Skill-Space Planning

12/07/2020
by   Kevin Lu, et al.
0

The objective of lifelong reinforcement learning (RL) is to optimize agents which can continuously adapt and interact in changing environments. However, current RL approaches fail drastically when environments are non-stationary and interactions are non-episodic. We propose Lifelong Skill Planning (LiSP), an algorithmic framework for non-episodic lifelong RL based on planning in an abstract space of higher-order skills. We learn the skills in an unsupervised manner using intrinsic rewards and plan over the learned skills using a learned dynamics model. Moreover, our framework permits skill discovery even from offline data, thereby reducing the need for excessive real-world interactions. We demonstrate empirically that LiSP successfully enables long-horizon planning and learns agents that can avoid catastrophic failures even in challenging non-stationary and non-episodic environments derived from gridworld and MuJoCo benchmarks.

READ FULL TEXT
research
07/15/2022

Skill-based Model-based Reinforcement Learning

Model-based reinforcement learning (RL) is a sample-efficient way of lea...
research
03/21/2022

One After Another: Learning Incremental Skills for a Changing World

Reward-free, unsupervised discovery of skills is an attractive alternati...
research
09/18/2020

HTMRL: Biologically Plausible Reinforcement Learning with Hierarchical Temporal Memory

Building Reinforcement Learning (RL) algorithms which are able to adapt ...
research
10/23/2022

Guided Skill Learning and Abstraction for Long-Horizon Manipulation

To assist with everyday human activities, robots must solve complex long...
research
06/18/2019

Learning to Plan Hierarchically from Curriculum

We present a framework for learning to plan hierarchically in domains wi...
research
11/25/2021

Robot Skill Adaptation via Soft Actor-Critic Gaussian Mixture Models

A core challenge for an autonomous agent acting in the real world is to ...
research
02/04/2022

BAM: Bayes with Adaptive Memory

Online learning via Bayes' theorem allows new data to be continuously in...

Please sign up or login with your details

Forgot password? Click here to reset