Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning

In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.

READ FULL TEXT

page 3

page 9

page 19

page 23

page 24

page 26

research
10/31/2020

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn...
research
01/22/2022

Bag of Tricks for Natural Policy Gradient Reinforcement Learning

Natural policy gradient methods are popular reinforcement learning metho...
research
04/08/2019

Samples are not all useful: Denoising policy gradient updates using variance

Policy gradient algorithms in reinforcement learning rely on efficiently...
research
07/10/2023

Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data

In some applications of reinforcement learning, a dataset of pre-collect...
research
01/01/2020

Long-Term Visitation Value for Deep Exploration in Sparse Reward Reinforcement Learning

Reinforcement learning with sparse rewards is still an open challenge. C...
research
11/12/2020

Steady State Analysis of Episodic Reinforcement Learning

This paper proves that the episodic learning environment of every finite...
research
02/22/2022

Continual Auxiliary Task Learning

Learning auxiliary tasks, such as multiple predictions about the world, ...

Please sign up or login with your details

Forgot password? Click here to reset