Block Contextual MDPs for Continual Learning

10/13/2021
by   Shagun Sodhani, et al.
8

In reinforcement learning (RL), when defining a Markov Decision Process (MDP), the environment dynamics is implicitly assumed to be stationary. This assumption of stationarity, while simplifying, can be unrealistic in many scenarios. In the continual reinforcement learning scenario, the sequence of tasks is another source of nonstationarity. In this work, we propose to examine this continual reinforcement learning setting through the block contextual MDP (BC-MDP) framework, which enables us to relax the assumption of stationarity. This framework challenges RL algorithms to handle both nonstationarity and rich observation settings and, by additionally leveraging smoothness properties, enables us to study generalization bounds for this setting. Finally, we take inspiration from adaptive control to propose a novel algorithm that addresses the challenges introduced by this more realistic BC-MDP setting, allows for zero-shot adaptation at evaluation time, and achieves strong performance on several nonstationary environments.

READ FULL TEXT
research
02/28/2022

Avalanche RL: a Continual Reinforcement Learning Library

Continual Reinforcement Learning (CRL) is a challenging setting where an...
research
07/22/2019

Efficient Policy Learning for Non-Stationary MDPs under Adversarial Manipulation

A Markov Decision Process (MDP) is a popular model for reinforcement lea...
research
12/06/2019

Observational Overfitting in Reinforcement Learning

A major component of overfitting in model-free reinforcement learning (R...
research
12/13/2021

Continual Learning In Environments With Polynomial Mixing Times

The mixing time of the Markov chain induced by a policy limits performan...
research
10/21/2022

Continual Reinforcement Learning with Group Symmetries

Continual reinforcement learning (RL) aims to learn a sequence of tasks ...
research
10/06/2022

Learning Algorithms for Intelligent Agents and Mechanisms

In this thesis, we research learning algorithms for optimal decision mak...
research
04/18/2020

Time Adaptive Reinforcement Learning

Reinforcement learning (RL) allows to solve complex tasks such as Go oft...

Please sign up or login with your details

Forgot password? Click here to reset