Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments

02/04/2023
by   Pouya Hamadanian, et al.
0

We study online Reinforcement Learning (RL) in non-stationary input-driven environments, where a time-varying exogenous input process affects the environment dynamics. Online RL is challenging in such environments due to catastrophic forgetting (CF). The agent tends to forget prior knowledge as it trains on new experiences. Prior approaches to mitigate this issue assume task labels (which are often not available in practice) or use off-policy methods that can suffer from instability and poor performance. We present Locally Constrained Policy Optimization (LCPO), an on-policy RL approach that combats CF by anchoring policy outputs on old experiences while optimizing the return on current experiences. To perform this anchoring, LCPO locally constrains policy optimization using samples from experiences that lie outside of the current input distribution. We evaluate LCPO in two gym and computer systems environments with a variety of synthetic and real input traces, and find that it outperforms state-of-the-art on-policy and off-policy RL methods in the online setting, while achieving results on-par with an offline agent pre-trained on the whole input trace.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

Non-stationarity arises in Reinforcement Learning (RL) even in stationar...
research
06/18/2020

Deep Reinforcement Learning amidst Lifelong Non-Stationarity

As humans, our goals and our environment are persistently changing throu...
research
04/06/2022

PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

Deep Reinforcement Learning (DRL) has been a promising solution to many ...
research
01/28/2022

Provably Efficient Primal-Dual Reinforcement Learning for CMDPs with Non-stationary Objectives and Constraints

We consider primal-dual-based reinforcement learning (RL) in episodic co...
research
06/06/2023

State Regularized Policy Optimization on Data with Dynamics Shift

In many real-world scenarios, Reinforcement Learning (RL) algorithms are...
research
01/14/2022

Reinforcement Learning in Time-Varying Systems: an Empirical Study

Recent research has turned to Reinforcement Learning (RL) to solve chall...
research
05/22/2022

A Dirichlet Process Mixture of Robust Task Models for Scalable Lifelong Reinforcement Learning

While reinforcement learning (RL) algorithms are achieving state-of-the-...

Please sign up or login with your details

Forgot password? Click here to reset