Tackling Unbounded State Spaces in Continuing Task Reinforcement Learning

06/02/2023
by   Brahma S. Pavse, et al.
0

While deep reinforcement learning (RL) algorithms have been successfully applied to many tasks, their inability to extrapolate and strong reliance on episodic resets inhibits their applicability to many real-world settings. For instance, in stochastic queueing problems, the state space can be unbounded and the agent may have to learn online without the system ever being reset to states the agent has seen before. In such settings, we show that deep RL agents can diverge into unseen states from which they can never recover due to the lack of resets, especially in highly stochastic environments. Towards overcoming this divergence, we introduce a Lyapunov-inspired reward shaping approach that encourages the agent to first learn to be stable (i.e. to achieve bounded cost) and then to learn to be optimal. We theoretically show that our reward shaping technique reduces the rate of divergence of the agent and empirically find that it prevents it. We further combine our reward shaping approach with a weight annealing scheme that gradually introduces optimality and log-transform of state inputs, and find that these techniques enable deep RL algorithms to learn high performing policies when learning online in unbounded state space domains.

READ FULL TEXT

page 7

page 15

research
12/24/2021

On the Unreasonable Efficiency of State Space Clustering in Personalization Tasks

In this effort we consider a reinforcement learning (RL) technique for s...
research
05/24/2017

State Space Decomposition and Subgoal Creation for Transfer in Deep Reinforcement Learning

Typical reinforcement learning (RL) agents learn to complete tasks speci...
research
11/22/2019

DeepSynth: Automata Synthesis for Automatic Task Segmentation in Deep Reinforcement Learning

We propose a method for effective training of deep Reinforcement Learnin...
research
06/25/2017

Count-Based Exploration in Feature Space for Reinforcement Learning

We introduce a new count-based optimistic exploration algorithm for Rein...
research
06/08/2020

Stable Reinforcement Learning with Unbounded State Space

We consider the problem of reinforcement learning (RL) with unbounded st...
research
05/23/2018

Discovering Blind Spots in Reinforcement Learning

Agents trained in simulation may make errors in the real world due to mi...
research
10/07/2022

Scaling Directed Controller Synthesis via Reinforcement Learning

Directed Controller Synthesis technique finds solutions for the non-bloc...

Please sign up or login with your details

Forgot password? Click here to reset