Approximate information state based convergence analysis of recurrent Q-learning

06/09/2023
by   Erfan Seyedsalehi, et al.
0

In spite of the large literature on reinforcement learning (RL) algorithms for partially observable Markov decision processes (POMDPs), a complete theoretical understanding is still lacking. In a partially observable setting, the history of data available to the agent increases over time so most practical algorithms either truncate the history to a finite window or compress it using a recurrent neural network leading to an agent state that is non-Markovian. In this paper, it is shown that in spite of the lack of the Markov property, recurrent Q-learning (RQL) converges in the tabular setting. Moreover, it is shown that the quality of the converged limit depends on the quality of the representation which is quantified in terms of what is known as an approximate information state (AIS). Based on this characterization of the approximation error, a variant of RQL with AIS losses is presented. This variant performs better than a strong baseline for RQL that does not use AIS losses. It is demonstrated that there is a strong correlation between the performance of RQL over time and the loss associated with the AIS representation.

READ FULL TEXT

page 8

page 21

page 22

research
11/06/2022

On learning history based policies for controlling Markov decision processes

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapprox...
research
05/15/2001

Market-Based Reinforcement Learning in Partially Observable Worlds

Unlike traditional reinforcement learning (RL), market-based RL is in pr...
research
03/06/2023

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Partially Observable Markov Decision Processes (POMDPs) are useful tools...
research
11/15/2021

The Partially Observable History Process

We introduce the partially observable history process (POHP) formalism f...
research
04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...
research
08/06/2022

Recurrent networks, hidden states and beliefs in partially observable environments

Reinforcement learning aims to learn optimal policies from interaction w...
research
11/20/2017

Is prioritized sweeping the better episodic control?

Episodic control has been proposed as a third approach to reinforcement ...

Please sign up or login with your details

Forgot password? Click here to reset