On learning history based policies for controlling Markov decision processes

11/06/2022
by   Gandharv Patil, et al.
0

Reinforcementlearning(RL)folkloresuggeststhathistory-basedfunctionapproximationmethods,suchas recurrent neural nets or history-based state abstraction, perform better than their memory-less counterparts, due to the fact that function approximation in Markov decision processes (MDP) can be viewed as inducing a Partially observable MDP. However, there has been little formal analysis of such history-based algorithms, as most existing frameworks focus exclusively on memory-less features. In this paper, we introduce a theoretical framework for studying the behaviour of RL algorithms that learn to control an MDP using history-based feature abstraction mappings. Furthermore, we use this framework to design a practical RL algorithm and we numerically evaluate its effectiveness on a set of continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/23/2022

Reinforcement Learning under Partial Observability Guided by Learned Environment Models

In practical applications, we can rarely assume full observability of a ...
research
06/09/2023

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorith...
research
08/21/2020

Refined Analysis of FPL for Adversarial Markov Decision Processes

We consider the adversarial Markov Decision Process (MDP) problem, where...
research
03/06/2023

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

Partially Observable Markov Decision Processes (POMDPs) are useful tools...
research
11/01/2019

A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances

Reinforcement Learning (RL) is limited in practice by its gray-box natur...
research
02/04/2023

Reinforcement Learning with History-Dependent Dynamic Contexts

We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a no...
research
02/07/2010

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

Adaptive control problems are notoriously difficult to solve even in the...

Please sign up or login with your details

Forgot password? Click here to reset