When Is Partially Observable Reinforcement Learning Not Scary?

04/19/2022
by   Qinghua Liu, et al.
0

Applications of Reinforcement Learning (RL), in which agents learn to make a sequence of decisions despite lacking complete information about the latent states of the controlled system, that is, they act under partial observability of the states, are ubiquitous. Partially observable RL can be notoriously difficult – well-known information-theoretic results show that learning partially observable Markov decision processes (POMDPs) requires an exponential number of samples in the worst case. Yet, this does not rule out the existence of large subclasses of POMDPs over which learning is tractable. In this paper we identify such a subclass, which we call weakly revealing POMDPs. This family rules out the pathological instances of POMDPs where observations are uninformative to a degree that makes learning hard. We prove that for weakly revealing POMDPs, a simple algorithm combining optimism and Maximum Likelihood Estimation (MLE) is sufficient to guarantee polynomial sample complexity. To the best of our knowledge, this is the first provably sample-efficient result for learning from interactions in overcomplete POMDPs, where the number of latent states can be larger than the number of observations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

Partial Observability – where agents can only observe partial informatio...
research
06/02/2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

This paper considers the challenging tasks of Multi-Agent Reinforcement ...
research
07/06/2023

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

This paper studies the sample-efficiency of learning in Partially Observ...
research
02/02/2023

Lower Bounds for Learning in Revealing POMDPs

This paper studies the fundamental limits of reinforcement learning (RL)...
research
02/08/2022

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial ...
research
11/01/2019

A2: Extracting Cyclic Switchings from DOB-nets for Rejecting Excessive Disturbances

Reinforcement Learning (RL) is limited in practice by its gray-box natur...
research
06/09/2023

Approximate information state based convergence analysis of recurrent Q-learning

In spite of the large literature on reinforcement learning (RL) algorith...

Please sign up or login with your details

Forgot password? Click here to reset