Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

07/06/2023
by   Jiacheng Guo, et al.
0

This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called “multiple observations in hindsight”, where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: multi-observation revealing POMDPs and distinguishable POMDPs. Both subclasses generalize and substantially relax revealing POMDPs – a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be different instead of linearly independent as required in revealing POMDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...
research
02/08/2022

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial ...
research
02/02/2023

Lower Bounds for Learning in Revealing POMDPs

This paper studies the fundamental limits of reinforcement learning (RL)...
research
09/29/2022

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

Partial Observability – where agents can only observe partial informatio...
research
06/02/2022

Sample-Efficient Reinforcement Learning of Partially Observable Markov Games

This paper considers the challenging tasks of Multi-Agent Reinforcement ...
research
05/07/2020

Reinforcement Learning with Feedback Graphs

We study episodic reinforcement learning in Markov decision processes wh...
research
03/09/2021

Sequential Importance Sampling With Corrections For Partially Observed States

We consider an evolving system for which a sequence of observations is b...

Please sign up or login with your details

Forgot password? Click here to reset