Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

06/14/2023
by   Ming Shi, et al.
0

Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much hindsight state information (HSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full HSI, we need an exponentially scaling sample complexity to obtain an ϵ-optimal policy solution for POMDPs. Nonetheless, from the key insights in our lower-bound construction, we find that there exist important tractable classes of POMDPs even with partial HSI. In particular, for two novel classes of POMDPs with partial HSI, we provide new algorithms that are shown to be near-optimal by establishing new upper and lower bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/29/2022

Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

Partial Observability – where agents can only observe partial informatio...
research
06/22/2020

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Partial observability is a common challenge in many reinforcement learni...
research
02/08/2022

Provable Reinforcement Learning with a Short-Term Memory

Real-world sequential decision making problems commonly involve partial ...
research
08/11/2022

Best Policy Identification in Linear MDPs

We investigate the problem of best policy identification in discounted l...
research
02/02/2023

Lower Bounds for Learning in Revealing POMDPs

This paper studies the fundamental limits of reinforcement learning (RL)...
research
02/14/2022

Towards Deployment-Efficient Reinforcement Learning: Lower Bound and Optimality

Deployment efficiency is an important criterion for many real-world appl...
research
01/31/2022

Fundamental Performance Limits for Sensor-Based Robot Control and Policy Learning

Our goal is to develop theory and algorithms for establishing fundamenta...

Please sign up or login with your details

Forgot password? Click here to reset