Partially Observable RL with B-Stability: Unified Structural Condition and Sharp Sample-Efficient Algorithms

09/29/2022
by   Fan Chen, et al.
4

Partial Observability – where agents can only observe partial information about the true underlying state of the system – is ubiquitous in real-world applications of Reinforcement Learning (RL). Theoretically, learning a near-optimal policy under partial observability is known to be hard in the worst case due to an exponential sample complexity lower bound. Recent work has identified several tractable subclasses that are learnable with polynomial samples, such as Partially Observable Markov Decision Processes (POMDPs) with certain revealing or decodability conditions. However, this line of research is still in its infancy, where (1) unified structural conditions enabling sample-efficient learning are lacking; (2) existing sample complexities for known tractable subclasses are far from sharp; and (3) fewer sample-efficient algorithms are available than in fully observable RL. This paper advances all three aspects above for Partially Observable RL in the general setting of Predictive State Representations (PSRs). First, we propose a natural and unified structural condition for PSRs called B-stability. B-stable PSRs encompasses the vast majority of known tractable subclasses such as weakly revealing POMDPs, low-rank future-sufficient POMDPs, decodable POMDPs, and regular PSRs. Next, we show that any B-stable PSR can be learned with polynomial samples in relevant problem parameters. When instantiated in the aforementioned subclasses, our sample complexities improve substantially over the current best ones. Finally, our results are achieved by three algorithms simultaneously: Optimistic Maximum Likelihood Estimation, Estimation-to-Decisions, and Model-Based Optimistic Posterior Sampling. The latter two algorithms are new for sample-efficient learning of POMDPs/PSRs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...
research
09/29/2022

Optimistic MLE – A Generic Model-based Algorithm for Partially Observable Sequential Decision Making

This paper introduces a simple efficient learning algorithms for general...
research
09/23/2022

Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning

Finding unified complexity measures and algorithms for sample-efficient ...
research
06/14/2023

Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Partially observable Markov decision processes (POMDPs) have been widely...
research
02/02/2023

Lower Bounds for Learning in Revealing POMDPs

This paper studies the fundamental limits of reinforcement learning (RL)...
research
07/06/2023

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

This paper studies the sample-efficiency of learning in Partially Observ...
research
07/01/2023

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

The general sequential decision-making problem, which includes Markov de...

Please sign up or login with your details

Forgot password? Click here to reset