Provable Reinforcement Learning with a Short-Term Memory

02/08/2022
by   Yonathan Efroni, et al.
0

Real-world sequential decision making problems commonly involve partial observability, which requires the agent to maintain a memory of history in order to infer the latent states, plan and make good decisions. Coping with partial observability in general is extremely challenging, as a number of worst-case statistical and computational barriers are known in learning Partially Observable Markov Decision Processes (POMDPs). Motivated by the problem structure in several physical applications, as well as a commonly used technique known as "frame stacking", this paper proposes to study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length m. We establish a set of upper and lower bounds on the sample complexity for learning near-optimal policies for this class of problems in both tabular and rich-observation settings (where the number of observations is enormous). In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially with the short length m rather than the problem horizon, and is independent of the number of observations. Our results show that a short-term memory suffices for reinforcement learning in these environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/22/2020

Sample-Efficient Reinforcement Learning of Undercomplete POMDPs

Partial observability is a common challenge in many reinforcement learni...
research
04/19/2022

When Is Partially Observable Reinforcement Learning Not Scary?

Applications of Reinforcement Learning (RL), in which agents learn to ma...
research
07/06/2023

Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

This paper studies the sample-efficiency of learning in Partially Observ...
research
07/01/2023

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

The general sequential decision-making problem, which includes Markov de...
research
06/14/2023

Theoretical Hardness and Tractability of POMDPs in RL with Partial Hindsight State Information

Partially observable Markov decision processes (POMDPs) have been widely...
research
06/16/2021

How memory architecture affects performance and learning in simple POMDPs

Reinforcement learning is made much more complex when the agent's observ...
research
11/02/2012

Learning classifier systems with memory condition to solve non-Markov problems

In the family of Learning Classifier Systems, the classifier system XCS ...

Please sign up or login with your details

Forgot password? Click here to reset