Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

11/11/2016
by   Kamyar Azizzadenesheli, et al.
0

Designing effective exploration-exploitation algorithms in Markov decision processes (MDPs) with large state-action spaces is the main challenge in reinforcement learning (RL). In fact, the learning performance degrades with the number of states and actions in the MDP. However, MDPs often exhibit a low-dimensional latent structure in practice, where a small hidden state is observable through a possibly large number of observations. In this paper, we study the setting of rich-observation Markov decision processes (), where hidden states are mapped to observations through an injective mapping, so that an observation can be generated by only one hidden state. While this mapping is unknown a priori, we introduce a spectral decomposition method that consistently estimates how observations are clustered in the hidden states. The estimated clustering is then integrated into an optimistic algorithm for RL (UCRL), which operates on the smaller clustered space. The resulting algorithm proceeds through phases and we show that its per-step regret (i.e., the difference in cumulative reward between the algorithm and the optimal policy) decreases as more observations are clustered together and finally, matches the (ideal) performance of an RL algorithm running directly on the hidden MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

We present BRIEE (Block-structured Representation learning with Interlea...
research
06/23/2020

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Modern tasks in reinforcement learning are always with large state and a...
research
08/24/2023

Optimal data pooling for shared learning in maintenance operations

This paper addresses the benefits of pooling data for shared learning in...
research
05/07/2017

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

We propose a new reinforcement learning algorithm for partially observab...
research
06/09/2009

Feature Reinforcement Learning: Part I: Unstructured MDPs

General-purpose, intelligent, learning agents cycle through sequences of...
research
06/23/2022

Recursive Reinforcement Learning

Recursion is the fundamental paradigm to finitely describe potentially i...
research
01/25/2019

Provably efficient RL with Rich Observations via Latent State Decoding

We study the exploration problem in episodic MDPs with rich observations...

Please sign up or login with your details

Forgot password? Click here to reset