Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

11/11/2016

∙

Designing effective exploration-exploitation algorithms in Markov decision processes (MDPs) with large state-action spaces is the main challenge in reinforcement learning (RL). In fact, the learning performance degrades with the number of states and actions in the MDP. However, MDPs often exhibit a low-dimensional latent structure in practice, where a small hidden state is observable through a possibly large number of observations. In this paper, we study the setting of rich-observation Markov decision processes (), where hidden states are mapped to observations through an injective mapping, so that an observation can be generated by only one hidden state. While this mapping is unknown a priori, we introduce a spectral decomposition method that consistently estimates how observations are clustered in the hidden states. The estimated clustering is then integrated into an optimistic algorithm for RL (UCRL), which operates on the smaller clustered space. The resulting algorithm proceeds through phases and we show that its per-step regret (i.e., the difference in cumulative reward between the algorithm and the optimal policy) decreases as more observations are clustered together and finally, matches the (ideal) performance of an RL algorithm running directly on the hidden MDP.

READ FULL TEXT

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Optimal data pooling for shared learning in maintenance operations

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

Feature Reinforcement Learning: Part I: Unstructured MDPs

Recursive Reinforcement Learning

Provably efficient RL with Rich Observations via Latent State Decoding

Reinforcement Learning in Rich-Observation MDPs using Spectral Methods

Related Research

Efficient Reinforcement Learning in Block MDPs: A Model-free Representation Learning Approach

Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

Optimal data pooling for shared learning in maintenance operations

Experimental results : Reinforcement Learning of POMDPs using Spectral Methods

Feature Reinforcement Learning: Part I: Unstructured MDPs

Recursive Reinforcement Learning

Provably efficient RL with Rich Observations via Latent State Decoding