Representation Learning for Online and Offline RL in Low-rank MDPs

by   Masatoshi Uehara, et al.

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal, the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample complexity from O( A^9 d^7 / (ϵ^10 (1-γ)^22)) for FLAMBE to O( A^4 d^4 / (ϵ^2 (1-γ)^3) ) with d being the rank of the transition matrix (or dimension of the ground truth representation), A being the number of actions, and γ being the discounted factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline distribution.



There are no comments yet.


page 1

page 2

page 3

page 4


FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

In order to deal with the curse of dimensionality in reinforcement learn...

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

The success of deep reinforcement learning (DRL) is due to the power of ...

Model-free Representation Learning and Exploration in Low-rank MDPs

The low rank MDP has emerged as an important model for studying represen...

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...

Exploratory State Representation Learning

Not having access to compact and meaningful representations is known to ...

Exploring compact reinforcement-learning representations with linear regression

This paper presents a new algorithm for online linear regression whose e...

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.