Representation Learning for Online and Offline RL in Low-rank MDPs

10/09/2021
by   Masatoshi Uehara, et al.
5

This work studies the question of Representation Learning in RL: how can we learn a compact low-dimensional representation such that on top of the representation we can perform RL procedures such as exploration and exploitation, in a sample efficient manner. We focus on the low-rank Markov Decision Processes (MDPs) where the transition dynamics correspond to a low-rank transition matrix. Unlike prior works that assume the representation is known (e.g., linear MDPs), here we need to learn the representation for the low-rank MDP. We study both the online RL and offline RL settings. For the online setting, operating with the same computational oracles used in FLAMBE (Agarwal et.al), the state-of-art algorithm for learning representations in low-rank MDPs, we propose an algorithm REP-UCB Upper Confidence Bound driven Representation learning for RL), which significantly improves the sample complexity from O( A^9 d^7 / (ϵ^10 (1-γ)^22)) for FLAMBE to O( A^4 d^4 / (ϵ^2 (1-γ)^3) ) with d being the rank of the transition matrix (or dimension of the ground truth representation), A being the number of actions, and γ being the discounted factor. Notably, REP-UCB is simpler than FLAMBE, as it directly balances the interplay between representation learning, exploration, and exploitation, while FLAMBE is an explore-then-commit style approach and has to perform reward-free exploration step-by-step forward in time. For the offline RL setting, we develop an algorithm that leverages pessimism to learn under a partial coverage condition: our algorithm is able to compete against any policy as long as it is covered by the offline distribution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs

In order to deal with the curse of dimensionality in reinforcement learn...
research
06/22/2021

Provably Efficient Representation Learning in Low-rank Markov Decision Processes

The success of deep reinforcement learning (DRL) is due to the power of ...
research
08/10/2023

Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Reinforcement learning (RL) under changing environment models many real-...
research
06/13/2022

Provable Benefit of Multitask Representation Learning in Reinforcement Learning

As representation learning becomes a powerful technique to reduce sample...
research
09/10/2023

Representation Learning in Low-rank Slate-based Recommender Systems

Reinforcement learning (RL) in recommendation systems offers the potenti...
research
07/13/2021

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...
research
02/04/2023

Reinforcement Learning in Low-Rank MDPs with Density Features

MDPs with low-rank transitions – that is, the transition matrix can be f...

Please sign up or login with your details

Forgot password? Click here to reset