Matrix Estimation for Offline Reinforcement Learning with Low-Rank Structure

05/24/2023
by   Xumei Xi, et al.
0

We consider offline Reinforcement Learning (RL), where the agent does not interact with the environment and must rely on offline data collected using a behavior policy. Previous works provide policy evaluation guarantees when the target policy to be evaluated is covered by the behavior policy, that is, state-action pairs visited by the target policy must also be visited by the behavior policy. We show that when the MDP has a latent low-rank structure, this coverage condition can be relaxed. Building on the connection to weighted matrix completion with non-uniform observations, we propose an offline policy evaluation algorithm that leverages the low-rank structure to estimate the values of uncovered state-action pairs. Our algorithm does not require a known feature representation, and our finite-sample error bound involves a novel discrepancy measure quantifying the discrepancy between the behavior and target policies in the spectral space. We provide concrete examples where our algorithm achieves accurate estimation while existing coverage conditions are not satisfied. Building on the above evaluation algorithm, we further design an offline policy optimization algorithm and provide non-asymptotic performance guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/29/2022

Behavior Estimation from Multi-Source Data for Offline Reinforcement Learning

Offline reinforcement learning (RL) have received rising interest due to...
research
06/21/2021

OptiDICE: Offline Policy Optimization via Stationary Distribution Correction Estimation

We consider the offline reinforcement learning (RL) setting where the ag...
research
06/07/2022

Overcoming the Long Horizon Barrier for Sample-Efficient Reinforcement Learning with Latent Low-Rank Structure

The practicality of reinforcement learning algorithms has been limited d...
research
07/13/2021

Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling under Partial Coverage

We study model-based offline Reinforcement Learning with general functio...
research
03/11/2021

On Finite-Sample Analysis of Offline Reinforcement Learning with Deep ReLU Networks

This paper studies the statistical theory of offline reinforcement learn...
research
05/24/2023

Provable Offline Reinforcement Learning with Human Feedback

In this paper, we investigate the problem of offline reinforcement learn...
research
02/04/2023

Reinforcement Learning in Low-Rank MDPs with Density Features

MDPs with low-rank transitions – that is, the transition matrix can be f...

Please sign up or login with your details

Forgot password? Click here to reset