Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

03/20/2023
by   Yuan Cheng, et al.
0

In reward-free reinforcement learning (RL), an agent explores the environment first without any reward information, in order to achieve certain learning goals afterwards for any given reward. In this paper we focus on reward-free RL under low-rank MDP models, in which both the representation and linear weight vectors are unknown. Although various algorithms have been proposed for reward-free low-rank MDPs, the corresponding sample complexity is still far from being satisfactory. In this work, we first provide the first known sample complexity lower bound that holds for any algorithm under low-rank MDPs. This lower bound implies it is strictly harder to find a near-optimal policy under low-rank MDPs than under linear MDPs. We then propose a novel model-based algorithm, coined RAFFLE, and show it can both find an ϵ-optimal policy and achieve an ϵ-accurate system identification via reward-free exploration, with a sample complexity significantly improving the previous results. Such a sample complexity matches our lower bound in the dependence on ϵ, as well as on K in the large d regime, where d and K respectively denote the representation dimension and action space cardinality. Finally, we provide a planning algorithm (without further interaction with true environment) for RAFFLE to learn a near-accurate representation, which is the first known representation learning guarantee under the same setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2023

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

We study reward-free reinforcement learning (RL) with linear function ap...
research
06/22/2021

Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations

There have been many recent advances on provably efficient Reinforcement...
research
06/28/2022

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

While the primary goal of the exploration phase in reward-free reinforce...
research
06/21/2022

On the Statistical Efficiency of Reward-Free Exploration in Non-Linear RL

We study reward-free reinforcement learning (RL) under general non-linea...
research
10/07/2021

Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

Although model-based reinforcement learning (RL) approaches are consider...
research
08/10/2023

Provably Efficient Algorithm for Nonstationary Low-Rank MDPs

Reinforcement learning (RL) under changing environment models many real-...
research
10/11/2022

Multi-User Reinforcement Learning with Low Rank Rewards

In this work, we consider the problem of collaborative multi-user reinfo...

Please sign up or login with your details

Forgot password? Click here to reset