Near-Optimal Reward-Free Exploration for Linear Mixture MDPs with Plug-in Solver

10/07/2021
by   Xiaoyu Chen, et al.
0

Although model-based reinforcement learning (RL) approaches are considered more sample efficient, existing algorithms are usually relying on sophisticated planning algorithm to couple tightly with the model-learning procedure. Hence the learned models may lack the ability of being re-used with more specialized planners. In this paper we address this issue and provide approaches to learn an RL model efficiently without the guidance of a reward signal. In particular, we take a plug-in solver approach, where we focus on learning a model in the exploration phase and demand that any planning algorithm on the learned model can give a near-optimal policy. Specicially, we focus on the linear mixture MDP setting, where the probability transition matrix is a (unknown) convex combination of a set of existing models. We show that, by establishing a novel exploration algorithm, the plug-in approach learns a model by taking Õ(d^2H^3/ϵ^2) interactions with the environment and any ϵ-optimal planner on the model gives an O(ϵ)-optimal policy on the original model. This sample complexity matches lower bounds for non-plug-in approaches and is statistically optimal. We achieve this result by leveraging a careful maximum total-variance bound using Bernstein inequality and properties specified to linear mixture MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2023

Optimal Horizon-Free Reward-Free Exploration for Linear Mixture MDPs

We study reward-free reinforcement learning (RL) with linear function ap...
research
03/20/2023

Improved Sample Complexity for Reward-free Reinforcement Learning under Low-rank MDPs

In reward-free reinforcement learning (RL), an agent explores the enviro...
research
10/12/2021

Reward-Free Model-Based Reinforcement Learning with Linear Function Approximation

We study the model-based reward-free reinforcement learning with linear ...
research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
10/23/2019

Sample Complexity of Reinforcement Learning using Linearly Combined Model Ensembles

Reinforcement learning (RL) methods have been shown to be capable of lea...
research
06/18/2012

Near-Optimal BRL using Optimistic Local Transitions

Model-based Bayesian Reinforcement Learning (BRL) allows a found formali...
research
06/28/2022

Safe Exploration Incurs Nearly No Additional Sample Complexity for Reward-free RL

While the primary goal of the exploration phase in reward-free reinforce...

Please sign up or login with your details

Forgot password? Click here to reset