Computationally Efficient Horizon-Free Reinforcement Learning for Linear Mixture MDPs

05/23/2022
by   Dongruo Zhou, et al.
7

Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than contextual bandits, even with a long planning horizon and unknown state transitions. However, these results are limited to either tabular Markov decision processes (MDPs) or computationally inefficient algorithms for linear mixture MDPs. In this paper, we propose the first computationally efficient horizon-free algorithm for linear mixture MDPs, which achieves the optimal Õ(d√(K) +d^2) regret up to logarithmic factors. Our algorithm adapts a weighted least square estimator for the unknown transitional dynamic, where the weight is both variance-aware and uncertainty-aware. When applying our weighted least square estimator to heterogeneous linear bandits, we can obtain an Õ(d√(∑_k=1^K σ_k^2) +d) regret in the first K rounds, where d is the dimension of the context and σ_k^2 is the variance of the reward in the k-th round. This also improves upon the best-known algorithms in this setting when σ_k^2's are known.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/15/2023

Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs

Recent studies have shown that episodic reinforcement learning (RL) is n...
research
11/05/2021

Improved Regret Analysis for Variance-Adaptive Linear Bandits and Horizon-Free Linear Mixture MDPs

In online learning problems, exploiting low variance plays an important ...
research
01/29/2021

Variance-Aware Confidence Set: Variance-Dependent Bound for Linear Bandits and Horizon-Free Bound for Linear Mixture MDP

We show how to construct variance-aware confidence sets for linear bandi...
research
06/08/2020

A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret

Recently, model-free reinforcement learning has attracted research atten...
research
08/31/2020

Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL

Reinforcement learning (RL) in episodic, factored Markov decision proces...
research
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
research
07/06/2022

Model Selection in Reinforcement Learning with General Function Approximations

We consider model selection for classic Reinforcement Learning (RL) envi...

Please sign up or login with your details

Forgot password? Click here to reset