Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

06/23/2022
by   Pihe Hu, et al.
2

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping ϕ(s,a). Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB^+, which achieves an O(Hd√(T)) regret bound where H is the episode length, d is the feature dimension, and T is the number of steps. LSVI-UCB^+ builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term. To the best of our knowledge, this is the first minimax optimal algorithm for linear MDPs up to logarithmic factors, which closes the √(Hd) gap between the best known upper bound of O(√(H^3d^3T)) in <cit.> and lower bound of Ω(Hd√(T)) for linear MDPs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes

We study reinforcement learning (RL) with linear function approximation ...
research
02/15/2021

Nearly Minimax Optimal Regret for Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation

We study reinforcement learning in an infinite-horizon average-reward se...
research
10/08/2020

Nonstationary Reinforcement Learning with Linear Function Approximation

We consider reinforcement learning (RL) in episodic Markov decision proc...
research
09/12/2022

Statistical Estimation of Confounded Linear MDPs: An Instrumental Variable Approach

In an Markov decision process (MDP), unobservable confounders may exist ...
research
02/11/2022

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved...
research
02/11/2022

Rate-matching the regret lower-bound in the linear quadratic regulator with unknown dynamics

The theory of reinforcement learning currently suffers from a mismatch b...
research
02/25/2023

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

A fundamental question in reinforcement learning theory is: suppose the ...

Please sign up or login with your details

Forgot password? Click here to reset