Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision Processes

12/12/2022
by   Jiafan He, et al.
14

We study reinforcement learning (RL) with linear function approximation. For episodic time-inhomogeneous linear Markov decision processes (linear MDPs) whose transition dynamic can be parameterized as a linear function of a given feature mapping, we propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret Õ(d√(H^3K)), where d is the dimension of the feature mapping, H is the planning horizon, and K is the number of episodes. Our algorithm is based on a weighted linear regression scheme with a carefully designed weight, which depends on a new variance estimator that (1) directly estimates the variance of the optimal value function, (2) monotonically decreases with respect to the number of episodes to ensure a better estimation accuracy, and (3) uses a rare-switching policy to update the value function estimator to control the complexity of the estimated value function class. Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/17/2021

Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation

We study the reinforcement learning for finite-horizon episodic Markov d...
research
05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
research
06/22/2021

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

We study the off-policy evaluation (OPE) problem in reinforcement learni...
research
07/12/2021

Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions

Many reinforcement learning (RL) environments in practice feature enormo...
research
06/15/2021

Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity

Reinforcement learning (RL) is empirically successful in complex nonline...
research
01/06/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

We study reinforcement learning (RL) with linear function approximation ...
research
08/22/2021

A Boosting Approach to Reinforcement Learning

We study efficient algorithms for reinforcement learning in Markov decis...

Please sign up or login with your details

Forgot password? Click here to reset