KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal

05/27/2022
by   Tadashi Kozuno, et al.
6

In this work, we consider and analyze the sample complexity of model-free reinforcement learning with a generative model. Particularly, we analyze mirror descent value iteration (MDVI) by Geist et al. (2019) and Vieillard et al. (2020a), which uses the Kullback-Leibler divergence and entropy regularization in its value and policy updates. Our analysis shows that it is nearly minimax-optimal for finding an ε-optimal policy when ε is sufficiently small. This is the first theoretical result that demonstrates that a simple model-free algorithm without variance-reduction can be nearly minimax-optimal under the considered setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice

Mirror descent value iteration (MDVI), an abstraction of Kullback-Leible...
research
06/10/2019

On the Optimality of Sparse Model-Based Planning for Markov Decision Processes

This work considers the sample complexity of obtaining an ϵ-optimal poli...
research
05/31/2023

Replicability in Reinforcement Learning

We initiate the mathematical study of replicability as an algorithmic pr...
research
01/27/2019

Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP

A fundamental question in reinforcement learning is whether model-free a...
research
02/27/2023

Taylor TD-learning

Many reinforcement learning approaches rely on temporal-difference (TD) ...
research
10/21/2020

Logistic Q-Learning

We propose a new reinforcement learning algorithm derived from a regular...
research
07/13/2020

Structured Policy Iteration for Linear Quadratic Regulator

Linear quadratic regulator (LQR) is one of the most popular frameworks t...

Please sign up or login with your details

Forgot password? Click here to reset