The Effect of Q-function Reuse on the Total Regret of Tabular, Model-Free, Reinforcement Learning

03/07/2021
by   Volodymyr Tkachuk, et al.
0

Some reinforcement learning methods suffer from high sample complexity causing them to not be practical in real-world situations. Q-function reuse, a transfer learning method, is one way to reduce the sample complexity of learning, potentially improving usefulness of existing algorithms. Prior work has shown the empirical effectiveness of Q-function reuse for various environments when applied to model-free algorithms. To the best of our knowledge, there has been no theoretical work showing the regret of Q-function reuse when applied to the tabular, model-free setting. We aim to bridge the gap between theoretical and empirical work in Q-function reuse by providing some theoretical insights on the effectiveness of Q-function reuse when applied to the Q-learning with UCB-Hoeffding algorithm. Our main contribution is showing that in a specific case if Q-function reuse is applied to the Q-learning with UCB-Hoeffding algorithm it has a regret that is independent of the state or action space. We also provide empirical results supporting our theoretical findings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/10/2018

Is Q-learning Provably Efficient?

Model-free reinforcement learning (RL) algorithms, such as Q-learning, d...
research
06/28/2022

Generalized Policy Improvement Algorithms with Theoretically Supported Sample Reuse

Real-world sequential decision making requires data-driven algorithms th...
research
05/24/2023

Regret-Optimal Model-Free Reinforcement Learning for Discounted MDPs with Short Burn-In Time

A crucial problem in reinforcement learning is learning the optimal poli...
research
10/12/2021

On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) algorithms often suffer from a...
research
12/03/2019

Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation

In many environments, only a relatively small subset of the complete sta...
research
03/24/2021

Cautiously Optimistic Policy Optimization and Exploration with Linear Function Approximation

Policy optimization methods are popular reinforcement learning algorithm...
research
09/22/2020

Is Q-Learning Provably Efficient? An Extended Analysis

This work extends the analysis of the theoretical results presented with...

Please sign up or login with your details

Forgot password? Click here to reset