Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

06/30/2020
by   Xiao-Yue Gong, et al.
0

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elimination-Based Half-Q-Learning (HQL), that enjoy improved efficiency and optimality in the full-feedback and the one-sided-feedback settings over existing Q-learning algorithms. We establish that FQL incurs regret Õ(H^2√( T)) and HQL incurs regret Õ(H^3√( T)), where H is the length of each episode and T is the total number of time periods. Our regret bounds are not affected by the possibly huge state and action space. Our numerical experiments using the classical inventory control problem as an example demonstrate the superior efficiency of FQL and HQL, and shows the potential of tailoring reinforcement learning algorithms for richer feedback models, which are prevalent in many natural problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/15/2021

Delayed Feedback in Episodic Reinforcement Learning

There are many provably efficient algorithms for episodic reinforcement ...
research
02/13/2021

Improved Corruption Robust Algorithms for Episodic Reinforcement Learning

We study episodic reinforcement learning under unknown adversarial corru...
research
06/16/2023

Understanding the Role of Feedback in Online Learning with Switching Costs

In this paper, we study the role of feedback in online learning with swi...
research
06/04/2013

(More) Efficient Reinforcement Learning via Posterior Sampling

Most provably-efficient learning algorithms introduce optimism about poo...
research
08/13/2020

Reinforcement Learning with Trajectory Feedback

The computational model of reinforcement learning is based upon the abil...
research
08/18/2021

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...
research
03/09/2020

Robust Learning from Discriminative Feature Feedback

Recent work introduced the model of learning from discriminative feature...

Please sign up or login with your details

Forgot password? Click here to reset