Provably Efficient Exploration in Quantum Reinforcement Learning with Logarithmic Worst-Case Regret

02/21/2023
by   Han Zhong, et al.
9

While quantum reinforcement learning (RL) has attracted a surge of attention recently, its theoretical understanding is limited. In particular, it remains elusive how to design provably efficient quantum RL algorithms that can address the exploration-exploitation trade-off. To this end, we propose a novel UCRL-style algorithm that takes advantage of quantum computing for tabular Markov decision processes (MDPs) with S states, A actions, and horizon H, and establish an 𝒪(poly(S, A, H, log T)) worst-case regret for it, where T is the number of episodes. Furthermore, we extend our results to quantum RL with linear function approximation, which is capable of handling problems with large state spaces. Specifically, we develop a quantum algorithm based on value target regression (VTR) for linear mixture MDPs with d-dimensional linear representation and prove that it enjoys 𝒪(poly(d, H, log T)) regret. Our algorithms are variants of UCRL/UCRL-VTR algorithms in classical RL, which also leverage a novel combination of lazy updating mechanisms and quantum estimation subroutines. This is the key to breaking the Ω(√(T))-regret barrier in classical RL. To the best of our knowledge, this is the first work studying the online exploration in quantum RL with provable logarithmic worst-case regret.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2021

Locally Differentially Private Reinforcement Learning for Linear Mixture Markov Decision Processes

Reinforcement learning (RL) algorithms can be used to provide personaliz...
research
02/10/2020

Provable Self-Play Algorithms for Competitive Reinforcement Learning

Self-play, where the algorithm learns by playing against itself without ...
research
02/16/2023

Quantum Computing Provides Exponential Regret Improvement in Episodic Reinforcement Learning

In this paper, we investigate the problem of episodic reinforcement lear...
research
07/13/2020

Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

We study the exploration-exploitation dilemma in the linear quadratic re...
research
10/25/2021

Can Q-Learning be Improved with Advice?

Despite rapid progress in theoretical reinforcement learning (RL) over t...
research
01/01/2019

Tighter Problem-Dependent Regret Bounds in Reinforcement Learning without Domain Knowledge using Value Function Bounds

Strong worst-case performance bounds for episodic reinforcement learning...
research
01/06/2023

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Real-world reinforcement learning (RL) is often severely limited since t...

Please sign up or login with your details

Forgot password? Click here to reset