
Qlearning with Logarithmic Regret
This paper presents the first nonasymptotic result showing that a model...
read it

A relaxed technical assumption for posterior samplingbased reinforcement learning for control of unknown linear systems
We revisit the Thompson sampling algorithm to control an unknown linear ...
read it

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints
We study reinforcement learning (RL) with linear function approximation ...
read it

Nearoptimal Representation Learning for Linear Bandits and Linear RL
This paper studies representation learning for multitask linear bandits...
read it

An Elementary Proof that Qlearning Converges Almost Surely
Watkins' and Dayan's Qlearning is a modelfree reinforcement learning a...
read it

GapDependent Bounds for TwoPlayer Markov Games
As one of the most popular methods in the field of reinforcement learnin...
read it

NoRegret Reinforcement Learning with Value Function Approximation: a Kernel Embedding Approach
We consider the regret minimisation problem in reinforcement learning (R...
read it
Logarithmic Regret for Reinforcement Learning with Linear Function Approximation
Reinforcement learning (RL) with linear function approximation has received increasing attention recently. However, existing work has focused on obtaining √(T)type regret bound, where T is the number of steps. In this paper, we show that logarithmic regret is attainable under two recently proposed linear MDP assumptions provided that there exists a positive suboptimality gap for the optimal actionvalue function. In specific, under the linear MDP assumption (Jin et al. 2019), the LSVIUCB algorithm can achieve Õ(d^3H^5/gap_min·log(T)) regret; and under the linear mixture model assumption (Ayoub et al. 2020), the UCRLVTR algorithm can achieve Õ(d^2H^5/gap_min·log^3(T)) regret, where d is the dimension of feature mapping, H is the length of episode, and gap_min is the minimum of suboptimality gap. To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
READ FULL TEXT
Comments
There are no comments yet.