Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

01/06/2021

∙

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: batch learning model and rare policy switch model, and propose two efficient online RL algorithms for linear Markov decision processes. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an Õ(√(d^3H^3T) + dHT/B) regret, where d is the dimension of the feature mapping, H is the episode length, T is the number of interactions and B is the number of batches. Our result suggests that it suffices to use only √(T/dH) batches to obtain Õ(√(d^3H^3T)) regret. For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an Õ(√(d^3H^3T[1+T/(dH)]^dH/B)) regret, which implies that dHlog T policy switches suffice to obtain the Õ(√(d^3H^3T)) regret. Our algorithms achieve the same regret as the LSVI-UCB algorithm (Jin et al., 2019), yet with a substantially smaller amount of adaptivity.

READ FULL TEXT

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

Sign in with Google

Consider DeepAI Pro