Provably Efficient Q-Learning with Low Switching Cost

05/30/2019
by   Yu Bai, et al.
3

We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is, algorithms that change its exploration policy as infrequently as possible during regret minimization. This is motivated by the difficulty of running fully adaptive algorithms in real-world applications (such as medical domains), and we propose to quantify adaptivity using the notion of local switching cost. Our main contribution, Q-Learning with UCB2 exploration, is a model-free algorithm for H-step episodic MDP that achieves sublinear regret whose local switching cost in K episodes is O(H^3SA K), and we provide a lower bound of Ω(HSA) on the local switching cost for any no-regret algorithm. Our algorithm can be naturally adapted to the concurrent setting, which yields nontrivial results that improve upon prior work in certain aspects.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/02/2021

A Provably Efficient Algorithm for Linear Markov Decision Process with Low Switching Cost

Many real-world applications, such as those in medical domains, recommen...
research
02/13/2022

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...
research
07/09/2020

Multinomial Logit Bandit with Low Switching Cost

We study multinomial logit bandit with limited adaptivity, where the alg...
research
11/28/2019

Understand Dynamic Regret with Switching Cost for Online Decision Making

As a metric to measure the performance of an online method, dynamic regr...
research
02/24/2023

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

In many real-life reinforcement learning (RL) problems, deploying new po...
research
08/24/2020

Algorithms and Lower Bounds for the Worker-Task Assignment Problem

We study the problem of assigning workers to tasks where each task has d...
research
11/17/2018

The Impatient May Use Limited Optimism to Minimize Regret

Discounted-sum games provide a formal model for the study of reinforceme...

Please sign up or login with your details

Forgot password? Click here to reset