Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

02/13/2022
by   Dan Qiao, et al.
0

We study the problem of reinforcement learning (RL) with low (policy) switching cost - a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of O(√(H^4S^2AT)) while requiring a switching cost of O(HSA loglog T). This is an exponential improvement over the best-known switching cost O(H^2SAlog T) among existing methods with O(poly(H,S,A)√(T)) regret. In the above, S,A denotes the number of states and actions in an H-horizon episodic Markov Decision Process model with unknown transitions, and T is the number of steps. We also prove an information-theoretical lower bound which says that a switching cost of Ω(HSA) is required for any no-regret algorithm. As a byproduct, our new algorithmic techniques allow us to derive a reward-free exploration algorithm with an optimal switching cost of O(HSA).

READ FULL TEXT
research
02/24/2023

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

In many real-life reinforcement learning (RL) problems, deploying new po...
research
10/03/2022

Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation

We study the problem of deployment efficient reinforcement learning (RL)...
research
12/13/2021

A Benchmark for Low-Switching-Cost Reinforcement Learning

A ubiquitous requirement in many practical reinforcement learning (RL) a...
research
02/11/2020

Learning to Switch Between Machines and Humans

Reinforcement learning algorithms have been mostly developed and evaluat...
research
05/30/2019

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adapti...
research
02/08/2023

Near-Optimal Adversarial Reinforcement Learning with Switching Costs

Switching costs, which capture the costs for changing policies, are rega...
research
08/26/2021

When should agents explore?

Exploration remains a central challenge for reinforcement learning (RL)....

Please sign up or login with your details

Forgot password? Click here to reset