A Benchmark for Low-Switching-Cost Reinforcement Learning

12/13/2021
by   Shusheng Xu, et al.
0

A ubiquitous requirement in many practical reinforcement learning (RL) applications, including medical treatment, recommendation system, education and robotics, is that the deployed policy that actually interacts with the environment cannot change frequently. Such an RL setting is called low-switching-cost RL, i.e., achieving the highest reward while reducing the number of policy switches during training. Despite the recent trend of theoretical studies aiming to design provably efficient RL algorithms with low switching costs, none of the existing approaches have been thoroughly evaluated in popular RL testbeds. In this paper, we systematically studied a wide collection of policy-switching approaches, including theoretically guided criteria, policy-difference-based methods, and non-adaptive baselines. Through extensive experiments on a medical treatment environment, the Atari games, and robotic control tasks, we present the first empirical benchmark for low-switching-cost RL and report novel findings on how to decrease the switching cost while maintain a similar sample efficiency to the case without the low-switching-cost constraint. We hope this benchmark could serve as a starting point for developing more practically effective low-switching-cost RL algorithms. We release our code and complete results in https://sites.google.com/view/low-switching-cost-rl.

READ FULL TEXT
research
02/13/2022

Sample-Efficient Reinforcement Learning with loglog(T) Switching Cost

We study the problem of reinforcement learning (RL) with low (policy) sw...
research
08/10/2023

A Comparison of Classical and Deep Reinforcement Learning Methods for HVAC Control

Reinforcement learning (RL) is a promising approach for optimizing HVAC ...
research
02/24/2023

Logarithmic Switching Cost in Reinforcement Learning beyond Linear MDPs

In many real-life reinforcement learning (RL) problems, deploying new po...
research
08/26/2021

When should agents explore?

Exploration remains a central challenge for reinforcement learning (RL)....
research
02/08/2023

Near-Optimal Adversarial Reinforcement Learning with Switching Costs

Switching costs, which capture the costs for changing policies, are rega...
research
04/26/2023

Multi-criteria Hardware Trojan Detection: A Reinforcement Learning Approach

Hardware Trojans (HTs) are undesired design or manufacturing modificatio...
research
06/15/2023

Low-Switching Policy Gradient with Exploration via Online Sensitivity Sampling

Policy optimization methods are powerful algorithms in Reinforcement Lea...

Please sign up or login with your details

Forgot password? Click here to reset