Balancing policy constraint and ensemble size in uncertainty-based offline reinforcement learning

03/26/2023
by   Alex Beeson, et al.
0

Offline reinforcement learning agents seek optimal policies from fixed data sets. With environmental interaction prohibited, agents face significant challenges in preventing errors in value estimates from compounding and subsequently causing the learning process to collapse. Uncertainty estimation using ensembles compensates for this by penalising high-variance value estimates, allowing agents to learn robust policies based on data-driven actions. However, the requirement for large ensembles to facilitate sufficient penalisation results in significant computational overhead. In this work, we examine the role of policy constraints as a mechanism for regulating uncertainty, and the corresponding balance between level of constraint and ensemble size. By incorporating behavioural cloning into policy updates, we show empirically that sufficient penalisation can be achieved with a much smaller ensemble size, substantially reducing computational demand while retaining state-of-the-art performance on benchmarking tasks. Furthermore, we show how such an approach can facilitate stable online fine tuning, allowing for continued policy improvement while avoiding severe performance drops.

READ FULL TEXT

page 18

page 19

page 20

page 22

page 25

page 40

page 41

page 42

research
11/21/2022

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...
research
05/21/2014

Off-Policy Shaping Ensembles in Reinforcement Learning

Recent advances of gradient temporal-difference methods allow to learn o...
research
10/25/2022

Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning

Offline reinforcement learning, by learning from a fixed dataset, makes ...
research
06/02/2022

Incorporating Explicit Uncertainty Estimates into Deep Offline Reinforcement Learning

Most theoretically motivated work in the offline reinforcement learning ...
research
09/15/2019

Biased Estimates of Advantages over Path Ensembles

The estimation of advantage is crucial for a number of reinforcement lea...
research
06/06/2023

Boosting Offline Reinforcement Learning with Action Preference Query

Training practical agents usually involve offline and online reinforceme...
research
09/29/2022

Online Weighted Q-Ensembles for Reduced Hyperparameter Tuning in Reinforcement Learning

Reinforcement learning is a promising paradigm for learning robot contro...

Please sign up or login with your details

Forgot password? Click here to reset