Deploying Offline Reinforcement Learning with Human Feedback

03/13/2023
by   Ziniu Li, et al.
0

Reinforcement learning (RL) has shown promise for decision-making tasks in real-world applications. One practical framework involves training parameterized policy models from an offline dataset and subsequently deploying them in an online environment. However, this approach can be risky since the offline training may not be perfect, leading to poor performance of the RL models that may take dangerous actions. To address this issue, we propose an alternative framework that involves a human supervising the RL models and providing additional feedback in the online deployment phase. We formalize this online deployment problem and develop two approaches. The first approach uses model selection and the upper confidence bound algorithm to adaptively select a model to deploy from a candidate set of trained offline RL models. The second approach involves fine-tuning the model in the online deployment phase when a supervision signal arrives. We demonstrate the effectiveness of these approaches for robot locomotion control and traffic light control tasks through empirical validation.

READ FULL TEXT

page 8

page 10

research
06/12/2023

Ensemble-based Offline-to-Online Reinforcement Learning: From Pessimistic Learning to Optimistic Exploration

Offline reinforcement learning (RL) is a learning paradigm where an agen...
research
03/20/2023

Data Might be Enough: Bridge Real-World Traffic Signal Control Using Offline Reinforcement Learning

Applying reinforcement learning (RL) to traffic signal control (TSC) has...
research
07/01/2021

Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble

Recent advance in deep offline reinforcement learning (RL) has made it p...
research
07/23/2021

Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings

Reinforcement learning (RL) can be used to learn treatment policies and ...
research
11/21/2022

Improving TD3-BC: Relaxed Policy Constraint for Offline Learning and Stable Online Fine-Tuning

The ability to discover optimal behaviour from fixed data sets has the p...
research
07/18/2023

REX: Rapid Exploration and eXploitation for AI Agents

In this paper, we propose an enhanced approach for Rapid Exploration and...
research
06/06/2023

Boosting Offline Reinforcement Learning with Action Preference Query

Training practical agents usually involve offline and online reinforceme...

Please sign up or login with your details

Forgot password? Click here to reset