Query-Policy Misalignment in Preference-Based Reinforcement Learning

05/27/2023
by   Xiao Hu, et al.
0

Preference-based reinforcement learning (PbRL) provides a natural way to align RL agents' behavior with human desired outcomes, but is often restrained by costly human feedback. To improve feedback efficiency, most existing PbRL methods focus on selecting queries to maximally improve the overall quality of the reward model, but counter-intuitively, we find that this may not necessarily lead to improved performance. To unravel this mystery, we identify a long-neglected issue in the query selection schemes of existing PbRL studies: Query-Policy Misalignment. We show that the seemingly informative queries selected to improve the overall quality of reward model actually may not align with RL agents' interests, thus offering little help on policy learning and eventually resulting in poor feedback efficiency. We show that this issue can be effectively addressed via near on-policy query and a specially designed hybrid experience replay, which together enforce the bidirectional query-policy alignment. Simple yet elegant, our method can be easily incorporated into existing approaches by changing only a few lines of code. We showcase in comprehensive experiments that our method achieves substantial gains in both human feedback and RL sample efficiency, demonstrating the importance of addressing query-policy misalignment in PbRL tasks.

READ FULL TEXT

page 4

page 8

page 14

page 19

page 20

research
05/24/2022

Reward Uncertainty for Exploration in Preference-based Reinforcement Learning

Conveying complex objectives to reinforcement learning (RL) agents often...
research
12/06/2022

Few-Shot Preference Learning for Human-in-the-Loop RL

While reinforcement learning (RL) has become a more popular approach for...
research
05/24/2023

Inverse Preference Learning: Preference-based RL without a Reward Function

Reward functions are difficult to design and often hard to align with hu...
research
05/29/2023

How to Query Human Feedback Efficiently in RL?

Reinforcement Learning with Human Feedback (RLHF) is a paradigm in which...
research
11/20/2022

Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Learning new task-specific skills from a few trials is a fundamental cha...
research
07/08/2023

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

In recent years, work has gone into developing deep interpretable method...
research
12/15/2022

Constitutional AI: Harmlessness from AI Feedback

As AI systems become more capable, we would like to enlist their help to...

Please sign up or login with your details

Forgot password? Click here to reset