Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

10/17/2022
by   Mudit Verma, et al.
0

Specifying rewards for reinforcement learned (RL) agents is challenging. Preference-based RL (PbRL) mitigates these challenges by inferring a reward from feedback over sets of trajectories. However, the effectiveness of PbRL is limited by the amount of feedback needed to reliably recover the structure of the target reward. We present the PRIor Over Rewards (PRIOR) framework, which incorporates priors about the structure of the reward function and the preference feedback into the reward learning process. Imposing these priors as soft constraints on the reward learning objective reduces the amount of feedback required by half and improves overall reward recovery. Additionally, we demonstrate that using an abstract state space for the computation of the priors further improves the reward learning and the agent's performance.

READ FULL TEXT
research
12/20/2021

Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions

The potential of reinforcement learning (RL) to deliver aligned and perf...
research
09/28/2022

Argumentative Reward Learning: Reasoning About Human Preferences

We define a novel neuro-symbolic framework, argumentative reward learnin...
research
12/02/2021

Residual Pathway Priors for Soft Equivariance Constraints

There is often a trade-off between building deep learning systems that a...
research
11/12/2022

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

Preference-based reinforcement learning (RL) algorithms help avoid the p...
research
02/19/2019

Learning to Generalize from Sparse and Underspecified Rewards

We consider the problem of learning from sparse and underspecified rewar...
research
06/08/2021

Exploration and preference satisfaction trade-off in reward-free learning

Biological agents have meaningful interactions with their environment de...
research
08/18/2023

Learning Reward Machines through Preference Queries over Sequences

Reward machines have shown great promise at capturing non-Markovian rewa...

Please sign up or login with your details

Forgot password? Click here to reset