The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

by   Gaurav R. Ghosal, et al.
University of Wisconsin-Madison
berkeley college

When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Many existing works have opted to fix this coefficient regardless of the type, or quality, of human feedback. However, in some settings, giving a demonstration may be much more difficult than answering a comparison query. In this case, we should expect to see more noise or suboptimality in demonstrations than in comparisons, and should interpret the feedback accordingly. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in experiments with both simulated feedback, as well a user study. We find that when learning from a single feedback type, overestimating human rationality can have dire effects on reward accuracy and regret. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative – when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Moreover, when the robot gets to decide which feedback type to ask for, it gets a large advantage from accurately modeling the rationality level of each type. Ultimately, our results emphasize the importance of paying attention to the assumed rationality level, not only when learning from a single feedback type, but especially when agents actively learn from multiple feedback types.


page 14

page 22


Unified Learning from Demonstrations, Corrections, and Preferences during Physical Human-Robot Interaction

Humans can leverage physical interaction to teach robot arms. This physi...

Reward-rational (implicit) choice: A unifying formalism for reward learning

It is often difficult to hand-specify what the correct reward function i...

Equal Affection or Random Selection: the Quality of Subjective Feedback from a Group Perspective

In the setting where a group of agents is asked a single subjective mult...

Active Reward Learning from Multiple Teachers

Reward learning algorithms utilize human feedback to infer a reward func...

Choice Set Misspecification in Reward Inference

Specifying reward functions for robots that operate in environments with...

RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback

To use reinforcement learning from human feedback (RLHF) in practical ap...

"Are you sure?": Preliminary Insights from Scaling Product Comparisons to Multiple Shops

Large eCommerce players introduced comparison tables as a new type of re...

Please sign up or login with your details

Forgot password? Click here to reset