DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

07/22/2023
by   Ellen Novoseller, et al.
0

In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways, including training an autoencoder, seeding reinforcement learning (RL) training batches with demonstration data, and inferring preferences over behaviors to learn a reward function to guide RL. We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that reflects human preferences and that DIP-RL performs competitively relative to baselines. DIP-RL is inspired by our previous work on combining demonstrations and pairwise preferences in Minecraft, which was awarded a research prize at the 2022 NeurIPS MineRL BASALT competition, Learning from Human Feedback in Minecraft. Example trajectory rollouts of DIP-RL and baselines are located at https://sites.google.com/view/dip-rl.

READ FULL TEXT
research
07/25/2020

Human Preference Scaling with Demonstrations For Deep Reinforcement Learning

The current reward learning from human preferences could be used for res...
research
02/14/2020

RL agents Implicitly Learning Human Preferences

In the real world, RL agents should be rewarded for fulfilling human pre...
research
08/30/2023

Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

A well-defined reward function is crucial for successful training of an ...
research
04/12/2017

Deep Q-learning from Demonstrations

Deep reinforcement learning (RL) has achieved several high profile succe...
research
10/04/2019

If MaxEnt RL is the Answer, What is the Question?

Experimentally, it has been observed that humans and animals often make ...
research
08/09/2022

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience

This paper addresses the problem of inverse reinforcement learning (IRL)...
research
01/02/2020

Joint Goal and Strategy Inference across Heterogeneous Demonstrators via Reward Network Distillation

Reinforcement learning (RL) has achieved tremendous success as a general...

Please sign up or login with your details

Forgot password? Click here to reset