Language Instructed Reinforcement Learning for Human-AI Coordination

04/13/2023
by   Hengyuan Hu, et al.
0

One of the fundamental quests of AI is to produce agents that coordinate well with humans. This problem is challenging, especially in domains that lack high quality human behavioral data, because multi-agent reinforcement learning (RL) often converges to different equilibria from the ones that humans prefer. We propose a novel framework, instructRL, that enables humans to specify what kind of strategies they expect from their AI partners through natural language instructions. We use pretrained large language models to generate a prior policy conditioned on the human instruction and use the prior to regularize the RL objective. This leads to the RL agent converging to equilibria that are aligned with human preferences. We show that instructRL converges to human-like policies that satisfy the given instructions in a proof-of-concept environment as well as the challenging Hanabi benchmark. Finally, we show that knowing the language instruction significantly boosts human-AI coordination performance in human evaluations in Hanabi.

READ FULL TEXT

page 7

page 12

research
02/18/2023

Natural Language-conditioned Reinforcement Learning with Inside-out Task Language Development and Translation

Natural Language-conditioned reinforcement learning (RL) enables the age...
research
06/01/2023

Thought Cloning: Learning to Think while Acting by Imitating Human Thinking

Language is often considered a key aspect of human thinking, providing u...
research
05/19/2020

Human Instruction-Following with Deep Reinforcement Learning via Transfer-Learning from Text

Recent work has described neural-network-based agents that are trained w...
research
02/10/2023

The Wisdom of Hindsight Makes Language Models Better Instruction Followers

Reinforcement learning has seen wide success in finetuning large languag...
research
01/18/2021

Interpretable Policy Specification and Synthesis through Natural Language and RL

Policy specification is a process by which a human can initialize a robo...
research
12/03/2018

Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning

Advances in Deep Reinforcement Learning have led to agents that perform ...

Please sign up or login with your details

Forgot password? Click here to reset