Collaborating with Humans without Human Data

by   DJ Strouse, et al.

Collaborating with humans requires rapidly adapting to their individual strengths, weaknesses, and preferences. Unfortunately, most standard multi-agent reinforcement learning techniques, such as self-play (SP) or population play (PP), produce agents that overfit to their training partners and do not generalize well to humans. Alternatively, researchers can collect human data, train a human model using behavioral cloning, and then use that model to train "human-aware" agents ("behavioral cloning play", or BCP). While such an approach can improve the generalization of agents to new human co-players, it involves the onerous and expensive step of collecting large amounts of human data first. Here, we study the problem of how to train agents that collaborate well with human partners without using human data. We argue that the crux of the problem is to produce a diverse set of training partners. Drawing inspiration from successful multi-agent approaches in competitive domains, we find that a surprisingly simple approach is highly effective. We train our agent partner as the best response to a population of self-play agents and their past checkpoints taken throughout training, a method we call Fictitious Co-Play (FCP). Our experiments focus on a two-player collaborative cooking simulator that has recently been proposed as a challenge problem for coordination with humans. We find that FCP agents score significantly higher than SP, PP, and BCP when paired with novel agent and human partners. Furthermore, humans also report a strong subjective preference to partnering with FCP agents over all baselines.


page 4

page 19

page 21

page 22

page 23

page 24

page 25

page 27


On the Utility of Learning about Humans for Human-AI Coordination

While we would like agents that can coordinate with humans, current algo...

Human-AI Coordination via Human-Regularized Search and Learning

We consider the problem of making AI agents that collaborate well with h...

Reinforcement Learning on Human Decision Models for Uniquely Collaborative AI Teammates

In 2021 the Johns Hopkins University Applied Physics Laboratory held an ...

Warmth and competence in human-agent cooperation

Interaction and cooperation with humans are overarching aspirations of a...

AI safety via debate

To make AI systems broadly useful for challenging real-world tasks, we n...

Does DQN really learn? Exploring adversarial training schemes in Pong

In this work, we study two self-play training schemes, Chainer and Pool,...

A Socially Aware Reinforcement Learning Agent for The Single Track Road Problem

We present the single track road problem. In this problem two agents fac...

Please sign up or login with your details

Forgot password? Click here to reset