Human-AI Coordination via Human-Regularized Search and Learning

10/11/2022
by   Hengyuan Hu, et al.
0

We consider the problem of making AI agents that collaborate well with humans in partially observable fully cooperative environments given datasets of human behavior. Inspired by piKL, a human-data-regularized search method that improves upon a behavioral cloning policy without diverging far away from it, we develop a three-step algorithm that achieve strong performance in coordinating with real humans in the Hanabi benchmark. We first use a regularized search algorithm and behavioral cloning to produce a better human model that captures diverse skill levels. Then, we integrate the policy regularization idea into reinforcement learning to train a human-like best response to the human model. Finally, we apply regularized search on top of the best response policy at test time to handle out-of-distribution challenges when playing with humans. We evaluate our method in two large scale experiments with humans. First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams. Second, we show that our method beats a vanilla best response to behavioral cloning baseline by having experts play repeatedly with the two agents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Collaborating with Humans without Human Data

Collaborating with humans requires rapidly adapting to their individual ...
research
08/26/2022

Generative Personas That Behave and Experience Like Humans

Using artificial intelligence (AI) to automatically test a game remains ...
research
07/14/2022

K-level Reasoning for Zero-Shot Coordination in Hanabi

The standard problem setting in cooperative multi-agent settings is self...
research
04/28/2020

Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi

Hanabi is a cooperative game that brings the problem of modeling other p...
research
03/30/2020

Survey Data and Human Computation for Improved Flu Tracking

While digital trace data from sources like search engines hold enormous ...
research
04/03/2022

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Centaurs are half-human, half-AI decision-makers where the AI's goal is ...
research
06/26/2022

Generalized Beliefs for Cooperative AI

Self-play is a common paradigm for constructing solutions in Markov game...

Please sign up or login with your details

Forgot password? Click here to reset