Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

10/02/2021
by   Erdem Bıyık, et al.
5

When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-Armed Bandit (MAB) problem with coupled rewards as an abstraction of more general multi-agent collaboration. We demonstrate that naïve extensions of single-agent optimal MAB algorithms fail when applied for decentralized bandit teams. Instead, we propose a Partner-Aware strategy for joint sequential decision-making that extends the well-known single-agent Upper Confidence Bound algorithm. We analytically show that our proposed strategy achieves logarithmic regret, and provide extensive experiments involving human-AI and human-robot collaboration to validate our theoretical findings. Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.

READ FULL TEXT

page 4

page 5

page 6

page 9

page 10

page 11

page 13

page 14

research
10/20/2020

Bayesian Algorithms for Decentralized Stochastic Bandits

We study a decentralized cooperative multi-agent multi-armed bandit prob...
research
12/21/2018

Human-AI Learning Performance in Multi-Armed Bandits

People frequently face challenging decision-making problems in which out...
research
04/07/2021

On the Critical Role of Conventions in Adaptive Human-AI Collaboration

Humans can quickly adapt to new partners in collaborative tasks (e.g. pl...
research
10/07/2019

An Option and Agent Selection Policy with Logarithmic Regret for Multi Agent Multi Armed Bandit Problems on Random Graphs

Existing studies of the Multi Agent Multi Armed Bandit (MAMAB) problem, ...
research
05/27/2022

Private and Byzantine-Proof Cooperative Decision-Making

The cooperative bandit problem is a multi-agent decision problem involvi...
research
02/06/2023

Learning Complementary Policies for Human-AI Teams

Human-AI complementarity is important when neither the algorithm nor the...
research
06/30/2019

Reinforcement Learning with Fairness Constraints for Resource Distribution in Human-Robot Teams

Much work in robotics and operations research has focused on optimal res...

Please sign up or login with your details

Forgot password? Click here to reset