Cooperative-Competitive Reinforcement Learning with History-Dependent Rewards

10/15/2020
by   Keyang He, et al.
0

Consider a typical organization whose worker agents seek to collectively cooperate for its general betterment. However, each individual agent simultaneously seeks to act to secure a larger chunk than its co-workers of the annual increment in compensation, which usually comes from a fixed pot. As such, the individual agent in the organization must cooperate and compete. Another feature of many organizations is that a worker receives a bonus, which is often a fraction of previous year's total profit. As such, the agent derives a reward that is also partly dependent on historical performance. How should the individual agent decide to act in this context? Few methods for the mixed cooperative-competitive setting have been presented in recent years, but these are challenged by problem domains whose reward functions do not depend on the current state and action only. Recent deep multi-agent reinforcement learning (MARL) methods using long short-term memory (LSTM) may be used, but these adopt a joint perspective to the interaction or require explicit exchange of information among the agents to promote cooperation, which may not be possible under competition. In this paper, we first show that the agent's decision-making problem can be modeled as an interactive partially observable Markov decision process (I-POMDP) that captures the dynamic of a history-dependent reward. We present an interactive advantage actor-critic method (IA2C^+), which combines the independent advantage actor-critic network with a belief filter that maintains a belief distribution over other agents' models. Empirical results show that IA2C^+ learns the optimal policy faster and more robustly than several other baselines including one that uses a LSTM, even when attributed models are incorrect.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2017

Parameter Sharing Deep Deterministic Policy Gradient for Cooperative Multi-agent Reinforcement Learning

Deep reinforcement learning for multi-agent cooperation and competition ...
research
10/31/2021

Decentralized Multi-Agent Reinforcement Learning: An Off-Policy Method

We discuss the problem of decentralized multi-agent reinforcement learni...
research
05/29/2021

MARL with General Utilities via Decentralized Shadow Reward Actor-Critic

We posit a new mechanism for cooperation in multi-agent reinforcement le...
research
12/10/2022

Effects of Spectral Normalization in Multi-agent Reinforcement Learning

A reliable critic is central to on-policy actor-critic learning. But it ...
research
10/06/2021

Can an AI agent hit a moving target?

As the economies we live in are evolving over time, it is imperative tha...
research
07/06/2023

Markov Persuasion Processes with Endogenous Agent Beliefs

We consider a dynamic Bayesian persuasion setting where a single long-li...
research
05/29/2018

The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making

Off-policy reinforcement learning enables near-optimal policy from subop...

Please sign up or login with your details

Forgot password? Click here to reset