Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning

02/07/2023
by   Lukas Schäfer, et al.
13

Cooperative multi-agent reinforcement learning (MARL) requires agents to explore to learn to cooperate. Existing value-based MARL algorithms commonly rely on random exploration, such as ϵ-greedy, which is inefficient in discovering multi-agent cooperation. Additionally, the environment in MARL appears non-stationary to any individual agent due to the simultaneous training of other agents, leading to highly variant and thus unstable optimisation signals. In this work, we propose ensemble value functions for multi-agent exploration (EMAX), a general framework to extend any value-based MARL algorithm. EMAX trains ensembles of value functions for each agent to address the key challenges of exploration and non-stationarity: (1) The uncertainty of value estimates across the ensemble is used in a UCB policy to guide the exploration of agents to parts of the environment which require cooperation. (2) Average value estimates across the ensemble serve as target values. These targets exhibit lower variance compared to commonly applied target networks and we show that they lead to more stable gradients during the optimisation. We instantiate three value-based MARL algorithms with EMAX, independent DQN, VDN and QMIX, and evaluate them in 21 tasks across four environments. Using ensembles of five value functions, EMAX improves sample efficiency and final evaluation returns of these algorithms by 53 averaged all 21 tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/23/2021

Cooperative Exploration for Multi-Agent Deep Reinforcement Learning

Exploration is critical for good results in deep reinforcement learning ...
research
12/07/2020

Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

Cooperative multi-agent tasks require agents to deduce their own contrib...
research
01/05/2023

Self-Motivated Multi-Agent Exploration

In cooperative multi-agent reinforcement learning (CMARL), it is critica...
research
12/29/2019

Individual specialization in multi-task environments with multiagent reinforcement learners

There is a growing interest in Multi-Agent Reinforcement Learning (MARL)...
research
07/16/2020

Mixture of Step Returns in Bootstrapped DQN

The concept of utilizing multi-step returns for updating value functions...
research
09/22/2022

MUI-TARE: Multi-Agent Cooperative Exploration with Unknown Initial Position

Multi-agent exploration of a bounded 3D environment with unknown initial...
research
07/16/2021

Robust Risk-Sensitive Reinforcement Learning Agents for Trading Markets

Trading markets represent a real-world financial application to deploy r...

Please sign up or login with your details

Forgot password? Click here to reset