Options as responses: Grounding behavioural hierarchies in multi-agent RL

We propose a novel hierarchical agent architecture for multi-agent reinforcement learning with concealed information. The hierarchy is grounded in the concealed information about other players, which resolves "the chicken or the egg" nature of option discovery. We factorise the value function over a latent representation of the concealed information and then re-use this latent space to factorise the policy into options. Low-level policies (options) are trained to respond to particular states of other agents grouped by the latent representation, while the top level (meta-policy) learns to infer the latent representation from its own observation thereby to select the right option. This grounding facilitates credit assignment across the levels of hierarchy. We show that this helps generalisation---performance against a held-out set of pre-trained competitors, while training in self- or population-play---and resolution of social dilemmas in self-play.

READ FULL TEXT

page 8

page 12

research
03/29/2022

Multi-Agent Asynchronous Cooperation with Hierarchical Reinforcement Learning

Hierarchical multi-agent reinforcement learning (MARL) has shown a signi...
research
02/19/2021

Deep Latent Competition: Learning to Race Using Visual Control Policies in Latent Space

Learning competitive behaviors in multi-agent settings such as racing re...
research
01/05/2019

Hierarchical Reinforcement Learning via Advantage-Weighted Information Maximization

Real-world tasks are often highly structured. Hierarchical reinforcement...
research
10/21/2019

Multi-agent Hierarchical Reinforcement Learning with Dynamic Termination

In a multi-agent system, an agent's optimal policy will typically depend...
research
01/20/2022

Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs

Covering option discovery has been developed to improve the exploration ...
research
09/20/2023

Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering

The application of artificial intelligence to simulate air-to-air combat...
research
06/29/2022

Breaking indecision in multi-agent, multi-option dynamics

How does a group of agents break indecision when deciding about options ...

Please sign up or login with your details

Forgot password? Click here to reset