Policy Evaluation in Decentralized POMDPs with Belief Sharing

02/08/2023
by   Mert Kayaalp, et al.
0

Most works on multi-agent reinforcement learning focus on scenarios where the state of the environment is fully observable. In this work, we consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly. Instead, agents can only have access to noisy observations and to belief vectors. It is well-known that finding global posterior distributions under multi-agent settings is generally NP-hard. As a remedy, we propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network. In addition to the exchange of the beliefs, agents exploit the communication network by exchanging value function parameter estimates as well. We analytically show that the proposed strategy allows information to diffuse over the network, which in turn allows the agents' parameters to have a bounded difference with a centralized baseline. A multi-sensor target tracking application is considered in the simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/17/2018

Multi-Agent Fully Decentralized Value Function Learning with Linear Convergence Rates

This work develops a fully decentralized multi-agent algorithm for polic...
research
04/27/2023

Decentralized Inference via Capability Type Structures in Cooperative Multi-Agent Systems

This work studies the problem of ad hoc teamwork in teams composed of ag...
research
05/13/2019

Multi-Agent Image Classification via Reinforcement Learning

We investigate a classification problem using multiple mobile agents tha...
research
01/21/2023

Decentralized Multi-agent Filtering

This paper addresses the considerations that comes along with adopting d...
research
07/21/2021

Multi-Agent Belief Sharing through Autonomous Hierarchical Multi-Level Clustering

Coordination in multi-agent systems is challenging for agile robots such...
research
06/26/2019

Reasoning about Hypothetical Agent Behaviours and their Parameters

Agents can achieve effective interaction with previously unknown other a...
research
03/03/2022

SMA-NBO: A Sequential Multi-Agent Planning with Nominal Belief-State Optimization in Target Tracking

In target tracking with mobile multi-sensor systems, sensor deployment i...

Please sign up or login with your details

Forgot password? Click here to reset