Bandit approach to conflict-free multi-agent Q-learning in view of photonic implementation

12/20/2022
by   Hiroaki Shinkawa, et al.
0

Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2019

Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem

We define and analyze a multi-agent multi-armed bandit problem in which ...
research
11/01/2022

Reinforcement Learning in Education: A Multi-Armed Bandit Approach

Advances in reinforcement learning research have demonstrated the ways i...
research
02/27/2020

A Visual Communication Map for Multi-Agent Deep Reinforcement Learning

Multi-agent learning distinctly poses significant challenges in the effo...
research
02/15/2023

On-Demand Communication for Asynchronous Multi-Agent Bandits

This paper studies a cooperative multi-agent multi-armed stochastic band...
research
01/01/2022

Modelling Cournot Games as Multi-agent Multi-armed Bandits

We investigate the use of a multi-agent multi-armed bandit (MA-MAB) sett...
research
05/19/2022

Parallel bandit architecture based on laser chaos for reinforcement learning

Accelerating artificial intelligence by photonics is an active field of ...
research
07/28/2023

Conflict-free joint decision by lag and zero-lag synchronization in laser network

With the end of Moore's Law and the increasing demand for computing, pho...

Please sign up or login with your details

Forgot password? Click here to reset