Arm order recognition in multi-armed bandit problem with laser chaos time series

05/26/2020
by   Naoki Narisawa, et al.
0

By exploiting ultrafast and irregular time series generated by lasers with delayed feedback, we have previously demonstrated a scalable algorithm to solve multi-armed bandit (MAB) problems utilizing the time-division multiplexing of laser chaos time series. Although the algorithm detects the arm with the highest reward expectation, the correct recognition of the order of arms in terms of reward expectations is not achievable. Here, we present an algorithm where the degree of exploration is adaptively controlled based on confidence intervals that represent the estimation accuracy of reward expectations. We have demonstrated numerically that our approach did improve arm order recognition accuracy significantly, along with reduced dependence on reward environments, and the total reward is almost maintained compared with conventional MAB methods. This study applies to sectors where the order information is critical, such as efficient allocation of resources in information and communications technology.

READ FULL TEXT

page 2

page 3

page 4

page 5

page 6

page 9

page 11

page 15

research
03/26/2018

Scalable photonic reinforcement learning by time-division multiplexing of laser chaos

Reinforcement learning involves decision making in dynamic and uncertain...
research
07/14/2020

Quantum exploration algorithms for multi-armed bandits

Identifying the best arm of a multi-armed bandit is a central problem in...
research
06/12/2023

Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where...
research
11/07/2016

Reinforcement-based Simultaneous Algorithm and its Hyperparameters Selection

Many algorithms for data analysis exist, especially for classification p...
research
05/12/2022

Controlling chaotic itinerancy in laser dynamics for reinforcement learning

Photonic artificial intelligence has attracted considerable interest in ...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
04/12/2018

Entangled photons for competitive multi-armed bandit problem: achievement of maximum social reward, equality, and deception prevention

The competitive multi-armed bandit (CMAB) problem is related to social i...

Please sign up or login with your details

Forgot password? Click here to reset