Actively Tracking the Optimal Arm in Non-Stationary Environments with Mandatory Probing

05/20/2022
by   Gourab Ghatak, et al.
0

We study a novel multi-armed bandit (MAB) setting which mandates the agent to probe all the arms periodically in a non-stationary environment. In particular, we develop that balances the regret guarantees of classical Thompson sampling (TS) with the broadcast probing (BP) of all the arms simultaneously in order to actively detect a change in the reward distributions. Once a system-level change is detected, the changed arm is identified by an optional subroutine called group exploration (GE) which scales as log_2(K) for a K-armed bandit setting. We characterize the probability of missed detection and the probability of false-alarm in terms of the environment parameters. The latency of change-detection is upper bounded by √(T) while within a period of √(T), all the arms are probed at least once. We highlight the conditions in which the regret guarantee of outperforms that of the state-of-the-art algorithms, in particular, and . Furthermore, unlike the existing bandit algorithms, can be deployed for applications such as timely status updates, critical control, and wireless energy transfer, which are essential features of next-generation wireless communication networks. We demonstrate the efficacy of by employing it in a n industrial internet-of-things (IIoT) network designed for simultaneous wireless information and power transfer (SWIPT).

READ FULL TEXT

page 1

page 11

research
05/30/2021

Kolmogorov-Smirnov Test-Based Actively-Adaptive Thompson Sampling for Non-Stationary Bandits

We consider the non-stationary multi-armed bandit (MAB) framework and pr...
research
09/06/2020

A Change-Detection Based Thompson Sampling Framework for Non-Stationary Bandits

We consider a non-stationary two-armed bandit framework and propose a ch...
research
02/22/2018

Regional Multi-Armed Bandits

We consider a variant of the classic multi-armed bandit problem where th...
research
05/24/2023

An Evaluation on Practical Batch Bayesian Sampling Algorithms for Online Adaptive Traffic Experimentation

To speed up online testing, adaptive traffic experimentation through mul...
research
02/18/2020

Intelligent and Reconfigurable Architecture for KL Divergence Based Online Machine Learning Algorithm

Online machine learning (OML) algorithms do not need any training phase ...
research
02/28/2019

Constrained Thompson Sampling for Wireless Link Optimization

Wireless communication systems operate in complex time-varying environme...
research
10/23/2021

The Countable-armed Bandit with Vanishing Arms

We consider a bandit problem with countably many arms, partitioned into ...

Please sign up or login with your details

Forgot password? Click here to reset