Piecewise-Stationary Multi-Objective Multi-Armed Bandit with Application to Joint Communications and Sensing

02/10/2023
by   Amir Rezaei Balef, et al.
0

We study a multi-objective multi-armed bandit problem in a dynamic environment. The problem portrays a decision-maker that sequentially selects an arm from a given set. If selected, each action produces a reward vector, where every element follows a piecewise-stationary Bernoulli distribution. The agent aims at choosing an arm among the Pareto optimal set of arms to minimize its regret. We propose a Pareto generic upper confidence bound (UCB)-based algorithm with change detection to solve this problem. By developing the essential inequalities for multi-dimensional spaces, we establish that our proposal guarantees a regret bound in the order of γ_Tlog(T/γ_T) when the number of breakpoints γ_T is known. Without this assumption, the regret bound of our algorithm is γ_Tlog(T). Finally, we formulate an energy-efficient waveform design problem in an integrated communication and sensing system as a toy example. Numerical experiments on the toy example and synthetic and real-world datasets demonstrate the efficiency of our policy compared to the current methods.

READ FULL TEXT
research
06/09/2023

Distributed Consensus Algorithm for Decision-Making in Multi-agent Multi-armed Bandit

We study a structured multi-agent multi-armed bandit (MAMAB) problem in ...
research
11/08/2017

A Change-Detection based Framework for Piecewise-stationary Multi-Armed Bandit Problem

The multi-armed bandit problem has been extensively studied under the st...
research
03/11/2018

Combinatorial Multi-Objective Multi-Armed Bandit Problem

In this paper, we introduce the COmbinatorial Multi-Objective Multi-Arme...
research
11/22/2021

Decentralized Multi-Armed Bandit Can Outperform Classic Upper Confidence Bound

This paper studies a decentralized multi-armed bandit problem in a multi...
research
09/12/2019

Be Aware of Non-Stationarity: Nearly Optimal Algorithms for Piecewise-Stationary Cascading Bandits

Cascading bandit (CB) is a variant of both the multi-armed bandit (MAB) ...
research
11/11/2021

Solving Multi-Arm Bandit Using a Few Bits of Communication

The multi-armed bandit (MAB) problem is an active learning framework tha...
research
05/30/2019

Multi-Objective Generalized Linear Bandits

In this paper, we study the multi-objective bandits (MOB) problem, where...

Please sign up or login with your details

Forgot password? Click here to reset