Cooperation Speeds Surfing: Use Co-Bandit!

01/23/2019
by   Anuja Meetoo Appavoo, et al.
0

In this paper, we explore the benefit of cooperation in adversarial bandit settings. As a motivating example, we consider the problem of wireless network selection. Mobile devices are often required to choose the right network to associate with for optimal performance, which is non-trivial. The excellent theoretical properties of EXP3, a leading multi-armed bandit algorithm, suggest that it should work well for this type of problem. Yet, it performs poorly in practice. A major limitation is its slow rate of stabilization. Bandit-style algorithms perform better when global knowledge is available, i.e., when devices receive feedback about all networks after each selection. But, unfortunately, communicating full information to all devices is expensive. Therefore, we address the question of how much information is adequate to achieve better performance. We propose Co-Bandit, a novel cooperative bandit approach, that allows devices to occasionally share their observations and forward feedback received from neighbors; hence, feedback may be received with a delay. Devices perform network selection based on their own observation and feedback from neighbors. As such, they speed up each other's rate of learning. We prove that Co-Bandit is regret-minimizing and retains the convergence property of multiplicative weight update algorithms with full information. Through simulation, we show that a very small amount of information, even with a delay, is adequate to nudge each other to select the right network and yield significantly faster stabilization at the optimal state (about 630x faster than EXP3).

READ FULL TEXT
research
12/08/2017

Shrewd Selection Speeds Surfing: Use Smart EXP3!

In this paper, we explore the use of multi-armed bandit online learning ...
research
02/11/2015

Combinatorial Bandits Revisited

This paper investigates stochastic and adversarial combinatorial multi-a...
research
07/09/2018

Delayed Bandit Online Learning with Unknown Delays

This paper studies bandit learning problems with delayed feedback, which...
research
06/04/2021

Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

We study the stochastic Multi-Armed Bandit (MAB) problem with random del...
research
03/13/2023

Best-of-three-worlds Analysis for Linear Bandits with Follow-the-regularized-leader Algorithm

The linear bandit problem has been studied for many years in both stocha...
research
04/28/2019

Periodic Bandits and Wireless Network Selection

Bandit-style algorithms have been studied extensively in stochastic and ...
research
05/04/2018

Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback

The web link selection problem is to select a small subset of web links ...

Please sign up or login with your details

Forgot password? Click here to reset