Congested Bandits: Optimal Routing via Short-term Resets

01/23/2023
by   Pranjal Awasthi, et al.
0

For traffic routing platforms, the choice of which route to recommend to a user depends on the congestion on these routes – indeed, an individual's utility depends on the number of people using the recommended route at that instance. Motivated by this, we introduce the problem of Congested Bandits where each arm's reward is allowed to depend on the number of times it was played in the past Δ timesteps. This dependence on past history of actions leads to a dynamical system where an algorithm's present choices also affect its future pay-offs, and requires an algorithm to plan for this. We study the congestion aware formulation in the multi-armed bandit (MAB) setup and in the contextual bandit setup with linear rewards. For the multi-armed setup, we propose a UCB style algorithm and show that its policy regret scales as Õ(√(K Δ T)). For the linear contextual bandit setup, our algorithm, based on an iterative least squares planner, achieves policy regret Õ(√(dT) + Δ). From an experimental standpoint, we corroborate the no-regret properties of our algorithms via a simulation study.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2019

OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

We consider the stochastic linear (multi-armed) contextual bandit proble...
research
01/05/2018

Nonparametric Stochastic Contextual Bandits

We analyze the K-armed bandit problem where the reward for each arm is a...
research
05/28/2021

Asymptotically Optimal Bandits under Weighted Information

We study the problem of regret minimization in a multi-armed bandit setu...
research
08/08/2018

Nonparametric Gaussian mixture models for the multi-armed contextual bandit

The multi-armed bandit is a sequential allocation task where an agent mu...
research
03/03/2017

Contextual Multi-armed Bandits under Feature Uncertainty

We study contextual multi-armed bandit problems under linear realizabili...
research
09/14/2020

Dual-Mandate Patrols: Multi-Armed Bandits for Green Security

Conservation efforts in green security domains to protect wildlife and f...
research
10/21/2019

Multi-User MABs with User Dependent Rewards for Uncoordinated Spectrum Access

Multi-user multi-armed bandits have emerged as a good model for uncoordi...

Please sign up or login with your details

Forgot password? Click here to reset