Thompson Sampling for Combinatorial Semi-bandits with Sleeping Arms and Long-Term Fairness Constraints

05/14/2020
by   Zhiming Huang, et al.
0

We study the combinatorial sleeping multi-armed semi-bandit problem with long-term fairness constraints (CSMAB-F). To address the problem, we adopt Thompson Sampling (TS) to maximize the total rewards and use virtual queue techniques to handle the fairness constraints, and design an algorithm called TS with beta priors and Bernoulli likelihoods for CSMAB-F (TSCSF-B). Further, we prove TSCSF-B can satisfy the fairness constraints, and the time-averaged regret is upper bounded by N/2η + O(√(mNTln T)/T), where N is the total number of arms, m is the maximum number of arms that can be pulled simultaneously in each round (the cardinality constraint) and η is the parameter trading off fairness for rewards. By relaxing the fairness constraints (i.e., let η→∞), the bound boils down to the first problem-independent bound of TS algorithms for combinatorial sleeping multi-armed semi-bandit problems. Finally, we perform numerical experiments and use a high-rating movie recommendation application to show the effectiveness and efficiency of the proposed algorithm.

READ FULL TEXT
research
01/15/2019

Combinatorial Sleeping Bandits with Fairness Constraints

The multi-armed bandit (MAB) model has been widely adopted for studying ...
research
02/24/2020

Fair Bandit Learning with Delayed Impact of Actions

Algorithmic fairness has been studied mostly in a static setting where t...
research
07/24/2017

Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms: A Case with Bounded Regret

In this paper, we study the combinatorial multi-armed bandit problem (CM...
research
01/17/2021

TSEC: a framework for online experimentation under experimental constraints

Thompson sampling is a popular algorithm for solving multi-armed bandit ...
research
01/17/2023

A Combinatorial Semi-Bandit Approach to Charging Station Selection for Electric Vehicles

In this work, we address the problem of long-distance navigation for bat...
research
04/15/2017

Asynchronous Parallel Empirical Variance Guided Algorithms for the Thresholding Bandit Problem

This paper considers the multi-armed thresholding bandit problem -- iden...
research
11/14/2019

Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems

We use a novel modification of Multi-Armed Bandits to create a new model...

Please sign up or login with your details

Forgot password? Click here to reset