Cost-aware Cascading Bandits

05/22/2018
by   Ruida Zhou, et al.
0

In this paper, we propose a cost-aware cascading bandits model, a new variant of multi-armed ban- dits with cascading feedback, by considering the random cost of pulling arms. In each step, the learning agent chooses an ordered list of items and examines them sequentially, until certain stopping condition is satisfied. Our objective is then to max- imize the expected net reward in each step, i.e., the reward obtained in each step minus the total cost in- curred in examining the items, by deciding the or- dered list of items, as well as when to stop examina- tion. We study both the offline and online settings, depending on whether the state and cost statistics of the items are known beforehand. For the of- fline setting, we show that the Unit Cost Ranking with Threshold 1 (UCR-T1) policy is optimal. For the online setting, we propose a Cost-aware Cas- cading Upper Confidence Bound (CC-UCB) algo- rithm, and show that the cumulative regret scales in O(log T ). We also provide a lower bound for all α-consistent policies, which scales in Ω(log T ) and matches our upper bound. The performance of the CC-UCB algorithm is evaluated with both synthetic and real-world data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/17/2020

Stochastic Bandits with Linear Constraints

We study a constrained contextual linear bandit setting, where the goal ...
research
04/11/2018

Cost-Aware Learning and Optimization for Opportunistic Spectrum Access

In this paper, we investigate cost-aware joint learning and optimization...
research
03/05/2022

Online List Labeling: Breaking the log^2n Barrier

The online list labeling problem is an algorithmic primitive with a larg...
research
12/03/2020

Distributed Thompson Sampling

We study a cooperative multi-agent multi-armed bandits with M agents and...
research
01/08/2013

Linear Bandits in High Dimension and Recommendation Systems

A large number of online services provide automated recommendations to h...
research
02/10/2015

Cascading Bandits: Learning to Rank in the Cascade Model

A search engine usually outputs a list of K web pages. The user examines...
research
02/22/2023

When Combinatorial Thompson Sampling meets Approximation Regret

We study the Combinatorial Thompson Sampling policy (CTS) for combinator...

Please sign up or login with your details

Forgot password? Click here to reset