Double-Linear Thompson Sampling for Context-Attentive Bandits

10/15/2020
by   Djallel Bouneffouf, et al.
0

In this paper, we analyze and extend an online learning framework known as Context-Attentive Bandit, motivated by various practical applications, from medical diagnosis to dialog systems, where due to observation costs only a small subset of a potentially large number of context variables can be observed at each iteration;however, the agent has a freedom to choose which variables to observe. We derive a novel algorithm, called Context-Attentive Thompson Sampling (CATS), which builds upon the Linear Thompson Sampling approach, adapting it to Context-Attentive Bandit setting. We provide a theoretical regret analysis and an extensive empirical evaluation demonstrating advantages of the proposed approach over several baseline methods on a variety of real-life datasets

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2017

Context Attentive Bandits: Contextual Bandit with Restricted Context

We consider a novel formulation of the multi-armed bandit model, which w...
research
06/22/2019

A Bandit Approach to Posterior Dialog Orchestration Under a Budget

Building multi-domain AI agents is a challenging task and an open proble...
research
09/20/2021

Asymptotic Optimality for Decentralised Bandits

We consider a large number of agents collaborating on a multi-armed band...
research
09/15/2023

Clustered Multi-Agent Linear Bandits

We address in this paper a particular instance of the multi-agent linear...
research
12/11/2018

Merge Double Thompson Sampling for Large Scale Online Ranker Evaluation

Online ranker evaluation is one of the key challenges in information ret...
research
04/25/2016

Double Thompson Sampling for Dueling Bandits

In this paper, we propose a Double Thompson Sampling (D-TS) algorithm fo...

Please sign up or login with your details

Forgot password? Click here to reset