Dynamical Linear Bandits

11/16/2022
by   Marco Mussi, et al.
0

In many real-world sequential decision-making problems, an action does not immediately reflect on the feedback and spreads its effects over a long time frame. For instance, in online advertising, investing in a platform produces an increase of awareness, but the actual reward, i.e., a conversion, might occur far in the future. Furthermore, whether a conversion takes place depends on: how fast the awareness grows, its vanishing effects, and the synergy or interference with other advertising platforms. Previous work has investigated the Multi-Armed Bandit framework with the possibility of delayed and aggregated feedback, without a particular structure on how an action propagates in the future, disregarding possible dynamical effects. In this paper, we introduce a novel setting, the Dynamical Linear Bandits (DLB), an extension of the linear bandits characterized by a hidden state. When an action is performed, the learner observes a noisy reward whose mean is a linear function of the hidden state and of the action. Then, the hidden state evolves according to a linear dynamics, affected by the performed action too. We start by introducing the setting, discussing the notion of optimal policy, and deriving an expected regret lower bound. Then, we provide an any-time optimistic regret minimization algorithm, Dynamical Linear Upper Confidence Bound (DynLin-UCB), that suffers an expected regret of order O(c d sqrt(T)), where c is a constant dependent on the properties of the linear dynamical evolution, and d is the dimension of the action vector. Finally, we conduct a numerical validation on a synthetic environment and on real-world data to show the effectiveness of DynLin-UCB in comparison with several baselines.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/12/2022

Autoregressive Bandits

Autoregressive processes naturally arise in a large variety of real-worl...
research
05/04/2019

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...
research
10/22/2019

Restless Hidden Markov Bandits with Linear Rewards

This paper presents an algorithm and regret analysis for the restless hi...
research
06/18/2020

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect

We study the effect of persistence of engagement on learning in a stocha...
research
10/25/2022

PopArt: Efficient Sparse Regression and Experimental Design for Optimal Sparse Linear Bandits

In sparse linear bandits, a learning agent sequentially selects an actio...
research
02/14/2011

Online Least Squares Estimation with Self-Normalized Processes: An Application to Bandit Problems

The analysis of online least squares estimation is at the heart of many ...
research
12/10/2018

Duelling Bandits with Weak Regret in Adversarial Environments

Research on the multi-armed bandit problem has studied the trade-off of ...

Please sign up or login with your details

Forgot password? Click here to reset