Learning and Optimization with Seasonal Patterns

05/16/2020
by   Ningyuan Chen, et al.
0

Seasonality is a common form of non-stationary patterns in the business world. We study a decision maker who tries to learn the optimal decision over time when the environment is unknown and evolving with seasonality. We consider a multi-armed bandit (MAB) framework where the mean rewards are periodic. The unknown periods of the arms can be different and scale with the length of the horizon T polynomially. We propose a two-staged policy that combines Fourier analysis with a confidence-bound based learning procedure to learn the periods and minimize the regret. In stage one, the policy is able to correctly estimate the periods of all arms with high probability. In stage two, the policy explores mean rewards of arms in each phase using the periods estimated in stage one and exploits the optimal arm in the long run. We show that our policy achieves the rate of regret Õ(√(T∑_k=1^K T_k)), where K is the number of arms and T_k is the period of arm k. It matches the optimal rate of regret of the classic MAB problem O(√(TK)) if we regard each phase of an arm as a separate arm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Restless Multi-Armed Bandits under Exogenous Global Markov Process

We consider an extension to the restless multi-armed bandit (RMAB) probl...
research
06/28/2021

Dynamic Planning and Learning under Recovering Rewards

Motivated by emerging applications such as live-streaming e-commerce, pr...
research
06/28/2023

Allocating Divisible Resources on Arms with Unknown and Random Rewards

We consider a decision maker allocating one unit of renewable and divisi...
research
10/23/2021

The Countable-armed Bandit with Vanishing Arms

We consider a bandit problem with countably many arms, partitioned into ...
research
04/11/2023

: Fair Multi-Armed Bandits with Guaranteed Rewards per Arm

Classic no-regret online prediction algorithms, including variants of th...
research
12/11/2017

Optimal Odd Arm Identification with Fixed Confidence

The problem of detecting an odd arm from a set of K arms of a multi-arme...
research
05/21/2015

Regulating Greed Over Time

In retail, there are predictable yet dramatic time-dependent patterns in...

Please sign up or login with your details

Forgot password? Click here to reset