We consider the offline reinforcement learning problem, where the aim is...
In the classical multi-armed bandit problem, instance-dependent algorith...
We consider the general (stochastic) contextual bandit problem under the...
This work is motivated by a practical concern from our retail partner. W...
This paper investigates the impact of pre-existing offline data on onlin...
We consider the classical stochastic multi-armed bandit problem with a
c...