We consider a novel dynamic pricing and learning setting where in additi...
We study the problem of allocating T sequentially arriving items among n...
We consider the Scale-Free Adversarial Multi Armed Bandit(MAB) problem, ...
We consider a novel formulation of the dynamic pricing and demand learni...
In the classical optimal stopping problem, a player is given a sequence ...
Integer programming (IP) is a general optimization framework widely
appl...
We study dynamic mechanisms for optimizing revenue in repeated auctions,...
We consider a stochastic inventory control problem under censored demand...
In this work, we show that discretizing action space for continuous cont...
We propose to improve trust region policy search with normalizing flows
...
In , the goal is to select k items in a randomly ordered input
so as to ...
We introduce Implicit Policy, a general class of expressive policies tha...
We propose a framework based on distributional reinforcement learning an...
We study revenue optimization in a repeated auction between a single sel...
We study the bandits with delayed anonymous feedback problem, a variant ...
We consider a contextual version of multi-armed bandit problem with glob...
Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
Thompson Sampling is one of the oldest heuristics for multi-armed bandit...