
Scale Free Adversarial Multi Armed Bandits
We consider the ScaleFree Adversarial Multi Armed Bandit(MAB) problem, ...
read it

Dynamic Pricing and Learning under the Bass Model
We consider a novel formulation of the dynamic pricing and demand learni...
read it

On optimal ordering in the optimal stopping problem
In the classical optimal stopping problem, a player is given a sequence ...
read it

Reinforcement Learning for Integer Programming: Learning to Cut
Integer programming (IP) is a general optimization framework widely appl...
read it

Dynamic First Price Auctions Robust to Heterogeneous Buyers
We study dynamic mechanisms for optimizing revenue in repeated auctions,...
read it

Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management
We consider a stochastic inventory control problem under censored demand...
read it

Discretizing Continuous Action Space for OnPolicy Optimization
In this work, we show that discretizing action space for continuous cont...
read it

Boosting Trust Region Policy Optimization by Normalizing Flows Policy
We propose to improve trust region policy search with normalizing flows ...
read it

Submodular Secretary Problem with Shortlists
In , the goal is to select k items in a randomly ordered input so as to ...
read it

Implicit Policy for Reinforcement Learning
We introduce Implicit Policy, a general class of expressive policies tha...
read it

Exploration by Distributional Reinforcement Learning
We propose a framework based on distributional reinforcement learning an...
read it

Robust Repeated Auctions under Heterogeneous Buyer Behavior
We study revenue optimization in a repeated auction between a single sel...
read it

Bandits with Delayed Anonymous Feedback
We study the bandits with delayed anonymous feedback problem, a variant ...
read it

An efficient algorithm for contextual bandits with knapsacks, and an extension to concave objectives
We consider a contextual version of multiarmed bandit problem with glob...
read it

Further Optimal Regret Bounds for Thompson Sampling
Thompson Sampling is one of the oldest heuristics for multiarmed bandit...
read it

Thompson Sampling for Contextual Bandits with Linear Payoffs
Thompson Sampling is one of the oldest heuristics for multiarmed bandit...
read it
Shipra Agrawal
is this you? claim profile