On the Hardness of Inventory Management with Censored Demand Data

10/16/2017
by   Gábor Lugosi, et al.
0

We consider a repeated newsvendor problem where the inventory manager has no prior information about the demand, and can access only censored/sales data. In analogy to multi-armed bandit problems, the manager needs to simultaneously "explore" and "exploit" with her inventory decisions, in order to minimize the cumulative cost. We make no probabilistic assumptions---importantly, independence or time stationarity---regarding the mechanism that creates the demand sequence. Our goal is to shed light on the hardness of the problem, and to develop policies that perform well with respect to the regret criterion, that is, the difference between the cumulative cost of a policy and that of the best fixed action/static inventory decision in hindsight, uniformly over all feasible demand sequences. We show that a simple randomized policy, termed the Exponentially Weighted Forecaster, combined with a carefully designed cost estimator, achieves optimal scaling of the expected regret (up to logarithmic factors) with respect to all three key primitives: the number of time periods, the number of inventory decisions available, and the demand support. Through this result, we derive an important insight: the benefit from "information stalking" as well as the cost of censoring are both negligible in this dynamic learning problem, at least with respect to the regret criterion. Furthermore, we modify the proposed policy in order to perform well in terms of the tracking regret, that is, using as benchmark the best sequence of inventory decisions that switches a limited number of times. Numerical experiments suggest that the proposed approach outperforms existing ones (that are tailored to, or facilitated by, time stationarity) on nonstationary demand models. Finally, we extend the proposed approach and its analysis to a "combinatorial" version of the repeated newsvendor problem.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2020

Lenient Regret for Multi-Armed Bandits

We consider the Multi-Armed Bandit (MAB) problem, where the agent sequen...
research
10/27/2011

The multi-armed bandit problem with covariates

We consider a multi-armed bandit problem in a setting where each arm pro...
research
08/18/2011

Doing Better Than UCT: Rational Monte Carlo Sampling in Trees

UCT, a state-of-the art algorithm for Monte Carlo tree sampling (MCTS), ...
research
02/10/2022

Adaptively Exploiting d-Separators with Causal Bandits

Multi-armed bandit problems provide a framework to identify the optimal ...
research
06/02/2021

MNL-Bandit with Knapsacks

We consider a dynamic assortment selection problem where a seller has a ...
research
12/02/2018

Quick Best Action Identification in Linear Bandit Problems

In this paper, we consider a best action identification problem in the s...
research
07/17/2018

Continuous Assortment Optimization with Logit Choice Probabilities under Incomplete Information

We consider assortment optimization in relation to a product for which a...

Please sign up or login with your details

Forgot password? Click here to reset