On optimal ordering in the optimal stopping problem
In the classical optimal stopping problem, a player is given a sequence of random variables X_1... X_n with known distributions. After observing the realization of X_i, the player can either accept the observed reward from X_i and stop, or reject the observed reward from X_i and continue to observe the next variable X_i+1 in the sequence. Under any fixed ordering of the random variables, an optimal stopping policy, one that maximizes the player's expected reward, is given by the solution of a simple dynamic program. In this paper, we investigate the relatively less studied question of selecting the order in which the random variables should be observed so as to maximize the expected reward at the stopping time. To demonstrate the benefits of order selection, we prove a novel prophet inequality showing that, when the support of each random variable has size at most 2, the optimal ordering can achieve an expected reward that is within a factor of 1.25 of the expected hindsight maximum; this is an improvement over the corresponding factor of 2 for the worst-case ordering. We also provide a simple O(n^2) algorithm for finding an optimal ordering in this case. Perhaps surprisingly, we demonstrate that a slightly more general case - each random variable X_i is restricted to have 3-point support of form {0, m_i, 1} - is NP-hard, and provide an FPTAS for that case.
READ FULL TEXT