 # Opportunity costs in the game of best choice

The game of best choice, also known as the secretary problem, is a model for sequential decision making with many variations in the literature. Notably, the classical setup assumes that the sequence of candidate rankings is uniformly distributed over time and that there is no expense associated with the candidate interviews. Here, we weight each ranking permutation according to the position of the best candidate in order to model costs incurred from conducting interviews with candidates that are ultimately not hired. We compare our weighted model with the classical (uniform) model via a limiting process. It turns out that imposing even infinitesimal costs on the interviews results in a probability of success that is about 28 classical case.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1. Introduction

The game of best choice, or secretary problem, is a model for sequential decision making. In the simplest variant, an interviewer evaluates a pool of candidates one by one. After each interview, the interviewer ranks the current candidate against all of the candidates interviewed so far, and decides whether to accept the current current candidate (ending the game) or to reject the current candidate (in which case, they cannot be recalled later). The goal of the game is to hire the best candidate out of . It turns out that the optimal strategy for large is to reject an initial set of candidates and hire the next candidate who is better than all of them (or the last candidate if no subsequent candidate is better). The probability of hiring the best candidate out of with this strategy also approaches . See [GM66] for an introduction to these results. Many other variations and some history have been given in [Fer89] and [Fre83].

We model interview orderings as permutations. The permutation of is expressed in one-line notation as where the consist of the elements (so each element appears exactly once). In the best choice game, is the rank of the th candidate interviewed in reality, where rank is best and is worst. What the player sees at each step, however, are relative rankings. For example, corresponding to the interview order , the player sees the sequence of permutations and must use only this information to determine when to accept a candidate, thereby ending the game.

Let be the set of all permutations of size . Given some statistic and a positive real number

, we define a discrete probability distribution on

via

 f(π)=θc(π)∑π∈SNθc(π).

Given a sequence of distinct integers, we define its flattening to be the unique permutation of having the same relative order as the sequence. Given a permutation , define the th prefix flattening, denoted , to be the permutation obtained by flattening the sequence . In the weighted game of best choice, introduced in [Jon19], some is chosen randomly, with probability , and each prefix flattening is presented sequentially to the player. If the player stops at value , they win; otherwise, they lose. We are interested in calculating the win probability, under optimal play, for finite as well as in the limit as .

In this note, we follow a suggestion by the first author to let be the position of the largest element in , indexed starting from ; that is, . Equivalently, this is the number of “wasted” interviews required before we can hire the best candidate. Setting has the effect of imposing a multiplicative cost of on each wasted interview. For example, the best candidate being hired immediately will contribute (before normalization) to the win probability, whereas each failed interview reduces the contribution of an eventually successful hire by a factor of . This weighted model is relevant when the interviews themselves are costly, or if time spent interviewing detracts from the time spent working productively such as when the position being filled is only for a limited term or requires a substantial training investment. Also, observe that when , we recover the complete uniform distribution on , corresponding to the classical model.

We obtain some interesting behavior vis-à-vis the classical model. The optimal strategy is still positional, for which we reject about initial candidates and select the next best candidate. As and , however, this strategy succeeds about of the time even though we have a success rate at . That is, the asymptotically optimal strategy does not vary continuously with the parameter which seems to limit the durability of any “policy advice” derived from the classical model (such as e.g. [SV99]). We found a similar discontinuity in the optimal strategy for the Mallows model in [Jon19], although the success probability there still approached . In the present model, both the strategy and probability of success are discontinuous. Evidently, there is a “price” of about in the asymptotic success rate for imposing any wasted interview penalty, no matter how small.

Although there is an established “full-information” version of the game in which the player observes values from a given distribution, it seems that only a few papers have considered nonuniform rank distributions for the secretary problem. Pfeifer [Pfe89]

considers the case where interview ranks are independent but have cumulative distribution functions containing parameters determined by the interview positions. The paper

[RF88] considers an explicit continuous probability distribution that allows for dependencies between nearby arrival ranks via a single parameter. Inspired by approximation theory, the paper [KKN15] studies some general properties of non-uniform rank distributions in the secretary problem. Our work also fits into a recent stream of asymptotic results for random permutations by researchers in algebraic combinatorics such as [MP14, CDE18, ABNP16].

## 2. The Model

The left-to-right maxima in a permutation consist of elements that are larger in value than every element to the left (i.e. for ). In the game of best choice, it is never optimal to select a candidate that is not a left-to-right maximum. A positional strategy for the game of best choice is one in which the interviewer transitions from rejection to hiring based only on the position of the interview (as opposed to adjusting the transition based on the prefix flattenings that are encountered). More precisely, the interviewer may play the -positional strategy on a permutation by rejecting candidates and then accepting the next left-to-right maximum thereafter. We say that a particular interview rank order is -winnable if transitioning from rejection to hiring after the th interview captures the best candidate. For example, is -winnable for , and . It is straightforward to verify that a permutation is -winnable precisely when position lies between the last two left-to-right maxima in .

It follows from the results in [Jon19, Section 3] that the optimal strategy in our game of best choice is positional 111Briefly, our statistic is essentially prefix equivariant [Jon19, Definition 3.2] in the sense that for all eligible prefixes . This is enough to obtain the results in [Jon19, Theorem 3.4] and subsequently [Jon19, Theorem 3.7]., and we let

 WN(r)=∑r-winnable π∈SNθπ−1(N)−1.
###### Theorem 2.1.

We have the recurrence

 WN(r)=(N−1)WN−1(r)+r(N−2)!θN−1

with initial conditions and for all .

###### Proof.

There are two cases for the -winnable permutations of . If does not lie in the last position, then we may view the initial segment of uniquely as an -winnable permutation of by flattening. Since there are possible values for the last position, this case contributes to . If lies in the last position, then will be winnable if and only if lies in one of the first positions of . For each of these choices, we may permute the remaining entries in ways, so these contribute all together. ∎

###### Corollary 2.2.

We have

 WN(r)=⎧⎪⎨⎪⎩(N−1)! if r=0 (N−1)! rN−1∑i=rθii if 1≤r≤N−1
###### Proof.

This follows from Theorem 2.1 by induction. ∎

###### Theorem 2.3.

Fix some positive . The probability of winning the game of best choice using the strategy that rejects initial candidates is

 Pr(N,θ)=r(1−θ)∑N−1i=rθii1−θN

if and is if .

###### Proof.

By definition, the probability of winning is

 (∑r-winnable π∈SNθc(π))/(∑π∈SNθc(π))=WN(r)(N−1)!(1+θ+θ2+⋯+θN−1).

The result then follows from the previous corollary. ∎

## 3. Results

Suppose now that , and take the limit as . Then the probability of success for the strategy that initially rejects candidates becomes

 Pr(θ)=r(1−θ)∞∑i=rθii.

We obtain a curve for each nonnegative value of (interpreting as ), the first several of which we have plotted in Figure 1. For each value of , one of the curves is maximal, yielding the optimal strategy and probability of success. For example,

0 1 2 3
is optimal for (0, 0.6321] [0.6321, 0.7968] [0.7968, 0.8609] [0.8609, 0.8945]
###### Lemma 3.1.

For each , the intersection of and coincides with the maximum value of .

###### Proof.

To see this, the derivative of with respect to is

 r(1−θ)∞∑i=rθi−1−r∞∑i=rθii=r(1−θ)θr−11−θ−r∞∑i=rθii=r(θr−1−∞∑i=rθii).

whereas the successive differences are

 (1−θ)(((r−1)−r)∞∑i=rθii)+(r−1)(1−θ)θr−1r−1=(1−θ)(θr−1−∞∑i=rθii).

Hence,

 Pr−1−Pr=1−θrdPrdθ

so the successive differences and derivatives have the same zeros. ∎

The first intersection occurs at with value

. Subsequent intersections can be estimated numerically but have no elementary closed form:

 rddθPrsolution for ddθPr=0value of Pr1ln(−θ+1)+1θ=1−1/e≈0.632121/e≈0.3678824θ+2ln(−θ+1)θ≈0.7968120.32380539/2θ2+3θ+3ln(−θ+1)θ≈0.8609170.309256416/3θ3+2θ2+4θ+4ln(−θ+1)θ≈0.8944570.302113525/4θ4+5/3θ3+5/2θ2+5θ+5ln(−θ+1)θ≈0.9150090.297883

Thus, the optimal strategy and probability of success is determined by , the maximum function in the regime determined by . By Lemma 3.1, is monotonically decreasing and bounded below. Hence, there is a limiting value as . However, the limit is clearly bounded away from , which is the value at according to the classical analysis. Our goal in this section is to determine more precisely.

Recall the exponential integral

 E1(x)=∫∞xe−tt dt

which we view as a function of a positive real variable (see e.g. [OLBC10]). This is a standard special function implemented in many mathematical software systems.

For our main result, we consider the maximum value attained by the related function on ; see Figure 2 for a plot. Although there is no elementary form for this maximum, it occurs where so can be estimated numerically to arbitrary precision. Let and be defined by and . Then, and .

We are now in a position to give our main result.

###### Theorem 3.2.

As approaches from the left, the optimal strategy in our asymptotic weighted game of best choice approaches a positional strategy that rejects initial candidates and selects the next candidate better than all of them. This strategy has a success probability of .

###### Proof.

We would like to optimize for large and chosen appropriately close to . We estimate the series by viewing it as a left or right sum for the corresponding integrals:

 ∫∞t=rθtt dt<∞∑i=rθii<∫∞t=rθt−1t−1 dt=∫∞t=r−1θtt dt.

Hence, we may approximate by with error less than

 r(1−θ)θr−1r−1<4(1−θ)θr

since the integrand is decreasing, , and .

Next, we change variables from to , and from to in the integral. We obtain so

 ˜Pc(θ)=c∫∞u=c(θ1/(1−θ))uu du.

and our error estimate for becomes .

Now, we are in a position to take the limit as , using . This forces by our error estimate, and

 ˜Pc→c∫∞u=ce−uu du=cE1(c).

Optimizing this function for then determines the asymptotically optimal positional strategy (where we reject initial candidates) and probability of success. ∎

We can also solve the model when . One interpretation here is that there is some “trend” in the candidate pool (e.g. due to changes in general economic conditions such as unemployment or interest rates) that is amplifying the probability of seeing the best candidate later. Once again, we find that including even an infinitesimal trend completely changes the optimal asymptotic strategy.

###### Theorem 3.3.

If , the probability of success for the strategy that initially rejects candidates approaches , as . Hence, the asymptotic model does not depend on .

###### Proof.

Recall Theorem 2.3; we claim

 limN→∞PN/λ(N,θ)=limN→∞N(1−θ)λ(1−θN)N−1∑i=N/λθii=1λ.

To see why, consider the “almost telescoping” sum

 (1−θ)N−1∑i=N/λθii=1N/λθN/λ−1N−1θN−N−1∑i=(N/λ)+11i(i−1)θi.

If we divide by and take the limit term by term with , we find that only the leading (i.e. middle) term survives. Hence, our limit is . ∎

Thus, for large we find that it is optimal to choose the last candidate (and win almost all of the time!), obtaining another discontinuity with the and classical models.

Although we were unaware of it until after our work was accepted for publication, the paper [Ras75] solves a very similar problem to the one we are considering. More specifically, we compute in Theorem 2.3

the probability of winning the game of best choice under a non-uniform distribution whereas Rasmussen–Pliska compute in their Equation (2.6) the expected value of a random variable representing the non-uniform payoff for the game played on a uniform distribution. For any particular

and , these problems are dual to each other in the sense that their corresponding formulas are off by the multiplicative constant . Since our weights form a probability distribution, we believe our model facilitates a clearer comparison with the classical secretary problem.

When , Rasmussen–Pliska obtain an asymptotic estimate for the optimal strategy that agrees with ours, using different methods. They also note that, for fixed , their expected payoff tends to as tends to infinity (because their denominator does not scale with ). By contrast, our model has nonzero probabilities, as tends to infinity, given by the value of where is optimal for the fixed .

## Acknowledgements

This material is based upon work supported by the National Science Foundation under Grant Number 1560151. It was initiated during the summer of 2018 at a research experience for undergraduates (REU) site at James Madison University mentored by the second and fourth authors.

## References

• [ABNP16] Nicolas Auger, Mathilde Bouvel, Cyril Nicaud, and Carine Pivoteau. Analysis of algorithms for permutations biased by their number of records. In Proceedings of the 27th International Conference on Probabilistic, Combinatorial and Asymptotic Methods for the Analysis of Algorithms—AofA’16, page 12. Jagiellonian Univ., Dep. Theor. Comput. Sci., Kraków, 2016.
• [CDE18] Harry Crane, Stephen DeSalvo, and Sergi Elizalde. The probability of avoiding consecutive patterns in the Mallows distribution. Random Structures Algorithms, 53(3):417–447, 2018.
• [Fer89] Thomas S. Ferguson. Who solved the secretary problem? Statist. Sci., 4(3):282–296, 1989. With comments and a rejoinder by the author.
• [Fre83] P. R. Freeman. The secretary problem and its extensions: a review. Internat. Statist. Rev., 51(2):189–206, 1983.
• [GM66] John P. Gilbert and Frederick Mosteller. Recognizing the maximum of a sequence. J. Amer. Statist. Assoc., 61:35–73, 1966.
• [Jon19] Brant Jones. Weighted games of best choice. arXiv:1902.10163 [math.CO], 2019.
• [KKN15] Thomas Kesselheim, Robert Kleinberg, and Rad Niazadeh. Secretary problems with non-uniform arrival order. In

STOC’15—Proceedings of the 2015 ACM Symposium on Theory of Computing

, pages 879–888. ACM, New York, 2015.
• [MP14] Sam Miner and Igor Pak. The shape of random pattern-avoiding permutations. Adv. in Appl. Math., 55:86–130, 2014.
• [OLBC10] Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert, and Charles W. Clark, editors. NIST handbook of mathematical functions. U.S. Department of Commerce, National Institute of Standards and Technology, Washington, DC; Cambridge University Press, Cambridge, 2010. With 1 CD-ROM (Windows, Macintosh and UNIX).
• [Pfe89] Dietmar Pfeifer. Extremal processes, secretary problems and the law. J. Appl. Probab., 26(4):722–733, 1989.
• [Ras75] W. T. Rasmussen and S. R. Pliska. Choosing the maximum from a sequence with a discount function. Appl. Math. Optimization, 2:279–289, 1975.
• [RF88] J. H. Reeves and V. F. Flack. A generalization of the classical secretary problem: dependent arrival sequences. J. Appl. Probab., 25(1):97–105, 1988.
• [SV99] Dimitris A. Sardelis and Theodoros M. Valahas. Decision making: a golden rule. Amer. Math. Monthly, 106(3):215–226, 1999.