DeepAI

Weighted games of best choice

The game of best choice (also known as the secretary problem) is a model for sequential decision making with a long history and many variations. The classical setup assumes that the sequence of candidate rankings are uniformly distributed. Given a statistic on permutations, one can generalize the uniform distribution on the symmetric group by weighting each permutation according to an exponential function in the statistic. We play the game of best choice on the Ewens and Mallows distributions that are obtained in this way from the number of left-to-right maxima and number of inversions in the permutation, respectively. For each of these, we give the optimal strategy and probability of winning. We also introduce a general class of permutation statistics that always produces games of best choice whose optimal strategy is positional. Specializing this result produces a new proof of a foundational result from the literature on the secretary problem.

03/05/2019

Opportunity costs in the game of best choice

The game of best choice, also known as the secretary problem, is a model...
11/16/2020

Finding the Second-Best Candidate under the Mallows Model

The well-known secretary problem in sequential analysis and optimal stop...
09/09/2020

Refined Wilf-equivalences by Comtet statistics

We launch a systematic study of the refined Wilf-equivalences by the sta...
07/23/2022

Constructive comparison in bidding combinatorial games

A class of discrete Bidding Combinatorial Games that generalize alternat...
11/15/2019

Strategy-Stealing is Non-Constructive

In many combinatorial games, one can prove that the first player wins un...
02/28/2022

Choosing on Sequences

The standard economic model of choice assumes that a decision maker choo...
01/02/2020

On permutation quadrinomials and 4-uniform BCT

We study a class of general quadrinomials over the field of size 2^2m wi...

1. Introduction

The game of best choice (or secretary problem) is a model for sequential decision making. In the simplest variant, an interviewer evaluates candidates one by one. After each interview, the interviewer ranks the current candidate against all of the candidates interviewed so far, and decides whether to accept the current current candidate (ending the game) or to reject the current candidate (in which case, they cannot be recalled later). The goal of the game is to hire the best candidate out of . It turns out that the optimal strategy is to reject an initial set of candidates, of size when is large, and use them as a training set by hiring the next candidate who is better than all of them (or the last candidate if no subsequent candidate is better). The probability of hiring the best candidate out of with this strategy also approaches .

The classical model assumes that all interview rank orders are equally likely, which we believe is mathematically expedient but unrealistic. Over the period that the player is conducting the interviews, there may exist extrinsic trends in the candidate pool. Also, as the interviewer ranks the candidates at each step, they acquire information about the domain that allows them to hone the pool to include more relevant candidates. Overall, this results in candidate ranks that are improving over time rather than uniform. We call this process intrinsic learning. As a first step towards understanding such mechanisms, we are interested in studying how different assumptions for the distribution of interview rank orders change the optimal strategy and probability of success in the game of best choice. Continuing work from [FJ18] and [Jon18], we establish in this paper a class of weighted models that generalize the uniform game in a natural way.

We model interview orderings as permutations. The permutation of is expressed in one-line notation as where the consist of the elements (so each element appears exactly once). In the best choice game, is the rank of the th candidate interviewed in reality, where rank is best and is worst. What the player sees at each step, however, are relative rankings. For example, corresponding to the interview order , the player sees the sequence of permutations

 1,12,231,2314,24153,241536,2516374

and must use only this information to determine when to accept a candidate, ending the game. The left-to-right maxima of consist of the elements that are larger in value than all elements lying to the left. The inversions of consist of pairs where .

Now, let be a statistic on the symmetric group of permutations of size . Then we can weight the permutation by where

is a positive real number to obtain a discrete probability distribution on

. When , we recover the uniform model. Distributions of this form were introduced by Mallows where represented some measure of distance from a fixed permutation, typically the identity. More recently, these distributions have been used by researchers in combinatorics; see [CDE18, ABNP16, ABT03] for example. In our first result, Theorem 3.7, we show that for a large class of statistics the optimal strategy in the weighted game of best choice has the same form as that for the classical model: to reject an initial set of candidates and accept the next left-to-right maximum thereafter. This seems mildly surprising given that the full history of relative rankings are available at each step to guide a strategy; evidently, however, the positions of the left-to-right maxima are the only information required to play these games optimally. Also, it seems remarkable that even for where this result is reported to be well-known, we have not encountered reasoning along the lines we have given here. Hence, our proof gives a new perspective on an important result from the literature on the secretary problem; see Remark 3.8.

In our setting, we interpret as a goal for intrinsic learning by the interviewer, over the course of the interviews. The first model is based on the Ewens distribution where is the number of left-to-right maxima in . When , this has the effect of amplifying the probability of experiencing candidates whose rank exceeds that of all earlier candidates. The second model is based on what has become known as the Mallows distribution where is the number of inversions in . When , this has the effect of dampening the probability of experiencing “disappointing pairs,” where an earlier candidate ranks higher than a later candidate. In terms of permutation patterns, the Mallows model weights each permutation by the number of -instances in , which facilitates comparison with results in [FJ18] and [Jon18] for the -avoiding model.

Once we know by Theorem 3.7 that some positional strategy is optimal for each value of , we can define the strategy function for a weighted game of best choice to be the number of candidates that we initially reject in the optimal strategy for value . Hence, is some function from to . In Corollary 4.6, we describe this function precisely for the Ewens model.

When we perform the asymptotic analysis for the Ewens model, we find that the optimal strategy depends on

with the optimal number of initial rejections being . Remarkably though, the probability of success is always , independent of , neither better nor worse than the classical case.

The Mallows model is more subtle. When , the optimal strategy is to reject all but the last candidates and select the next left-to-right maximum thereafter. This “right-justified” strategy succeeds with probability . When , the optimal strategy is “left-justified,” rejecting the first candidates for some depending on but independent of . Consequently, this shows that the classical model (as embedded in the Mallows model) is highly unstable, another reason we find it to be unrealistic: even an infinitesimal change away from , where the asymptotic optimal strategy rejects about of the candidates, results in an optimal strategy that asymptotically rejects either or of the candidates. This suggests a goal for future work of finding models where the asymptotic optimal strategy varies continuously with parameterizations of the distribution.

We now mention some earlier work in this area. Martin Gardner’s 1960 Scientific American column popularized what he called “the game of googol,” although the problem has roots which predate this. His article has been reprinted in [Gar95]. One of the first papers to systematically study the game of best choice in detail is [GM66]. Many other variations and some history have been given in [Fer89] and [Fre83]. Recently, researchers (e.g. [BIKR08]) have begun applying the best-choice framework to online auctions where the “candidate rankings” are bids (that may arrive and expire at different times) and the player must choose which bid to accept, ending the auction.

Only a few papers have previously considered nonuniform distributions for the secretary problem. Pfeifer [Pfe89]

considers the case where interview ranks are independent but have cumulative distribution functions containing parameters determined by the position of the interview. The paper

[RF88] considers an explicit continuous probability distribution that allows for dependencies between nearby arrival ranks via a single parameter. Inspired by approximation theory, the paper [KKN15] studies some general properties of non-uniform rank distributions for the secretary problem.

We now outline the rest of the paper. In Section 2 we review the form of the optimal strategy for games of best choice, and show in Section 3 that any sufficiently local statistic will generate a game with an optimal strategy having the same form as the classical model. In Sections 4 and 5, we obtain precise and asymptotic results for the Ewens model. The Mallows model is treated in Section 6.

2. The weighted game of best choice

Fix a discrete probability distribution on the symmetric group where is the probability of the permutation . This defines a game of best choice as follows.

Definition 2.1.

Given a sequence of distinct integers, we define its flattening to be the unique permutation of having the same relative order as the elements of the sequence. Given a permutation , define the th prefix flattening, denoted , to be the permutation obtained by flattening the sequence . In the game of best choice some is chosen randomly, with probability , and each prefix flattening is presented sequentially to the player. If the player stops at value , they win; otherwise, they lose.

In this work, we are primarily interested in probability distributions obtained from weighting by some statistic and a positive real number via

 f(π)=θc(π)∑π∈SNθc(π).

For example, we may take to be the number of left-to-right maxima in , obtaining the Ewens distribution; if is the number of inversions then we obtain the Mallows distribution. When , we recover the complete uniform distribution.

A strike strategy for a game of best choice is defined by a collection of prefixes from that we call the strike set. To play the strategy on a particular interview ordering , compare prefix flattenings to the strike set at each step. As soon as the th prefix flattening occurs in the strike set, accept the candidate at position and end the game. Otherwise, the strike strategy rejects the candidate at position to continue playing. It follows directly from the definitions that any strategy (including the optimal strategy) for a game of best choice can be represented as a strike strategy because the player has only the relative ranking information captured in the prefix flattenings to guide them as they play. We can visualize the set of all possible prefixes as a tree, partially ordered by containment of prefix flattening, which we call the prefix tree. See Figure 1 for a small example.

To find the optimal strike set, we define several conditional probabilities. For brevity, we say that is -prefixed if it contains as a prefix flattening, and we say that is -winnable if including in the strike set would win the interview order . Explicitly for , we have that is -winnable if is -prefixed and . For each prefix , the strike probability is the probability of winning the game if the prefix is included in the strike set restricted to those interview rank orders having as a prefix. Since each denominator of is just a normalizing constant, we may cancel it obtaining

 S(p)=⎛⎝∑p-winnable π∈SNθc(π)⎞⎠/⎛⎝∑p-prefixed π∈SNθc(π)⎞⎠

In particular, is unless ends in a left-to-right maximum; for this reason we refer to a prefix as eligible if it ends in a left-to-right maximum or has size (included for completeness). Hence, it suffices to restrict our attention to strike sets

1. consisting of prefixes that are eligible, and

2. having no pair of elements such that one contains the other as a prefix flattening, and

3. such that every permutation in contains some element of the strike set as a prefix flattening.

Given this setup, we let be the “open” probability of a win if we play optimally, using any strike set consisting of prefixes that contain (but are not equal to) . Similarly, let be the “closed” probability of a win if we play optimally, using any strike set consisting of prefixes that contain (and may include) . Let each of these be conditional probabilities with standard denominator . Then, it follows directly from the definitions that

 S(¯p)=max(S(p),S(p∘)).

This formula can be used to recursively determine , the globally optimal probability of a win. To also keep track of the globally optimal strategy, let us say that is positive if and negative otherwise. The positive prefixes represent locally optimal strikes.

Theorem 2.2.

With the setup given above, a globally optimal strike set for a weighted game of best choice consists of the subset of positive prefixes that are minimal when partially ordered by prefix-containment. The probability of winning is where we define and use the standard denominator for all strike probabilities.

Proof.

We first show that all of the probabilities are determined. The prefixes of size are positive, which serves as a base case for induction on the size of a prefix. Given the probabilities for prefixes of sizes greater than , the probabilities for each prefix of size can be obtained as , and then the probabilities can be determined from the formula. (The operation represents the probability of a disjoint union of events.) This process is essentially a discrete version of “backwards induction.”

The positive prefixes are locally optimal by definition. If a prefix has no (proper) prefix flattenings that are positive, then must be globally optimal as well. ∎

Example 2.3.

Consider the tree shown in Figure 1 for at . The strike probabilities are illustrated in the figure. Since , we have that is a positive prefix. Similarly, so is also a positive prefix. On the other hand, so is a negative prefix. The optimal strike set consists of (contributing ), (contributing ), (contributing ), and (contributing ), together with all of the prefixes of size that aren’t already related to one of these, each contributing a strike probability of . The optimal probability is the -sum of these contributions, namely .

3. Prefix equivariance and positional strategies

While any game of best choice has an optimal strike strategy, the classical game (where uniformly) is optimized by a positional strategy in which the player rejects the first candidates and accepts the next left-to-right maximum thereafter. From our point of view, this seems to be a minor miracle as there are many antichains in the partial order of prefix-containment that would serve as potentially optimal strike sets but only possible positional strategies. In this section, we give a concrete explanation for this and generalize it to a class of weighted games.

The key idea is to use a fundamental bijection in order to transport structure around the prefix tree. To define it, let be the “open” subforest of prefixes containing (but not equal to) , and let be the “closed” subtree of prefixes containing , including itself.

Definition 3.1.

Suppose that and let be the permutation that rearranges the prefix to give some other prefix of size . We can extend this to an action on , denoted , by similarly permuting the first entries and fixing the last entries of , where is the size of . Then is a bijection from to .

Definition 3.2.

Suppose that the statistic satisfies for all prefixes and all , where is the size of . Then, we say that is a prefix equivariant statistic.

This condition essentially says that the change in the statistic that results from permuting the first entries of is the same as the change that would result if we restricted to entries. Hence, statistics that count sufficiently local phenomena in permutations will be prefix equivariant.

Example 3.3.

It is straightforward to check that  left-to-right maxima in and  inversions in are each prefix equivariant statistics. Explicitly, if has the form of an increasing block of size followed by an arbitrary block, then we may observe that rearranging the first block may change the number of left-to-right maxima within that block, but it cannot change the left-to-right maximal status of any entry in the second block. Similarly, rearranging the first block may change the number of inversions within that block, but it cannot add or remove an inversion pair where the smaller entry lies in the second block. On the other hand, -instances in is not prefix equivariant because, for example, yet even though .

Theorem 3.4.

Any statistic that is prefix equivariant yields strike probabilities that are preserved under the restricted bijection , where is the size of . If is eligible then these probabilities are also preserved under .

Consequently, for (and additionally for if is eligible), we have that

• the probabilities are preserved by , and

• the probabilities are preserved by , and

• if and are eligible, we have is positive if and only if is positive.

Proof.

Fix to be any prefix of size , and let with size . Then, .

Since the weights satisfy for all , we have

 S(σq⋅p)=∑(σq⋅p)-winnable % π∈SNθc(π)∑(σq⋅p)-% prefixed π∈SNθc(π)=∑p-winnable% π∈SNθc(σq⋅π)∑p-% prefixed π∈SNθc(σq⋅π)

which is unless happens to be ineligible and in which case and so .

The probability is an -sum of strike probabilities, say

 S(p∘)=S(r1)⊕S(r2)⊕⋯⊕S(rn),

for some prefixes . Then since is a strike-probability preserving bijection, we have

 S((σq⋅p)∘)=S(σq⋅r1)⊕S(σq⋅r2)⊕⋯⊕S(σq⋅rn)
 =θc(q)−c(12⋯k)θc(q)−c(12⋯k)S(r1)⊕θc(q)−c(12⋯k)θc(q)−c(12⋯k)S(r2)⊕⋯⊕θc(q)−c(12⋯k)θc(q)−c(12⋯k)S(rn)

which is .

The other consequences follow directly from the definitions. ∎

Thus, it suffices to restrict our attention to subtrees lying under increasing prefixes. To avoid clutter in the remainder of the results, we abuse notation to let , and each refer to their numerators over the standard denominator . To ensure that this is valid, we avow that all of our equalities will occur between quantities for which their implied denominators agree.

Theorem 3.5.

For the increasing prefixes and , we have

 S(p∘)=S(¯q)+S(q∘)∑nontrivial permutationsr of q having p as a prefix θc(r)−c(q), and
 S(p)=S(q)∑nontrivial % permutationsr of q having p as a prefix θc(r)−c(q).
Proof.

There are children of in the prefix tree; they are distinguished by their value in the last position. The prefix itself is an eligible child so is the optimal probability under this subtree. The subtrees under each of the other children of are isomorphic to via the bijection where is a nontrivial permutation of having as a prefix. For each that is won in , we have by prefix equivariance, and the first result follows.

The second result is similar. First, observe that none of the -prefixed are -winnable. Each of the -winnable permutations arises by applying one of the to a -winnable permutation . This has the effect of placing the value into position , as desired. Prefix equivariance produces a factor of for each choice of . ∎

Corollary 3.6.

For any increasing prefixes and , we have that if is negative then is negative.

Proof.

Suppose is negative so . Then, by Theorem 3.5 we have

 S(p∘)=S(¯q)+S(q∘)∑nontrivial permutationsr of q having p as a prefix θc(r)−c(q)
 >S(q)⎛⎜ ⎜ ⎜⎝1+∑nontrivial permutationsr of q having p as a prefix θc(r)−c(q)⎞⎟ ⎟ ⎟⎠=S(q)+S(p)≥S(p),

so is negative as well. ∎

Theorem 3.7.

For a weighted game of best choice defined using a prefix equivariant statistic, the optimal strategy is positional.

Proof.

By Corollary 3.6, there exists some such that all of the increasing prefixes with size less or equal to are negative, and all of the increasing prefixes with size greater than are positive. Applying the isomorphisms, the same also serves to separate positive and negative eligible prefixes in the rest of the tree by Theorem 3.4. Hence, the optimal strike strategy coincides with the positional strategy that rejects the first candidates and accepts the next left-to-right maximum thereafter. ∎

Remark 3.8.

It is interesting to compare this proof at with arguments from the literature on the secretary problem. The paper of Gilbert–Mosteller [GM66] is often cited (e.g. by [Fer89]) as a proof of the result that among all potential strategies, the optimal one must be positional. However, their proof only considers strategies that always stop at a single particular position and does not address the possibility that a player may stop at various positions depending on the relative rankings encountered in the game. The paper of Kadison [Kad94] does provide a complete proof of the desired result (along different lines than ours), but it does not seem to have been widely cited.

Example 3.9.

In Figure 1 we have illustrated the prefix tree with ineligible prefixes shown in gray and strike probabilities given below each eligible prefix.

4. Precise results for Ewens distribution

When we weight by left-to-right maxima in , we obtain the Ewens distribution. In this section, we work out the optimal best choice strategy for all .

Definition 4.1.

Let be the polynomial in defined by

 {N}!=θ(θ+1)(θ+2)⋯(θ+(N−1)).

The following result justifies our “-analogue” notation.

Lemma 4.2.

We have

 {N}!=∑π∈SNθ#left-to-right maxima in π.

Hence, the coefficients of in are Stirling numbers (of the type used to count permutations by number of cycles).

Proof.

This is straightforward to prove using induction since we may extend each permutation of by placing one of the values in the last position and arranging the complementary values according to (this does not create a new left-to-right maximum so contributes to ) or by simply appending the value to the last position of (which does create a new left-to-right maximum, so contributes to ).

The equivalence between the number of cycles and number of left-to-right maxima (attributed to Rényi) is accomplished by writing the cycle notation for a permutation using the maximum element in a cycle as the starting point and then arranging the cycles with increasing maximum elements. ∎

We are primarily interested in

 W(N,k)=∑k-winnable π∈SNθ#left-to-% right maxima in π.

Here, we say that is -winnable if it would be won by the positional strategy that rejects the first candidates and accepts the next left-to-right maximum thereafter. Some examples of these polynomials are given in Figure 2. Our next result provides a recursive description for them.

Theorem 4.3.

We have

 W(N,k)=(N−1)W(N−1,k)+(N−2)!(k−1)!θ{k}!

with initial conditions and .

Proof.

We have two cases for the -winnable permutations .

• If the last position contains one of the values , then it is not a left-to-right maximum and we may view the complementary values as some -winnable . Hence, these contribute to .

• If the last position contains then it is a left-to-right maximum and the value must lie in one of the first positions in order for to be -winnable. We can choose the rest of the values to place among the first positions in ways and then permute them, keeping track of the number of left-to-right maxima with . For each of these, we may also then permute the rest of the entries in positions in ways. All together, these contribute

 θ1(N−2k−1){k}!(N−k−1)!=(N−2)!(k−1)!θ{k}!

to .

The initial conditions are immediate. ∎

Now, let . The zeros of these polynomials will determine the intervals of that produce games for which a given positional strategy is optimal. We begin by translating the recurrence.

Corollary 4.4.

We have

 ΔW(N,k)=(N−1)ΔW(N−1,k)+θ2(N−2)!k!{k}!

with initial conditions .

Proof.

By Theorem 4.3, we have

 ΔW(N,k)=W(N,k+1)−W(N,k)=(N−1)(W(N−1,k+1)−W(N−1,k))
 +θ(N−2)!({k+1}!k!−{k}!(k−1)!)
 =(N−1)ΔW(N−1,k)+θ(N−2)!k!{k}!((θ+k)−k)

yielding the result.

The initial conditions follow by subtracting from . ∎

It turns out that we can solve this recurrence.

Theorem 4.5.

We have

 ΔW(N,k)=c1(N,k)((N−1∑i=k+11i)θ−1)θ{k}!

for some which is constant in . Hence, has only real roots. Moreover, the only positive root of occurs at

 θ=1/(N−1∑i=k+11i).
Proof.

We fix and argue by induction. The base case for matches the initial conditions in Corollary 4.4 with .

Now suppose the result holds for . Then, by Corollary 4.4

 ΔW(N,k)=(N−1)(c1((N−2∑i=k+11i)θ−1)θ{k}!)+θ2(N−2)!k!{k}!
 =((N−1)c1((N−2∑i=k+11i)θ−1)+θ(N−2)!k!)θ{k}!
 =(N−1)c1⎛⎜ ⎜ ⎜ ⎜ ⎜⎝⎛⎜ ⎜ ⎜ ⎜ ⎜⎝(N−2)(N−3)⋯(k+1)(N−1)c1+∑k+1≤i≤N−2  ∏k+1≤j≤N−2j≠ij(N−2)(N−3)⋯(k+1)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠θ−1⎞⎟ ⎟ ⎟ ⎟ ⎟⎠θ{k}!.

Now, if we let , we may rewrite the linear term as

 ⎛⎜ ⎜ ⎜ ⎜ ⎜⎝(N−2)(N−3)⋯(k+1)(N−1)(N−2)(N−3)⋯(k+1)+(N−1)∑k+1≤i≤N−2  ∏k+1≤j≤N−2j≠ij(N−1)(N−2)(N−3)⋯(k+1)⎞⎟ ⎟ ⎟ ⎟ ⎟⎠θ−1=(N−1∑i=k+11i)θ−1

obtaining as desired. ∎

Corollary 4.6.

We have

 κN(θ)=⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩0 if 0<θ≤(∑N−1i=11i)−1k if (∑N−1i=k1i)−1<θ≤(∑N−1i=k+11i)−1N−1 if θ>N−1.
Proof.

Since, for fixed , the positive roots from Theorem 4.5 are unique and increasing in , we find that the strategy function is increasing as well. ∎

This completely determines the optimal strategy precisely for all . Some of the cutoff values for are illustrated in Figure 3. We have highlighted the optimal range including corresponding to the classical uniform case.

5. Asymptotic results for Ewens distribution

To facilitate a comparison with the classical case, we can also solve the Ewens model asymptotically.

5.1. Optimal strategy

For fixed and large , the optimal is given by solving

 θ−1=N−1∑i=k1i=N−1∑i=k1i/(N−1)1N−1

for . The latter is a Riemann sum approximation for the integral where and .

Therefore, as we obtain

 θ−1=∫1x1t dt=−lnx

which we can solve for . Thus, the optimal number of initial rejections is approximately for sufficiently large. A plot is shown in Figure 4.

5.2. Optimal probability of success

Reviewing the previous section, we find that can solve explicitly. Each polynomial is just a constant in (that depends on and ) times the polynomial .

Theorem 5.1.

For all and , we have

 W(N,k)=θ{k}!(N−1)!(k−1)!N−1∑i=k1i.

When , we have .

Proof.

This is straightforward to prove by induction from Theorem 4.3. ∎

Since the optimal value of satisfies we can cancel it and the optimal probability simplifies to

 W(N,k){N}!={k}!(k−1)!(N−1)!{N}!.

When is an integer, this is just a ratio of binomial coefficients, but for arbitrary positive real we use gamma functions (see e.g. [OLBC10]). By iterating the recurrence we obtain . Hence,

 {k}!(k−1)!(N−1)!{N}!=Γ(θ+k)Γ(θ)(k−1)!Γ(θ)(N−1)!Γ(θ+N)=Γ(θ+k)Γ(k)Γ(N)Γ(θ+N).

Using , we get

 limN→∞Γ(θ+k)Γ(k)Γ(N)Γ(θ+N)=limN→∞kθNθ=limN→∞(kN)θ=(1e1/θ)θ=1/e.

Remarkably, this probability of success is independent of .

6. Results for Mallows distribution

We now turn to the Mallows distribution defined by inversions in . We begin by working out the standard “-analogue” for this statistic.

Definition 6.1.

Let be the polynomial in defined by . Let be the polynomial in defined by

 [N]!=[N][N−1]⋯[1].
Lemma 6.2.

We have

 [N]!=∑π∈SNθ#inversions in π.
Proof.

This is straightforward to prove using induction since we may extend each permutation of by placing one of the values in the last position and arranging the complementary values according to . This creates new inversions, so contributes to . ∎

Let us redefine

 W(N,k)=∑k-winnable π∈SNθ#% inversions in π

for the Mallows distribution. Our next result provides a recursive description for these polynomials.

Theorem 6.3.

We have

 W(N,k)=θ[N−1]W(N−1,k)+θN−k−1[k][N−2]!

with initial conditions and .

Proof.

We have two cases for the -winnable permutations .

• If the last position contains one of the values , then it contributes to the inversion count and we may view the complementary values as some -winnable . Hence, these contribute to .

• If the last position contains then the value must lie in one of the first positions in order for to be -winnable. These choices for the position of value contribute . For each of these, we choose a permutation of size to fill in the remaining positions, keeping track of the inversions with .

The initial conditions are immediate. ∎

We can solve this recurrence.

Corollary 6.4.

We have

 W(N,k)=θN−k−1[N−1]!N−1∑i=k[k][i]

for and .

Proof.

This is straightforward to prove by induction on from Theorem 6.3. ∎

For this distribution, the precise transition probabilities for each seem to be inaccessible, being roots of polynomials (with complex solutions) that use many repeated root extractions as opposed to the rational numbers we obtained in the Ewens case. However, we obtain some interesting asymptotic results. Figure 5 shows a plot of the optimal success probability for various values of based on the following theorem.

Theorem 6.5.

If then the optimal strategy as becomes large is to reject the first candidates where , and select the next left-to-right maximum thereafter. This strategy succeeds with probability . If , this probability of success simplifies to .

Proof.

Rewriting the probability from Corollary 6.4 gives

 θN−k−1[k][N]N−1∑i=k1[i]=(1−θ)θN−k−11−θk1−θNN−1∑i=k11−θi
 =θN−θN−1−θN−k+θN−k−11−θNN−1∑i=k11−θi.

For fixed , as becomes large, the fraction tends to and the terms in the series become close to so the probability reduces to . By continuity, this sequence converges to a positive value if and only if the sequence converges to a finite value.

So we let obtaining . Differentiating with respect to and set equal to zero, we solve to obtain the optimal . For , we have but we cannot reject more than candidates so max appears in the expression. ∎

The case where is less interesting from our intrinsic learning perspective but we sketch the behavior of these models for mathematical completeness. Taken together, the results also prove that the asymptotically optimal strategy does not vary continuously with the parameter which would seem to limit the durability of any “policy advice” derived from the classical model (such as e.g. [SV99]).