## 1 Introduction

In sequential selection problems a decision maker examines a sequence of observations which appear in random order over some horizon. Each observation can be either accepted or rejected, and these decisions are irrevocable. The objective is to select an element in this sequence to optimize a given criterion. A classical example is the so-called secretary problem

in which the objective is to maximize the probability of selecting the element of the sequence that ranks highest. The existing literature contains numerous settings and formulations of such problems, see, e.g., GiMo, freeman, gnedin-book, ferguson, samuels and F2008; to make more concrete connections we defer further references to the subsequent section where we formulate the class of problems more precisely.

Sequential selection problems are typically solved using the principles of dynamic programming, relying heavily on structure that is problem-specific, and focusing on theoretical properties of the optimal solution; cf. GiMo, gnedin-book and F2008. Consequently, it has become increasingly difficult to discern commonalities among the multitude of problem variants and their solutions. Moreover, the resulting optimal policies are often viewed as difficult to implement, and focus is placed on deriving sub–optimal policies and various asymptotic approximations; see, e.g., mucci-a, FrSa, krieger-ester, and Arlotto, among many others.

In this paper we demonstrate that a wide class of such problems can be solved optimally and in a unified manner. This class includes, but is not limited to, sequential selection problems with no–information, rank–dependent rewards and allows for fixed or random horizons. The proposed solution methodology covers both problems that have been worked out in the literature, albeit in an instance-specific manner, as well as several problems whose solution to the best of our knowledge is not known to date. We refer to Section 2 for details. The unified framework we develop is based on the fact that various sequential selection problems can be reduced, via a conditioning argument, to a problem of optimal stopping for a sequence of independent random variables that are constructed in a special way. The latter is an instance of a more general class of problems, referred to as sequential stochastic assignments, first formulated and solved by DLR (some extensions are given in albright). The main idea of the proposed framework was briefly sketched in [Section 4]GZ; in this paper it is fully fleshed and adapted to the range of problems alluded to above.

The approach we take is operational, insofar as it supports exact and efficient computation of the optimal policies and corresponding optimal values, as well as various other performance metrics. In the words of robbins70, we “put the problem on a computer.” Optimal stopping rules that result from our approach belong to the class of memoryless threshold policies and hence have a relatively simple structure. In particular, the proposed reduction constructs a new sequence of independent random variables, and the optimal rule is to stop the first time instant when the current “observation” exceeds a given threshold. The threshold computation is predicated on the structure of the policy in sequential stochastic assignment problems a lá DLR and albright (as part of the so pursued unification, these problems are also extended in the present paper to the case of a random time horizon). The structure of the optimal stopping rule we derive allows us to explicitly compute probabilistic characteristics and various performance metrics of the stopping time, which, outside of special cases, are completely absent from the literature.

The rest of the paper is structured as follows. Section 2 provides the formulation for the various problem instances that are covered by the proposed unified framework. Section 3 describes the class of stochastic sequential selection problems first formulated in DLR that are central to our solution approach. Section 4 formulates the auxiliary stopping problem, and explains its solution via the mapping to a stochastic assignment problem. It then explains the details of the reduction and the structure of the algorithm that implements our proposed stopping rule. Section 5 presents the implementation of said algorithm to the various sequential selection problems surveyed in Secton 2. We close with a few concluding remarks in Section 6.

## 2 Sequential selection problems

Let us introduce some notation and terminology. Let

be an infinite sequence of independent identically distributed continuous random variables defined on a probability space

. Let be the relative rank of and be the absolute rank of among the first observations (which we also refer to as the problem horizon):Note that with this notation the largest observation has the absolute rank one, and for any . Let and denote the –fields generated by and , respectively; and are the corresponding filtrations. In general, the class of all stopping times of a filtration will be denoted ; i.e., if for all .

Sequential selection problems are classified according to the information available to the decision maker and the structure of the reward function. The settings in which only relative ranks

are observed are usually referred to as no–information problems, whereas full information refers to the case when random variables are available.In this paper we mainly consider the class of problems with no–information and rank–dependent reward. The prototypical sequential selection problem with no–information and rank–dependent reward is formulated as follows; see, e.g., gnedin-krengel.

Problem (A1). Let be a fixed positive integer, and let be a reward function. The average reward of a stopping rule is

and we want to find the rule such that

We are naturally also interested in the computation of the optimal value .

Depending on the reward function we distinguish among the following types of sequential selection problems.

##### Best–choice problems.

The settings in which the reward function is an indicator are usually referred to as best–choice stopping problems. Of special note are the following.

(P1). Classical secretary problem corresponds to the case . Here we want to maximize the probability of selecting the best alternative over all stopping times from . It is well known that the optimal policy will pass on approximately the first observations and select the first subsequent to that which is superior than all previous ones, if such an observation exists; otherwise the last element in the sequence is selected. The limiting optimal value is [Dyn, GiMo]. We refer to ferguson where the history of this problem is reviewed in detail.

(P2). Selecting one of the best values. The problem is usually referred to as the Gusein–Zade stopping problem [GuZa, FrSa]. Here , and the problem is to maximize with respect to . The optimal policy was characterized in GuZa. It is determined by natural numbers and proceeds as follows: pass the first observations and among the subsequent choose the first best observation; if it does not exists then among the set of observations choose one of the two best, etc. GuZa studied the limiting behavior of the numbers as the problem horizon grows large, and showed that . Exact results for the case are given in QuLaw. The above optimal policy requires determination of which is computationally challenging for general and general ; exact values of are not reported in the literature. Based on general asymptotic results of mucci-a, FrSa computed numerically for a range of different values of . The recent paper DiLaRi studies some approximate policies.

(P3). Selecting the th best alternative. In this problem , i.e. we want to maximize the probability of selecting the th best candidate. The problem was explicitly solved for by Rose and Vanderbei2012; the last paper coined the name the postdoc problem for this setting. The optimal policy for is to reject first observations and then select the one which is the second best relative to this previous observation set, if it exits; otherwise the last element in the sequence is selected. The optimal value is . An optimal stopping rule for the case and some results on the optimal value were reported recently in Yao. We are not aware of results on the optimal policy and exact computation of the optimal values for general and . Recently approximate policies were developed in Bruss-2016. The problem of selecting the median value , where

is odd, was considered in Rose-2. It is shown there that

.##### Expected rank type problems.

To this category we attribute problems with reward which is not an indicator function.

(P4). Minimization of the expected rank. In this problem the goal is to minimize with respect to . If we put then

(1) |

This problem was discussed heuristically by lindley and solved by chow. It was shown there that

. The corresponding optimal stopping rule is given by backward induction relations. A simple suboptimal stopping rule which is close to the optimal one was proposed in krieger-ester.(P5). Minimization of the expected squared rank. Based on chow, Robbins-91 developed the optimal policy and computed the asymptotic optimal value in the problem of minimization of with respect to . In particular, he showed that for the optimal stopping rule

Robbins-91 also discussed the problem of minimization of over and mentioned that the optimal stopping rule and optimal value are unknown. As we will demonstrate below, optimal policies for any problem of this type can be easily derived, and the corresponding optimal values are straightforwardly calculated for any fixed .

##### Problems with a random horizon.

The standard assumption in sequential selection problems is that the problem horizon is fixed beforehand, and optimal policies depend critically on this assumption. However, in practical situations may be unknown. This fact motivates settings in which is assumed to be a random variable independent of the observations. A general sequential selection problem with no–information, rank–dependent reward and random horizon can be formulated as follows.

Problem (A2). Let be a positive integer random variable with distribution , , , where may be infinite. Assume that is independent of the sequence . Let be a reward function, and let the reward for stopping at time be provided that . The performance of a stopping rule is measured by

We want to find the stopping rule such that

We are also interested in computation of the optimal value .

The problems (P1)–(P5) discussed above can be all considered under the assumption that the observation horizon is random. Below we discuss the following two problem instances.

(P6). Classical secretary problem with random horizon. The classical secretary problem with random horizon was studied in PS1972. In Problem (P1) where is fixed, the stopping region is an interval of the form for some integer . In contrast to (P1), PS1972 show that for general distributions of

the optimal policy can involve “islands,” i.e., the stopping region can be a union of several disjoint intervals (“islands”). The paper derives some sufficient conditions under which the stopping region is a single interval and presents specific examples satisfying these conditions. In particular, it is shown that in the case of the uniform distribution on

, i.e., , , the stopping region is of the form with , as . The characterization of optimal policies for general distributions of is not available in the existing literature.(P7). Minimization of the expected rank over a random horizon. Consider a variant of Problem (P4) under the assumption that the horizon is a random variable with known distribution. In this setting the loss (the negative reward) for stopping at time is the absolute rank on the event ; otherwise, the absolute rank of the last available observation is received. We want to minimize the expected loss over all stopping rules . This problem has been considered in Gianini-Pettitt. In particular, it was shown there that if is uniformly distributed over then the expected loss tends to infinity as . On the other hand, for distributions which are more “concentrated” around , the optimal value coincides asymptotically with the one for Problem (P4). Below we demonstrate that this problem can be naturally formulated and solved for general distributions of using our proposed unifying framework; the details are given in Section 5.

##### Multiple choice problems.

The proposed framework is also applicable for some multiple choice problems. We review some of these settings below.

(P8). Maximizing the probability of selecting the best observation with choices. Assume that one can make selections, and the reward function equals one if the best observation belongs to the selected subset and zero otherwise. Formally, the problem is to maximize the probability over stopping times from . This problem has been considered in GiMo who gave numerical results for up to ; see also Haggstrom for theoretical results for .

(P9). Minimization of the expected average rank. Assume that choices are possible, and the goal is to minimize the expected average rank of the selected subset. Formally, the problem is to minimize over stopping times of . For related results we refer to Megiddo, kep-jap, kep-aap1 and Nikolaev-Sofronov.

##### Miscellaneous problems.

The proposed framework extends beyond problems with rank–dependent rewards and no–information. The next two problem instances demonstrate such extensions.

(P10). Moser’s problem with random horizon. Let is a sequence of independent identically distributed random variables with distribution and expectation . Let be a positive integer-valued random variable with distribution , , , where .Assume that is independent of the sequence . We observe the sequence , and the reward for stopping at time is provided that ; otherwise the reward is . Formally, we want to maximize

with respect to all stopping times of filtration . The formulation with fixed and uniformly distributed ’s on corresponds to the classical problem of Moser.

(P11). Bruss’ Odds–Theorem. Bruss considered the following optimal stopping problem. Let be independent Benoulli random variables with success probabilities respectively. We observe sequentially and want to stop at the time of the last success, i.e., the problem is to find a stopping time so as the probability is maximized. Odds–Theorem [Bruss, Theorem 1] states that it is optimal to stop at the first time instance such that

with and . This statement has been used in various settings for finding optimal stopping policies. In what follows we will demonstrate that Bruss’ Odds–Theorem can be easily derived using the proposed framework.

## 3 Sequential stochastic assignment problems

The unified framework we propose leverages the sequential assignment model toward the solution of the problems presented in Section 2. In this section we consider two formulations of the stochastic sequential assignment problem: the first is the classical formulation introduced by DLR, while the second one is an extension for random horizon.

### 3.1 Sequential assignment problem with fixed horizon

The formulation below follows the terminology used by DLR. Suppose that jobs arrive sequentially in time, referring henceforth to the latter as the problem horizon. The th job, , is identified with a random variable which is observed. The jobs must be assigned to persons which have known “values” . Exactly one job should be assigned to each person, and after the assignment the person becomes unavailable for the next jobs. If the th job is assigned to the th person then a reward of is obtained. The goal is to maximize the expected total reward.

Formally, assume that are integrable independent random variables defined on probability space , and let be the distribution function of for each . Let denote the –field generated by : , . Suppose that is a random permutation of defined on . We say that is an assignment policy (or simply policy) if for every and . That is, is a policy if it is non–anticipating relative to the filtration so that th job is assigned on the basis of information in . Denote by the set of all policies associated with the filtration .

Now consider the following sequential assignment problem.

Problem (AP1).

Given a vector

, with , we want to maximize the total expected reward with respect to . The policy is called optimal if .

In the sequel the following representation will be useful

here the random variables , are given by the one-to-one correspondence , , . In words, denotes the index of the job to which the th person is assigned.

The structure of the optimal policy is given by the following statement.

###### Theorem 1 (Dlr)

Consider Problem (AP1) with horizon . There exist real numbers ,

such that on the first step, when random variable distributed is observed, the optimal policy is . The numbers do not depend on and are determined by the following recursive relationship

where and are defined to be . At the end of the first stage the assigned is removed from the feasible set and the process repeats with the next observation, where the above calculation is then performed relative to the distribution and so on. Note that , , i.e., is the expected value of the job which is assigned to the th person.

### 3.2 Stochastic sequential assignment problems with random horizon

In practical situations the horizon, or number of available jobs, is often unknown. Under these circumstances the optimal policy of DLR is not applicable. This fact provides motivation for the setting with random number of jobs. Jacobson considered the sequential assignment problem with a random horizon. They show that the optimal solution to the problem with random horizon can be derived from the solution to an auxiliary assignment problem with dependent job sizes. Here we demonstrate that the problem with random horizon is in fact equivalent to a certain version of the sequential assignment problem with fixed horizon and independent job sizes.

Problem (AP2). Let be a positive integer-valued random variable with distribution , , , where can be infinite. Let be an infinite sequence of integrable independent random variables with distributions , independent of . Given real numbers the objective is to maximize the expected total reward over all policies .

In the following statement we show that Problem (AP2) is equivalent to a version of the standard sequential assignment problem with fixed horizon.

###### Theorem 2

In Problem (AP2) assume that and let , . For any one has , and the optimal policy in Problem (AP2) coincides with the optimal policy in Problem (AP1) associated with fixed horizon and job sizes .

Proof : For any we have , and

where we have used the fact that is –measurable, and is independent of . Therefore . Note that are independent random variables, and -fields and are identical. This implies the stated result.

###### Remark 1

To the best of our knowledge, the relation between Problems (AP2) and (AP1) established in Theorem 2 is new. It is worth noting that Jacobson developed an optimal policy by reduction of the problem to an auxiliary one with dependent random variables. In contrast, Theorem 2 shows that the problem with random number of jobs is equivalent to the standard sequential assignment problem with independent random variables which is solved by the procedure of DLR.

###### Remark 2

In Theorem 2 we assume that is finite. Under suitable assumptions on the weights and jobs sizes one can construct –optimal policies for the problem with infinite . However, we do not pursue this direction here.

## 4 A unified approach for solving sequential selection problems

### 4.1 An auxiliary optimal stopping problem

Consider the following auxiliary problem of optimal stopping.

Problem (B). Let be a sequence of integrable independent real-valued random variables with corresponding distributions . For a stopping rule define . The objective is to find the stopping rule such that

Problem (B) is a specific case of the stochastic sequential assignment problem of DLR, and Theorem 1 has immediate implications for Problem (B). The following statement is a straightforward consequence of Theorem 1.

###### Corollary 1

Consider Problem (B). Let be the sequence of real numbers defined recursively by

(2) |

Let

(3) |

then

### 4.2 Reduction to the auxiliary stopping problem

Problems (A1) and (A2) of Section 2 can be reduced to the optimal stopping of a sequence of independent random variables [Problem (B)]. In order to demonstrate this relationship we use well known properties of the relative and absolute ranks. These properties are briefly recalled in the next paragraph.

Let , and let denote then set of all permutations of ; then for all and all . The random variables are independent, and for all . For any and

(4) |

and

(5) |

Now we are in a position to establish a relationship between Problems (A1) and (B).

##### Fixed horizon.

Let

(6) |

It follows from (5) that . Define

(7) |

By independence of the relative ranks, is a sequence of independent random variables.

The relationship between stopping problems (A1) and (B) is given in the next theorem.

###### Theorem 3

Proof : First we note that for any stopping rule one has , where . Indeed,

where we have used the fact that . This implies that . To prove the theorem it suffices to show only that

(8) |

Clearly,

(9) |

Because are independent random variables, and , we have that for any with

(10) |

The statement (8) follows from (9), (10) and Theorem 5.3 of chow-rob-sieg. In fact, (8) is a consequence of the well known fact that randomization does not increase rewards in stopping problems [chow-rob-sieg, Chapter 5]. This concludes the proof.

##### Random horizon.

Next, we establish a correspondence between Problems (A2) and (B). Let

(11) |

where is given in (6), and . Below in the proof of Theorem 4 we show that

Define also

(12) |

###### Theorem 4

Proof : (i). In Problem (A2) the reward for stopping at time is . The expectation of the reward conditional on the observations until time is

(16) |

where we have used (4) and (5) with fixed horizon , and independence of and . Together with (12) this implies that for any . The remainder of the proof proceeds along the lines of the proof of Theorem 3.

(ii). Let be the minimal integer number such that

(17) |

The existence of follows from (13). In view of (16) and (17) for any stopping rule we have , and

This implies (14). In order to prove (15) we note that if is the optimal stopping rule in Problem (A2) then by (14) and definition of

which proves the upper bound in (15). On the other hand, in view of (14)

This concludes the proof.

###### Remark 3

### 4.3 Specification of the optimal stopping rule for Problems (A1) and (A2)

Now, using Theorems 3 and 4, we specialize the result of Corollary 1 for solution of Problems (A1) and (A2). For this purpose we require the following notation:

Note that in Problem (A2) we put for distributions with the finite right endpoint ; otherwise , where is defined in the proof of Theorem 4. With this notation Problem (B) is associated with independent random variables for .

Let denote distinct points of the set , . The distribution of the random variable is supported on the set and given by

(18) | |||||

(19) |

The following statement is an immediate consequence of Corollary 1 and formulas (18)–(19).

###### Corollary 2

Let , where the sequence is given by

(20) | |||