secretaries are interviewed one after the other as candidates for one job position. At the moment of each interview, thedecision maker (DM) acquires information about a candidate’s competence (or quality) which allows her to rank the so far examined candidates and decide when to stop the process by selecting the last candidate interviewed. The DM has no knowledge of who will come later on and her decisions should be immediate and irrevocable after interviewing each candidate. This describes a Sequential Selection Problem (SSP 333Depending on the context, the last letter of the abbreviations SSP and the herein presented MSSP may refer to the respective selection ‘Problems’ or the associated selection ‘Processes’.). The class of SSP problems is attractive for theoretical analysis as well as for practical use, due to its generality and evident relevance to online selection under realistic constraints. Same as in our work, SSP problems are usually presented in the intuitive recruitment context.
The goal of the original problem is to select none but the best among the list of candidates, while in each interview the DM only realizes the relative quality of the examined candidate, that is his relative rank. algorithm, first proposed in , is a cutoff-based approach which comprises two phases: the rejection phase where a number of candidates are automatically rejected, and the selection phase where the first candidate that will be ranked above the best recorded during the first phase is hired (or the last candidate, when the best one did appear during the first phase). In essence, the former phase learns a threshold that is subsequently used in the latter to spot the first candidate that beats it. Significant advantages of any cutoff-based strategy is that they are intuitive and easy to implement. The length of each phase is determined by the a cutoff value and is subject to an exploration-exploitation trade-off
that depends on the considered objective function to optimize. This aspect makes the problem interesting and intriguing. For instance, the optimal strategy that maximizes the probability to find the best candidate requires a cutoff of. Besides, when having access to the actual quality score of a candidate, rather than a relative quality like his rank among the already examined candidates, one may be interested in maximizing the expectation of the score of the selected candidate. For this objective function the optimal cutoff is . Note that the multi-choice problem is a natural extension of the above (see Sec. 2).
Motivation and contribution. Two important limitations that, to the best of our knowledge, characterize the existing SSPs in literature is the fact that: i) firstly, they operate in a single-round where only one sequence of candidates is processed, and ii) they consider a cold-start initialization where there is no assignment of the jobs at the beginning of the round. Our motivation derives from real-world recruitment processes that take place in large organizations or companies whose aim is to dynamically adapt in their operating environments. Typically, an organization has already many employees and the DM has the challenging task to keep the personnel as competitive as possible at any moment in time. Moreover, the DM has to ensure that jobs are always assigned to employees. It is easy to see why such an organization requires constant recruitment processes in parallel to their operation cycle, a setting that goes beyond existing SSPs. For this purpose we introduce a new online-within-online problem, the Multi-round Sequential Selection Problem (MSSP). Specifically, the selection rounds are launched one after the other, each one having at hand a preselection set which is the the output of the previous round. Moreover, in each round new candidates are interviewed also sequentially. The preselection set, that the DM wants to improve in the best possible way. The MSSP brings two new features regarding the preselection: its availability, as employees are allowed to quit their jobs just before the beginning of a round, and its relative quality w.r.t. the new candidates. We propose a novel algorithm associated to the MSSP, called Cutoff-Based Regret Minimization (CRM), that manages the same jobs in each round. As suggested by its name, it is based on a cutoff rule which is thereby required as input. The objective set function that CRM minimizes, termed as regret, is the sum of the ranks of the selected individuals and, as such, it can be applied to any score distribution and performs as efficiently. In our technical contributions, we derive a complete analytical formula for the expectation of the regret. Furthermore, we infer the analytical optimal cutoff , given the number of jobs , the number of candidates , and the number of resignations at the beginning of the round, and for the specific case where the quality of the preselection is set to be . The latter result is then complemented by an empirically generated table of cutoffs that can be plugged into CRM for any arbitrary quality.
The rest of the paper is organized as follows. Sec. 2 mentions related works; Sec. 3 introduces the novel MSSP setting; Sec. 4 details the proposed CRM algorithm and a study of the expectation of its main parameters; Sec. 5 contains our experimental results that validate our analytical results with simulations highlighting the superior performance of our approach compared to existing strategies; Sec. 6 provides all the technical proofs and discussion; finally, our conclusions and future work are included in Sec. 7.
2 Related work
Various extensions of the basic secretary problem have been investigated; see the non-exhaustive surveys in [12, 11]. Important to note, a change in the setting or in the objective function, changes also the optimal cutoff. In some scenarios, the DM can not only compute the relative rank of an interviewed candidate among those examined earlier, but also assess candidate’s true quality score
. This score can be thought of as a random variable associated to each candidate. When the objective is to maximize the expectation of the score of the selected candidate, then the optimal cutoff becomes. On the other end, Robbin’s problem  seeks to minimize the expectation of the rank of the selected candidate (note: low ranks are better). However, the analytical solution to this problem remains unknown, even when the score distribution of the candidates is known.
Notable variants are those related to multiple stopping, or simply -choice, where the DM has to select candidates [15, 13, 4, 2, 16, 1, 5, 18]. In that case, the objective set function can be modular (i.e. equivalent to adding up the application of the function to the set elements independently), submodular , supermodular , or sometimes be subject to matroid constraints [9, 10, 20]. Non-modularity introduces interesting set evaluation aspects, such as the complementarity or mutual-enhancement among the selected candidates, which are however out of the scope of this work. Regarding modular objective functions,  studies the -choice problem with the objective of maximizing the sum of scores of the selected candidates, without assuming prior knowledge of the score distribution. An interesting finding is that the optimal cutoff for that setting does not depend on : .
A very limited number of papers study algorithmic notions related to repeated selections , as well as the human capacity to learn the right cutoff after reviewing multiple independent candidate sets [14, 4]. However, 
develops a non cutoff-based strategy and is implemented regarding two distinct aims: to maximize the probability of selecting the best, or to maximize the expected score of the selected candidate. Its conclusion states that learning the score distribution does contribute to the efficiency of the selection only w.r.t. the second aim. An experimental comparison of simpler and intuitive non cutoff-based heuristics is provided in. More sophisticated adaptive strategies worth mentioning are the
Bruss’ odds theorem and the work in . A rather different scenario concerns a startup company (or a new ambitious business unit) which is initially funded by a handful of people but is about to grow larger. The so-called hiring problem  refers to the SSP process that aims at driving the optimal growth of personnel using an adaptive selection threshold based on the already employed individuals. Among heuristics, such as hiring above the worst or best employees, hiring above the mean employee score shown to be the best performing strategy.
3 The Multi-round SSP
3.1 General setting
The environment of the problem is set on a large population of job-seekers. Each individual has some qualitative skills that are quantified by a single-valued score and a status of availability, which both however may vary through time. The Multi-round Sequential Selection Problem (MSSP) problem is entrusted to a decision maker (DM) who is responsible for managing a limited budget related to non-distinguishable job positions for which she has the authority to hire or fire employees. For the -th sample , she launches interviews. Upon arriving, a candidate reveals his score , with 444For simplicity, refers to the set of all integers contained in the interval , i.e. .. Essentially, each round constitutes a separate Sequential Selection Process (SSP). Unlike traditional cold-starting SSP s that build a selection set from scratch, a round of the MSSP starts with the existing jobs-to-employee assignment decided at the (-1)-th round. At that point, however, this assignment may be partially obsolete due to resignations. This preselection offers information for a warm-start of the -th SSP, but comes with constraints on how jobs can be managed. More specifically, our setting i) allows each position to be (re-)assigned at most once in each round; ii) considers a “only fire on hire” logic where dismissing an employee is only needed when he can be immediately replaced by a better one.
The environment is considered to be fixed during each SSP round. However, the MSSP setting allows changes to occur between any two rounds regarding the score of each individual, and his availability since any employed subject can resign. The process may have an arbitrary number of SSP rounds. Therefore, the challenge for the DM is to adapt or improve the personnel in the course of a multi-round process: at the end of any round that is to have selected the -best individuals she could have chosen while respecting all the above management constraints. Formally, a single SSP round employed in the MSSP model is described in Definition 1.
-th round of the MSSP: The sequential selection process that takes place at round is described by the tuple: , where the elements are:
is the -th sample from the underlying population , which contains the ids of the candidates of the round in the order of appearance;
is the set of quality scores indexed by the order of candidates, and the scores are given by the mapping ;
is a collection of information available to DM that describe the state of the multi-round process;
is the set of all possible actions the DM can take after seeing a candidate and a specific sequence of decisions , with ;
is the regret function, described below (see Eq. 2).
According to Definition 1, the DM needs to determine sequentially the sequence of decisions (e.g. for reject, for accept) that at the end of the round would have minimal regret , defined in Sec. 3.3. Note that, although the score of individual may vary in time, we let that time-dependence to be imposed by its time-dependent input. Thus, our simplified notation merely represents the score measurement regarding the individual who is the -th candidate at round . Finally, the collection may contain information that is related to the general multi-round process and is discussed in the next section.
By considering a cutoff-based approach (see Sec. 4.1) in each round of the MSSP, it does emerge an exploration-exploitation trade-off on how to adjust the length of the rejection phase before starting to select from the candidates. In the MSSP setting, the behavior of the trade-off in each SSP round is complex and interesting since the selection is conditioned by the preselected items and the constraints. The derivation of the cutoff value in the MSSP setting (see Sec. 4.2) is among the major contributions of this work.
3.2 Observable information
At the beginning of a selection round, the DM has at her disposal some information regarding the status of the overall multi-round process. According to Definition 1, the collection may contain variables of multiple types. In this work we consider that it specifically contains:
: the number of candidates to appear, which renders the SSP a finite-horizon process. The knowledge of the sample size may be justified by considering that the DM has limited capacity or time to conduct the interviews.
: a set of previously selected individuals (see Definition 2). In addition, the DM has access to the scores and the availability of that set, i.e. which of the employees have just resigned. It is assumed that resignations occur just before the beginning of an SSP round, so that current known scores for resigned individuals are still accurate. We call the set of previously selected individuals that have not resigned s.t. , and the number of resignations from step to step .
: an estimate measure of therelative quality of the preselection compared to the candidates that are going to be interviewed at this round (see definition below).
Preselection: The preselection for round , denoted as , is composed of the individuals that were previously selected as the output of round . For convenience, is considered sorted in descending score order.
The relevant contribution of the preselection is the fact that, even when all listed individuals have resigned at round
, their scores still provide information for the previous round and specifically regarding the top quantile of the score distribution over the underlying population. Let the latter distribution be denoted as. Essentially, the ‘goodness’ of for , which we call quality of preselection , implies the “small distance” between the respective distributions, and therefore can express how valuable the prior knowledge carried by is for round . Defining and measuring directly is impractical due to lack of information, as well as complicated since the selection process may be affected by: i) changes in the shape of the score distribution that leave the population ranking intact, ii) changes in the ranking whereas the score distribution remains intact, iii) a combination of the previous.
In this work we reduce the complications by assuming that is “sufficiently small” and by defining the true through a rank-based evaluation of w.r.t. . Let us suppose that their scores, and , were known and hence their joint ranking could be computed. Then, the true true relative quality is defined as follows.
True rank-based relative quality of preselection: For SSP, that is the average rank of the individuals that compose the preselection compared to the candidates, normalized by the maximum rank:
where is the function that ranks jointly preselection and candidates. It is defined so that and refers to a highly-skilled preselection.
Estimating the quality . Having a good estimate of the true is crucial for the aims of the DM. In particular, as we discuss in detail in the next sections, this quantity is one of the factors that affects the rejection phase of a cutoff-based SSP (see Sec. 4.2
). The main focus of this work is the investigation of this dependency, however we believe that advanced statistical machine learning methods can be employed for the challenging problem of the estimation ofin the multi-round setting.
3.3 Regret function
During SSP, stores the sequence of actions taken: corresponds to candidate such that if he is accepted and if he is rejected. A policy should optimize an objective function. Instead of minding about the actual selected scores, here we rather choose to minimize the expectation of the following rank-based regret function
which is more robust to highly skewed or changing score distributions:
where is the number of new hires up to step , is the preselection set without the resigned individuals so that , . is the minimal obtainable cost using an offline strategy and is is thoroughly articulate in Sec. 4.2. Without loss of generality, we assume that the output of the SSP will always be employees from the set. If new hires have been decided from , then only the ()-best of the preselection will remain employees. Eq. 15 subtracts the minimal regret from the average true rank of the selected individuals. The reason is that we seek for a policy which would perform as close as possible to the offline case, where the DM would have known the -best scores within the sample and also would have had straight access to the respective candidates to select them. For simplicity, we henceforth refer to a single SSP at a given round and drop the index in our variable notations.
4 An algorithm for the Multi-round SSP
4.1 Proposed policy
There is a series of issues that trouble the DM in this setting. First, she should devise an algorithmic way to incorporate the preselection into the SSP round, mainly because its size (when ) brings a combinatorial aspect in comparing and updating that list. Ideally, we should not only seek for the best reassignment per job, but rather for the altogether best set of job reassignments to optimize the ranks of our selection (see Sec. 3.3). Second, the preselection might not represent well the quality of the new sample of candidates (see Sec. 3.2), hence the need to learn from initially incoming candidates. Inspired by the secretary problem, we develop the Cutoff-based Regret Minimization (CRM) policy:
Cutoff-based Regret Minimization (CRM): A two-phase SSP strategy that proceeds as follows: First, a rejection phase learns the modalities of the sample by rejecting the first candidates. A subsequent selection phase selects immediately and irrevocably an incoming candidate whose score exceeds a threshold which is adjusted using the information collected during the rejection phase. The cutoff-rule should be adjusted with a proper value so as to minimize the regret of the final selection (see Eq. 15).
If denotes the sequence of decisions for the candidates of the sample , then formally the cutoff-rule at candidate implies that: . Recall also that is the number of selections decided during the first steps and, as there are no selections for , it is given by:
The new information that is becoming available to the DM during the rejection phase of the SSP gets incorporated into what we call as reference set (see Definition 5).
Reference set: Set , composed of the -best individuals known by the DM after step of an SSP round. At the beginning of the round, the set gets initialized with the preselection , thereafter is being updated during the sequential examination of candidates up to candidate where it takes its final form . After having seen candidates, the following properties do hold:
The reference set contains the best known individuals so far, no matter if they were already employees or rejected candidates of the first phase. The CRM policy computes a threshold value at any given step that a candidate needs to exceed to be accepted. The threshold may vary along the process depending on the candidates seen so far and the decisions that were made.
Quality threshold: Score value to beat at step of the SSP when the CRM policy is applied with cutoff value :
where with being the number of resigned preselected (change of availability since the end of the previous round) and being the number of rejected candidates that have been added in the reference set. We also define the threshold in terms of ranks, , similar to except that the scores are replaced by their corresponding ranks using the function that ranks jointly the preselection and the candidates. Finally, we define its expectation as:
After the conclusion of the rejection phase, there are no job reassignments yet decided. The CRM policy uses , the final form of the reference set, in order to define a list of thresholds to beat regarding each job individually, and thereby guarantees that every one of them either remains filled by the employee which was determined in the previous round (if he has not resigned and is still at his post), or is reassigned to a strictly better candidate.
4.2 Optimal stopping time for a single SSP round
The key variable that needs to be specified for a CRM policy (see Alg. 1) is the cutoff value , hence we write , where the latter is the set of all possible CRM policies. A given policy proceeds as follows: the -th candidate, refers to as , is hired if its score is better than the current threshold and if less than candidates have been hired so far; or if no competitive candidate has been found throughout the sample so far and some ‘extra’ jobs (i.e. due to resignations) are still vacant and would otherwise be lost. More formally a new hire is defined by:
The goal is to find the best CRM policy s.t.:
The set of candidates is a random variable, hence so are the ranks of the candidates, the ranks of the preselection and the regret computed on them. For simplicity, we write . The expectation of the regret is given in Proposition 4 when taking into account the policy’s specifications. In order to find it, we decompose the process and study each variable involved.
The expectation of the rank of the threshold at the beginning of an SSP, is a function of the quality of the preselection :
Let be the number of resignations from a round to the subsequent one. The expectation of the rank of the -th individual in the preselection that has not resigned, is:
Proofs of Proposition 1 and Proposition 2 are given in Sec. 6. From now on we exclusively study the situation where the true relative quality of the preselection is equal to 1/2 i.e. and are drawn from the same distribution. Indeed, analytical computation of main variables of the problem are extremely challenging when . Nevertheless, we conduct simulations for every value of the quality . Since candidates are uniformly sampled from the population , a candidate has equal probability of having any of the possible relative ranks when (see Lemma 1).
Consider an SSP round with candidates of size , and a preselection of size with true relative quality . The probability for a candidate to have a rank of given that is smaller than the threshold , is given by:
Following Lemma 1, we define the probability for a candidate to have a rank smaller than the threshold :
The probability for the number of new hires, decided up to step , to be smaller than can be approximated by:
where and .
The expected minimal cost an offline algorithm can achieve , is given by:
where is the expected number employees that resigned and that belong to the -best.
Proof of Proposition 4 is given in Sec. 6. It gives a general statement of the expectation of the regret when using CRM to deal with a single SSP. It turns out that this latter equation is true for any value of the relative quality , provided that is replaced by that does depend on . Moreover, the variable that gives the current threshold to be beaten also depends on the relative quality of the preselection. We found that when , the difficulty to find an analytical solution for increases enormously, that is why we performed the theoretical analysis on the specific case where .
At the end of the rejection phase, i.e. after having interviewed candidates without hiring any of them, and before the selection phase, a threshold is given to the DM according to Definition 6 below which she will not hire any incoming candidate. Its expectation in terms of ranks is given by:
We are now able to write Eq. 15 entirely as a function of , and when . We should recall that one of the goal of this paper is to find that verifies Eq. 7 which is equivalent to finding s.t. using Eq. 15. Unfortunately this equation is analytically intractable. However we can easily derive numerically (see Fig. 2).
5.1 Optimal stopping time for a single SSP round
So far we have been analyzing a single SSP round. Let us define as the -th realization of the regret at a specific round number of an MSSP test, where and is the number of tests. The empirical average of the regret is simply:
. Thanks to the strong law of large numbers we have:. In order to guarantee the accuracy of our analytical formula, in Eq. 15, we simulate each SSP for a large number of times, for a fixed number of candidates and a fixed preselection quality . The top row of Fig. 3 displays a heatmap of the empirical average regret (simulated) w.r.t. the number of jobs (x-axis) and the value of the cutoff (y-axis). The white plain line in each heatmap follows the path of the lowest simulated value of the heatmap, referred to as . These plots should be put in comparison with those in the bottom row which show the heatmaps of the expected regret according to our analysis. The white dashed line follows again the path of the lowest heatmap value, which we denote as (see Eq. 7). The respective plot w.r.t. the number of candidates and preselection size , i.e. , is given in Fig. 2. From Fig. 3 it is clear that the law of large number is complied with: lemmas and propositions of Sec. 4.2 are consistent with these experiments.
Note that increases with the number of jobs up to a certain turning point and later decreases. This phenomenon can be explained by the following trade-off: the algorithm has to ensure that the acceptance threshold is high enough so that the accepted candidates are competitive with the rest of the sample; on the other hand, when there are many jobs compared to the number of candidates the DM does not have to be that demanding, hence, should not risk rejecting a good candidate.
5.2 Multi-Round SSP
In the previous section we focused on the optimal stopping of a single SSP with a given number of jobs, a number of candidates, and relative quality of preselection . In this section we intend to plug those results in the multi-round setting (MSSP). For the simulations of this section we use the following parametrization. Firstly, each multi-round simulation considers a population of individuals and for all rounds we fix the number of candidates to . Secondly, the resignation probability for any employee to resign in the time between two subsequent rounds is set equal for all, and is considered to be known in advance by the DM. Therefore we can directly set . Lastly, we use the same stationary score distribution during a simulation in order to generate the realizations of the random variable . Recall that
is the score vector for candidates within the sample of round. We test, though, various score distributions for different simulations.
In Fig. 4, regardless the cutoff definition, the CRM algorithm effectively reduces the regret through the iterations for three considered score distributions: uniform, normal, and exponential. Furthermore, our proposed cutoff (red curves) outperforms other proposals from the general SSP literature, or heuristics like the case. These results show that the rank-based CRM has consistent and similar behavior across different score distributions. For that reason, we simply use the uniform score distribution in the rest or our simulations, i.e. .
CRM has two essential parameters, namely the cutoff and the threshold at step of the process. Fig. 4 provides empirical support that is a good choice. As for the second parameter, we use a changing threshold given in Definition 6 which is adapted to each hiring decision individually as the round proceeds. In Fig. 5 we simulate the MSSP using CRM and compare the adapted threshold (plain lines) against a fixed threshold that corresponds to the score of the worst available employee of the preselection (see Proposition 5). The latter option is shown to be clearly suboptimal, especially as the difference gets smaller, e.g. employees for candidates.
Our final experiments investigate the role of resignations which until now were considered improbable so that we could focus on other aspects of the MSSP. As presented, MSSP allows for preselected individuals to resign their job at the beginning of a round, with probability . Fig. 6, displays the average regret w.r.t. the round number for different resignation probabilities. Notice that, under ‘usual conditions’ (i.e. ), the previous simulations (see the top row of Fig. 4) showed that the cutoff is a decent alternative to , although failing at reducing the regret when (see Fig. 6(c)). Large number of resignations can occur when the environment changes abruptly (e.g. company’s future, changes in the job market, etc.), or when the time-interval between two subsequent rounds is very long and more employees may happen to resign. Another observation on this scenario is that the CRM seems to struggle to make the regret converge towards zero. This effect is a consequence of being forced to select the last candidate(s) in order to assign all vacant jobs. This is a known deficiency of most SSP settings and additional efforts should be made to find out ways to reduce these ‘failures’ with a more adaptive strategy.
6 Technical proofs
For convenience, we state again lemmas and propositions in this self-contained technical section. Recall that SSP’s regret that takes into account resignations is formally defined as follows:
where is the number of new hires up to step , is the preselection set without the resigned individuals s.t. , , and finally, is the expected minimal cost for the selection process, i.e. the average sum of the -best ranks when no employee has resigned from round to round . As a matter of fact, some of the -best candidates that were also part of the preselection might have resigned, therefore, the optimal (minimal) cost is .
True rank-based relative quality of preselection: For SSP, that is the average rank of the individuals of preselection compared to the candidates, normalized by the maximum rank:
where is the function that ranks jointly preselection and candidates. Note that for a highly-skilled preselection.
For providing as accurate as possible analytical results, in what follows we take the estimate of the relative preselection at SSP to be precisely equal to its true value: i.e. . For clarity, we focus on a single SSP round and drop hereafter the subscript .
The variable stipulates if the -th interviewed candidate, , is accepted or not:
Less formally, is hired if its score is better than the current quality threshold and there are still unassigned jobs (i.e. less than hires have been decided so far, or due to resignations) or if no competitive candidate has been found throughout the selection and there are still vacant positions due to resignations. It therefore includes the failures whereas does not, .
Quality threshold: Score value to beat at step of the SSP for candidate to get hired, when the CRM policy is applied with cutoff value :
where with being the number of resigned preselected (change of availability since the end of the previous round) and being the number of rejected candidates that have been added in the reference set. We define the threshold also in terms of ranks, , which is similar to except that scores are replaced by the corresponding ranks using the function that ranks jointly the preselection and the candidates. Finally we define its expectation as:
The expectation of the rank-based threshold at the beginning of an SSP, , is a function of the relative quality of the preselection :
If we denote as the -th realization of the variable , its empirical average is:
Each realization is independent. Thanks to the law of large numbers we get: . The series is increasing and . is randomly sampled from s.t. , thus it is uniformly distributed in , …, . Hence:
We have and (note that the variable does not exist), therefore:
|For instance, if , then:|
Similarly we get:
|Recursively we have:|
Then, for the expectation:
Let be the number of resignations from a given round to the subsequent one. The expectation of the rank of the -th non-resigned individual in the preselection, , is given by:
We assume that any resignation from preselection at the beginning of the round is equiprobable. The method is similar to that of Eq. 8, we therefore seek for the expectation of the rank of the highest-ranked preselected . Next, this is computed recursively by considering the probability to take any of the possible values it can take:
Then each expected rank is computed using: , . ∎
From now on we exclusively study the case where the true relative quality of the preselection is equal to 1/2, i.e. and are drawn from the same distribution. Indeed, the analytical computation of the main variables of the problem is particularly challenging when . Nevertheless, we conduct simulations for every value of preselection quality .
Consider an SSP round with candidate set of size , and a preselection of size with true relative quality s.t. follows Eq. 8. The probability for a candidate to have a relative rank smaller than the threshold is given by:
The second equality comes from the assertion that , . The random variable