The secretary problem also known as the marriage problem relies on a choice of the best candidate in such a way that only the relation to the previously interviewed candidates is known and the rejected candidates are definitively lost. The number of candidates is also known before the interview starts. Then after the interview we have to decide whether to accept the candidate or not? Our goal is to choose the best candidate, i.e. we have to decide when the process of recruitment should be stopped. In a more general situation we want to choose not only one, the best candidate, but we want to choose the best members who form a team.
In the simplest case we do not have any limitation given to recruitment process or the relationships inside the team. In this paper we focus our attention on the limitation of a recruitment process. namely we can choose only such candidates who are not dependent on the candidates rejected in the current interview.
Such an idea of a recruitment process requires a precise explanation of the meaning of the sentence “independent of previously rejected candidates”. As the next step we have to determine the stopping rule to obtain the optimal stopping time. The main aim of this paper is to formulate a sufficiently general but practicably useful structure of dependence.
Finding the optimal solution of the problem described above in the general case seems impossible in general cases. Therefore we study optimal algorithms for finding the best solution only in the some particular, but apparently useful cases.
The paper is organised as follows. In Section 2 the classical secretary problem is introduced. Next, the variant of this problem with the the necessary independence between rejected candidates and accepted ones is presented. Section 3 introduces the most known independence structures: matroids and their generalisation – greedoids. At the end of that section, the problem in the general greedoid case, is introduced. In Section 4 some particular, selected models are introduced. In the simpler models, the solutions are given. In the more complicated models only some connections between known results (for example from random graph theory) and problems of optimal stopping in such models are discussed.
2 Secretary problem
2.1 Classical secretary problem
In the classical secretary problem there are linearly ordered elements . They are being observed at a random order
. At the momentthe observer knows only the relative ranks of the elements examined so far. Once rejected, an element cannot be recalled.
The aim of the observer is to choose the currently examined object in such a way that the probabilitywill be maximal.
This problem is well known and solved. Dynkin in 1963 shows that for large , it is approximately optimal to wait until a fraction of the elements appears and then to select the next relatively best one. The probability of success is also . More strictly, we can present this result as follows. Let denote the rank of and .
Let us assume that an algorithm of choices has the following form.
Reject all elements for subsequent for some .
If then we accept if or reject it in the opposite case. The rejection is irrevocable.
The process is stopped if the element is accepted or .
If with then the is maximal and is equal to .
The easy proof of Theorem 1 is a good pattern for considerations which will be used in more general models given in the next parts of this article. Therefore, this proof is presented in a more detailed way than it is required in this particular case.
(see Ferguson (1989)) Assume that the first elements are rejected and element has the highest rank among these elements. Next, select the first subsequent element that is better than element . For an arbitrary , the probability that the element with the highest rank is selected is
Therefore the best choice is with probability:
The the maximum is achieved for . ∎∎
See Ferguson (1989) for a brief historical review of this classical secretary problem.
An important generalisation of this problem is known as the multiple choice secretary problem (see Hajiaghayi et al. (2004), Kleinberg (2005), Girdhar and Dudek (2009)). The objective of this problem is to select a group of at most secretaries from a pool of applicants having a combined value as large as possible.
2.2 Secretary problem and independence
Our generalisation leaves a linear order but assumes an additional combinatorial structure in the set of elements . Using the language of the optimal choice of the candidate to a position (secretary problem), our problem can be described as follows.
The subsequent candidates arrive. We can reject the candidate and then we consider a new candidate. The rejected candidate is irretrievably lost. Every new candidate is compared to the previously rejected candidates. If a new candidate is dependent on the previously rejected ones, such a candidate is also rejected. If the candidate is not dependent, then as a result of the comparison we can reject or accept him/her.
The main aim of the article is the research of stopping criteria if the random variables are indexed by elements of a finite structure and the permissible choice is limited by such a structure. Assume tentatively that an elementis independent on the set if it does not belong to the closure of . The name ‘closure’ needs defining which will be done in the next sections. Our basic assumptions are:
in the structure, a closure operator and a family of closed sets are specified,
if a new element belongs to the closure of previously rejected elements, then it also has to be rejected,
if it does not belong to the closure, the new element can be accepted.
Let us consider a simple, but illustrative example. The structure in this example is known as “linear structure” which is a special case of “strictly hierarchical structure” (see Klimesch (1994), p. 46). At first we have to formulate the following simple combinatorial result.
Denote and let be fixed. The number of permutation such that
For the number of permutations fulfilling (1) is equal
For all we obtain
which completes the proof.∎∎
Let . We will make the following assumptions: every secretary has two features – qualification (weight) and position in the hierarchical organisation (i.e. rank) . Let all weights and ranks will be different. Let
If is the set of candidates rejected so far, then if for a new , has to be rejected even if . In other words, having rejected the boss we must not employ the subordinate. More formally, the element is independent of the if . Note however, that at the moment we do not know the values and but we can only verify if the inequalities and are fulfilled.
In this example we consider two completely different cases. The first ideal case:
Then we can assume that . This case coincides with the classical secretary problem.
The second is the most haphazard case: weight and rank are independent random variables111Any similarity to actual events is purely coincidental.. Then we can assume that but where is random permutation of .
In this case let us try to pick the best candidates in the same way as in the classic problem. First we examine and reject a fraction of candidates (say ) and at the next steps we pick the first candidate with the rank and weight higher of the candidates rejected so far, i.e. and .
Let us denote the most valuable candidate by and the second most valuable by . If , and moreover , the selected candidate is the best. Therefore the probability that the randomly chosen permutation fulfils (1) for given and for some fixed is equal
where is an Euler constant.
Continuing this example for the second, haphazard case, let us assume that . In such the case, let and . The candidate is elective if for all . From
and from Equation (3) we obtain
It seems that a better way is to take as the value other than . Then we have
and instead of (5) we obtain
Note that in the ludicrous situation222See footnote 1
for any pair , the optimal strategy is to choose the first candidate. Every next candidate will be either worse or dependent. This situation leads of course, with high probability , to the lack of choice, so it can be neglected.
As the third case in this example we can consider such a situation that the correlation between ranks and weights is positive (usually essentially greater than zero), but smaller than one. Such a case needs more precise assumptions and probabilistic considerations hence it will be omitted in this paper.
3 Matroids and greedoids
As it was mentioned previously we need a precise definition of the words ‘closure’ of and ‘independent’ element from the set . The useful tool to give such the definitions are structures known as matroids and more generally – greedoids. In the next two sections we provide the necessary definitions and results from the matroid and greedoid theory.
Let be a finite set. A family of subsets of is the family of independent sets if the following conditions hold:
if , then ,
if , , then there exists , such that .
A basis is every maximal independent set. All bases have the same number of elements. A rank of any set is the number of elements of maximal independent set . A closure of a set is the maximal set with the same rank as . The set is closed if . The operator for matroids fulfils the following properties:
if then ,
Using the definition of matroid, we can interpret “an independence” of element of the set in such a way that . Comparing this interpretation with the example in Section 2.2, we can remark that such a meaning of independence is not fortunate because the closure has the exchange property:
if , then .
A structure is a matroid if and only if fulfils the conditions and and the condition . Note that follows from and and the condition but – does not give . Therefore the set of conditions – is not a characterisation of a matroid.
3.2.1 Basic definitions and properties
The hierarchical structure of dependence in Example 1 does not fulfil the condition . Therefore we have to use more a general structure than matroids.
if , , then there exists , such that .
Note that the conditions for greedoids are the conditions for matroids with the exception of . The family is called feasible. The family is called accessible if the following condition holds:
if then there exist such that .
The pair where is accessible is called an accessible system. Every greedoid is an accessible system. Matroids are also greedoids with independent sets as feasible sets. Clearly, the property is weaker than the property – does not every subset of an independent set is independent, but at least one subset of a feasible set is also feasible.
A basis is every maximal feasible set. All bases have the same number of elements. A rank of any set is the number of elements of maximal feasible set . A closure of a set is the maximal set with the same rank as , i.e. (see Korte and Lovás (1983) or Korte et al. (1991))
The closure defined by (8) fulfils the conditions and but not necessarily the condition , i.e. closure operator is not necessarily monotone (see Korte et al. (1991), Example on p. 69, fig. 6). However one can define the monotone closure operator :
It is easy to see that the monotone closure satisfies all conditions – , but greedoids is not uniquely determined by its monotone closure operator (see Korte et al. (1991), p. 63).
If a greedoid fulfils the antiexchange property
if , , then
then we call such a greedoid an antimatroid.
Theorem 2 (Korte and Vygen (2012), Th. 14.4).
If is an antimatroid then
is a closure operator, i.e. it satisfies conditions – .
The structure of the Example 1 is an antimatroid if we take as closed sets all the sets of the form , where for and . The feasible sets have the form for .
Let be an antimatroid. Suppose that the sequence is such that
for every pair . Then the sequence is linearly ordered.
In the next parts of this section we give some examples of greedoids. The exhaustive review of examples of greedoids can be found in Goecke et al. (1989). In our article we give only some simplified examples, useful for our aim.
Let be a tree with the root and the set of vertices . The distance from the root to other is a height of then . The height of the tree is the maximum height of the leaf.
Let be the family of all vertex sets such that if is a subtree of and . Let
Then is a greedoid of feasible sets and defined by (12) is the closure operator, which fulfils the property . Therefore is an antimatroid. Such an antimatroid can be considered as an example of a hierarchical organisation. Note that the hierarchical structure of dependence in Example 1 is the trivial example of a tree (with the element as a root), and it is a very simple example of antimatroid.
Every closed set in the given greedoid is a sum of disjoint maximal subtrees , , with the set of their roots where has the highest height in .
The set is the unique spanning set of the set , i.e. is the unique such that .
In Fig. 1, for example the sets of vertices and belong to (they are subtrees rooted in ) but the sets , and do not belong to (they are not subtrees or they are subtrees do not rooted in ). The set
is closed and with the minimal spanning set .
3.2.3 Acyclic digraphs
Let be a rooted directed acyclic digraph with the root and the set of arcs . A rooted subgraph of is connected (directionally connected) if for its every vertex there exist a path from to . Let be the family of all sets of arcs of connected subgraphs rooted at .
Then is a greedoid of feasible sets and defined by (13) is the closure operator (see Korte et al. (1991), p. 26). Such a greedoid can be considered as an example of a hierarchical organisation with multiple dependencies.
Note that if every vertex has indegree , the the linegraph of is a tree. Therefore in such a case, a greedoid is isomorphic to a greedoid presented in 3.2.2.
In Fig. 2 for example the set of arcs , and belong to (they are connected and rooted in ) but does not belong to (it is not rooted in ) and does not belong to either (it is not connected, so it is not rooted in ).
3.3 Secretary problem in greedoids
Now we formulate the problem in the most general way, for any greedoid. Let , , be a greedoid with closure operator . On the set a weight function is defined. We want to choose the element with the greatest weight under the following conditions.
The structure and the function , for all , is defined but it is not known.
The elements of arrive sequentially at the moments .
At the moment we know which element arrives (say the element ) and we can observe its weight and the closure of restricted to , i.e. .
For any two subsets the possible inclusion are known.
Let be the set of elements which arrived before the moment . If then is rejected irrevocably.
If then we can accept if
or reject it in the opposite case. The rejection is irrevocable.
The process is stopped if the element is accepted or if there are no next elements to observe.
The proposed algorithm is similar to the algorithm known as Secretary Problem.
At each step the observer knows the weight of the chosen element and performs the actions below:
Fix the closed family of test sets or least .
Reject all elements for subsequent while () for some .
For the next reject it if or .
If and accept and stop the process.
We take as the criterion the subspaces of the appropriately chosen rank, say rank . Therefore we unconditionally reject the elements until . For the next we reject the element if or . If and we accept and stop the process. To solve this problem we need to determine the distribution of the random variable .
The presented model in the matroid case is different from the known so far Matroid Secretary Problem introduced in Babaioff et al. (2008). Their model is a generalisation of the multiple choice secretary problem by an additional condition that the chosen set has to be independent. In such a model the accepted elements do not have to be independent of the previously rejected elements. The paper Soto (2013) gives an exhaustive review of known results and presents some new ones.
4 Special cases
4.1 Uniform matroid
Uniform matroid , where independent sets are all subsets of with the number of element no greater than :
Obviously there must be . Assume . Then the best choice is with probability:
4.2 Binary trees
A binary tree with vertices is an empty tree if or a triple where is the root of the tree, (left subtree) is a binary tree with vertices and (right subtree) is a binary tree with vertices, where . For nonempty , the root of is called a left child of and the root of is called a right child of . If , then is a leaf.
A complete binary tree is a binary tree in which all nodes other than the leaves have two children. If moreover all leaves have the same height, the binary tree is complete and full.
The number of leaves in a complete and full binary tree with vertices is . Thus is the number of vertices of such a tree. The sequence is linear if .
Similarly to Example 1 we will consider two different cases. First, let us consider the case . Therefore the root has the maximal weight and leaves have the minimal weights .
In the second case we assume that
the set of weights has exactly values,
exactly vertices have the value ,
values are equally likely distributed on all vertices.
not linear or it is linear and .
The element is accepted if both of the above conditions are fulfilled.
4.3 Graphical matroids
4.3.1 Graphical model of secretary problem
Let be an undirected graph where is the set of vertices and is the set of edges. An independent set is any set of edges which does not contain any cycles, i.e. the independent set forms a forest. In this section only the case , where is an -vertices complete graph, is considered.
The random graph introduced by Erdős and Rényi (1960) is constructed by connecting nodes randomly. Since that time many monographs and textbooks have been devoted to the theory of random graphs. Among others we refer the reader to the following books: Bollobás (2001), Janson et al. (2000) and van der Hofstad (2016).
In this paper we will consider the so called “random graph process” (see Janson et al. (2000), p. 4). Let , a number of vertices be fixed. Let be any fixed graph with vertices and edges. The random graph process is a stochastic process which begins with no edges at time and adds new edges, one at time; each new edge is selected at random, uniformly among all edges not presented until now. At the moment , , the random graph has edges and
Let us consider the asymptotic case where and . To simplify the notation, use the abbreviation a.a.s (asymptotic almost surely) instead of the term “with the probability tending to 1 when ”. If and but then a.a.s, has no cycles, i.e. the set of edges forms an independent set (see Janson et al. (2000), p. 104). This means that the beginning of such the process is similar to the beginning of the process without dependence restrictions. Nevertheless the number of tested elements is too small to obtain a reasonable decision.
In order to change to the proper range of numbers of edges which give a sufficient information to obtain an optimal decision, we have to consider such a case, where the number of tested edges is big enough and furthermore the number of edges which are not dependent tested as well as the number of rejected edges are also big enough. Such a situation is given by the following fundamental result, proved by Erdős and Rényi in their famous paper Erdős and Rényi (1960) (see Janson et al. (2000) and van der Hofstad (2016)).
where , then the random graph a.a.s. has one giant component and isolated vertices.
The random variable has Poisson distribution with the mean
has Poisson distribution with the mean.
For the big we can obtain a better balance between a number of tested elements (given by Eq. (14)) and a number of edges possible to choose, i.e. edges which do not belong to the giant component. From Theorem 4, the giant component has a.a.s. elements (edges) and the rank . Because every new edge a.a.s. joins an isolated vertex with the giant component then we can choose an optimal elements set from elements.
From rule we have approximately . To obtain we should a.a.s. test at least
elements plus perhaps an additional next elements.
In Table 2 there are shown the values of the necessary number of testing steps to achieve the set of rank given before. Let .
Note, that after rejecting approximately next edges after the moment , , the process will be finished. If all values are different for , then it is clear that the probability of choosing the optimal solution (the edge of maximal weight or the set of edges with maximal sum of weights) rapidly tends to zero.
4.3.2 Linearly decreasing number of linearly ordered weights
Let us assume that there exist only values of weights of edges in the -vertices graph. In this section we restrict ourselves to the case , i.e. to the choice of only one, the best element. Without the loss of generality one can assume that for all edges of the graph . Similarly to Example 1 we consider the three completely different cases.
Let be the set of vertices and . For and let .
Let be the set of vertices and . For and let .
Every value appears approximately times and these values are distributed equally likely.
At first, let us consider Case 1. In this case we have only one best element, but the worst element. If the maximal element belongs to the giant component, then the optimal solution does not exist. In Case 2 we have the best elements, but only one worst element. If the maximal element belongs to the giant component, then the optimal solution does not exit.
5 Prospective application: cloud computing
It is obvious that the simplest model closely related with the name Secretary Problem is very far from real applications. In this section we describe the simplified, but more realistic model of cloud computing, which can used as an example of an application to the computer networks333This applications was inspired by problems arisen during the realisation of the grant Research on cloud based distribution and management technology of software and licenses for research and science units. Below we shortly describe the model.
Cloud computing is definitely one of the fastest developing technologies in IT sector. Year by year this kind of solutions become more popular. This idea has actually its implementations in many different models. Regardless of the fact which of them is used the general idea is still the same: most of the duties related to IT infrastructure maintenance is moved from the user (customer) to the service provider. In other words we can say that the same classical element (i.e. server or software running on it) becomes just a service, available for the user by the computer network. The user, who has a task to be performed, just orders the resources needed for this particular time. This solution is very comfortable for the user as more efficient resources usage guarantees also economic benefits. Since in typical cloud computing service many different users share with each other limited hardware and software resources, optimisation of their utilisation is the key problem.
Let us consider the situation, where the user has same the computing task to be performed in the shortest possible time. To do this job, a virtual machine with required hardware resources (computing cores, RAM memory etc.) must be rented. Then there is a need to deliver a significant amount of data required for computing. This operation is strictly related to the transfer time. Some parameters of the virtual machine are simple to compare (results of popular benchmarks, user estimation based on declared hardware parameters). In the real environment also some other parameters, often difficult for forecasting, should also be considered. One of them is an actually available throughput of the computer network between the client host and the computing node. While the bandwidth can be considered constant, the throughput is directly connected with the current utilisation of the network. Due to the above, time of transfer can be approximated no sooner than after sending a few TCP datagrams and receiving acknowledgements. At the moment when the transmission speed would classified as unsatisfactory, it can be interrupted and the next localisation can be considered. However, what is very important at the time of the resignation of the given service provider, the resources can be assigned to other tasks, and they are not available anymore. What is more, there can be some relationship between individual service providers. Their hardware resources can be located in the same network segments. Therefore, the rejection of one or more of the service providers in the network should also result in the elimination of other nodes located in the same network location and depended of rejected nodes.
Assuming that the systems work in a such the way that at each step they try to choose the best node, then our model (matroid and more generally – greedoid) can be applied as a model of activities in the cloud. Certainly, the accurate choice needs deeper considerations and verifications with the real networks and their management.
We presented a model of optimal choice among objects which are connected by different dependencies. Our aim is to choose an object or a set object but in a such way so that the chosen objects were independent in some sense. The independence in the model is described in the term of greedoids and as special cases – matroids, antimatroids and more special cases, for example rooted trees and random graphs. As the first step we try to apply such models to a more realistic problem, namely to the problem of operations during the cloud computing.
Acknowledgements. The author thanks Piotr Nadybski who helped to formulate the problem of operation in a cloud and wrote the most part of Section 5.
- Babaioff et al.  M. Babaioff, N. Immorlica, D. Kempe, and R. Kleinberg. Online auctions and generalized secretary problems. SIGecom Exch., 7(2):7:1–7:11, June 2008. ISSN 1551-9031. doi: 10.1145/1399589.1399596. URL http://doi.acm.org/10.1145/1399589.1399596.
- Bollobás  B. Bollobás. Random Graphs. Academic Press, London, 2001.
- Dynkin  E. B. Dynkin. The optimum choice of the instant for stopping a Markov process. Soviet Math. Dokl., 4:627–629, 1963.
- Erdős and Rényi  P. Erdős and A. Rényi. On the evolution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5:17–61, 1960.
- Ferguson  T. S. Ferguson. Who solved the secretary problem? Statistical Science, 4(3):282–289, 1989.
- Girdhar and Dudek  Y. Girdhar and G. Dudek. Optimal online data sampling or how to hire the best secretaries. In Canadian Conference on Computer and Robot Vision, pages 292–298, 2009.
- Goecke et al.  O. Goecke, B. Korte, and L. Lovás. Examples and algorithmic properties of greedoids. In B. Simeone, editor, Combinatorial optimization, volume 1403 of Lecture Notes in Mathematics, pages 113–161. Springer, 1989.
- Hajiaghayi et al.  M. T. Hajiaghayi, R. Kleinberg, and D. C. Parkes. Adaptive limited-supply online auctions. In EC’04:Proceedings of the 5th ACM Conference on Electronic Commerce, pages 71–80, New York, 2004. ACM Press.
- Janson et al.  S. Janson, T. Łuczak, and A. Ruciński. Random Graphs. Wiley, New York, 2000.
- Kleinberg  R. Kleinberg. A multiple-choice secretary algorithm with applications to online auctions. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, SODA ’05, pages 630–631, Philadelphia, PA, USA, 2005. Society for Industrial and Applied Mathematics. ISBN 0-89871-585-7. URL http://dl.acm.org/citation.cfm?id=1070432.1070519.
- Klimesch  W. Klimesch. The Structure of Long-term Memory: A Connectivity Model of Semantic Processing. Lawrence Erlbaum Associates, Inc., Publishers, 1994.
- Korte and Lovás  B. Korte and L. Lovás. Structural properties of greedoids. Combinatorica, 3–4(3):359–374, 1983.
- Korte and Vygen  B. Korte and J. Vygen. Combinatorial Optimization, volume 21 of Algorithms and Combinatorics. Springer-Verlag, Berlin Heidelberg, 5 edition, 2012.
- Korte et al.  B. Korte, L. Lovász, and R. Schrader. Greedoids, volume 4 of Algorithms and Combinatorics. Springer-Verlag, Berlin Heidelberg, 1991.
- Morayne  M. Morayne. Partial order analogue of the secretary problem: the binary tree case. Discrete Math., 184:165 181, 1998.
- Oxley  J. G. Oxley. Matroid Theory. Oxford University Press, Oxford, 2 edition, 2011.
- Soto  J. A. Soto. Matroid secretary problem in the random-assignment model. SIAM J. Comput., 42(1):178–211, 2013.
- van der Hofstad  R. van der Hofstad. Random Graphs and Complex Networks, volume I. 2016. URL https://www.win.tue.nl/~rhofstad/NotesRGCN.pdf.
- Welsh  D. J. A. Welsh. Matroid Theory. Academic Press, London, 1976.
- Wilson  R. J. Wilson. Introduction to Graph Theory. Prentice Hall, 5 edition, 2010.