## 1 Introduction

In the classic secretary problem, there is a single position to be filled, and candidates arrive one by one in uniformly random order. Upon arrival of any candidate, they have to be rejected or accepted immediately and irrevocably only based on *ordinal*

information on the candidates seen so far, that is, their relative ranks. The goal is to maximize the probability that the best candidate is selected. The origin of this problem is unclear; for a discussion, we refer to Ferguson’s survey

[16]. It is well known [26, 13] since the 1960s that a probability of can be achieved by selecting the first candidate that is better than the first candidates and that this is the best-possible probability under the typical assumption . Many extensions of this problem have since been considered, especially in recent years, partially due to relations to beyond-the-worst-case analyses of online algorithms (e.g., [20, 1, 18]) and to mechanism design (e.g., [24, 4]).There is extensive work on multiple-choice variants of the secretary problem. Few of these works consider an ordinal setting [8, 19, 29]; the majority considers the *value* setting in which each arriving candidate (or item) is revealed along with a value and must be rejected or accepted immediately and irrevocably so that the set of accepted items obeys some combinatorial constraint. The goal is to obtain an algorithm with a (strong) competitive ratio , i.e., that constructs a solution such that , the sum of values of accepted items, is in expectation at least where is the best solution that could have been constructed.

Whereas the results for the standard secretary problem carry over to the value setting, even relatively simple variants are not completely understood in that setting. This is arguably due to the sheer amount of conceivable strategies. For instance, the precise competitive ratio achievable in the -secretary problem, the variant in which two positions are to be filled, is *not* known—only that it is strictly larger than in the much-better-understood ordinal “counterpart”, sometimes called the -secretary problem [8, 9].

The secretary variant that has probably received most attention is the matroid secretary problem [5], an extension of the -secretary problem [24] (in which positions are to be filled) to any matroid constraint, see, e.g., the state-of-the-art result [25, 15] and the survey by Dinitz [12]. An orthogonal and also well-known extension of -secretary is the *knapsack* secretary problem in which items additionally have sizes and the total size of accepted items must not exceed some given capacity [4, 23, 2, 28, 21]. While this line of work has improved the competitive ratio from to , no impossibility beyond has been found. For some secretary versions, e.g., the bipartite-matching variant [22], it is known that this ratio can in fact be matched.

Our paper may raise hope that the ratio of can in fact be matched for knapsack secretary. First, we consider the --knapsack problem. Here, items have sizes either or and the capacity is . We develop a -competitive algorithm. To us, this result is both surprising and significant because the problem generalizes both the classic secretary problem, which severely restricts the set of candidate algorithms, and the not-entirely-understood -secretary problem.
We also consider the problem with sizes either or and large, for which we show initial results, namely that is precisely the competitive ratio that can be achieved by *ordinal* algorithms. These are algorithms that only use the relative rank of the items and disregard the actual values.

### 1.1 Related Work

Kleinberg [24] first considers -secretary as introduced above, gives an algorithm with competitive ratio , and shows that this ratio is asymptotically best possible. This result is reproduced by Kesselheim et al. [23] in the more general context of packing LPs. Buchbinder et al. [8] consider the -secretary problem in the ordinal setting in which items can be selected and the goal is to maximize the expected ratio of elements selected from the top items. They also state the algorithm-design problems as linear programs, which they can only solve for small values of and , but Chan et al. [9] can solve them for larger values. Any guarantee for the -secretary problem carries over to the -secretary problem, but Chan et al. [9] rule out the other direction. More specifically, Chan et al.’s results include an optimal algorithm for -secretary with guarantee approximately and a (not necessarily optimal) algorithm for -secretary with guarantee approximately . Albers and Ladewig [3] revisit the problem and give simple algorithms with improved (albeit non-optimal) competitive ratios for many fixed values of .

The knapsack secretary problem is introduced by Babaioff et al. [4] who give a -competitive algorithm, which was subsequently improved by Kesselheim et al. [23] to and by Albers, Khan, and Ladewig [2] to . Essentially all known -competitive algorithms for the knapsack secretary problem are somewhat wasteful in the competitive ratio, presumably at least partially for the sake of a simpler analysis, in that they randomize between different algorithms that are tailored to respective item sizes. It seems that qualitative progress can only be made by a more fine-grained analysis avoiding such case distinctions.

A variant of the knapsack secretary problem that has recently been considered is the fractional variant in which an item can also be packed fractionally, avoiding situations in which an arriving item cannot be selected at all, even when there is space. The currently best known achievable competitive ratio is [17], also achieved by a blended approach.

It is not difficult to see that no constant competitive ratio can be achieved when the items do not arrive in random but in adversarial order, even in the unit-value case [27]. Starting from this problem, problems in which other assumptions than the order are relaxed are considered as well. For instance, Zhou et al. [30] consider the version in which each item has a small size; Böckenhauer et al. [6] and Boyar et al. [7] introduce advice and untrusted predictions, respectively, to the problem.

Lower bounds for secretary problems in the value setting are rare. For some related problems [10, 11, 14], the rich class of strategies can be handled by, for any strategy, identifying an infinite set of values (using Ramsey theory) on which it is much better behaved. It is, however, not clear how such an approach could be applied, e.g., for knapsack secretary since it seems one would need to control how the values in the support are spread out, a property that is irrelevant in the other settings.

### 1.2 Our Contribution

The special case --knapsack is not only arguably the simplest special case that exhibits features of the knapsack problem distinguishing it from the matroid secretary problem. Since the problem generalizes both the standard secretary problem and -secretary, we believe that settling it in terms of the achievable competitive ratio is also interesting per se.

A good starting point for tackling --knapsack seems to be the extended secretary algorithm, which is -competitive in the slightly more general case when all items have size larger than [2]. This algorithm simply ignores the item sizes, samples some prefix of length for some optimized constant , and afterwards selects all items that surpass the largest value from the sampling phase and that can still be feasibly packed. It is, however, easy to see that this approach cannot achieve : Achieving in an instance where the optimal solution consists of a large item requires setting . The resulting algorithm will, however, not be -competitive in an instance where the optimal solution consists of two small items of equal value, but there are many large items, each slightly more valuable than the individual small items, making sure that the small items are (almost) never selected by the algorithm. In this case, the competitive ratio of the algorithm will be essentially half the probability that the algorithm selects a (large) item, that is, . We denote two instances of the above forms by and , respectively, in the following. Clearly, it is possible to choose so as to balance between and . As a small side result, we show that a ratio of approximately can be achieved that way.

The key observation leading to our -competitive algorithm is that keeping and internally multiplying (*boosting*) values of small items with a suitable constant factor prior to running the extended secretary algorithm may handle both and : While this is clear for when the ranking of values does not change through boosting, a small item may overtake the most valuable (large) item. This however means that this small item has relatively large (actual) value. Using that the algorithm also accepts the second-best item with a significant probability (), we can show that, with the right choice of , we still extract enough value from the small and large items to cover . In , the small items would overtake the large items, significantly improving the expected value achieved by the algorithm; conversely, if they did not overtake, they would not have been harmfully valuable in the first place—again with the right choice of . To sum up, “ type” instances impose an upper bound on , and “ type” instances impose a lower bound on . We show that the algorithm is -competitive if *and only if* where the upper bound comes essentially from the above consideration for . Note that therefore, in particular, our boosting *is* different from ordering the items by their “bang for the buck” ratios.

We note that, while -boosting seems reminiscent of -filtering [9] (for ), applying -filtering to the extended secretary algorithm will not yield a -competitive algorithm. The extended secretary algorithm would be adapted by ignoring items with a value less than times the highest value seen so far. Note that indeed, a “ type” instance where all but the most valuable item have a similar small value, one would have to choose again, independently of . But such an algorithm would again only be -competitive on .

The crux of our analysis is distinguishing all possible cases beyond those covered by and in a smart way. To bound the algorithm’s value in each of these cases, we precisely characterize the probabilities with which the algorithm selects an item depending on its size and its position in the (boosted) order of values, significantly extending observations made by Albers and Ladewig [3].

Before tackling the general case and understanding potentially complicated knapsack configurations, we propose considering a clean special case called --knapsack where items have sizes either or , and is large. One may be tempted to think that this special case is difficult in that selecting a small item early on may lead to a blocked knapsack and a horribly inefficient use of capacity, e.g., because all other items are large. On the other hand, when is large, one can easily avoid such situations by sampling. We do not give a conclusive answer on whether can be matched in this case, but we give some preliminary results.

Unfortunately, a competitive ratio of for --knapsack cannot be achieved with our boosting approach. The same consideration we made for earlier (for --knapsack) to get an upper bound of on still works; in contrast, a generalization of rules out any constant boosting factor.

We then give another algorithm for -

knapsack which can be viewed as a linear interpolation between the classic secretary algorithm and the algorithm by Kleinberg

[24] for -secretary. We show that it is -competitive. This algorithm turns out to be*ordinal*, that is, its decisions only depend on the item sizes and the relative order of their values. Remarkably, we are able to show that is the best-possible guarantee such algorithms can achieve. We do so by generalizing the factor-revealing linear program due to Buchbinder et al. [8] by adding variables and constraints. Arguing that the LP indeed models our problem becomes more difficult because, in contrast to the setting of Buchbinder et al., at any time, even the size of the next item is random. We do so by showing reductions between our model and an auxiliary batched-arrival model.

## 2 Preliminaries

We use the following notation. Let be the set of items (also called elements), where each item is specified by a profit and a size . Moreover, we are given a knapsack of capacity . The goal is to find a maximum-profit packing, i.e., a subset of items such that and is maximized. Without loss of generality, we assume that all elements have distinct values and that . This way, the name of an item corresponds to the (global) rank in .

Throughout the following sections, an important subclass of the knapsack problem arises where each item has either size 1 or .

###### Definition 1 (1--knapsack).

We call the special case of the knapsack problem where all items have size or the 1--knapsack problem. Items of size are called small and items of size are called large.

Within the context of 1--knapsack, we use the following further notation. Let be the set of small items. For any small item , let denote its rank among the small items. Note that is at most the global rank of this item. Further, let denote the global rank of the small item that satisfies . When we use just the word “rank”, we refer to the global rank.

Let be an optimal offline algorithm. For any algorithm , we overload the notation and use the same symbol also for the packing returned by the algorithm. Further, we denote by the total profit of the packing returned by . We are particularly interested in *online* algorithms, i.e., algorithms that are initially only given and are presented with the items one by one. Upon arrival of any item, an online algorithm has to irrevocably decide whether it includes the item or not. A special class of algorithms we consider are *ordinal* algorithms. These algorithms only have access to the item sizes and the *relative order* of item values.

We say that an online algorithm is -competitive if for all instances, where the expectation is taken over a uniformly random arrival order (and possibly internal randomization that the algorithm uses). In general, we assume for our bounds. Note that, for a fixed number of items, we can achieve a guarantee that is arbitrarily close to the guarantee for by adding a sufficient amount of virtual dummy items.

Finally, throughout the paper, we use the notation for any .

## 3 Matching for 1-2-Knapsack

In this section, we develop an optimal algorithm for 1-2-knapsack. For this purpose, we first propose a natural algorithm for 1--knapsack, based on the size-oblivious approach from [2]. Here, items are accepted whenever their profit exceeds a certain threshold, similar to the optimal algorithm for the classic secretary problem. Therefore, we call it the extended secretary algorithm. From an initial sampling phase of length , where is a parameter of the algorithm, the best item is used as a reference element. Subsequently, any item beating the reference element is packed if it still fits. A formal description is given in Algorithm 1.

In the following, we denote Algorithm 1 by and set

Thus, is the probability that the algorithm packs item at all, while is the probability that it is packed as the first item. We first state some results on the values , which have essentially been investigated in [3]. Indeed, the following results follow from that work and some simple observations.

###### Lemma 1.

For , it holds that

###### Proof.

Let . The extended secretary algorithm packs as the first item if and only if the single-ref algorithm from [3] with and packs as the first item. Hence, the probability can be derived from [3] as follows: If , item is a dominating item in the terminology of [3] and Lemma 6 of [3] gives . In the case , item is a non-dominating item in the terminology of [3]. Here, Lemma 4 of [3] gives and Lemma 5 of [3] and gives , that is, turns out to be the probability that the dominating item is accepted as the -th item by the single-ref algorithm. Again, the claim follows from Lemma 6 of [3]. ∎

Furthermore, observe that, since increasing the profit of an item cannot decrease its probability of being selected, we have for all . Note that accepts no item if and only if the best item is in the sampling phase. Therefore, we have the following observation.

###### Observation 1.

It holds that

In the following subsection, we identify relations between the probabilities and .

### 3.1 Structural Lemma

In this subsection, we show the following lemma connecting the probabilities to the probabilities from Lemma 1. The analysis showing the -competitiveness of our algorithm is crucially based on this result. Note that we only use it for but it holds for all .

###### Lemma 2.

The probability that packs element is

if element is large, | (1) | ||||

if element is small, | (2) |

with and .

Observe that (1) follows immediately: Any large element can only be packed when the knapsack is empty, i.e., as the first element. The proof of (2) requires a bit more work.

###### Definition 2.

Let be the event that the small elements and are packed as the -th and -th items, respectively.

Note that the event that any item is packed as -th item, where , can be partitioned according to the item packed first. Therefore, for any and ,

(3) |

We have the following technical lemmata.

###### Lemma 3.

Let be any small item and . For , it holds that .

###### Proof.

The first step is to show that at least elements are accepted in total, if element is accepted first. Since element has rank among the small elements, there are small elements that are more valuable. Their position in the input sequence cannot be in the sampling phase, nor before element if it is packed first. So there are at least small elements that can be packed subsequently. Therefore, for , a small element is packed as -th item. The claim follows by partitioning the event that is packed first according to the item packed as -th item. ∎

###### Lemma 4.

For any two small elements and any , we have .

###### Proof.

Consider any input sequence of and the sequence resulting from swapping the elements and . Since both elements are not part of the sample, the reference element is not changed by the swap. Therefore, no element that was previously accepted will be rejected and none that was previously rejected will be accepted. Only the order of selection changes. ∎

###### Lemma 5.

For any small items with and , it holds that .

###### Proof.

Consider any input sequence from . Since applies, element lies behind the element with rank in the sequence. If both are selected (see in Figure 1), this also applies after they have been swapped (see Lemma 4 and in Figure 1). If previously only element of the two is packed, only element (of the two) is selected after their swapping ( and in Figure 1), since in this case, nothing changes in the reference element either. Therefore applies.

Now consider any input sequence from . We show that the element lies behind the element in the sequence since an element is packed as -th item, where . Assuming this did not apply and is in the sample, then there would be at most small elements that can be packed.

In the case that it occurs in the sequence after the sampling phase, but before element , there must be a more valuable element in the sample (because was not packed) and therefore there are again at most small elements that can be selected. In particular, in both cases, no element is packed as -th item for . This is a contradiction to the fact that we consider an input sequence in . Now, using the same argumentation as in the first case, it follows that , which completes the proof. ∎

###### Proof of Lemma 2.

The following corollary is an immediate consequence of Lemma 2 for .

###### Corollary 1.

For , the probability that packs element is

where, if the second most valuable small item does not exist, we set .

### 3.2 First approach: Without Boosting

In this subsection, we study Algorithm 1 (as is) for 1-2-knapsack. Unfortunately, there are two instances such that it is impossible to choose the parameter so that Algorithm 1 is -competitive on both instances.

###### Lemma 6.

For 1-2-knapsack, the competitive ratio of is at most , assuming .

###### Proof.

Let be a constant. We define two instances and . In the first instance , all items are large and only one item has substantial profit. Formally, let , for , and for all . Then, for instance ,

(4) |

In the second instance , most items are large and essentially of the same profit. However, the optimal packing contains two small items that appear at ranks and . Formally, set for , , and for all . As item never beats any reference item, we have . Hence, the algorithm selects only items from with positive probability, and always at most one item. For instance , we get

(5) |

As a small side result, we show that this bound is almost tight. The techniques are similar to those used for our main result and presented in the full version of the paper.

###### Proposition 1.

For --knapsack, the competitive ratio of is , setting and assuming .

### 3.3 Optimal algorithm through -Boosting

The proof of Lemma 6 reveals the bottleneck of Algorithm 1: If the optimal solution consists of two elements having a high rank, the probability of selecting those items is small. This problem can be resolved by the concept of -boosting.

###### Definition 3 (-boosting).

Let be the boosting factor. For any item , we define its boosted profit to be

In the following, we investigate Algorithm 1 enhanced by the concept of -boosting, denoted by . This algorithm works exactly as given in the description of Algorithm 1, but works with the boosted profit instead of the actual profit for any item . Note that the unboosted algorithm analyzed in Proposition 1 is . For the remainder of this subsection, we fix . In particular, this implies and according to Lemma 1.

So far, we did not specify the boosting factor . However, the following intuitive reasoning already shows that should be bounded from above and below: If is too large, we risk that packs small items with high probability, even when they are not part of the optimal packing. On the other hand, by the result of Proposition 1 we know that cannot achieve an optimal competitive ratio. The following theorem provides lower and upper bounds on such that is -competitive.

###### Theorem 1.

For --knapsack, algorithm is -competitive if and only if and , assuming .

###### Proof.

For any item , let denote the global rank of after boosting. On a high level, we need to consider two cases.

In the first case, the optimal packing contains a single item . If , we immediately obtain . Now, suppose . Let and be the items such that and , respectively. Hence,

We note that is small, as otherwise . Moreover, for , item is large: If was small, it would follow that and therefore , contradicting the assumption that the optimal packing contains a single item. Therefore, is small and is large, implying and . Hence,

(6) | ||||

where the latter inequality holds for . Note that, when , , and for all other items , Inequality (6) becomes satisfied with equality as . Therefore, is not -competitive when .

In the remainder of the proof, we consider the case where the optimal packing contains two small items and , where we assume without loss of generality. We set and , where . Now, let and denote the items appearing before and between and , respectively, in the ordered sequence of boosted profits:

We observe that neither items nor items can be small: Otherwise, the profit of such an item would be strictly larger than , and as any two small items fit together, this item should be in the optimal packing instead of . Therefore, we have for all and for all .

Now, we can bound the expected profit of as follows:

(7) | ||||

where we use Corollary 1 for the first equality.

If we immediately get . Therefore, we assume in the following. By Chebyshev’s sum inequality, it holds that . Therefore, the competitive ratio is

(8) |

If , it follows that and therefore Equation (8) resolves to

which holds independently of . For , is -competitive by Equation (8) if

It remains to show for all and with . For this purpose, we first show

(9) |

Since is decreasing in , the inequality in Equation (9) follows immediately if we can show for large-enough . This inequality is easily verified for , as , for large-enough . For , note that , again for large-enough , which is equivalent to . Using Observation 1, we obtain . Combining both inequalities yields Equation (9).

By computing the last term in Equation (9) for , we obtain the upper bounds on given in Table 1, up to additive terms. Note that the maximum value is 1.400382. For , we obtain from Equation (9) together with for all that

3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |

1.3475 | 1.3962 | 1.400382 | 1.3988 | 1.3968 | 1.3952 | 1.3941 | 1.3934 |

For the lower bound of approximately on , first note that for and , it holds indeed that

Next, note that setting ,,,, and all equal to and for all other items makes Inequality (7) as well as Inequaltiy (8) tight as . Therefore, the above arguments imply that if and only if is -competitive. This completes the proof. ∎

## 4 Ordinal Algorithms for 1--Knapsack

In this section, we consider ordinal algorithms for 1--knapsack with large. Recall that ordinal algorithms have access to both item sizes and the relative order on item values (of previously arrived items) but not to the actual item values. We show the following theorem.

###### Theorem 2.

There is an ordinal -competitive algorithm for the --knapsack problem, and every ordinal algorithm has a competitive ratio of at most for this problem.

We first discuss the lower bound, i.e., the algorithm. Note that, while the input is any combination of large and small items, the optimal solution still consists of either the single most valuable item or of a set of up to small items . Our algorithm can be viewed as a linear combination of (near-)optimal algorithms and against the respective cases. In particular, is the -competitive algorithm [16] for the standard secretary problem and run with probability ; is the -competitive algorithm for -secretary by Kleinberg [24] and run with probability . The competitive ratio follows by a simple case distinction. A small subtlety that we need to take care of is that these subroutines require the number of items as input. To deal with this problem, we introduce dummy items. In the following, we make this idea formal.

###### Proof (Algorithm).

The algorithm treats all items as if they were large and then applies the standard secretary algorithm [26, 13]. For the algorithm , whenever a large item arrives, we pretend that a small dummy item with value arrives. These dummy items can be accepted and take up space in the capacity constraint, but they do not contribute to the solution value. On this adapted instance, we apply an optimal algorithm for the multiple-choice secretary problem, e.g. Kleinberg [24] or Kesselheim et al. [23]. Clearly, for both algorithms, any solution for the respective adapted instance can be translated back to a solution with equal value for the original instance. Also, both of these algorithms are ordinal.

For every input instance, our algorithm chooses with probability and otherwise. To analyze the competitive ratio, distinguish two cases. If , we use that the algorithm chooses with probability and conditioned on that achieves an expected value of [26, 13], yielding an unconditional expected value of . Otherwise, i.e., if , we use that is run with probability which achieves, as , an expected value of , resulting in an unconditional expected value of . ∎

We now discuss the upper bound, i.e., the impossibility. In our construction, there are large and small items. All items have different values, and each large item is more valuable than each small item. The adversary chooses between two ways of setting the values: The first option is to make the solution consisting of *all* small items much more valuable than any single large item; the second option is to make a single large item much more valuable than any other solution.

Ideally, we would like to analyze algorithms in the following setting: In each of rounds, the algorithm is presented with both a uniformly random small and a uniformly random large item out of the items not presented thus far. Upon presentation of any such two items, the algorithm has to choose whether to select all small items from now on or to select the current large item. While the actual setting, in which all items arrive in uniformly random order, is clearly different, we show below that working with the other setting is only with a -factor loss in the impossibility by reductions between our problem and an auxiliary batched-arrival model.

Assuming the latter setting, we can write a linear program similar to that of Buchbinder et al. [8]. Like in that approach, each LP solution corresponds to an algorithm and vice versa. More specifically, our LP uses two variables (rather than one) for every time step, corresponding to the probabilities that the algorithm accepts a large item or the first small item, respectively. In addition, there is a variable representing the competitive ratio, and there are two upper bounds (rather than one) on that variable, representing the two instances the adversary can choose. A feasible dual solution then yields the desired impossibility. We formalize these ideas in the following.

###### Proof (Impossibility).

Consider the following two instances that are treated identically by ordinal algorithms. There are items where items are large and items are small. In one instance, the item values are for and for . In the other instance, the values are the same except for . So, for both instances, the rank of item is indeed , for all . The two optimal solutions are and . The adversary decides which of the two instances is the actual instance.

We consider the following batched-arrival setting parameterized with some constant and assume that divides . The items still arrive in uniformly random order, but the algorithm does not always have to make a decision upon the arrival of an item. More specifically, for any , upon the arrival of the -th item, the algorithm may make a decision about all items that have arrived in the current batch, i.e., after the -th item. Clearly, any upper bound on the competitive ratio achievable in this setting, is also an upper bound on the competitive ratio achievable in the original setting.

Note that the expected number of items of each type, i.e., small and large, in each batch is . Let be some constant. As follows from a standard concentration (e.g., Chernoff) bound, when , the probability that the number of items from each type is between and approaches . From the union bound over all batches it then follows that also the probability that the number of items of each type *in each batch* is within the given range approaches . We may therefore assume that this is indeed the case at an arbitrarily small loss in our impossibility.

To analyze the algorithm in the batched-arrival setting, we write a linear program similar to that of Buchbinder et al. [8]

. The LP encodes a probability distribution for the decisions that an algorithm

makes against the pair of instances. The variable represents the probability that the algorithm selects the best large item from the -th batch. Similarly, the variable represents the probability that the algorithm selects all small items from both the -th batch and forthcoming batches.Note that the algorithm may make any such decision, i.e., selecting the best largest item or starting to select small items from a batch, for at most a single batch. Hence, we obtain as a constraint for our LP for all . Further, observe that we may assume that the algorithm only selects a large item when the best largest item so far is in the current batch. In batch , the probability for this to happen is at most . As , we obtain

for all , another constraint of the LP.