1 Introduction
The problem we consider is a basic convex optimization problem in the online setting: items appear onebyone. Each item/element has a dimensional size and a value , which are both revealed to us when the item arrives.We must either accept or reject an item when it arrives, before seeing the future items. If we accept a certain subset of the items, we get their total value , but incur a production cost , where is a nondecreasing convex cost function with . Optionally, we may also be given a downwardsclosed family of subsets , and now the accepted set of elements must lie in . More formally, we want to solve
(1.1) 
This question arises, e.g., when we are selling some service that depends on commodities, where the value is the amount of money customer is willing to pay for the service, and the size vector is the amount of resources she will require. The cost function captures our operating expenses; its convexity models diseconomies of scale that arise when dealing with scarce commodities. In particular, it can capture dimensional knapsack constraints, by setting until the knapsack size, and afterwards. When the cost function is linear , we want to pick a maxweight subset from using item weights , which is tractable/approximable for being a matroid, system, etc.
Blum et al. [BGMS11] defined this problem in the adversarial model, and gave postedprice algorithms for “lowdegree” separable cost functions , that is, of the form for 1dimensional functions ’s. This result was tightened by Huang and Kim [HK15], still for separable functions with additonal growth control. More recently, Azar et al. [ABC16] studied this problem for more general supermodular nonseparable convex functions (see also [EF16]). A differentiable function is supermodular if for any vectors we have . Equivalently, if is twicedifferentiable, it is supermodular if for all , i.e., increasing the consumption of a resource cannot decrease the marginal cost for another. However, to handle the worstcase ordering, Azar et al. also require the cost functions to have essentially lowdegree.
Can we do better by going beyond the worstcase model? In this paper, we focus on the randomorder or “secretary” setting, where the set of items is fixed by an adversary but they arrive in random order. In the singledimensional case , it is easy to see that a solution that learns a “good” threshold and picks all further items with density at least essentially gives a constant approximation, much like in the secretary and knapsack secretary problems [Fre83, BIKK07]. The multidimensional case is much more challenging. This was studied by Barman et al. [BUCM12], again assuming a separable cost function . They give an competitive algorithm for the unconstrained case, and an competitive algorithm for the problem with a downward closed constraint set , where is the competitive ratio for the secretary problem. Their main idea is to perform a clever decomposition of the value of each item into “subvalues” for each of the coordinate cost functions ’s; this effectively decomposes the problem into 1dimension problems with values ’s and costs ’s. Unfortunately, since their solution explicitly relies on the decomposability of the cost function, it is unclear how to extend it to general supermodular functions. We note that when the cost function is supermodular, the profit function is a submodular set function (Section 2.1). However, the profit can take negative values, and then existing algorithms for submodular maximization break down.^{2}^{2}2For example, we can model set packing (which is hard) as follows: for a subcollection of sets, let . The function is submodular, and its maximizer is a largest set packing.
Our work is then motivated by trying to better understand the multidimensional nature of this problem, and provide a more principled algorithmic approach.
1.1 Our Results
We use techniques from convex duality to reinterpret, simplify, and improve the existing results. First, we obtain the first approximation for nonseparable supermodular cost functions. (We omit some mild regularity conditions for brevity; see Section 3 for full details.)
Theorem 1.1 (Unconstrained & Supermodular).
For the unconstrained problem with supermodular convex cost functions , we give an competitive randomized algorithm in the randomorder model.
This result generalizes the approximation of Barman et al. [BUCM12] to the nonseparable case. The factor seems unavoidable, since our problem inherits the (offline) hardness of the dimensional knapsack, assuming [DGV05].
Next, we consider the constrained case. For simplicity, we focus on the most interesting case where is a matroid constraint; more general results can be obtained from the results and techniques in Section 5.
Theorem 1.2 (Constrained & Separable).
For the constrained problem with being a matroid constraint, and the cost function being separable, we get an competitive randomized algorithm in the randomorder model.
This improves by a factor of the approximation given by [BUCM12]. Finally, we give a general reduction that takes an algorithm for separable functions and produces an algorithm for supermodular functions, both with respect to a matroid constraint. This implies:
Theorem 1.3 (Constrained & Supermodular).
For the constrained problem with being a matroid constraint, and the cost function being supermodular, we get an competitive randomized algorithm in the randomorder model.
On conceptual contributions are in bringing techniques from convex duality to obtain, in a principled way, thresholdbased algorithms for nonlinear secretary problems. Since this is a classical and heavily used algorithmic strategy for secretary problems [Fre83, BIKK07, Kle05, AWY14, MR14] we hope that the perspectives used here will find use in other contexts.
1.2 Other Related Work
There is a vast literature on secretary problems [Fre83]. Closest to our setting, Agrawal and Devanur study an online convex optimization problem in the random order model, and give a powerful result showing strong regret bounds in this setting [AD15]. They extend this result to give algorithms for online packing LPs with “large” righthand sides. However, it is unclear how to use their algorithm to obtain results in our setting. Other algorithms solving packing LPs with large righthand sides appear in [AWY14, DJSW11, MR14, KRTV14, GM16, ESF14].
Feldman and Zenklusen [FZ15] show how to transform any algorithm for (linear) matroid secretary into one for submodular matroid secretary. They give an algorithm for the latter, based on results of [Lac14, FSZ15]. All these algorithms critically assume the submodular function is nonnegative everywhere, which is not the case for us, since picking too large a set may cause the profit function to go negative. Indeed, one technical contribution is a procedure for making the profit function nonnegative while preserving submodularity (Section 4.1), which allows us to use these results as part of our solution.
1.3 Structure of the paper
Section 3 develops the convex duality perspective used in the paper for the offline version of the unconstrained case, hopefully in an manner accessible to nonexperts. Section 4 gives the small changes required to extend this to the constrained case. Section 5 shows how transform these into online algorithms. Section 6 shows how to convert an algorithm for separable functions into one for supermodular functions, both subject to matroid constraints. To improve the presentation, we make throughout convenient assumptions, which are discharged in Appendix C.
Since some familiarity with convex functions and conjugates will be useful, we give basic facts about them and some probabilistic inequalities in Appendix A.
2 Preliminaries
Problem Formulation.
Elements from a universe of size are presented in random order. Each element has value and size . We are given a convex cost function . On seeing each element we must either accept or discard it. A downwardsclosed collection of feasible sets is also given. When , we call it the unconstrained problem. The goal is to pick a subset to maximize the profit
(2.2) 
We often use vectors in to denote subsets of ; denotes the indicator vector for set . Hence, is a downideal on the Boolean lattice, and we can succinctly write our problem as
(2.3) 
where columns of are the item sizes. Let denote the optimal value. For a subset , and denote and respectively.
Definition 2.1 (Exceptional).
Item is exceptional if .
Definition 2.2 (Marginal Function).
Given , define the marginal function as , where is the standard unit vector.
2.1 Supermodular Functions
While supermodular functions defined over the Boolean lattice are widely considered, one can define supermodularity for all realvalued functions. Omitted proofs are presented in Appendix B.1
Definition 2.3 (Supermodular).
Let be a lattice. A function is supermodular if for all , where and are the componentwise minimum and maximum operations.
This corresponds to the usual definition of (discrete) supermodularity when . For proof of the lemma below and other equivalent definitions, see, e.g., [Top98].
Lemma 2.4 (Supermodularity and Gradients).
A convex function is supermodular if and only if any of the following are true.

is increasing in each coordinate, if is differentiable.

for all , if is twicedifferentiable.
Lemma 2.5 (Superadditivity).
If is differentiable, convex, and supermodular, then for such that , . In particular, if , setting gives
Corollary 2.6 (Subadditivity of profit).
The profit function is subadditive.
The next fact shows that the cost is also supermodular when seen in a discrete way.
Fact 2.7 (Continuous vs. Discrete Supermodularity).
Given a convex supermodular function and items with sizes , define the function as . Then is a (discrete) supermodular function.
3 The Offline Unconstrained Problem
We first present an offline algorithm for supermodular functions in the unconstrained case (where ). We focus on the main techniques and defer some technicalities and all computational aspects for now. Just for this section, we assume item sizes are “infinitesimal”. We make the following assumptions on the cost function and the elements.
Assumption 3.1.
We assume that cost function is nonnegative, strictly convex, closed, and differentiable. We assume , is supermodular, and that gradients of go to along every positive direction. We assume elements are in general position^{3}^{3}3There are no nontrivial linear dependencies, see Lemma C.2 for a formal definition, and that there are no exceptional items. We also assume that every individual item has profit at most for . (See Section C.2 on how to remove these assumptions on elements.)
Classifiers.
The offline algorithm will be based on
linear classifiers
, where a set of weights is used to aggregates the multidimensional size of an item into a scalar, and the algorithm picks all items that have highenough value/aggregatedsize ratio.Definition 3.2 (Classifiers and Occupancy).
Given a vector (a “classifier”), define the set of items picked by as . Let denote the multidimensional occupancy induced by choosing items in .
To understand the importance of classifierbased solutions it is instructive to consider the problem with singledimensional size. A little thought shows that an optimal solution is to pick items in decreasing order of value density . Adding these items causes the total occupancy—and hence the incurred cost—to increase, so we stop when the value density of the current item becomes smaller than the derivative of the cost function at the current utilization. That is, we find a density threshold such that , and take all these highdensity items. Thus, the optimal solution is one based on the classifier .
To see that this holds in the multidimensional case, express in terms of linearizations
(3.4) 
where is its Fenchel dual. (Note we are maximizing over positive classifiers; Lemma C.1 shows this is WLOG.) Then our unconstrained problem (2.2) becomes a minimax problem:
Consider an optimal pair ; i.e., a pair that is a saddlepoint solution, so neither nor can be improved keeping the other one fixed. This saddlepoint optimality implies:

Since , it is the right linearization of at and thus (see Claim A.3).

is such that if and if , with being the column of and the size of the item.
From part (b) we see that the optimal solution is essentially the one picked by the classifier (ignoring coordinates with the “0 marginal value” ). Moreover, the converse also holds.
Claim 3.3.
For a classifier , let be the items picked by it. If we have , then is an optimal solution.
Proof.
For any solution ,
where the second inequality holds since, by definition, maximizes . ∎
Restricting the Set of Classifiers.
The existence of such good classifiers is not enough, since we need to find them online. This is difficult not only because of degrees of freedom and no control over the magnitude of the values/sizes (to be exploited in concentration inequalities), but also because picking too few or too many items could lead to low profits.
So we restrict the set of candidate classifiers to be a monotone^{4}^{4}4A curve is monotone if for every pair , one is coordinatewise smaller than the other.1dimensional curve , satisfying additional properties given below. The main motivation is that it imposes a total ordering on the set of items picked by the classifiers: given on such a curve , the sets of items picked satisfy the inclusion . This allows us to select a “minimally good” classifier in in a robust way, avoiding classifiers that select too many items.
To design the curve so it contains a classifier with profit , we relax the condition from Claim 3.3 (too much to ask) and require the existence of satisfying:

(don’t pick too many items) .

(partial gradient equality) There is a coordinate where .

(balanced curve) (see also Claim A.6).
Property (P1) enforces half of the equality in Claim 3.3, and (P2) guarantees that equality holds for some coordinate. Now for property (P3). Since the optimality proof of Claim 3.3 does not go though, since . As we prove later, the difference between these terms can be at most (see Figure .1 for an illustration), and the superadditivity of gives us (see Claim A.7). Property (P3) is used to control this sum, by charging it to the coordinate where we know we have “the right linearization” (by property (P2)). Reinterpreting the construction of [BUCM12] in our setting, we then define as any monotone curve where every satisfies (P3).
Lemma 3.4.
The curve exists and contains a satisfying properties (P1)(P3).
Proof.
We first show existence, that is, the set contains a monotone curve. Notice that this set is the union of the box (range of slopes where we can swivel around ) and a monotone curve , where is the unique vector satisfying ; uniqueness follows from the fact stays at value zero in the interval , but after that is strictly increasing due to its convexity, and monotonicity of this curve also follows from monotonicity of the ’s. Thus, is this curve plus any monotone curve extending it to the origin.
To see that satisfies properties (P1) and (P2), we note that since the ’s are increasing and not identically 0, is unbounded in all coordinates. Thus, a sufficiently large satisfies (P1), and we can start with such and move down the curve (decreasing in each coordinate) until we obtain with , since the has increasing gradients. (The equality in this final step uses the assumption that item sizes are infinitesimal, which we made for simplicity in this section). ∎
Making the above discussion formal, we show that has a highvalue classifier. Recall that is the set of items picked by (Definition 3.2).
Theorem 3.5.
Given Assumption 3.1, let be a classifier in satisfying properties (P1)(P3). Then for all we have .
Proof.
Let be the solution picked by the classifier , and note that . Let be the linearization of at some slope . From (3.4) we know for all . Since is optimal for the linearization (because iff ), we have
(3.5) 
Now we relate the true profit to this linearized value. Observe that
(by Claim A.2)  
(3.6) 
where the inequality uses that by property (P1) and . The first term is nonnegative because we only pick items for which . The second term is nonnegative due to Claim A.5(a). We can now prove three lemmas that imply the theorem.
Lemma 3.6.
For any ,
Lemma 3.7.
Proof.
Using the superadditivity of and Claim A.7 we get . Now from property (P3) of the classifier , all the terms in the sum are equal. ∎
Lemma 3.8.
Proof.
This completes the proof of Theorem 3.5. ∎
4 The Offline Constrained Case
Having built up tools and intuition in the unconstrained case, we turn to the case where there is a downwardsclosed constraint , and the goal is to maximize the profit subject to . We again work with Assumption 3.1, but do not assume anything about items sizes. We discuss computational aspects at the end of this section.
The general idea is again to use classifiers , and only consider items in , namely those with “highenough” value . However, because of the constraints we may no longer be able to pick all these items. Thus, we need to consider the most profitable solution from in this filtered feasible set (whose quality is less clear how to analyze).
Again we restrict to the 1dimensional curve defined in the previous section; however, it only satisfies slightly modified versions of properties (P1)(P2), since we do not assume the item sizes to be infinitesimal anymore. To make this precise, define the “open” set ; note the strict inequality. Under the assumption of items being in general position, there is at most one “threshold” item with , i.e., . Now a “good” classifier is one that satisfies the following:

For all binary with and , .

There exists a binary with and , and index such that (Note that if , then by property (P1’) the above inequality holds at equality; else contains the unique element in .)

This is the same as before: .
The arguments of Lemma 3.4 show the following.
Lemma 4.1.
Given Assumption 3.1, the curve defined in the previous section contains a satisfying properties (P1’)(P3’).
Next, we show that for a good classifier , the maximum profit solution from contained within essentially gives an approximation.
Theorem 4.2 (Offline Approach).
Suppose Assumption 3.1 holds. Let be a classifier in satisfying properties (P1’)–(P3’). Then the better of the two solutions: (a) the maximum profit solution in containing elements only from , and (b) the optimal single element in , has profit at least for any vector .
Proof.
The idea is to follow the development in Theorem 3.5. There same solution satisfied the value lower bounds of Lemmas 3.6 and 3.8; to satisfy the first lemma, we needed the solution to be optimal for the linearization of using “slope” ; to satisfy the second, we needed to satisfy (P2). Here, we construct two solutions in intersect to satisfy these lemmas separately:
Since property (P1’) and (P3’) holds for , Lemmas 3.6 and 3.7 hold essentially unchanged, and thus for any vector we have
(4.7) 
The solution may not belong to the set , since it may contain the threshold item , if it exists (let be its characteristic vector, all 0’s vector if does not exists). Let .
Lemma 4.3.
These solutions satisfy
Proof.
Property (P1’) gives , and Property (P2’) implies is at least at some coordinate . Since is convex and differentiable, the gradients are continuous [HUL01, Remark D.6.2.6], so there is where the vector satisfies and for some coordinate . Due to these properties, the proof of Lemma 3.8 holds for and shows .
The assumption of no exceptional items gives . From subadditivity of profit , This concludes the proof. ∎
Picking the most profitable singleton is trivial offline, and wellapproximable online by the secretary algorithm [Fre83]. Moreover, we need to approximately optimize the submodular function (Fact 2.7) over (i.e., the sets in with only elements of ). For several constraint structures (e.g., matroids, systems), there are known algorithms for approximately optimizing nonnegative (and sometimes also monotone) submodular functions. Unfortunately, our profit function may take negative values, so we cannot directly use these algorithms. Simply considering the truncated function does not work because it may be nonsubmodular. In the next section, when is separable, we introduce a way of making our profit function nonnegative everywhere, while maintaining submodularity and preserving the values at the region of interest .
4.1 Making the Profit Function Nonnegative
We first show that already satisfies the desired properties over the sets in .
Lemma 4.4.
The profit function is nonnegative monotone over .
Proof.
Since it suffices to show monotonicity. Consider and let be the indicator os an item in . Comparing the costs with and without we have
Since , we have and thus , i.e., monotonicity. ∎
However, to run algorithms that approximately optimize over in a blackbox fashion, nonnegativity over the feasible sets is not enough, even if the algorithm only probes over these sets, since their proof of correctness may require this property outside of feasible sets. Thus, we need to modify to ensure nonnegativity outside of .
For that, the idea is to truncate the gradient of the cost so becomes at most for all subsets (i.e., so Property (P1’) holds for all subsets); this was the crucial element for the monotonicity (and hence nonnegativity) proof above. Notice that since Property (P1’) guarantees already for all , this does not change the value of over these points. The proof of the lemma is given in Appendix B.
Lemma 4.5.
If is separable, there is a submodular function satisfying the following:

is nonnegative and monotone over all subsets of , and

for every .
4.2 The Offline Algorithm: Wrapup
Using this nonnegativization procedure, we get an approximation offline algorithm for constrained profit maximization for separable cost functions ; this is an offline analog of Theorem 1.2. For the unconstrained case, Lemma 4.4 implies that the profit function it itself monotone, so we get an approximation offline algorithm for the supermodular case. In the next section we show how to convert these algorithms into online algorithms.
One issue we have not discussed is the computational cost of finding satisfying (P1’)–(P3’). In the full version of the paper, we show that for any we can efficiently find a satisfying (P1’), (P2’), and a slightly weaker condition: for all . Using this condition in Theorem 4.2 means we get a profit of at least ; the running time depends on so we can make this loss negligible.
5 The Online Algorithm
In the previous sections we were working offline: in particular, in computing the “good” classifier , we assumed knowledge of the entire element set. We now present the online framework for the setting where elements come in random order. Recall the definition of the curve from §3, and the fact that there is a total order among all . Recall that for simplicity we restrict the constraints to be matroid constraints.
For a subset of elements , let and denote the integer and fractional optimal profit for , the feasible solutions restricted to elements in . Note that in the fractional case this means the best solution in the convex hull . Clearly, . We use and to denote and for the entire instance .
Again we work under Assumption 3.1. We will also make use of any algorithm for maximizing submodular functions over in the randomorder model satisfying the following.
Assumption 5.1.
Algorithm SubmodMS takes a nonnegative monotone submodular function with , and a number . When run on a sequence of elements presented in random order, it returns a (random) subset with expected value . Moreover, the it only evaluates the function on feasible sets.
Our algorithm is very simple:
Note that denotes the set of items in the sample picked by (Definition 3.2). In Step 2, we can use the Ellipsoid method to find , i.e., to maximize the concave profit function over the matroid polytopes ) and ), within negligible error. Moreover, we must do this for several sets and pick the largest one on using a binarysearch procedure. We defer the technical details to the full version of the paper.
5.1 Analysis
To analyze the algorithm, we need to show that the classifier learned in Step 2 is large enough that we do not waste space with useless items, but low enough that we admit enough useful items. Along the way we frequently use the concentration bound from Fact A.9. For this we need the profit function to satisfy a Lipschitztype condition (A.10) on the optimal solutions of any given subinstance. To facilitate this, let us record a useful lemma, proved in Appendix B. For a vector , and a subset , define to be the same as on , and zero outside .
Claim 5.2.
Consider any , and let be an optimal fractional solution on (so ). Then for any with , we have , where is an upper bound on the profit from any single item.
From Section 4, recall is a classifier that satisfies properties (P1’)–(P3’).
Lemma 5.3 (Goldilocks Lemma).
Proof sketch.
(See Appendix B for full proof.) For the first part, we show that the classifier satisfies the properties needed in Line 2 with probability ; since is the largest such vector, we get . Using Theorem 4.2 and the assumption that no item has large profit, we have . Moreover, the sample obtains at least half of this profit in expectation, i.e., . Then using concentration (Fact A.9) with the Lipschitz property of Claim 5.2 and the nohighprofititem assumption, we have (which is at least ) with probability at least . Thus, with this probability satisfies the properties needed in Line 2 of the algorithm, as desired.
For the part (b) of the lemma, notice that for each scenario , since feasible solutions for the sample are feasible for the whole instance. Next, by definition of , . Finally, if is the fractional optimal solution on with , then , since is superadditive. Again using the concentration bound Fact A.9, the profit is at least with probability at least . Of course, . Chaining these inequalities, with this probability. ∎
In view of Theorem 4.2, we show the filtered outofsample instance behaves like .
Lemma 5.4.
The filtered outofsample instance satisfies the following w.p. :

For all , .

For all with such that , .

.
Proof.
By Lemma 5.3(a), threshold with probability . When that happens, . Since the first two properties hold for , they also hold for , and by downwardclosedness, also for .
For the third part, let be the largest threshold in such that . From Lemma 5.3(b), with good probability we have . Since is a smaller threshold, the instance is contained in the instance , which implies that for every scenario . Next we will show that that with good probability , and hence get the same lower bound for . If is the optimal fractional solution for , then is feasible for with . Moreover, using the concentration bound again, we get that with probability at least . Finally, by the assumption of general position, there is at most one item in . Dropping this item from the solution to get reduces the value by at most ; here we use subadditivty of the profit, and that there are no exceptional items. Hence, with probability at least :
Comments
There are no comments yet.