# Online Revenue Maximization for Server Pricing

Efficient and truthful mechanisms to price resources on remote servers/machines has been the subject of much work in recent years due to the importance of the cloud market. This paper considers revenue maximization in the online stochastic setting with non-preemptive jobs and a unit capacity server. One agent/job arrives at every time step, with parameters drawn from an underlying unknown distribution. We design a posted-price mechanism which can be efficiently computed, and is revenue-optimal in expectation and in retrospect, up to additive error. The prices are posted prior to learning the agent's type, and the computed pricing scheme is deterministic, depending only on the length of the allotted time interval and on the earliest time the server is available. If the distribution of agent's type is only learned from observing the jobs that are executed, we prove that a polynomial number of samples is sufficient to obtain a near-optimal truthful pricing strategy.

## Authors

• 8 publications
• 10 publications
• 21 publications
• 63 publications
• 18 publications
• ### Optimal Pricing Schemes for an Impatient Buyer

A patient seller aims to sell a good to an impatient buyer (i.e., one wh...
06/03/2021 ∙ by Yuan Deng, et al. ∙ 0

• ### Tight Revenue Gaps among Multi-Unit Mechanisms

This paper considers Bayesian revenue maximization in the k-unit setting...
02/15/2021 ∙ by Yaonan Jin, et al. ∙ 0

• ### Escaping Cannibalization? Correlation-Robust Pricing for a Unit-Demand Buyer

A single seller wishes to sell n items to a single unit-demand buyer. We...
03/12/2020 ∙ by Moshe Babaioff, et al. ∙ 0

• ### Selling Data to an Agent with Endogenous Information

We consider the model of the data broker selling information to a single...
03/09/2021 ∙ by Yingkai Li, et al. ∙ 0

• ### Third-degree Price Discrimination Versus Uniform Pricing

We compare the revenue of the optimal third-degree price discrimination ...
12/11/2019 ∙ by Dirk Bergemann, et al. ∙ 0

• ### When is Assortment Optimization Optimal?

A classical question in economics is whether complex, randomized selling...
05/21/2021 ∙ by Will Ma, et al. ∙ 0

• ### Are Two (Samples) Really Better Than One? On the Non-Asymptotic Performance of Empirical Revenue Maximization

The literature on "mechanism design from samples," which has flourished ...
02/22/2018 ∙ by Moshe Babaioff, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Designing mechanisms for a desired outcome with strategic and selfish agents is an extensively studied problem in economics, with classical work by Myerson [17], and Vickrey-Clarke-Groves [24], emphasizing truthful mechanisms. The advent of online interaction and e-commerce has added an efficiency constraint on the mechanisms, going so far as to prioritize computational efficiency over classical objectives: e.g. implementing simple approximate mechanisms when optimal mechanisms are computationally difficult, or even impossible. Beginning with Nisan and Ronen [18], the theoretical computer science community has contributed greatly to the field, in both fundamental problems and specific applications. In addition to designing truthful mechanisms for the maximization of welfare and revenue, this body of work has also focused on learning distributions of agent types, menu complexity, and dynamic mechanisms (e.g., [5, 9].)

We consider this question in the setting of selling computational resources on remote servers or machines (cf. [23, 2]

.) This is arguably one of the fastest growing markets on the Internet. The goods (resources) are assigned non-preemptively and thus have strong complementarities. Furthermore, since the supply (server capacity) is limited, any mechanism trades immediate revenue for future supply. Finally, mechanisms must be incentive-compatible, as non-truthful, strategic, behaviour from the agents can skew the performance of a mechanism from its theoretical guarantees. This leads us to the following question:

#### 1.0.1 Question.

Can we design an efficient, truthful, and revenue-maximizing mechanism to sell time-slots non-preemptively on a single server?

We design a posted-price mechanism which maximizes expected revenue up to additive error, for agents/buyers arriving online, with parameters of value, length and maximum delay, drawn from an underlying, unknown, distribution.

Three key aspects distinguish our problem from standard online scheduling: (i) In our setting, as time progresses, the server clears up, allowing longer jobs to be scheduled in the future if no smaller jobs are scheduled until then. (ii) Scheduling the jobs is not exclusively to the discretion of the mechanism designer, but must also be desired by the job itself, while also producing sufficient revenue. (iii) As the mechanism designer, we do not have access to job parameters in an incentive-compatible way before deciding on a posted price menu. These three features lie at the core of the difficulty of our problem. Our focus will be on devising online mechanisms in the Bayesian setting.

In our online model, time on the server is discrete. At every time step, an agent arrives on the server, with a value , length requirement , and maximum delay . These parameters are drawn from a common unknown distribution, i.i.d. across jobs. The job wishes to be scheduled for at least consecutive time slots, no more than time units after its arrival, and wishes to pay no more than . Jobs are assumed to have quasi-linear utility in money, and so prefer the least-price interval within their constraints. The mechanism designer never learns the parameters of the job. Instead, she posts a price menu of (length,price) pairs, and the minimum available delay . The job accepts to be scheduled so long as , and there is some (length,price) pair in the menu of length at least and price at most

. We note that the pricing scheme can be dynamic, changing through time. If, at time epoch

, an agent chooses option , then she pays and her job will be allocated to the interval . She will choose the option which minimizes

. Throughout this paper we assume that the random variables

are discrete, and have finite support.

### 1.1 Summary of Our Results

1. We model the problem of finding a revenue maximizing pricing strategy as a Markov Decision Process (MDP). Given a price menu (length,price) and a state (minimum available delay)

at time

, the probability of transition to any other state at time

is obtained from the distribution of the job’s parameters. The revenue maximizing pricing strategy can be efficiently computed via backwards induction. We also present, in appendix 0.C.2, an approximation scheme in the case where

2. We prove that the optimal pricing strategy is monotone in length under a natural distributional assumption (log-concave/MHR). This implies the existence of an optimal truthful pricing mechanism in the finite horizon setting when the distributions are known. In appendix 0.C.1, this is extended to the infinite discounted horizon setting, incurring a small additive error.

3. We analyze the performances of the proposed pricing strategy when the distribution is only known from samples collected through the jobs’ decisions. We provide a truthful posted price -approximate mechanism if the number of samples is polynomial in and the complexity of the distribution.

### 1.2 Related Work

Much recent work has focused on designing efficient mechanisms for pricing cloud resources. Chawla et al. [7] recently studied “time-of-use” pricing mechanisms, to match demand to supply with deadlines and online arrivals. Their result assumes large-capacity servers, and seeks to maximize welfare. [1] provides a mechanism for preemptive scheduling with deadlines, maximizing the total value of completed jobs. Another possible objective for the design of incentive-compatible scheduling mechanisms is the total value of completed jobs, which have release times and deadlines. [19] solves this problem in an online setting, while [6], in the offline setting for parallel machines, and [22], in the online competitive setting with uncertain supply. [12] focuses on social welfare maximization for non-preemptive scheduling on multiple servers, and obtains a constant competitive ratio as the number of servers increases. Our work differs from these by considering stochastic job types, and revenue maximization. [13] addresses computing a price menu for revenue maximization with different machines. Finally, [2] proposes a system architecture for scheduling and pricing in cloud computing.

Posted price mechanisms (PPM) have been introduced by [21] and have gained attention due to their simplicity, robustness to collusion, and their ease of implementation in practice. One of the first theoretical results concerning PPM’s is an asymptotic comparison to classical single-parameter mechanisms [4]. They were later studied by [8] for the objective of revenue maximization, and further strengthened by [14] and [10]. [11] shows that sequential PPM’s can -approximate social welfare for XOS valuation functions, if the price for an item is equal to the expected contribution of the item to the social welfare.

Sample complexity for revenue maximization was recently been studied in [9] showing that a polynomial number of samples is sufficient to obtain near optimal Bayesian auction mechanisms. An approach based on statistical learning that allows to learn mechanisms with expected revenue arbitrarily close to optimal from a polynomial number of samples has been proposed in [15]. The problems of learning simple auctions from samples has been studied in [16].

### 1.3 Structure of the Paper

In §2 we describe the model of the problem as a Markov Decision Process. In §3 we present an efficient algorithm for computing optimal policies for the finite time horizon. This is extended to other settings in appendix 0.C. In §3.3, we demonstrate that the optimal policy is monotone, and §4.1 gives the learning algorithm and error bounds for computing the pricing policies with only (partial) sample access to the job distribution. In §4.2 we describe the concentration bounds on the revenue of a pricing policy. Finally, §5 is devoted to describing and summarizing the final result and future directions of research.

#### 1.3.1 Proof Details.

Detailed proofs are not included in the body of the text, but rather in appendix 0.B. If the appendix is not present, a full version of this document may be found on the arXiv.

## 2 Model

#### 2.0.1 Notation.

In what follows, the variables , or , or , and or are reserved for describing the parameters of a job that wishes to be scheduled. Respectively, they represent the arrival time , required length , value , and maximum allowed delay . The lowercase variables represent fixed values, whereas the uppercase represent random variables. Script-uppercase letters represent the supports of the distributions on , , and , respectively; and the bold-uppercase letters represent the maximum values in these respective sets. Finally, is reserved for pricing policy, whereas is reserved for probabilities.

#### 2.0.2 Single-Machine, Non-Preemptive, Job Scheduling.

A sequence of random jobs wish to be scheduled on a server, non-preemptively, for a sufficiently low price, within a time constraint. Formally, a job with parameters arriving at time wishes to be scheduled for a price in an interval such that and . There is an underlying distribution over the space from which we sample the parameters of each new job.

Our goal is to design a take-it-or-leave-it, posted-price mechanism which maximizes expected revenue. At each time period, the mechanism posts a “price menu” and an earliest-available-time , indicating that times through have already been scheduled. ( will henceforth be referred to as the state of the server.) We let to be the set of all possible states. The state of the server at a given time is naturally a random variable which depends on the earlier jobs and on the adopted policy . As before, we will denote with or the fixed value, and with or the corresponding random variable. The price menu will be given by the function , i.e., if we are a time and the server is in state , then the prices are set according to The reported pair is computed by the scheduler’s strategy, which we determine in this paper. Once this is posted, a job is then sampled i.i.d. from the underlying distribution .

If for some , and , then the job accepts the schedule, and reports the length which minimize price. Otherwise, the job reports and is not scheduled. To guarantee truthfulness, it suffices to have be monotonically non-decreasing for every state : the agent would not want a longer interval since it costs more, and would not want one of the shorter intervals since they cannot run the job. Since we are assuming that jobs are non-preemptive, a false-name strategy for buying multiple small intervals would expose them to the risk of being assigned a discontinuous interval.

It should be clear that the mechanism’s strategy is to always report monotone non-decreasing prices, as a decrease in the price menu will only cause more utilization of the server, without accruing more revenue. The main technical challenge in this paper, then, is to show that under some assumptions, the optimal strategy is monotone non-decreasing, and efficiently computable.

#### 2.0.4 Revenue Objective.

Revenue can be measured in either a finite or an infinite discounted horizon. In the former (finite) case, only time periods will occur, and we seek to maximize the expected sum of revenue over these periods. In the infinite-horizon setting, future revenue is discounted, at an exponentially decaying rate. Formally, revenue at time is worth a fraction of revenue at time 0, for some fixed . See appendix 0.C.1. Recall that the job parameters are drawn independently at random from the underlying distribution, so the scheduler can only base their “price menu” on the state of the system and the current time. Thus, the only realistic strategy is to fix a state-and-time-dependent pricing policy , “”, where .

Let be the random sequence of jobs arriving, sampled i.i.d. from the underlying distribution. Let be the pricing policy. We denote as the revenue earned at time with policy and sequence . If does not buy, then , and otherwise, it is equal to . We denote as the total (cumulative) revenue earned over the periods. Thus,

 CmlRevT(X,π):=∑Tt=0Revt(X,π). (1)

We will also need the expected-future-revenue, given a current time and server state, which we will denote as follows:

 (2)

The subscript of the expectation denotes that we consider only jobs arriving from time onward. Our objective is to find the pricing policy which maximizes . Call this , and denote the expected revenue under as .

## 3 Bayes-optimal Strategies for Sever Pricing

In this section we seek to compute an optimal monotone pricing policy which maximizes revenue in expectation over jobs sampled i.i.d. from an underlying known distribution . This is extended to the infinite-horizon, discounted, setting in appendix 0.C.1.

We first model the problem of maximizing the revenue in online server pricing as a Markov Decision Process that admits an efficiently-computable, optimal pricing strategy. The main contribution of this section is to show that, for a natural assumption on the distribution , the optimal policy is monotone. We recall that this allows us to derive truthful Bayes-optimal mechanisms.

### 3.1 Markov Decision Processes.

We show that the theory of Markov Decision Processes

is well suited to model our problem. A Markov Decision Process is, in its essence, a Markov Chain whose transition probabilities depend on the

action chosen at each state, and where to each transition is assigned a reward. A policy is then a function mapping states to actions. In our setting, the states are the states of the system outlined in Section 2.0.3 (i.e., the possible delays before the earliest available time on the server), and the actions are the “price menus.” At every state , a job of a random length arrives, and with some probability, chooses to be scheduled, given the choice of prices. The next state is either , if the job does not choose to be scheduled (since we have moved forward in time), or , if a job of length is scheduled, since we have occupied more units. The transition probabilities depend on the distribution of job lengths, and the probability that a job accepts to be scheduled given the pricing policy (action). Formally,

 Pr[st+1=st+ℓ−1]={Pr[Lt=ℓ,Vt≥πt(st,ℓ),Dt≥st+ℓ] if ℓ≥11−∑k≥0Pr[st+1=st+k] if ℓ=0 (3)

(Transitions to state “” should be read as transitions to state “”.)  Note that a job of length  may choose to purchase an interval of length greater than , which would render these transition probabilities incorrect. However, this may only happen if the larger interval is more affordable. It is therefore in the scheduler’s interest to guarantee that in monotone non-decreasing in , which incentivizes truthfulness, since this increases the amount of server-time available, without affecting revenue. Thus we restrict ourselves to this case.

It remains to define the transition rewards. They are simply the revenue earned. Formally, a transition from state to incurs a reward of , whereas a transition from state to incurs 0 reward. We wish to compute a policy in such a way as to maximize the expected cumulative revenue, given as the (possibly discounted) sum of all transition rewards in expectation.

### 3.2 Solving for the Optimal Policy with Distributional Knowledge

In this section, we present a modified MDP whose optimal policies can be efficiently computed, and show that these policies are optimal for the original MDP. In this section, we assume that the mechanism designer is given access to the underlying distribution . This is not in line with our model, since we assume the contrary. However, in the following sections, we will show that if the distribution

is estimated from samples, then solving for the MDP on this estimated distribution is sufficient to ensure sufficiently good revenue guarantees.

Since the problem has been modelled as a Markov Decision Process (MDP), we may rely on the wealth of literature available on MDP solutions, in particular we will leverage the backwards induction algorithm (BIA) of [20]§4.5, included in appendix 0.B as Algorithm 1. We will however need to ensure that this standard algorithm (i) runs efficiently, and (ii) returns a monotone pricing policy.

Note that past prices do not contribute to future revenue insofar as the current state remains unchanged. Thus, to compute optimal current prices, we need only know the current state and expected future revenue. This allows us to use the BIA. The idea is to compute the optimal time-dependent policy, and the incurred expected reward, for shorter horizons, then use this to recursively compute the optimal policies for longer horizons.

The total runtime of the BIA is . Note that the dependence on is unavoidable, since any optimal policy must be time-dependent. Recall that and denote the maximum values that and can take, respectively, and is the set of possible values that can take. Denote . If we define the action space naïvely, we have , and . Thus, a naïve definition of the MDP bounds the runtime at , which is far from efficient. Requiring monotonocity only affects lower-order terms.

#### 3.2.1 Modified MDP.

To avoid this exponential dependence, we can be a little clever about the definition of the state space: instead of states being the possible server states, we define our state space as possible (state, length) pairs. Thus, when the MDP is in state , the server is in state , and a job of length has been sampled from the distribution. Our action-space then is simply the possible values of , and the transition probabilities and rewards become:

 Pr[(s,ℓ) →(s′,ℓ′)|π]= ⎧⎨⎩Pr[V≥πt(s,ℓ),D≥s|L=ℓ]Pr[L′=ℓ′] if s′=s+ℓ−1Pr[V<πt(s,ℓ) or D

Therefore, we get , and . Thus, the runtime of the algorithm becomes . A full description of the procedure is given in appendix 0.B as Algorithm 2. It remains to prove that it is correct. We begin by claiming that these two MDPs are equivalent in the following sense:

###### Lemma 1

For any fixed pricing policy ,

 Uπt(s)=EL[uπt(s,L)],∀t∈T, s∈\SS,

where the ’s are as in (2), and the ’s are from the modified MDP.

(See appendix 0.B for a proof.) This lemma, however, does not suffice on its own, as agents may behave strategically by over-reporting their length, if the prices are not increasing. This would alter the transition probabilities, breaking the analysis. We will see that under a mild assumption, this can not happen, as the optimal policy for non-strategic agents will be monotone, and therefore truthful.

### 3.3 Monotonicity of the Optimal Pricing Policies

Recall that the solution of the more efficient MDP formulation is only correct if we can show that it is always monotone without considering the strategic behaviour of agents, ensuring incentive-compatibility of the optimal.

An optimal monotone strategy cannot be obtained for all the distributions on and . As an example, for any distribution where a job’s value is a deterministic function of their length, the optimal policy is to price-discriminate by length. If this function is not monotone, the optimum won’t be either. We wish to show monotonicity under the assumption below.

#### 3.3.1 Assumption 1.

The quantity is monotone non-decreasing as grows, for any state and fixed.

This is not an immediately intuitive assumption, but we show that it is satisfied by all “-parametrized” log-concave random variables, where the parametrization captures a sense of positive correlation between length and value.

###### Lemma 2

Let, denote the marginal random variable conditioned on and . Let be a continuously-supported random variable, and . If is distributed like , , , or , then the assumption is satisfied if is log-concave, or if the ’s are independent of .

A discussion of log-concave random variables and a proof of this fact is given in appendix 0.A. Many standard (discrete) distributions are (discrete) log-concave random variables, including the uniform, Gaussian, logistic, exponential, Poisson, binomial, etc. These can be proved to be log-concave from the discussion in appendix 0.A. In the above, the terms represent a notion of spread or shifting, parametrized by the length, indicating some amount of positive correlation.

It remains to show price monotonicity under the above assumption. First, we begin with the following, which holds for arbitrary distributions.

###### Lemma 3

Let be the expected future revenue earned starting at time in state , for the optimal policy computed by Algorithm 2. Then the function is monotone non-increasing in for any fixed.

See appendix 0.B for the proof. This lemma ensures that over-selling time on the server can only hurt the mechanism. This allows us to conclude

###### Lemma 4

If the distribution on job parameters satisfies the above assumption, then for all , we have .

###### Proof (Sketch.)

As usual, a full proof may be found in appendix 0.B. The idea is to show that, for any price less than the optimum , the difference in revenue between charging and to jobs of length is less than the difference in revenue between the same prices for jobs of length . This is achieved by applying the assumption to recursive definition of future revnue, along with the previous lemma. Thus, we can conclude that the optimal price must be greater than . ∎

With Lemma 4 and the results of Appendix 0.C, we finally have:

###### Theorem 3.1

The online server pricing problem admits an optimal monotone pricing strategy when the variables , , and satisfy assumption 1. Also,

1. In the finite horizon setting, when is finitely supported, an exact optimum can be computed in time .

2. In the infinite horizon setting, when is finitely supported, for all , an -additive-approximate policy can be computed in time

 O(K3logγ(ε(1−γ)V))≤O(K31−γln(Vε(1−γ)))
3. In the finite horizon setting, when is continuously supported, for all , an -additive-approximate policy can be computed in time .

## 4 Performance with distribution learned from samples

In the previous section, we have shown how to compute the optimal pricing policy in expectation, given the distribution. However, our problem does not allow access to this distribution. We describe here how one might learn a distribution which approximates in our setting. We then show that the Bayes-optimal policy computed over performs well in expectation and with high probability.

### 4.1 Learning the Underlying Distribution from Samples

When a job arrives, we only learn its length, and only if it agrees to be scheduled. Thus, we are not given full samples of , complicating the learning procedure. We propose here a simple sampling method allowed within the model, which efficiently computes an estimated with good error bounds. This in turn lets us to bound the error on the estimated future revenues, and to show that the revenue of any given policy with jobs from is similar to the revenue for this same policy with jobs from . Thus, a policy which is optimal with respect to will be close-to-optimal with respect to .

Let be an i.i.d. sample of jobs from the underlying distribution . Note that the expectation of an indicator is the probability of the indicated event. Fix a length , a state , and a value . As a consequence of Höffding’s inequality, with probability ,

 ∣∣1n∑nk=1I[Lk=ℓ,Vk≥v,Dk≥s]−Pr[L=ℓ,V≥v,D≥s]∣∣≤√log(\sfrac2δ)2n (6)

#### 4.1.1 Sampling Procedure.

We wish to estimate the value for all choices of , , and . Fixing and , we may repeatedly post prices and declare that the earliest available time is , then record (i) which job accepts to be scheduled, and (ii) the length of each scheduled job. Let and , then by (6), the sample-average of each value will have error at most with probability , for any one choice of .

Repeating this process for all choices of and gives us estimates for each. Now, if we want to have the estimate hold over all choices of , it suffices to take the union bound over all values (incl. ), and scaling accordingly. If we take samples for each of the choices of and , then simultaneously for all , , and , the quantity in (6) is at most . In an abuse of notation, we will denote this “”.

It should be noted that, for this sampling procedure, if a job of length is scheduled, we must possibly wait at most times units before taking the next sample to clear the buffer. This blows up the sampling time by a factor of .

Later, we will need to estimate the value , that is the probability that the job has length , but either cannot afford price , or cannot be scheduled slots in the future. This is equal to

 Pr[L=ℓ]−Pr[L=ℓ,V≥v,D≥s] .

The left-hand term is equal to , and so we have access to both terms. The estimation error is additive, so the deviation is at most .

Denote , and recall

 Uπt(s):=∑ℓ∈LPr[L=ℓ](pℓt,s(πt(s,ℓ)+Uπt+1(s+ℓ−1))+(1−pℓt,s)Uπt+1(s−1)), (7)

the expected revenue from time onwards, conditioning on . Let be the same as , but where the variables are distributed as . As before, let be for , the Bayes-optimal policy returned by Algorithm 2, and defined similarly but with respect to . We will show that is a good estimate for .

###### Lemma 5

Let , , and , be as above. In the finite horizon, for all , if , we have that with probability , for all . In the infinite horizon, if , we have that with probability , for all .

As usual, the proof is in Appendix 0.B.3. It is a consequence of being small, which is a consequence of Höffding’s inequality for the right choice of .

### 4.2 Concentration Bounds on Revenue for Online Scheduling

In the previous section, we showed that the performance of any fixed policy, including , can be well estimated given sufficient samples from . In this section, we show that the revenue of arbitrary policies concentrates around their mean. This will allow us to argue later that, if we first learn from samples, then execute Algorithm 2 given the sample distribution , then the output policy will perform well with respect to , both in expectation, and with high probability.

To show this concentration, we will consider the Doob or exposure martingale of the cumulative revenue function, introduced in Section 2. Define

 Rπi:=E[CmlRevT(π,X)|X1,…,Xi] (8)

where the ’s are jobs in the sequence and the expected value is taken with respect to . Thus, is the expected cumulative revenue, and is the random cumulative revenue. To formally describe this martingale sequence, we will introduce some notation, and formalize some previous notation. Recall that is a sequence of jobs sampled i.i.d. from an underlying distribution . Fix a pricing policy . Note that the state at time is a random variable depending on both the (deterministic) pricing policy and the (random) . We denote it , or for short. Formally, suppose , then if either or , and otherwise . Furthermore, let be equal to 0 in the first case above (the -th job is not scheduled), and otherwise. Thus, and are functions of the random values for fixed. Note that implicitly depends on . Let and . Recalling that , we have

 Rπi (9a) =(∑it=0Revt(π,X≤t))+Uπi+1(Si+1(π,X≤i)) (9b)

We wish to show that concentrates around its mean. Since is the expected revenue due to , and is the (random) revenue observed, it suffices to show is small, which we will do by applying Azuma’s inequality, after showing the bounded-differences property. This gives

###### Theorem 4.1

Let be a finite sequence of jobs sampled from , and let be any monotone policy. Then, with probability ,

in the finite horizon, and in the infinite-horizon-discounted,

 ∣∣∣CmlRev∞(X,π)−EX′[CmlRevT(X′,π)]∣∣∣≤V⋅√2log(\sfrac2δ)/(1−γ2).

### 4.3 Performance of the Computed Policy

We combine here the results of the two previous subsections to analyze the performance of the policy output by Alg. 2. By the estimation of revenue, the best policy in estimated-expectation is near-optimal in expectation. Since revenues from arbitrary policies concentrate, we get near-optimal revenue in hindsight.

Formally, for , Lemma 5 gives us that if the sample-distribution is computed on samples, then with probability over the samples, . Note that is exactly the expected cumulative revenue of the optimal policy. For clarity of notation, denote

 ECRevT(π|Q):=EX∼Q[CmlRevT(X,π)] (10)

We have shown that for sufficient samples, , with probability . This observation allows us to then conclude

###### Theorem 4.2

Let be the underlying distribution of jobs, let and , and let be the sampled distribution as in section 4.1, learned from samples. Let be the -optimal policy returned by Algorithm 2 given . Then with probability , for any arbitrary policy ,

 CmlRevT(X,^π)≥CmlRevT(X,π)−2V√2log(\sfrac8δ)(T+1)−ε .

In the infinite horizon, the error term is and needs samples.

###### Proof

We have chosen . Let be the optimal policy for the true distribution . By Theorem 4.1, we have with probability for both and . Furthermore, by Lemma 5, with probability , for both and . This is because from the point of view of , is the true distribution, and is the estimate. Taking the union bound over all four events above, and recalling that maximizes , and maximizes , we get the following with probability :

 CmlRevT(X,^π) ≥ECRevT(^π|Q)−V√2log(8/δ)(T+1) (concentration) ≥ECRevT(^π|^Q)−V√2log(8/δ)(T+1)−ε/2 (sample error) ≥ECRevT(π∗|^Q)−V√2log(8/δ)(T+1)−ε/2 (optimality) ≥ECRevT(π∗|Q)−V√2log(8/δ)(T+1)−ε (sample error) ≥ECRevT(π|Q)−V√2log(8/δ)(T+1)−ε (optimality) ≥CmlRevT(X,π)−2V√2log(8/δ)(T+1)−ε (concentration)

as desired. The proof for infinite-horizon is identical. ∎

## 5 Conclusions

In summary, we propose to price time on a server by first learning the distribution over jobs from samples, then computing the Bayes-optimal policy from the estimated distribution. Our learning algorithm is very simple: we sample the distribution through the observation of jobs at artificially fixed prices and server-states, and learn the job parameters depending on whether they accept to be scheduled. Using these observations, we build an observed distribution . Given the observed distribution , we run Algorithm 2 and compute an optimal policy for . We are guaranteed that is a policy with prices monotone non-decreasing in job length (due to Lemma 3), and therefore it is incentive compatible, which implies that our computations estimating its revenue are correct.

We conclude with the following theorem that combines the results of Theorems 3.1, 4.1, and 4.2, when the policy is the optimal policy .

###### Theorem 5.1 (Finite Horizon)

Let be the underlying distribution over jobs. Let , and . Then in time , we may compute a policy which is monotone in length, and therefore incentive compatible, such that for any policy , with probability ,

 CmlRevT(X,^π)≥CmlRevT(X,π)−2V√2log(\sfrac8δ)(T+1)−ε

Furthermore, if the distribution over values is continuous rather than discrete, we may compute in time a monotone policy such that for any policy , with probability ,

 CmlRevT(X,^π)≥CmlRevT(X,π)−2V√2log(\sfrac8δ)(T+1)−ε−ηT

This policy is computed by learning from samples as in Section 4.1, and running Algorithm 2 for the estimated distribution. When is continuously distributed, choose prices which are multiples of between 0 and , as is outlined in appendix 0.C.2.

###### Theorem 5.2 (Infinite Horizon, Discounted)

Let be the underlying distribution over jobs. Let , and . Then we may compute a policy in time , which is monotone, and thus incentive compatible, such that for any policy , with probability ,

 CmlRev∞(X,^π)≥CmlRev∞(X,π)−2V√2log(\sfrac8δ)/(1−γ2)−2ε

Furthermore, if the distribution over values is continuous rather than discrete, we may compute in time a monotone policy such that for any , with probability ,

 CmlRev∞(X,^π)≥CmlRev∞(X,π)−2V√2log(\sfrac8δ)/(1−γ2)−2ε−η/(1−γ)

As above, this policy is computed by learning from samples as in Section 4.1, then running the modified Algorithm 2 for the estimated distribution. In case is continuously distributed, we restrict ourselves to prices which are multiples of between 0 and .

#### 5.0.1 Future Work.

This work clearly opens up several future research directions. It is a challenging problem to extend our results to the multi-server setting, for servers. A second interesting open problem is concerned with the case in which the jobs can be scheduled with some delay after the first time the machine is available. We finally mention the more complex combinatorial setting in which the jobs require various quantities of different resources, e.g. storage and computation.

## References

• [1] Azar, Y., Kalp-Shaltiel, I., Lucier, B., Menache, I., Naor, J., Yaniv, J.: Truthful online scheduling with commitments. In: EC (2015)
• [2] Babaioff, M., Mansour, Y., Nisan, N., Noti, G., Curino, C., Ganapathy, N., Menache, I., Reingold, O., Tennenholtz, M., Timnat, E.: Era: A framework for economic resource allocation for the cloud. In: Proceedings of the 26th WWW Companion. pp. 635–642 (2017). https://doi.org/10.1145/3041021.3054186
• [3] Bagnoli, M., Bergstrom, T.: Log-concave probability and its applications. Economic theory 26(2), 445–469 (2005)
• [4]

Blumrosen, L., Holenstein, T.: Posted prices vs. negotiations: an asymptotic analysis. In: Proceedings of the 9th EC. pp. 49–49. ACM (2008)

• [5] den Boer, A.V.: Dynamic pricing and learning: historical origins, current research, and new directions. Surveys in O.R. and management science 20(1), 1–18 (2015)
• [6] Carroll, T.E., Grosu, D.: An incentive-compatible mechanism for scheduling non-malleable parallel jobs with individual deadlines. In: Proceedings of the 2008 ICPP. pp. 107–114 (2008). https://doi.org/10.1109/ICPP.2008.27
• [7]

Chawla, S., Devanur, N.R., Holroyd, A.E., Karlin, A.R., Martin, J.B., Sivan, B.: Stability of service under time-of-use pricing. In: Procs. of the 49th Annual ACM SIGACT Symp. on Theory of Computing. pp. 184–197. ACM (2017)

• [8] Chawla, S., Hartline, J.D., Malec, D.L., Sivan, B.: Multi-parameter mechanism design and sequential posted pricing. In: Proc’s of the 42nd STOC. ACM (2010)
• [9] Cole, R., Roughgarden, T.: The sample complexity of revenue maximization. In: Proceedings of the 46th STOC. pp. 243–252. ACM (2014)
• [10] Dütting, P., Kleinberg, R.: Polymatroid prophet inequalities. In: Proceedings of the 23rd Annual European Symposium on Algorithms (ESA). pp. 437–449 (2015)
• [11] Feldman, M., Gravin, N., Lucier, B.: Combinatorial auctions via posted prices. In: Proceedings of the 26th SODA. pp. 123–135. ACM-SIAM (2015)
• [12] Jain, N., Menache, I., Naor, J., Yaniv, J.: A truthful mechanism for value-based scheduling in cloud computing. Theory of Computing Systems 54, 388–406 (2013)
• [13] Kilcioglu, C., Rao, J.M.: Competition on price and quality in cloud computing. In: WWW (2016)
• [14] Kleinberg, R., Weinberg, S.M.: Matroid prophet inequalities. In: Proceedings of the 44th STOC. pp. 123–136. ACM (2012)
• [15] Morgenstern, J., Roughgarden, T.: The pseudo-dimension of near-optimal auctions. In: Proceedings of the 28th NeurIPS. MIT Press, Cambridge, MA, USA (2015)
• [16]

Morgenstern, J., Roughgarden, T.: Learning simple auctions. In: 29th COLT. Proceedings of Machine Learning Research, vol. 49, pp. 1298–1318 (2016)

• [17] Myerson, R.B.: Optimal auction design. Math’s. of O.R. 6(1), 58–73 (1981)
• [18] Nisan, N., Ronen, A.: Algorithmic mechanism design (extended abstract). In: Proceedings of the 31st STOC. pp. 129–140. ACM (1999)
• [19] Porter, R.: Mechanism design for online real-time scheduling. In: Proc’s. of the 5th EC. pp. 61–70. EC ’04, ACM (2004). https://doi.org/10.1145/988772.988783
• [20] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley Series in Probability and Statistics). Wiley-Interscience (2005)
• [21] Sandholm, T., Gilpin, A.: Sequences of Take-It-or-Leave-It Offers: Near-Optimal Auctions Without Full Valuation Revelation, pp. 73–91. Springer (2004)
• [22] Ströhle, P., Gerding, E.H., de Weerdt, M., Stein, S., Robu, V.: Online mechanism design for scheduling non-preemptive jobs under uncertain supply and demand. In: Int’l. conference on AAMAS 2014. pp. 437–444 (2014)
• [23] Tang, X., Li, X., Fu, Z.: Budget-constraint stochastic task sched. on heterogeneous cloud systems. Concurrency and Comp.: Practice and Experience 29(19) (2017)
• [24] Vickrey, W.: Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance 16(1), 8–37 (1961)

## Appendix 0.A Log-Concave Distributions

In Section 3.3

, we sought to show that if the value of a random job has a log-concave distribution, then the optimal policy will be monotone. We present here a discussion of log-concavity, both for continuous and discrete random variables, and give the proof of the monotonicity of the prices.

Formally, a function is log-concave if for any and , and for any , . Equivalently, . For a discretely supported , we replace this condition with , emulating the continuous definition with . We further require that the support of be “connected”.

###### Definition 1

A continuous random variable with density function is said to be log-concave if is log-concave. A discrete random variable with probability mass function is said to be log-concave if is discretely log-concave.

A well-known fact is that log-concave random variables also have log-concave cumulative density/mass functions. We present here a quick proof of this fact, for completeness.

###### Claim

If is a log-concave continuous r.v., then , and are log-concave functions of . If is a log-concave discrete r.v. supported on , then and are discretely log-concave functions of .

###### Proof

The continuous case is well-documented in the literature. See for example [3]. For the discrete case, observe first that since a mass function is non-negative, and we have assumed contiguous support, the function must be single-peaked, i.e. quasi-concave, as any local minimum would contradict the definition. Furthermore, the definition of log-concavity is equivalent to . Repeatedly applying this, and rearranging, we get

 pypy+k≥py−1py+k+1∀y,k∈Z,k≥0 .

It remains to show that is log-concave. We have

 P(y)P(y) =P(y−1)P(y)+y∑−∞pkpy ≥P(y−1)P(y)+y∑−∞pk−1py+1=P(y−1)P(y+1)

as desired. The same technique applies for the upper-sum. ∎

This will allow us to then conclude:

(Lemma 2, p.2) Let, denote the marginal r.v. conditioned on and . Let be a continuously-supported random variable, and . If is distributed like , , , or , then assumption 1 is satisfied if is log-concave, or if the ’s are independent of .

###### Proof

First, observe that

 Pr[V≥μ,D≥s|L=ℓ]=Pr[V≥μ|D≥s,L=ℓ]⋅Pr[D≥s|L=ℓ] .

and since we are taking ratios for fixed, we can replace the joint cumulatives on and in the assumption, with the marginals on just .

Now, if the ’s are independent of , then the ratio remains unchanged as changes, satisfying assumption 1. Otherwise, we begin by analyzing the distributions given by and . Let , noting that and , for the two cases, respectively. Note that we wish to show is increasing, which is equivalent to increasing.

For , observe that for and , we have

 log¯F(x−γ)−log¯F(x′−γ)≥log¯F(x−γ′)−log¯F(x′−γ′)

since is a non-increasing and concave function, by assumption. Also

 log¯F(x/γ)−log¯F(x′/γ) ≥log¯F(x/γ′)−log¯F(x/γ′+(x′−x)/γ) ≥log¯F(x/γ′)−log¯F(x′/γ′)

where the first inequality is the same as the previous equation, as the second is by monotonicity. Thus we have done the continuous case.

For , we note that if . So the probability is . Similarly, for , is . Thus, if we assume that and are integers, the calculations above go through, as desired. ∎

We present a final fact that justifies the use of -type random variables:

###### Lemma 6

If is a discrete log-concave random variable, then there exists a continuous log-concave such that .

###### Proof

Let be the right-hand cumulative mass function for . Then, it suffices to have for all integers . Let be the piecewise-linear function such that , , and for all . Since is a discretely concave and non-increasing function, must be concave and non-increasing. We can then set to be the random variable whose density is given by . ∎

## Appendix 0.B Detailed Proofs

We present in this section the detailed proofs of the lemmas and theorems from the text. 0.B.1 gives the pseudocode for the dynamic programs that comptue the optimal pricing policies, outlined in Section 3, 0.B.2 gives the proofs for the monotonicity of the pricing policies, along with the discussion on log-concave random variables from Appendix 0.A, and 0.B.3 gives the computations of the error bounds from Section 4.