Revenue maximization in online advertising is an important development direction of leading Internet companies (like real-time ad exchanges , search engines , and social networks), in which a large part of ad inventory is sold via widely applicable second price auctions [19, 32], including the generalizations GSP  and VCG . The optimization of revenue in these auctions is mostly controlled by means of reserve prices, whose proper setting is studied both by game-theoretical methods [35, 28]
and by machine learning approaches[36, 8, 41, 32, 31, 15]. A large number of online auctions in, for example, ad exchanges involve only a single buyer [2, 33, 3, 15], and, in this case, a second-price auction with reserve reduces to a posted-price auction  where the seller sets a reserve price for a good (e.g., an advertisement space) and the buyer decides whether to accept or reject it (i.e., to bid above or below the price).
In our study, we focus on a scenario in which the seller repeatedly interacts through a posted-price mechanism with the same strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative surplus. At each round of this game, the seller is able to chose the price based on previous decisions of the buyer: he applies a deterministic online learning algorithm announced to the buyer in advance . While previous studies on this scenario [2, 33, 15] provide the seller with pricing algorithms that guarantee lower bounds on his cumulative revenue for any buyer valuation (via worst-case strategic regret minimization), we search for pricing algorithms that exactly maximize the expectation of the seller’s cumulative revenue over a given distribution of buyer valuations. The cumulative utilities (surplus for the buyer and revenue for the seller) are considered as discounted sums of corresponding instant utilities gained at each round, what allows us to cover a wide range of games (including the ones with infinite number of rounds and finite games without discounting).
We start our study from addressing the case when both the seller and the buyer have the same discount. We show that the constant pricing algorithm with the Myerson price where and is the valuation distribution, maximizes our optimization objective (see Theorem 1). This result tells us that any dynamic learning of prices based on previous decisions of the buyer can not increase the expected cumulative revenue of the seller with respect to a much simpler approach that offers the optimal constant price over all rounds. Further we also show that the above mentioned optimal pricing is not unique. Namely, there exists an optimal pricing algorithm (referred to as “big deal") that proposes the following choice to the buyer: pay a large price at the first round and get all goods in the subsequent rounds for free, otherwise get nothing (see Prop. 1). The same discount for both participants of the game assumes that we do not give any advantage to each of them over the other one. However, in many real applications, there exists an imbalance between the sides in the patience to wait for utility. This asymmetry is often modeled by different discounts for them [2, 3, 33]. In our work, we address both the case of less patient seller and the case of less patient buyer.
First, in the case when the buyer’s discount rate is larger than the seller’s one, we find that the algorithm “big deal" with a specific price at the first round can still be effectively applied by the seller (i.e., with optimal outcome). Namely, it allows the seller to “accumulate" all his revenue at the first round and, in this way, to avoid the uncomfortable discounting in the future rounds; this discount makes the constant algorithm with Myerson’s price suboptimal (see Sec. 5). Second, in the inverse case, when the buyer’s discount rate is lower than the seller’s one, the optimization problem becomes surprisingly more complicated. In this case, we reduce it to the optimization of a bilinear form in and (see Theorem 2). This functional constitutes a multivariate analogue of the one-dimensional function widely used in static auctions to find the optimal pricing. Our reduction does not admit a closed form solution in general, but allows to find the optimal algorithm by means of state-of-the-art numerical optimization techniques (e.g., gradient ones). In contrast to the previous cases, the optimal algorithm in this case of less patient buyer is non-trivial and its prices depend on both the valuation distribution and the discounts. Finally, we numerically solve the above mentioned reduced problem for a series of representative discounts and analyze properties of the obtained optimal algorithms (see Sec. 6). In this way, we show, in particular, that an optimal algorithm may be non-consistent111A consistent algorithm never sets prices lower (higher) than earlier accepted (rejected, resp.) ones. and provides revenue larger than the constant algorithm with Myerson’s price.
The most important conclusion consists in the following. Only in the case of equal discounts, the seller cannot advantageously use the ability to change prices in dynamic fashion (i.e., to learn them) w.r.t. the static approach. But, both in the case when the seller is far more ready to wait for revenue than the buyer, and, more surprisingly, in the inverse case, the seller can boost his revenue w.r.t. the one obtained by the optimal constant algorithm. Overall, the above described thorough study of optimal pricing algorithms for repeated auctions with different discounts constitutes the main contribution of our work. The ideas behind our techniques of theoretical analysis are simple and, to the best of our knowledge, novel; they might thus be used for future foundations of repeated auctions, e.g., the ones with multiple buyers.
2 Preliminaries, problem statement and related work
2.1 Setup of repeated posted-price auctions
We consider the following standard mechanism of repeated posted-price auctions [2, 33, 10, 15, 16]. The seller repeatedly proposes goods (e.g., advertisement spaces) to a single buyer over a sequence of rounds (one good per round). The buyer holds a fixed private valuation for a good, i.e., the valuation is unknown to the seller and is equal for goods offered in all rounds. At each round , the seller offers a price for a good, and the buyer makes his allocation decision : to buy the currently offered good (), or not (). In our setting, the seller’s price depends on the previous answers of the buyer (a.k.a. the history up to the round ), i.e., the seller uses a pricing algorithm to set prices in the deterministic online learning manner [2, 33, 15]. The sequence of the buyer’s answers is denoted by and is referred to as a buyer strategy.
Hence, given an algorithm and a strategy , the price sequence is uniquely determined. The instant surplus and the instant revenue are thus gained by the buyer and the seller, respectively, at each round . An instant surplus (or revenue) obtained in different rounds may contribute differently to the total (cumulative) profit of the buyer (or the seller, respectively). We model this by discount factors and at each round and get the total discounted surplus and the total discounted revenue of the following form:
We assume that the discount sequences and are non-negative, , , and the series converges, . We also assume that there are no zeros between positive numbers in the sequences and . Note that discounts allow us to consider a general setting, which covers a wide range of cases including finite games without discounting (i.e., 222 denotes the indicator of the condition , i.e., , when holds, and , otherwise. for some horizon ) and infinite games with discount rates that decrease geometrically (i.e., for some ) .
Both the seller and the buyer may have the same discount (), which is a reasonable assumption since it does not give any privilege to each party over the other one. For instance, money inflation, a common interpretation of the discount factor, affects the preferences of both participants for current gains versus future ones equally. The case when the discounts are different () is important for real applications as well . The discounting can also be considered as a model for uncertainty of the participants about the total number of rounds of their interaction (i.e., the factor
is a priori probability that repeated auctions will last exactlyrounds).
Following a standard assumption in mechanism design, which matches the practice in ad exchanges , the pricing algorithm , used by the seller, is announced to the buyer in advance [2, 15]. In this case, the buyer is able to act strategically against this algorithm, i.e., to chose the optimal strategy in the set of all possible strategies , i.e., 333We show existence of the maximum in Appendix A.1. If there is a tie, i.e., more than one optimal strategy, the buyer selects one of them arbitrary (as in [35, 28])., This leads us to the definition of the strategic revenue of the pricing algorithm , which faces the strategic buyer with a valuation :
2.2 Notation and auxiliary definitions
Following [26, 33, 15], we associate a deterministic pricing algorithm with a complete infinite binary tree in which each vertex is labeled with a price. The algorithm offers the price from a current node (starting from the root) and moves to the left (right) child of the node if the buyer answers (, respectively). Clearly, buyer decisions at rounds encode bijectively paths from the root to tree nodes and, thus, nodes as well. Hence, we apply short notations for the nodes by means of the dictionary of finite strings : the root is the empty string , its left child is , the right one is , the right child of is , etc. (e.g., denotes the string of zeros). Similarly, we denote buyer strategies by infinite strings from the alphabet 444We purposely use different outline of the numbers zero and one to distinguish their use in numerical expressions (as , ) and their use in strings that encode nodes or strategies (as elements of the alphabet ). to save space (e.g., the buyer that follows accepts the price at the first round, , and rejects all remaining ones, ). Overall, the set of pricing algorithms is equivalent to the set of mappings from the nodes to , and we use thus them interchangeably: . The price of an algorithm offered at a node is denoted by .
2.3 Problem statement
Let possible buyer valuations be distributed on according to some distribution , i.e., the buyer valuation
(fixed over all rounds) is a realization of a random variable. Following a standard assumption in classical auction theory [36, 28], the valuation distribution is known by the seller. We also assume that the distribution has finite expectation, i.e., , and is continuous; these assumptions are standard in auction theory as well [35, 28]. So, we consider the problem of finding a pricing algorithm that maximizes the expected strategic revenue555Note that, in repeated auctions, revenue is usually compared to the one that would have been earned by offering the buyer’s valuation if it was known in advance to the seller, resulting in the notion of the strategic regret . Regret is a powerful instrument to obtain lower bounds on revenue [26, 2, 15], but, in our setup, minimization of the expected strategic regret is equivalent to our problem.: .
From a game-theoretic view, we consider a two-player non-zero sum repeated game with incomplete information and unlimited supply in which the seller commits to the pricing (since he announces the algorithm before the auctions take place). An attentive reader may also note that, due to the commitment and the presence of only one buyer, our setting can be formalized as a two stage game. The common knowledge here are the discounts , , and the prior distribution of the private valuation , while the realization of is known only by the buyer. At the first stage, the seller picks a pricing algorithm , his choice is announced to the buyer; at the second stage, the buyer picks a buyer strategy . The buyer’s utility is the surplus and the seller’s one is the expected revenue (see Eq. (1)). Thus, if some pricing is a solution to our problem, then the pair will be an equilibrium of above described game.
Note that both an optimal buyer strategy and an optimal algorithm will remain optimal, if the discount or is multiplied by any positive constant. Hence, from here on in our paper we assume w.l.o.g. that and .
2.4 Related work
Optimization of seller revenue in auctions was generally reduced to a selection of proper reserve prices for buyers666Of course, there are other options to optimize revenue like quality scores for advertisements in ad auctions , but they are significantly less popular. And, surely, revenue optimization was also considered in other contexts such as trade-offs between auction stakeholders  or between auction properties (e.g., simplicity, expressivity , and revenue monotonicity ). (e.g., in VCG , GSP , and other auctions ). In such setups, these prices usually depend on distributions of buyer bids or valuations 
and was in turn estimated by machine learning techniques[19, 41, 37], while alternative approaches learned reserve prices directly [32, 31]. In contrast to these works, we consider an online deterministic learning framework for repeated auctions.
Revenue optimization for repeated auctions was mainly concentrated on algorithmic reserve prices, that are updated in online fashion over time, and was also known as dynamic pricing, see the extensive survey  on this field. Oh the one hand, dynamic pricing was studied under game-theoretic view in context of different aspects such as budget constraints [6, 5], mean field equilibria [23, 6], strategic buyer behavior [11, 29], multi-period contracts , etc. A series of studies [40, 14, 22] close to ours considered repeated sales where the seller does not commit for its pricing policy (in contrast to our setting), what required thus special approaches (such as the concept of perfect Bayesian equilibrium) to address the revenue optimization problem. That studies showed that the seller earns less in settings without commitment than with it. Another line of works like [38, 24] studied auction environment settings of a general form and was aimed to find revenue optimal mechanisms that are incentive compatible (truthful). In contrast to these studies, we consider a specific mechanism of repeated posted-price auctions and do not require its truthfulness (e.g., the algorithms in Sec. 6.2 and 6.3). Finally, our work can be considered as further development of classical auction theory [36, 28]: in particular, in the case of a more patient seller, to address the optimal pricing problem we derive a multidimensional optimization functional, defined in Eq. (12), which is a multivariate analogue of the classical one, , used to determine the optimal reserve price in static auctions. Overall, the optimal pricing in our scenario of repeated posted-price auctions with different discounts for the seller and the buyer, to the best of our knowledge, was never considered in existing studies, and we believe that the key ideas behind our analysis may be used for future foundation on repeated auctions.
Oh the other hand, revenue optimization in dynamic pricing was considered from algorithmic and learning approaches: as bandit problems [1, 43, 30] (e.g., UCB-like pricing , bandit feedback models ); from the buyer side (valuation learning [23, 42], competition between buyers and optimal bidding [21, 42], interaction with several sellers , etc.); from the seller side against several buyers [8, 25, 39, 17]; and a single buyer with stochastic valuation (myopic [26, 9] and strategic buyers [2, 3, 33, 10], feature-based pricing [3, 12], limited supply ).
The most relevant studies from these works on online learning are [2, 33, 15, 16], where our scenario of the strategic buyer with a fixed private valuation is considered. Amin et al.  proposed to seek for algorithms that have the lowest possible upper bound on the strategic regret for the worst case buyer valuation, i.e., , where is the finite game horizon. This problem was recently solved in , where the algorithm PRRFES with a tight regret bound in was proposed. Some extensions of this algorithm were proposed in . In contrast to these studies, first, we search for a pricing algorithm that maximizes the strategic revenue expected over buyer valuations, i.e., , (equivalently, s.t. ), which matches the practice of ad exchanges and optimization goals in classical auction theory . Second, our revenue optimization problem is solved exactly (not approximately and not via optimization of lower/upper bounds). Third, our study considers a more general setup in which not only the buyer’s surplus is discounted over rounds, but also the seller’s revenue does.
3 Constant pricing algorithms
We start investigation of the problem from study of constant algorithms, i.e., such algorithms that propose only one price over all rounds independently of the buyer’s decisions.
A pricing algorithm is said to be constant, if there exists a price s.t., at each node , the algorithm’s price equals . This price is referred to as the algorithm price and is denoted by . The set of all constant algorithms is denoted by .
Note that since a constant algorithm offers a price that is independent of buyer decisions, the buyer has no incentive to lie and behaves thus truthfully. Hence, the buyer either rejects the price all the rounds, or accepts it (in our notations, applies the strategy or , resp.) depending on whether his valuation is lower than or not. Since and , the expectation of the strategic revenue of the constant algorithm is
It is easy to see that a constant algorithm is optimal if its price is the global maximum point of the function , which is well known in the theory of non-repeated auctions [35, 36, 28]. The existence of a global maximum point of for our distribution is shown in Appendix A.2, and we refer to the leftmost one of them as the Myerson price . Note that this price can be find via the first-order necessary condition , when the distribution has continuous probability density (
is its cumulative distribution function).
The constant algorithm with the price equal to the Myerson price of the distribution is called the optimal constant algorithm and is denoted by .
4 Equal discounts of the seller and the buyer
In this section, we study the case when the seller and the buyer discount their utilities equally, i.e., , and we use the following notation for the strategic revenue: . First of all, we summarize some useful properties of surplus and revenue as functions of the valuation .
Let a pricing algorithm and the discount sequence be given. For simplicity, we will use the following short notations of surpluses as mappings from the valuation domain: and , for which the following hold:
for each strategy , the surplus w.r.t. this strategy is a linear function of of the form , where is the discounted quantity of purchased goods and is the discounted revenue of the seller (i.e., );
the strategic (optimal) surplus is convex as a function of , because it is the maximum of a set of linear functions: (by definition);
the strategic surplus is non-negative for any since, for the strategy , we have , which implies in turn that ;
the derivative exists for almost all (i.e., it does not exist on a set of Lebesgue measure zero), because is convex and is thus absolutely continuous.
For any pricing algorithm , the strategic revenue is increasing on the valuation domain , it starts from zero (i.e., ), and the random variable has thus finite non-negative expectation (i.e., ).
We prove only the first claim since the utilized technique will be useful further. The other claims are quite simple and are deferred to Appendix A.3 due to space constraints. For any two valuations and s.t. , and two corresponding optimal strategies and , i.e., such that , (using the notations from Remark 2), we have
Therefore, since are linear, they either coincide (then ), or have an intersection point in . In the latter case, one gets , which implies when . Hence, we obtain for any . ∎
Similarly to the optimal surplus function and the strategic revenue one , we introduce the strategic purchased quantity as a map from the valuation domain, i.e., , where . Note that , for each .
Assume that, for a given , the derivative exists. Then, is uniquely defined and equals to for any optimal strategy of the buyer that holds the valuation .
For almost all , the strategic revenue is uniquely defined for any optimal strategy of the buyer that holds the valuation 777Remind that the strategic revenue may not be uniquely defined (see Footnote 3 near the definition of the strategic revenue)..
Function is defined almost everywhere and non-decreasing on its domain, since , which also defined almost everywhere and not less than 0, since is convex on its domain888Note that this fact can be proved directly like in Lemma 1.. Also by the definition and, thus, is finite.
4.1 Optimality of the constant algorithm with the Myerson price
We use notations for the distribution functions: and .
For the mappings , , and the following identity holds:
The proof is rather technical, relies on the properties of , , and established in the above statements, and is thus deferred to Appendix A.5.
Assume the valuation and the discount sequence satisfy the aforementioned conditions (see Sec. 2). Then the expected strategic revenue of an arbitrary pricing algorithm is not greater than the one of the optimal constant algorithm :
Proof of Theorem 1.
Consider an arbitrary algorithm and use the notations , , and introduced above. From Lemma 3, we have
where the latter identity of Eq. (4) holds due to the facts that is absolutely continuous on its domain (see Remark 2), thus, , and that almost everywhere (see Lemma 2). By definition, we have , and, hence, Eq. (4) implies that can be upper bound by the expression
where is bounded by its maximum , the first identity is due to the fact that is non-decreasing on , and non-negative is bounded by for all (see Remark 3). Finally, remind that the expected strategic revenue of the optimal constant algorithm equals to the right hand side of Eq. (5) (see Sec. 3). ∎
Th. 1 states that the optimal constant algorithm is, in fact, optimal among all pricings .
4.2 Non-uniqueness of the optimal algorithm: “big deal" pricing
It appears that the optimal constant algorithm is not the unique optimal one. We provide an example of applying a general technique for building optimal algorithms of certain form.
Let the game have at least 2 rounds (i.e., ). If an algorithm sets the first price equal to and sets all further prices either , if the buyer accepts the first offer, , or , otherwise; then the algorithm is optimal.
First, note that the buyer has no incentive to lie after the first round since the algorithm prices do not depend on his decisions . Hence, possible candidates for optimal strategies are , , , and . It easy to see that the optimal buyer strategy in response to is for the case and for . Indeed, if the buyer accepts , further offers are for free goods that will be accepted. If the buyer rejects , then, for any strategy s.t. , we have
Thus, if , then is optimal strategy, and, if , then Eq. (6) implies optimality of . Finally, note that that implies . Hence, the expected strategic revenue of is
The key idea behind the algorithm is quite simple. Roughly speaking, the seller “accumulates" all his revenue at the first round by proposing the buyer a “big deal": to pay a large price at the first round and get all goods in the subsequent rounds for free, or, otherwise, get nothing999A similar pricing was proposed by  for a class of mechanism environments with multiplicative separability and zero production cost. Their mechanism charges an up-front payment (before rounds starts) and posts zero price each round obtaining thus truthfulness. In contrast to that study, the “big deal" pricing posts a large price at the first round (our setup does not allow an up-front payment) and is not truthful (since the price is accepted by the strategic buyer whose valuation , not ).. Note that this optimal pricing algorithm depends both on the discounting and the valuation distribution : the price is calculated based on the knowledge of the total discounted revenue that is earned by from selling all goods. An attentive reader may note that the idea of the aforementioned technique allows, in fact, to build more variants of optimal algorithms by “spreading" the revenue in a certain way along the rightmost path of the tree . In Sections 5 and 6, we show that may remain optimal in the cases when the constant algorithm is no longer optimal.
5 Less patient seller
Now we are ready to study the cases when the seller and the buyer discounts are different. Further, we argue that the constant algorithm is no longer optimal among all algorithms in these cases.
We start our investigation from a seller which is less patient than the buyer in willingness to wait for the revenue. We consider the case when (i.e., ); e.g., when the discounts decrease geometrically: and , where .
Let , then the following upper bound for its expected strategic revenue holds:
Let , then, using the independence of on the seller’s discount, we get Finally, where Theorem 1 is applied with to infer the latter inequality. ∎
Let and be the seller and the buyer discounts, respectively, s.t. . Then the algorithm from Proposition 1 with set to (i.e., with ) is optimal in .
Since the optimal strategy is independent of the seller’s discount, the beginning of the proof is similar to the one of Prop. 1 up to Eq. (7), where the seller’s discount is used for the first time. In our case of different discounts, the identity Eq. (7) on the expected strategic revenue will have the form where we used (see Remark 1). We see that achieves the upper bound of Lemma 4 and is thus optimal. ∎
The relative expected revenue of the optimal algorithm w.r.t. the optimal constant one is which is , when ; i.e., the optimal revenue is larger than the one obtained by offering the Myerson price constantly (in contrast to the equal discount case). For instance, for geometric discounts and , this revenue improvement ratio is equal to and goes to as for a fixed . Moreover, the algorithm provides exactly the same expected revenue as if the seller played in the game with the same discount as the buyer one . This result is quite surprising, because the dominance of the buyer’s discount over the seller’s one suggests a hypothesis that the seller should earn lower than with (e.g., see the revenue of ). But the ability of the seller to apply the trick of “accumulation" of all his revenue at the first round (see Sec. 4.2) allows him to get the payments for all goods discounted by the buyer’s at the first round and to boost thus his revenue over the constant pricing.
6 Less patient buyer
In contrast to the previous cases, finding an optimal pricing here is much more difficult problem since the technique used in Sec. 4 and 5 to upper bound the expected strategic revenue is no longer applicable (because it relies on the condition ). As we will see further, in the studied case, the obtained optimal algorithms are not trivial and require derivation of a multivariate analogue of the functional to be found in a multidimensional space. We obtain this functional in Sec. 6.1 and use it to provide extensive analysis of optimal algorithms in Sec. 6.2 and 6.3.
For a discount sequence , we define the discount rate sequence as the sequence of the ratios of consecutive components of : when , and when 101010Recall that if then for any , i.e., has no zeros between positive components (see Sec. 2.1). Hence, the discount rate sequence has no zeros between positive components as well..
Let and be some discounts sequences. Then, the condition is equivalent to the one that the sequence is non-decreasing (formally, treating as ). The proof of this statement straightforwardly follows from Definition 3.
From here on in this section we consider the discounts and such that . This condition means that the seller is more patient than the buyer locally at each round (see Remark 4). In particularly, implies that , i.e., the seller is globally more patient than the buyer as well, but the inverse implication is not true111111We believe that the studied case of covers a large variety of discount sequences (e.g., the geometric ones) that describe a more patient seller. Nonetheless, the study of the case when and is interesting and is left for future work. A possible direction to study this case consists in our following insight: if the buyer is locally more patient than the seller at some round (i.e., ), then the trick similar to the one used in the “big deal" algorithm can be applied at this round to get an optimal algorithm.. A typical example of the studied case is a pair of geometric discounts: and , where .
Let be a discount sequence, then an algorithm is said to be completely active for , if for any strategy there exists a valuation such that , where and are defined in Remark 2, i.e., the surplus function is tangent to the optimal surplus function . We denote the set of all completely active algorithms for by .
In the next subsection, we will obtain the central results of our study. We do it for the case of a finite number of rounds, but, in Sec. 6.3, we show how to use these results to obtain approximately optimal algorithms for the case of the infinite number of rounds.
6.1 Finite games: multivariate optimization functional
In this section, we consider the case of the game with a finite time horizon : in particular, in this case, seller algorithms, buyer strategies, and all discounts (including ) are considered as their -length variants (they can be defined in a natural way similarly to their infinite analogues). For simplicity of presentation, we assume that all discounts are positive (i.e., ) in all rounds.
A discount sequence is said to be regular121212The reasons to introduce this class of discounts are discussed in Remark 6., if for any pair of strategies , i.e., any buyer strategy results in a unique discounted quantity of purchased goods. Here we used the short notation for the scalar product: .
In the following important proposition we show that any algorithm can be transformed to a completely active one for the discount with no loss in the expected strategic revenue.
In a -round game, let be discounts s.t. and is a regular one. Then, for any pricing algorithm , there exists a completely active algorithm s.t.
For a given algorithm and a given discount , we will use the notation for any (similarly to Remark 2, but indicating explicitly the seller’s discount). The main idea of the proof consists in the following technique. We will consider all strategies s.t. (referred to as non-active), and, consequently, for each of them denoted by , we apply the following procedure of modifying the source algorithm : define a transformation that does not change for , moves to the left until it is tangent to in some , decreases , and does not decrease for . That will imply that the expected strategic revenue of the transformed algorithm is no lower than the one of the source algorithm . In this way, we will (one-by-one) make all strategies active.
Let us consider the set of all non-active strategies. If it is empty, then and Eq. (9) holds. Otherwise, note that the “always-reject" strategy is always active, since . Hence, one can order all non-active strategies by “the last index" .
We take a non-active strategy with the smallest , denoting and the node , and construct a new algorithm based on the source one in the following way. Set and transform the prices as follows:
decrease until the function is tangent to the function in some ;
if , increase for in such a way that
Since we chosen with the smallest among non-active strategies the price obtained in the step 1 is non-negative (and, thus, this step is correct). Indeed, substitute the -th component in by and denote the obtained strategy by . Due to selection of , the strategy is active. Therefore, assume is decreased to , then the function becomes equal to by the definition. Since is tangent to , the increase of its slope by will result in intersection with . This means that will be tangent to before reaches .
Now let us prove that the transformation satisfies properties announced at the beginning of the proof. Let . The step 2 implies that the transformation does not change . For a strategy that does not come through the node , the revenue remains the same, since the algorithm prices that contribute to are not altered. For that comes through the node , let us prove that can only increase. Since there is a round where . Let s.t. this is the first round of acceptance after reaching the node , and let us denote the node where this acceptance take place by . Therefore, one can write the following expression for the increment of : where we used Eq. (10) to obtain the first equation and used to obtain the last inequality. So, can only increase for .
Finally, since becomes tangent to , which is convex (see Remark 2), the function either equals to exactly in one point or coincides with for some . The latter case is impossible since a function have different slope for different strategy , because of regularity of . Therefore, the optimal strategy does not change for the buyer with any valuation except the only one s.t. , and the strategic revenue expectation is not affected by the decrease of (due to continuity of the valuation distribution ). Thus, and the number of non-active strategies of is reduced by one w.r.t. . After that, we repeatedly apply the above described transformation to until the resulted algorithm has no non-active strategies. In this way, we get that satisfies Eq. (10). ∎
An attentive reader may note that the the finiteness of the game is crucially used in the assumption that any (non-active) strategy has “the last index" . It is certainly untrue for infinite strategies since there are the ones that accept the offer infinite number of rounds. Therefore, we consider the validity of the Prop. 3’s statement (or its analogue) for the infinite game as an open research question that could be considered as a possible direction for future work.
In a -round game, let be discounts s.t. and is a regular one. If there exists an optimal pricing algorithm , then there exists an optimal completely active algorithm . Thus,