# On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic buyer

We study revenue optimization learning algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative discounted surplus. For this setting, first, we propose a novel algorithm that never decreases offered prices and has a tight strategic regret bound in Θ( T) under some mild assumptions on the buyer surplus discounting. This result closes the open research question on the existence of a no-regret horizon-independent weakly consistent pricing. The proposed algorithm is inspired by our observation that a double decrease of offered prices in a weakly consistent algorithm is enough to cause a linear regret. This motivates us to construct a novel transformation that maps a right-consistent algorithm to a weakly consistent one that never decreases offered prices. Second, we outperform the previously known strategic regret upper bound of the algorithm PRRFES, where the improvement is achieved by means of a finer constant factor C of the principal term C T in this upper bound. Finally, we generalize results on strategic regret previously known for geometric discounting of the buyer's surplus to discounting of other types, namely: the optimality of the pricing PRRFES to the case of geometrically concave decreasing discounting; and linear lower bound on the strategic regret of a wide range of horizon-independent weakly consistent algorithms to the case of arbitrary discounts.

## Authors

• 8 publications
• ### Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders

We study revenue optimization learning algorithms for repeated second-pr...
06/21/2019 ∙ by Alexey Drutsa, et al. ∙ 0

• ### Optimal Pricing in Repeated Posted-Price Auctions

We study revenue optimization pricing algorithms for repeated posted-pri...
05/07/2018 ∙ by Arsenii Vanunts, et al. ∙ 0

• ### Dynamic Incentive-aware Learning: Robust Pricing in Contextual Auctions

Motivated by pricing in ad exchange markets, we consider the problem of ...
02/25/2020 ∙ by Negin Golrezaei, et al. ∙ 0

• ### Incentive-aware Contextual Pricing with Non-parametric Market Noise

We consider a dynamic pricing problem for repeated contextual second-pri...
11/08/2019 ∙ by Negin Golrezaei, et al. ∙ 7

• ### Policy Optimization Using Semiparametric Models for Dynamic Pricing

In this paper, we study the contextual dynamic pricing problem where the...
09/13/2021 ∙ by Jianqing Fan, et al. ∙ 0

• ### An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions

We propose a new efficient online algorithm to learn the parameters gove...
03/06/2018 ∙ by Debjyoti Saharoy, et al. ∙ 0

• ### Fast Rate Learning in Stochastic First Price Bidding

First-price auctions have largely replaced traditional bidding approache...
07/05/2021 ∙ by Juliette Achddou, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Revenue maximization in online advertising represents one of the most important development direction in leading Internet companies (such as real-time ad exchanges [26, 12], search engines [51, 3, 56, 27, 15], social networks [1], etc.), where a large part of advertisement inventory is sold via widely applicable second price auctions [27, 37], including their generalizations as GSP [51, 36, 49, 15] and Vickrey-Clarke-Groves (VCG) [52, 53] auctions. Optimal revenue here is mostly controlled by means of reserve prices, whose proper setting is studied both by game-theoretical methods [42, 33]

and by machine learning approaches

[43, 14, 27, 5, 29, 37, 55, 49, 39, 38, 46, 48, 47, 22]. A large number of online auctions run, for instance, by ad exchanges involve only a single bidder [5, 39, 22], and, in this case, a second-price auction with reserve is equivalent to a posted-price auction [32] where the seller sets a reserve price for a good (e.g., an ad space) and the buyer decides whether to accept or reject this price (i.e., to bid above or below it).

In this work, we focus on a scenario when the seller repeatedly interacts through a posted-price mechanism with the same strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative discounted surplus [5]. At each round of this game, the seller is able to chose the price based on previous decisions of the buyer, i.e., to apply a deterministic online learning (discrete) algorithm. The seller’s goal is to maximize his cumulative revenue over a finite number of rounds  (the time horizon), which is generally reduced to regret minimization111In our study, the regret is the difference between the revenue that would have been earned by offering the buyer’s valuation and the seller’s revenue; it is optimized for the worst-case buyer valuation (see Sec. 2.1 and in [32, 39, 22])., and the seller seeks thus for a no-regret pricing algorithm, i.e., with a sublinear regret on  [5, 39, 6, 40, 17, 22].

For this setting, the algorithm PRRFES with tight strategic regret bound in was recently proposed for the case when the buyer’s cumulative surplus is geometrically discounted [22]. This algorithm is horizon-independent and right-consistent (i.e., it never proposes prices lower than earlier accepted ones). However, its key peculiarity consists in its ability to decrease an offered price after its rejection, but then to revise it and, moreover, to propose higher prices than this one in subsequent rounds (not satisfying thus the left consistency), such a behavior of the algorithm may be confusing to a buyer. Despite the fact that there does not exist a no-regret horizon-independent algorithm with the fully consistent property (both right, and left), the question on the existence of such algorithm with the consistent property in weak sense remains open [22].

The primary research goal of our study is, first, to find a no-regret weakly consistent pricing algorithm and to resolve thus the open research question. Second, we are aimed to improve the currently best known upper bounds on strategic regret222Since these bounds are tight [22], we are aimed to improve the constant factor of its principal term . and to generalize results on them to families of buyer discount sequences that are wider than geometric ones.

We propose a novel algorithm that never decreases offered prices and can be applied against strategic buyers with a tight regret bound in under some mild assumptions on the discounting of the buyer’s surplus (Th. 2). This result constitutes the first contribution of our work and closes the open research question on the existence of a no-regret horizon-independent weakly consistent pricing. The key idea of this algorithm is based on our observation that a double decrease of offered prices by a weakly consistent algorithm is enough to cause a linear regret (Lemma 2). This motivates us to propose a novel transformation that being applied to a right-consistent algorithm results in a weakly consistent one which has no decrease of offered prices (Lemma 3).

The second contribution consists in a novel strategic regret upper bound for the algorithm PRRFES which outperforms the previously known one from [22]. This is achieved through obtaining a finer expression for the constant factor of the principal term of this upper bound that can be optimized by adjusting the algorithm’s parameter (Th. 1). Finally, our work contributes also the generalization of the tight strategic regret bound of the pricing PRRFES to the case of geometrically concave decreasing discounting of the buyer surplus (Th. 1) and the generalization of the previously known linear lower bound on the strategic regret of a wide range of horizon-independent weakly consistent algorithms to the case of arbitrary discounts (Lemma 2 and Cor. 1).

## 2 Preliminaries

### 2.1 Setup of repeated posted-price auctions

We consider the following scenario of repeated posted-price auctions [5, 39, 22]. The seller repeatedly proposes goods (e.g., advertisement spaces) to a single buyer over rounds (the time horizon): one good per round. The buyer holds a fixed private valuation for a good, i.e., the valuation is unknown to the seller and is equal for goods offered in all rounds. At each round , a price is offered by the seller, and an allocation decision is made by the buyer: , when the buyer accepts to buy a currently offered good at that price, , otherwise. Thus, the seller applies a (pricing) algorithm that sets prices in response to buyer decisions referred to as a (buyer) strategy. We consider the deterministic online learning case when the price at a round can depend only on the buyer’s actions during the previous rounds 333We use a notation for a part of a strategy .. Following [22], we are studying algorithms that does not depend on the horizon since it is very natural in practice (e.g., of ad exchanges) that the seller does not know in advance the number of rounds  that the buyer wants to interact with him. Let be the set of such algorithms.

Hence, given an algorithm , a strategy uniquely defines the corresponding price sequence . Hence, given a pricing algorithm , a buyer strategy uniquely defines the corresponding price sequence , which, in turn, determines the seller’s total revenue . This revenue is usually compared to the revenue that would have been earned by offering the buyer’s valuation if it was known in advance to the seller [32, 5, 39, 22]. This leads to the definition of the regret of the algorithm that faced a buyer with the valuation following the (buyer) strategy over rounds as

Following a standard assumption in mechanism design that matches the practice in ad exchanges [39], the pricing algorithm , used by the seller, is announced to the buyer in advance. In this case, the buyer can act strategically against this algorithm: we assume that the buyer follows the optimal strategy that maximizes the buyer’s -discounted surplus [5]:

 Surγ(T,A,v,a):=T∑t=1γtat(v−pt),

i.e., , where is the discount sequence, which is assumed positive, , with convergent sums, . Thus, we define the strategic regret of the algorithm that faced a strategic buyer with valuation over rounds as

 SReg(T,A,v,γ):=Reg(T,A,v,aOpt(T,A,v,γ)).

Hence, we consider a two-player non-zero sum repeated game with incomplete information and unlimited supply, introduced by Amin et al. [5] and considered in [39, 22]: the buyer seeks to maximize his surplus, while the seller’s objective is to minimize his strategic regret (i.e., maximize his revenue). Note that only the buyer’s objective is discounted over time (not the seller’s one), which is motivated by the observation that sellers are far more willing to wait for revenue than buyers are willing to wait for goods in important real-world markets like online advertising [5, 39].

In our setting, following [32, 5, 6, 39, 40, 22], we are interested in algorithms that attain strategic regret (i.e., the averaged regret goes to zero as ) for the worst-case valuation , i.e., we say that an algorithm is no-regret when . Namely, we seek for algorithms that have the lowest possible strategic regret upper bound of the form and treat their optimality in terms of with the slowest growth as (the averaged regret has thus the best rate of convergence to zero).

### 2.2 Notations and auxiliary definitions

Similarly to [22], a deterministic pricing algorithm can be associated with an infinite complete binary tree  [32, 39] (since we consider horizon-independent algorithms). Each node 444For simplicity, if is a node of a tree , we write . is labeled with the price offered by . The right and left children of are denoted by and respectively. The left (right) subtrees rooted at the node ( resp.) are denoted by ( resp.). The operators and sequentially applied times to a node are denoted by and respectively, . The root node of a tree is denoted by .

So, the algorithm’s work flow is following: it starts at the root of the tree by offering the first price to the buyer; at each step , if a price is accepted, the algorithm moves to the right node and offers the price ; in the case of the rejection, it moves to the left node and offers the price ; this process repeats until reaching the time horizon . The pseudo-code of this process is in Alg. C.1. The round at which the price of a node is offered is denoted by (it is equal to the node’s depth +1). Note that each node uniquely determines the buyer decisions up to the round . Thus, each buyer strategy is bijectively mapped to a -length path in the tree that starts from the root and goes to a -depth node (and the strategy prices are the ones that are in the nodes lying along this path).

We define, for a pricing tree , the set of its prices and denote by all prices that can be offered by an algorithm . We say that two infinite complete trees and are price equivalent (and write ) if the trees have the same node labeling when we naturally match the nodes between the trees (starting from the roots): i.e., following the same strategy in both trees, the buyer receives the same sequence of prices.

### 2.3 Background on pricing algorithms

First of all, we remind several classes (sets) of algorithms that were introduced in [39, 22] and include the definitions of pricing consistency of different type, which are actively used in our work. After that, we briefly overview pricing algorithms from existing studies [32, 5, 39, 22].

Notion of consistency. Since the buyer holds a fixed valuation, we could expect that a smart online pricing algorithm should work as follows: after an acceptance (a rejection), it should set only no lower (no higher, resp.) prices than the offered one. Formally, this leads to the definition:

###### Definition 1.

An algorithm is said to be consistent  [39] ( in the class ) if, for any node ,

The key idea behind a consistent algorithm is clear [22]: it explores the valuation domain by means of a feasible search interval (initialized by ) targeted to locate the valuation . At each round , offers a price and, depending on the buyer’s decision, reduces the interval to the right subinterval (by ) or the left one (by

); at any moment,

is thus always the last accepted price or , while is the last rejected price or . The most famous example of a consistent algorithm is the binary search.

###### Definition 2.

An algorithm is said to be weakly consistent [22] ( in the class ) if, for any node , (a) when and, (b) when

Weakly consistent algorithms are similar to consistent ones, but they are additionally able to offer the same price several times before making a final decision on which of the subintervals or continue. The subclass of WC algorithms that can also wait with the subinterval decision, but the pricing will be the same no matter when a decision is made, is the following.

###### Definition 3.

A weakly consistent algorithm is said to be regular [22] ( in the class ) if, for any node :

• when ,

• when ,

• when ,

###### Definition 4.

An algorithm is said to be right-consistent [22] ( in the class ) if, for any , .

Right-consistent algorithms never offer a price lower than the last accepted one, but may offer a price higher than a rejected one (in contrast to consistent algorithms). These classes are related to each other in the following way: and .

We will use the following definitions [22] as well. A buyer strategy is said to be locally non-losing w.r.t.  and if prices higher than are never accepted555Note that the optimal strategy of a strategic buyer may not satisfy this property: it is easy to imagine an algorithm that offers the price at the first round and, if it is accepted, offers the price all remaining rounds. (i.e., implies ). An algorithm is said to be dense if the set of its prices is dense in (i.e., ).

Background. The consistency represents a quite reasonable property, when the buyer is myopic (truthful, i.e., ), because a reported buyer decision correctly locates in . Kleinberg et al. [32] showed that the regret of any pricing algorithm against a myopic buyer is lower bounded by and proposed a horizon-dependent consistent algorithm, known as Fast Search (FS), that has tight regret bound in against such buyers.

A strategic buyer, incited by surplus maximization, may mislead the seller’s consistent algorithm [6, 39]. To overcome this, Mohri et al. [39] proposed to inject so-called penalization rounds (see Def. 5) after each rejection into the algorithm FS and got, in this way, the algorithm PFS with strategic regret bound in that outperforms the algorithm “Monotone” [5] with strategic regret bound in . Both algorithms are horizon-dependent and are not optimal.

###### Definition 5.

Nodes are said to be a (-length) penalization sequence[39, 22] if

 ni+1=l(ni),pni+1=pni,andR(ni+1)≅R(ni),i=1,…,r−1.

It is easy to see that a strategic buyer either accepts the price at the first node or rejects this price in all of them, when the discount sequence is decreasing.

An optimal pricing was found in [22], where horizon-independent algorithms were studied and the causes of a linear regret in different classes of consistent algorithms were analyzed step-by-step. First, the algorithm FES [22] was proposed as a modification of the FS by injecting exploitation rounds after each rejection to obtain a consistent horizon-independent algorithm against truthful buyer with tight regret bound in . Second, this pricing was upgraded to the algorithm PRRFES [22] to act against strategic buyers. Namely, it was shown that there is no no-regret pricing in the class , which comprises, in particular, all consistent horizon-independent algorithms even being modified by penalization rounds. This led to a guess that possibly the left consistency requirement should be relaxed. This guess succeeded in building of the optimal right-consistent algorithm PRRFES with tight strategic regret bound in , while the research question on the existence of a no-regret horizon-independent algorithm in the class remained open.

As stated at the beginning of this paper, our research goals comprise (a) closing of that open research question; (b) improvement of the best known upper bounds on strategic regret by finding a finer constant factor of their principal term ; and (c) generalization of above mentioned results to the cases of strategic buyers whose discounting is not only a geometric progression.

### 2.4 Related work

Most of studies on online advertising auctions lies in the field of game theory

[33, 43]: a large part of them focused on characterizing different aspects of equilibria, and recent ones was devoted (but not limited) to: position auctions [51, 52, 53, 15], different generalizations of second-price auctions [3, 13], efficiency [2], mechanism expressiveness [23], competition across auction platforms [8], buyer budget [1], experimental analysis [45, 50, 44], etc.

Studies on revenue maximization were devoted to both the seller revenue solely [56, 27] and different sort of trade-offs either between several auction stakeholders [26, 25, 10] or between auction characteristics (like revenue monotonicity [25], expressivity, and simplicity [41]). The optimization problem was generally reduced to a selection of proper quality scores for advertisements (for auctions with several advertisers [56, 27]) or reserve prices for buyers (e.g., in VCG [42], GSP [36], and others [26, 46]

). The reserve prices, in such setups, usually depend on distributions of buyer bids or valuations and was in turn estimated by machine learning techniques

[27, 49, 46], while alternative approaches learned reserve prices directly [37, 38, 48]. In contrast to these works, we use an online deterministic learning approach for repeated auctions.

Revenue optimization for repeated auctions was mainly concentrated on algorithmic reserve prices, that are updated in online fashion over time, and was also known as dynamic pricing. An extensive survey on this field is presented in [21]. Dynamic pricing was studied: under game-theoretic view (MFE [30, 12], budget constraints [12, 11], strategic buyer behavior [18], dynamic mechanisms [34, 7], etc.); as bandit problems [4, 57, 35] (e.g., UCB-like pricing [9], bandit feedback models [54]); from the buyer side (valuation learning [30, 54], competition between buyers and optimal bidding [29, 54], interaction with several sellers [28], etc.); from the seller side against several buyers [14, 55, 31, 47, 24]; and a single buyer with stochastic valuation (myopic/truthful [32, 19, 16] and strategic buyers [5, 6, 40, 40, 17, 22], feature-based pricing [6, 20], limited supply [9], etc.). The most relevant part of these works to ours are [32, 5, 39, 22], where our scenario with a fixed private valuation is considered and whose algorithms are discussed in more details in Sec. 2.3. First, in contrast to [32], we study strategic buyer behavior, whose cumulative surplus may be discounted non-geometrically (unlike in [32, 5, 39, 22]). Second, in contrast to [5, 39], we propose and analyze algorithms that have tight strategic regret bound in , and, unlike in [22], one of these algorithms is weakly consistent and never decreases offered prices. Finally, we reduce the factor of the principal term in the strategic regret upper bound from [22] for the algorithm PRRFES.

## 3 Optimizing right-consistent optimal pricing

In this section, first, we show that the algorithm PRRFES [22] is able to retain its tight strategic regret bound in even against strategic buyers whose surplus is not necessarily discounted geometrically. Second, we provide a finer upper bound for the PRRFES’s strategic regret, that allows to optimize the constant factor of the principal term of this upper bound by adjusting the number of penalization rounds used in the pricing algorithm. This result allows to obtain a more favorable regret upper bound than in [22].

For the convenience of readers, we give a short description of the algorithm PRRFES in Appendix B and its pseudo-code in Alg. C.3. We begin our regret analysis for discount sequences of general form by proving an analogue of [22, Prop.2], which was for a geometric discounting. Let be the left increment [39, 22], then the following proposition holds.

###### Proposition 1.

Let be a decreasing discount sequence (whose sum converges), be a pricing algorithm, be a starting node in a -length penalization sequence (see Def. 5), and s.t. . If the price offered by the algorithm at the node is rejected by the strategic buyer, then the following inequality on his valuation holds:

 v−pn<ζr,γ,tnδln,whereζr,γ,t:=∑∞s=t+rγsγt−∑∞s=t+rγs. (1)

The proof is presented in Appendix A.1.1 and is based on ideas similar to the ones in [22, Prop.2]. Note that [22, Prop.2] is a particular case of Proposition 1, when is a geometric discounting for some (then becomes from [22, Prop.2]). For this case of geometric discounting, the condition on from Prop. 1 becomes . The important property of the latter condition consists in its independence on the time (i.e., round, depth) of the starting penalization node . This independence property, namely, the property s.t. , does not hold for an arbitrary discount sequence . But, in the following lemma, we show that if a discount sequence is geometrically concave, then the above mentioned independence property holds (see the proof in Appendix A.1.2).

###### Lemma 1.

Let a decreasing sequence be geometrically concave, i.e., , then (a) there exists s.t. ; (b) moreover, for any , there exists s.t. .

For geometrically convex discount sequences, the properties in both claims of this lemma may not hold. For instance, consider the telescoping discount (see Appendix A.1.3).

For a right-consistent algorithm (and, thus, for the PRRFES as well [22]), the increment in Prop. 1 is bounded by the difference between the current node’s price and the last accepted price before reaching this node. Hence, the inequality in Eq. (1) provides a guarantee on no-lies at a particular round for certain valuations : the closer an offered price is to the last accepted price the smaller the interval of possible valuations , holding which the strategic buyer may lie on this offer, i.e, the buyer may lie at the -th round only if his valuation is located in . Using this insight, we can obtain the following theorem, whose proof is presented in Appendix A.1.4.

###### Theorem 1.

Let be a decreasing discount sequence (whose sum converges) for which there exist and s.t.  (the definition of is from Eq. (1)). If is the pricing algorithm PRRFES with and the exploitation rate , then, for any valuation and , the strategic regret is upper bounded:

 SReg(T,A,v,γ)≤Cr,ϰ(log2log2T+2),whereCr,ϰ:=rv+(2+ϰ)2−12. (2)

First, combining Theorem 1 and Lemma 1, one concludes that the pricing PRRFES can be effectively applied (with tight regret bound in ), in particular, against strategic buyers with a geometrically concave discount sequence. Second, the result [22, Th.5] represents a corollary of Th. 1, when we consider a geometric discounting with parameters and . But, more importantly, Th. 1 provides a novel regret upper bound (even for a geometric discounting) that can be adjusted, e.g., to reduce the number of penalization rounds or to optimize the constant factor . Let , then one can easily derive the following dependence between and in order to satisfy the conditions of Th. 1: . Hence, Th. 1 allows us to retain the upper bound Eq. (2) valid and reduce the number of penalization rounds up to . Note that this value is in fact the lower bound for a number of penalization rounds that satisfies Prop. 1 and [22, Prop.2]. Depending on , the number of penalization rounds may be reduced by up to per cents; e.g., in integers, we can reduce from to for and from to for .

In order to analyze the capacity of possible improvement in optimal factor , let us bound by and by , then this upper bound on has the following first-order condition w.r.t. : , which has only one solution in . This solution monotonically depends on the discount rate : the closer is to the closer to , and, vice-versa, as . For instance, let us consider the improvement of the bound on the factor calculated for the optimal w.r.t. the one for : the factor is reduced by for with ; by for with ; by for with ; by for with ; and by for with . Thus, we conclude that Theorem 1 outperforms the strategic regret upper bound of [22, Th.5].

## 4 Weakly consistent pricing

In this section, we, first, generalize the result [22, Th.4] on the absence of no-regret algorithm in the class to any discount sequence of the buyer surplus and, moreover, show that any weakly consistent algorithm with double decrease of offered prices has a linear regret. This motivates us to hypothesize that there exists a no-regret pricing in which only increases offered prices. Second, we propose a novel transformation of pricing algorithms and apply it to the algorithm PRRFES obtaining a weakly consistent pricing. Finally, we argue that this algorithm is a no-regret one and, moreover, has tight strategic regret bound in .

### 4.1 Weakly consistent algorithms with linear regret

First of all, we isolate the main cause of a linear regret of a wide range of weakly consistent algorithms and formalize it in the following lemma, whose proof is deferred to Appendix A.2.1.

###### Lemma 2.

Let be a discount sequence and be a horizon-independent weakly consistent pricing algorithm s.t. the first offered price . If there exists a path in the tree with the corresponding price sequence s.t.

 ∃~t0,~t1∈N:~t0≤~t1and~p~t1+1<~p~t0

then there exists a valuation s.t. .

Note that this lemma holds for any discount sequence and has the following corollary, which is the generalization of [22, Th.4] to any discounting and whose proof is presented in Appendix A.2.2.

###### Corollary 1.

For any horizon-independent regular weakly consistent pricing algorithm and any discount sequence , there exists a valuation s.t. .

The key intuition behind Lemma 2 consists in the following: the strategic buyer can lie few times to decrease offered prices and, due to (even weak) consistency, receive prices at least on lower than his valuation all the remaining rounds. Note that the buyer is able to mislead a wide range of weakly consistent algorithms: this set of algorithms (that satisfy conditions of Lemma 2) is significantly larger than the set of regular weakly consistent ones. But, what if the buyer cannot apply this intuition? After all, there are weakly consistent algorithms that never decrease offered prices. We hypothesize thus that there may exist such an algorithm with a sublinear regret. In Sec. 4.3, this hypothesis is confirmed.

### 4.2 Transformation pre

Let us consider a special transformation referred to as and which transforms any pricing algorithm to another one. First, we define this transformation for labeled binary trees.

###### Definition 6.

Given a non-negative real number and a labeled binary tree , the transformation is such that the labels (i.e., prices) of the tree are defined recursively in the following way starting from the root node of the tree :

 (4)

Second, since each pricing algorithm is associated with a complete binary tree , the transformation is thus correctly defined for pricing algorithms: namely, and is the pricing algorithm associated with the tree . In Algorithm C.2, for better understanding, we provide a reader with a pseudo-code that applies the pricing with given and a source pricing 666We put side-by-side Alg. C.2 with Alg. C.1 in order to show the difference between the work flow of the transformed pricing and the one of the source pricing .. Informally speaking, this transformation tracks over the nodes in the source algorithm’s tree , but, being in a current node , it offers the price from one of preceding nodes, where the buyer purchased a good last time (or if never purchased), instead of offering the price from the current node . From the buyer’s point of view, the choice between the pricing of the subtrees and of a node should be made at the round previous to the one where the price will be offered. Overall, these intuitions could be used to obtain the following lemma (see the proof in Appendix A.3.1).

###### Lemma 3.

Let be a right-consistent pricing algorithm and be the infimum of the algorithm prices, then the transformed pricing algorithm is both right-consistent and weakly consistent, i.e., .

Note that the transformed algorithm for is only able to increase prices starting from and it never decreases them regardless of any buyer strategy (see Appendix A.3.1).

### 4.3 Weakly consistent optimal pricing

Let us apply the transformation to the pricing algorithm PRRFES and refer to the transformed one as prePRRFES. Formally, the algorithm prePRRFES works in phases initialized by the phase index , the first offered price at the current phase , and the iteration parameter ; at each phase , it sequentially offers prices (exploration, in contrast to PRRFES, it starts from ), where

 ϵl:=ϵ2l−1=2−2l,Nl:=ϵl−1/ϵl=ϵ−1l−1=22l−1,l∈N; (5)

if a price with is rejected, (1) it offers this price for penalization rounds (if one of them is accepted, prePRRFES continues offering following the Definition 5), (2) it offers the price for exploitation rounds (buyer decisions made at them do not affect further pricing), and (3) prePRRFES goes to the next phase by setting and . The pseudo-code of prePRRFES is presented in Alg. C.4, where the lines that differ from the ones of PRRFES (see Alg. C.3) are highlighted in blue. Since PRRFES is a right-consistent algorithm, Lemma 3 implies that prePRRFES is both right-consistent and weakly consistent one.

In this subsection, we will show that prePRRFES being properly configured is, in fact, a no-regret pricing and, moreover, is optimal with tight strategic regret bound in . To show this, we follow the methodology of establishing the optimality of the algorithm PRRFES; however, this is not straightforward and requires additional statements (see Prop. 3) not needed for PRRFES.

In order to simplify further analysis, we assume that the discounting is geometric from here on in this subsection777An analysis for non-geometric discount sequences could be done in a similar way as for Theorem 1.. First, let us consider an analogue of Proposition 1 that will be useful to upper bound the strategic regret of the algorithm prePRRFES.

###### Proposition 2.

Let be a discount sequence with , be a pricing algorithm, be a starting node in a -length penalization sequence (see Def. 5), all prices after rejections are no lower than (i.e., ), and . If the price offered by the algorithm at the node is rejected by the strategic buyer, then the following inequality on his valuation holds:

 v−pr(n)<ηr,γ(pr(n)−pn),whereηr,γ:=γr+γ−11−γ2−γr. (6)

The proof is given in Appendix A.4.1. Note that, similarly to Eq. (1), the inequality in Eq. (6) bounds the deviation of the buyer’s valuation from the price at some node by some increment . But, in contrast to Eq. (1), this bounding occurs when the buyer rejects the price offered previously to the one which is used as the reference price of the valuation’s deviation.

As we show in the proof of Theorem 2, Prop. 2 allows us to obtain an upper bound for the number of exploring steps at each phase of the algorithm prePRRFES (like Prop. 1 is used for the algorithm PRRFES). However, this is not enough to directly apply the methodology of the proofs of [22, Th.3, Th.5] and Theorem 1 to bound the strategic regret, because, in contrast to the PRRFES, during exploitation rounds, the algorithm prePRRFES offers the price that has not been earlier accepted by the strategic buyer (hence, there is no evidence to guarantee his acceptance during the exploitation). Namely, since the buyer’s decision made at an exploitation round does not affect the algorithm’s pricing in the subsequent rounds , the strategic buyer acts truthfully at this round , i.e., . For the PRRFES, we knew that the price was accepted in a previous round (or ), but, for the prePRRFES, one has to specially guarantee the acceptance of the price at the exploitation round in the following proposition.

###### Proposition 3.

Let be a discount sequence with , be a pricing algorithm, and be a starting node in a -length penalization sequence (see Def. 5), which is followed by exploitation rounds offering the price starting from the node . If , , , and the buyer valuation is higher than and lower than any price in the right subtree of the node , i.e., , then the strategic buyer rejects the price at the round .

The proof is presented in Appendix A.4.2. Additionally to the claim of Proposition 3, note that, from the definitions of penalization and exploitation rounds, it follows that, if the strategic buyer rejects the price at the round , he rejects this price at the rounds as well and accepts it at the rounds . Note that, since (otherwise, there is no node and the right subtree ), the condition makes Prop. 3 meaningful only in the case of . This is consistent with a clear intuition that, having , the discount at a round is no lower than the sum of all discounts in all possible subsequent rounds , and the strategic buyer prefers thus to purchase a good for a price at the -th round, rather than many goods for a no lower price in all subsequent rounds.

In order to use both Prop. 2 and Prop. 3, the number of penalization rounds is required to be in . We restrict by the condition in order to have the length of this interval larger than and guarantee thus existence of a natural number in it (since ). This restriction implies that should be larger than . For such discount rates, the following lemma (with the proof in Appendix A.4.3) provides values for and s.t. Prop. 2 and Prop. 3 hold and (from Eq. (6)) is bounded by some positive number .

###### Lemma 4.

Let the discount rate , a constant , the number of penalization rounds , and the number of exploitation rounds , where

 rγ,ϰ:=logγ((1−γ)(1+ϰ1+ϰγ))andGγ,ϰ:=logγ(1−(1+ϰ1+ϰγ)−1γ−1). (7)

Then the conditions of Prop. 2 and Prop. 3 hold, and .

Now we ready to obtain an upper bound for the prePRRFES by proving the following theorem.

###### Theorem 2.

Let be a discount sequence with , a constant , while the constants and be from Eq. (7). If is the pricing algorithm prePRRFES with and the exploitation rate , then, for any valuation and , the strategic regret is upper bounded:

 SReg(T,A,v,γ)≤(rv+(1+ϰ)2(2+max{2,⌈Gγ,ϰ⌉}+ϰ))(log2log2T+2)+⌈Gγ,ϰ⌉2−1. (8)

The proof of this theorem is presented in Appendix A.4.4 and is based on the methodology of the proof of Th. 1, but requires special modifications as discussed above. Theorem 2 confirms our hypothesis on the existence of a no-regret algorithm in the class and closes thus the corresponding open research question [22]. An attentive reader may note that the pricing prePRRFES has the following drawback: this algorithm being applied against a myopic (truthful) buyer will incur a linear regret (in contrast to the source PRRFES). But we feel that this is the price we have to pay in order to construct a horizon-independent optimal algorithm that offers prices in a consistent manner (i.e., never revise prices that was previously reduced as it did by the PRRFES).

## 5 Conclusions

We studied horizon-independent online learning (discrete) algorithms in the scenario of repeated posted-price auctions with a strategic buyer that holds a fixed private valuation. First, we closed the open research question on the existence of a no-regret horizon-independent weakly consistent algorithm by proposing a novel algorithm that never decreases offered prices and can be applied against strategic buyers with a tight regret bound in . Second, we provided an upper bound on strategic regret of the algorithm PRRFES, that allows to optimize the constant factor of its principal term , outperforming thus the previously best known upper bounds. Finally, we generalized the previously known lower and upper bounds on strategic regret to classes of discount sequences that are wider than geometric progressions.

## Appendix A Missed proofs

### a.1 Missed proofs from Section 3

#### a.1.1 Proof of Proposition 1

###### Proof.

For each node , let be the surplus obtained by the buyer when playing an optimal strategy against after reaching the node . Since the price is rejected at the node by the strategic buyer, there following inequality on surpluses holds:

 γtn(v−pn)+S(r(n))

First, for the case , we show that rejection of the price at the node implies rejection of this price at the subsequent penalization nodes , as well. Indeed, let us assume the contrary: the strategic buyer accepts the price at the node for some (let be the smallest one). Hence, the optimal strategy passes through the right subtree , which is price equivalent to the tree , i.e., (by Def. 5 of a penalization sequence). Let be the part of this optimal strategy within the subtree (with the corresponding sequence of prices ), then we consider the strategy with the same sequence of decisions, but made after acceptance of the price at the node : . Since , the corresponding prices will be the same: . Hence, we have

 S(l(n))=S(ls(n))=γtn+s(v−pls(n))+S(r(ls(n)))=γtn+s(v−pls(n))+∞∑t=tn+s+1γt~at(v−~pt)=γtn+s(v−pn)+∞∑t=tn+1γt+s^at(v−^pt)<γtn(v−pn)+∞∑t=tn+1γt^at(v−^pt)≤γtn(v−pn)+S(r(n)), (A.2)

where we used, in the first inequality, the fact that the discount sequence is decreasing and, in the second one, that generates surplus in at most . In Eq. (A.2), we obtain a contradiction to Eq. (A.1). Therefore, the following inequality holds:

 γtn(v−pn)+S(r(n))

The surplus is lower bounded by , while the left subtree’s surplus can be upper bounded as follows (using ):

 S(lr(n))≤T∑t=tn+rγt(v−pn+δln)<∞∑t=tn+rγt(v−pn+δln),

We plug these bounds in Eq. (A.3) and obtain

 (v−pn)(γtn−∞∑t=tn+rγt)<∞∑t=tn+rγtδln,

that implies Eq. (1), since s.t. . ∎

#### a.1.2 Proof of Lemma 1

###### Proof.

Let , then, from the non-convexity, we have . In particular, since is decreasing. So, for any ,

 γt+t′=αt+t′αt+t′−1…αt+1γt≤αt′tγt≤αt′2γt.

Hence,

 ∞∑s=t+rγs≤∞∑s=t+rαs−t2γt=αr21−α2γt.

Taking and , we obtain both claims of the lemma. ∎

#### a.1.3 Telescoping discount sequence γt=1/t(t+1) does not satisfy Prop. 1

Note that this discount sequence is geometrically convex, i.e., .

Let us show that for the property s.t.  does not hold. Indeed, assume the contrary: let be s.t. . We have