# Reserve Pricing in Repeated Second-Price Auctions with Strategic Bidders

We study revenue optimization learning algorithms for repeated second-price auctions with reserve where a seller interacts with multiple strategic bidders each of which holds a fixed private valuation for a good and seeks to maximize his expected future cumulative discounted surplus. We propose a novel algorithm that has strategic regret upper bound of O(loglog T) for worst-case valuations. This pricing is based on our novel transformation that upgrades an algorithm designed for the setup with a single buyer to the multi-buyer case. We provide theoretical guarantees on the ability of a transformed algorithm to learn the valuation of a strategic buyer, which has uncertainty about the future due to the presence of rivals.

## Authors

• 8 publications
• ### On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic buyer

We study revenue optimization learning algorithms for repeated posted-pr...
07/17/2017 ∙ by Alexey Drutsa, et al. ∙ 0

• ### Optimal Pricing in Repeated Posted-Price Auctions

We study revenue optimization pricing algorithms for repeated posted-pri...
05/07/2018 ∙ by Arsenii Vanunts, et al. ∙ 0

• ### Robust Clearing Price Mechanisms for Reserve Price Optimization

Setting an effective reserve price for strategic bidders in repeated auc...
07/09/2021 ∙ by Zhe Feng, et al. ∙ 0

• ### Online Pricing with Reserve Price Constraint for Personal Data Markets

The society's insatiable appetites for personal data are driving the eme...
11/28/2019 ∙ by Chaoyue Niu, et al. ∙ 0

• ### Incentive-aware Contextual Pricing with Non-parametric Market Noise

We consider a dynamic pricing problem for repeated contextual second-pri...
11/08/2019 ∙ by Negin Golrezaei, et al. ∙ 7

• ### Bidding and Pricing in Budget and ROI Constrained Markets

In online advertising markets, setting budget and return on investment (...
07/16/2021 ∙ by Negin Golrezaei, et al. ∙ 0

• ### Learning Optimal Reserve Price against Non-myopic Bidders

We consider the problem of learning optimal reserve price in repeated au...
04/30/2018 ∙ by Zhiyi Huang, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Revenue maximization is one of fundamental development directions in major Internet companies that have their own online advertising platforms 2014-WWW-Gomes ; 2015-ManagSci-Balseiro ; 2014-KDD-Agarwal ; 2017-WWW-Drutsa ; 2018-IJGT-Hummel . Most part of ad inventory is sold via widely applicable second price auctions 2013-IJCAI-He ; 2014-ICML-Mohri and their generalizations like GSP 2007-IJIO-Varian ; 2009-AER-Varian ; 2014-AER-Varian ; 2014-ECRA-Sun . Adjustment of reserve prices plays a central role in revenue optimization here: their proper setting is studied both by game-theoretical methods 1981-MOR-Myerson ; agrawal2018robust

and by machine learning approaches

2007-Book-Nisan ; 2013-SODA-Cesa-Bianchi ; 2014-ICML-Mohri ; 2016-WWW-Paes .

In our work, we focus on a scenario where the seller repeatedly interacts through a second-price auction with strategic bidders (referred to as buyers as well). Each buyer participates in each round of this game, holds a fixed private valuation for a good (e.g., an ad space), and seeks to maximize his expected future discounted surplus given his beliefs about the behaviors of other bidders. The seller applies a deterministic online learning algorithm, which is announced to the buyers in advance and, in each round, selects individual reserve prices based on the previous bids of the buyers. The seller’s goal is to maximize her revenue over a finite horizon  through regret minimization for worst-case valuations of the bidders 2014-NIPS-Mohri ; 2018-ICML-Drutsa . Thus, the seller seeks for a no-regret pricing algorithm.

To the best of our knowledge, no existing study investigated worst-case regret optimizing algorithms that set reserve prices in repeated second-price auctions with strategic bidders whose valuation is private, but fixed over all rounds. However, our setting constitutes a natural generalization of the well-studied -buyer setup of repeated posted-price auctions111In particular, when , our auction in a round reduces to a posted-price one: the bidder has no rivals and his decision is thus binary (to accept or to reject a currently offered price). (RPPA) 2013-NIPS-Amin ; 2014-NIPS-Mohri to the scenario of multiple buyers in a second-price auction. In the RPPA setting, there are optimal algorithms 2017-WWW-Drutsa ; 2017-ArXiV-Drutsa ; 2018-ICML-Drutsa that have tight strategic regret bound of . This bound follows from an ability of the seller to upper bound the buyer valuation even if he lies when rejecting a price (2017-WWW-Drutsa, , Prop.2). This ability strongly exploits that the buyer knows in advance the outcomes of a current and all future rounds since he has complete information due to the absence of rivals. In our multi-bidder scenario, this does not hold: a bidder has incomplete information and is thus uncertain about the future. Hence, the theoretical guarantees could not be directly ported to our scenario when trying straightforwardly apply the optimal -buyer RPPA algorithms.

In our study, we propose a novel algorithm that can be applied against our strategic buyers with regret upper bound of (Th. 1) and constitutes the main contribution of our work. We also introduce a novel transformation of a RPPA algorithm that maps it to a multi-buyer pricing and is based on a simple but crucial idea of cyclic elimination of all bidders except one in each round (Sec.3). Construction and analysis of the proposed algorithm and transformation have required introduction of novel techniques, which are contributed by our work as well. They include (a) the method to locate the valuation of a strategic buyer in a played round under his uncertainty about the future (Prop. 1); (b) the decomposition of strategic regret into the regret of learning the individual valuations and the deviation regret of learning which bidder has the maximal valuation (Lemma 1); and (c) the approach to learn the highest-valuation bidder with deviation regret of w.r.t.  (Lemma 3).

## 2 Preliminaries: setup, background, and overview of results

Setup of Repeated Second-Price Auctions. We study the following mechanism of repeated second-price auctions. Namely, the auctioneer repeatedly proposes goods (e.g., advertisement opportunities) to bidders (whose set is denoted by ) over rounds: one good per round. From here on the following terminology is used as well: the seller for the auctioneer, a buyer for a bidder, and the time horizon for the number of rounds . Each bidder holds a fixed private valuation for a good, i.e., the valuation

is equal for goods offered in all rounds and is unknown to the seller. The vector of valuations of all bidders is denoted by

.

In each round , for each bidder , the seller sets a personal reserve price , and the buyer (knowing ) submits a sealed bid of . Given the reserve prices and the bids , the standard allocation and payment rules of a second price auction are applied 2016-WWW-Paes : (a) for each bidder , we check whether he bids over his reserve price or not, , obtaining the set of actual bidder-participants; (b) if , the good is allocated to the winning bidder (if a tie, choose randomly) who pays to the seller. (c) if , the current good disappears and no payment is transferred. Further we use the following notations for allocation indicators, payments, and their vectors: , , , , , and . The summary on all notations is in App. C.

Thus, the seller applies a (pricing) algorithm that sets reserve prices in response to the buyers’ bids . We consider the deterministic online learning case when the reserve price for a bidder in a round can depend only on bids of all bidders during the previous rounds and, possibly, the horizon . Let be the set of such algorithms. Hence, given a pricing algorithm , the buyers’ bids uniquely define the corresponding price sequence , which, in turn, determines the seller’s total revenue . This revenue is usually compared to the revenue that would have been earned by offering the highest valuation if the valuations were known in advance to the seller 2013-NIPS-Amin ; 2017-WWW-Drutsa . This leads to the notion of the regret of the algorithm :

Following a standard assumption in mechanism design that matches the practice in ad exchanges 2014-NIPS-Mohri ; 2018-ICML-Drutsa , the seller’s pricing algorithm is announced to the buyers in advance. A bidder can then act strategically against this algorithm. In contrast to the case of one bidder (), where the buyer can get an optimal behavior in advance, and the repeated mechanism reduces thus to a two-stage game 2013-NIPS-Amin ; 2014-NIPS-Mohri ; 2017-WWW-Drutsa ; in our setting, a bidder has incomplete information since he may not know the valuations and behaviors of the other bidders. Therefore, in order to model buyer strategic behavior under this uncertainty, we assume that, in each round , each buyer optimizes his utility on subgame of future rounds given the available history of previous rounds and his beliefs about the other buyers.

Formally, in a round , given the seller’s pricing algorithm , a strategic buyer observes a history available to him and derives his optimal bid from a (possibly mixed) strategy 222A buyer strategy is a map that maps any history in a round  to a bid , where and . Let denote the set of all possible strategies. that maximizes his future -discounted surplus:

 Surt:T(A,γm,vm,hmt,βm,σ)=E[T∑s=tγs−1m¯¯¯ams(vm−¯¯¯pms)∣hmt,σ,βm], (1)

where is the discount rate333Note that only buyer utilities are discounted over time, what is motivated by real-world markets as online advertising where sellers are far more willing to wait for revenue than buyers are willing to wait for goods 2014-NIPS-Mohri ; 2018-ICML-Drutsa . of the bidder . The expectation in Eq. (1) is taken over all possible continuations of the history w.r.t. a strategy of the buyer and his beliefs about the strategies of the other bidders 444So, and determine the future outcomes and

, that are thus random variables.

. The buyer assumes that the other bidders are strategic in the sense described above as well, what is taken into account in the beliefs 555In our setup, we do not require that the strategies actually used by the buyers match with the buyer ’s beliefs (an equilibrium requirement), because our results hold without this requirement.. When rounds has been played, let be the optimal bids that depend on , where and . We define the strategic regret of the algorithm that faced strategic buyers with valuations and beliefs over rounds as

 SReg(T,A,v,γ,β):=Reg(T,A,v,˚b1:T(T,A,v,γ,β)).

In our setting, following 2013-NIPS-Amin ; 2014-NIPS-Mohri ; 2017-WWW-Drutsa ; 2018-ICML-Drutsa , we seek for algorithms that attain strategic regret for the worst-case valuations . Formally, an algorithm is said to be a no-regret one when in our multi-buyer case. The optimization goal is to find algorithms with the lowest possible strategic regret upper bound , i.e., has the slowest growth as or, alternatively, the averaged regret has the best rate of convergence to zero.

Background on pricing algorithms. To the best of our knowledge, there is no work studied worst-case regret optimizing algorithms that set reserve prices in repeated second-price auctions with strategic bidders whose valuation is private, but fixed over all rounds. However, in the case of one bidder, , the bidder has no rivals, and, thus, the second-price auction in a round reduces to a posted-price auction, where the buyer decision reduces to a binary action: to accept or to reject a currently offered price . Let be the subclass of the 1-bidder algorithms s.t. each reserve price depends only on the past binary decisions of the buyer to get or do not get a good for a posted reserve price. For this subclass, all our strategic setting of repeated second-price auctions reduces to the setup of repeated posted-price auctions (RPPA) earlier introduced in 2013-NIPS-Amin .

Pricing algorithms in the strategic setup of RPPA with fixed private valuation and worst-case regret optimization were well studied last years 2013-NIPS-Amin ; 2014-NIPS-Mohri ; 2017-WWW-Drutsa ; 2018-ICML-Drutsa . It is known that, if the discount rate , any algorithm has a linear strategic regret, i.e., the regret has lower bound  2013-NIPS-Amin , while, for the other cases , the lower bound of holds 2003-FOCS-Kleinberg ; 2014-NIPS-Mohri . The first algorithm with optimal strategic regret bound of was found in 2017-WWW-Drutsa . It is Penalized Reject-Revising Fast Exploiting Search (PRRFES), which is horizon-independent and is based on Fast Search 2003-FOCS-Kleinberg modified to act against a strategic buyer. The modifications include penalizations (see Def. 1). A strategic buyer either accepts the price at the first node or rejects this price in subsequent penalization ones 2014-NIPS-Mohri ; 2017-WWW-Drutsa . PRRFES is also a right-consistent algorithm: a RPPA algorithm is right-consistent () if it never offers a price lower than the last accepted one 2017-WWW-Drutsa . The algorithm PRRFES was further modified by the transformation to obtain the one that never decreases offered prices and has a tight strategic regret bound of as well 2018-ICML-Drutsa .

The workflow of a RPPA algorithm is usually described by a labeled binary tree  2014-NIPS-Mohri ; 2017-WWW-Drutsa ; 2018-ICML-Drutsa : initialize the tracking node to the root ; in each round, the label is offered as a price; if it is accepted (rejected), move the tracking node to the right child (the left child , resp.); and go to the next round. The left (right) subtrees rooted at the node (, resp.) are denoted by (, resp.). When trees and have the same node labeling, we write .

###### Definition 1.

For a RPPA algorithm , nodes are said to be a (-length) penalization sequence if , , and .

Overview of our results. We cannot directly apply the optimal RPPA algorithms 2017-WWW-Drutsa ; 2018-ICML-Drutsa , because our bidders have incomplete information in the game, while the proofs of optimality of these algorithms strongly rely on complete information. This completely different information structure of the multi-buyer game results in very complicated bidder behavior even in the absence of reserve prices bikhchandani1988reputation . Hence, it is challenging to find, in the multi-buyer case, a pricing algorithm that has regret upper bound of the same asymptotic behavior as the best one in the -buyer RPPA setting. Our research goal comprises closing of this research question on the existence of such algorithms.

First, we propose a novel technique to transform a RPPA algorithm to our setup that is based on cyclic elimination of all bidders except one by means of high enough prices (Sec. 3). Separate playing with each buyer removes his uncertainty about the outcome of a current round; and, despite remaining uncertainty about future rounds, this is enough to construct a tool to locate his valuation (Prop. 1). Second, we transform PRRFES in this way and show that its regret is affected by two learning processes: the one learns bidder valuations and the other learns which bidders have the maximal valuation (Sec. 4). The former learning is controlled by the design of the source PRRFES, while the latter one is achieved by a special stopping rule that excludes bidders from suspected ones. A proper combination of parameters for the source pricing and the stopping rule provides an algorithm with strategic regret in , see Th. 1.

Related work.

Several studies maximized revenue of auctions in an offline/batch learning fashion: either via estimating or fitting of distributions of buyer valuations/bids to set reserve prices

2013-IJCAI-He ; 2014-ECRA-Sun ; 2016-WWW-Paes , or via direct learning of reserve prices 2014-ICML-Mohri ; 2015-UAI-Mohri ; 2016-WWW-Rudolph ; 2017-NIPS-Medina . In contrast to them, we set prices in repeated auctions by an online deterministic learning approach. Revenue optimization for repeated auctions was mainly concentrated on algorithmic reserve prices, that are updated in online way over time, and was also known as dynamic pricing fudenberg2006behavior ; 2015-SORMS-den-Boer . Dynamic pricing was considered: under game-theoretic view leme2012sequential ; 2015-EC-Chen ; 2016-EC-Balseiro ; 2016-EC-Ashlagi ; mirrokni2018optimal ; from the bidder side 2011-ECOMexch-Iyer ; 2016-JMLR-Weed ; 2016-ICML-Heidari ; 2017-NIPS-Baltaoglu ; in experimental studies list1999price ; 2012-RIO-Carare ; 2014-KDD-Yuan ; as bandit problems 2011-COLT-Amin ; 2015-NIPS-Lin ; cesa2018dynamic ; and from other aspects 2016-EC-Roughgarden ; 2016-NIPS-Feldman ; 2016-SODA-Chawla ; 2018-IJGT-Hummel . Repeated auctions with a contextual information about the good in a round were considered in 2014-NIPS-Amin ; 2016-EC-Cohen ; 2018-NIPS-Mao ; 2018-NIPS-Leme . The studies 1993-JET-Schmidt ; hart1988contract ; 2015-SODA-Devanur ; 2017-EC-Immorlica ; 2018-ArXiV-Vanunts elaborated on setups of repeated posted-price auctions with a strategic buyer holding a fixed valuation, but maximized expected revenue for a given prior distribution of valuations, while we optimize regret w.r.t. worst-case valuations without knowing their distribution.

There are studies on reserve price optimization in repeated second-price auctions, but they considered scenarios different to ours. Non-strategic bidders are considered in 2013-SODA-Cesa-Bianchi . Kanoria et al. 2014-SSRN-Kanoria studied strategic buyers (similarly to our work), but maximized expected revenue w.r.t. a prior distribution of valuations. Our setup can be considered as a special case of repeated Vickrey auctions in 2018-NIPS-Huang , but their regret upper bound is in and holds only when selling several goods in a round. However, the most relevant works to ours are 2013-NIPS-Amin ; 2014-NIPS-Mohri ; 2017-WWW-Drutsa ; 2018-ICML-Drutsa , where our strategic setup with fixed private valuation is considered, but for the case of one bidder, . The most important results of these works are discussed above in this section (see “Background on pricing algorithms").

## 3 Dividing algorithms and div-transformation

Barrage pricing. In our setting, a pricing algorithm is able to set personal (individual) reserve prices to each bidder and is able hence to “eliminate" particular bidders from particular rounds. Namely, in a round , an algorithm can set a reserve price s.t. a strategic bidder , independently of his valuation, will never accept , i.e., will never bid no lower than this price; such a price is referred to as a barrage reserve price. From here on we use : accepting it once will result in a negative surplus for a buyer with discount . We use the phrase “the bidder is eliminated666Note that, (a) formally, all bidders participate in all rounds (see Sec. 2) and (b), if a bidder is not eliminated, it does not mean that he is in (he may bid below his reserve price which can be a non-barrage one). So, the word “elimination" is purposely associate with barrage pricing in order to refer to this case. from participation in the round " to describe this case.

Dividing algorithms. In this subsection, we introduce a subclass of the algorithms that is denoted by and is referred to as the class of dividing algorithms (stands for lat. “Divide et impera"). A dividing algorithm works in periods and tracks a feasible set of suspected bidders aimed to find the bidder (or bidders) with the maximal valuation . Namely, it starts with all bidders at the first period which lasts rounds. In each period , the algorithm iterates over the currently suspected bidders : in a current round, it picks up , sets a non-barrage reserve price to the bidder , sets a barrage reserve price to all other bidders , and goes to the next round within the period by picking up the next buyer from . Thus, the algorithm meaningfully interacts with only one bidder in each round through elimination of all other bidders by means of barrage pricing. After the -th period, the algorithm identifies somehow which bidders from should be left as suspected ones in the next period (i.e., be included in the set ).

When the game has been played with the dividing algorithm , one can split all the rounds into periods: . Each period consists of rounds (the last one of ). Let denote the round of a period in which a bidder is not eliminated by the seller algorithm (i.e., receives a non-barrage reserve price). Thus, are all such rounds of the bidder and is referred to as the subhorizon of the bidder (the number of periods where he participates). Note that (a) and depend on the bids of all buyers ; (b) the following identities hold: and .

So, in a round , the algorithm eliminates the bidders (i.e., sets the reserves ), while the reserve price set for the buyer is determined only by his bids during the previous rounds where he has not been eliminated: i.e., . Hence, the algorithm ’s interaction with the bidder in the rounds can be encoded by a -buyer algorithm from , which sets prices in the rounds instead of . We denote this algorithm by and refer to it as the subalgorithm of against the buyer . Let be the regret of the subalgorithm for given bids of the buyer in the rounds . The lemma holds (the trivial proof is in App. A.1.1).

###### Lemma 1.

Let be a dividing algorithm, , be its subalgorithms (as described above), and be optimal bids of the strategic buyers . Then, for any , and , the strategic regret of can be decomposed into two parts , where is the individual part of the regret and is the deviation part of the regret.

Informally, this lemma states that the regret consists of the individual regrets against each buyer in his rounds and the deviation of the buyer valuations from the maximal one . So, we see a clear intuition: a good algorithm should (1) learn the valuations of the buyers (minimizing individual regrets) and (2) learn which buyers have the highest valuation (minimizing the deviation regret).

-transformation. Let be a -buyer RPPA algorithm. An algorithm is said to be a -transformation of with a stopping rule when it is a dividing algorithm from s.t. its subalgorithms are and the stopping rule determines which bidders are not suspected ones in after a period . Namely, first, the algorithm tracks the state of each buyer in the tree of the RPPA algorithm (see Sec. 2) by means of a personal (individual) feasible node. For each period and for each round , the current state (i.e., the history of previous actions) of the buyer is encoded by the tracking node ; in particular, in the round , he receives the reserve price of this node (the other bidders get a barrage reserve price ). If a buyer is not more suspected in a period (i.e., ), we formally set . Second, after a period , the stopping decision is based on the past buyer binary actions that are coded by means of the nodes in the binary tree : if the stopping rule is , then the buyer . The pseudo-code of the -transformation of a RPPA algorithm is in Appendix B.1.

For a RPPA right-consistent algorithm with penalization rounds, let denote the transformation of s.t. it is equal to , but each penalization sequence of nodes (see Def. 1) is reinforced in the following way: all the prices in the nodes are replaced by (the maximal valuation domain value); the sequence and the rounds are then referred to as reinforced penalization ones. After this, a strategic buyer will certainly either accept the price at the node , or reject the prices in all the nodes even in the case of uncertainty about the future. Let be the left increment 2014-NIPS-Mohri ; 2017-WWW-Drutsa of a node .

In order to obtain upper bounds on strategic regret, it is important to have a tool that allows to locate the valuation of a strategic bidder. Such a tool can be obtained for -transformed right-consistent RPPA algorithms with reinforced penalization rounds based on the following proposition, which is an analogue of (2017-WWW-Drutsa, , Prop.2) in our case with buyer uncertainty about the future.

###### Proposition 1.

Let , be a RPPA right-consistent pricing algorithm, be a starting node in a -length penalization sequence (see Def. 1), , be a stopping rule, and the -transformation be used by the seller for setting reserve prices. If, in a round, the node is reached and the price is rejected by a strategic buyer (i.e., he bids lower than ), then the following inequality on holds:

 vm−p(n)<ζr,γmδln,whereζr,γ:=γr/(1−γ−γr). (2)
###### Proof sketch.

The full proof is in App.A.1.2. Let be the round in which the bidder reaches the node and rejects his reserve price . In particular, it is the round where he is the non-eliminated buyer and for some period . Since the buyers are divided and , w.l.o.g., any strategy can be treated as a map to binary decisions . Let be the optimal strategy used by the buyer ; be the continuation of the current history by a binary decision , while denote an optimal strategy among all possible strategies in which the binary buyer decision is ; and be the future expected surplus when following a strategy . Rejection of the price when following the optimal strategy easily implies: . Let us bound each side of this inequality. First,

 Smt(^σ1)=γt−1m(vm−p(n))+Surt+1:T(A,γm,vm,hmt;1,βm,^σ1)≥γt−1m(vm−p(n)), (3)

where we used the facts (i) that if the bidder accepts the price , then he necessarily gets the good since all other bidders are eliminated by a barrage price in this round ; and (ii) that the expected surplus in rounds is at least non-negative, because the subalgorithm is right-consistent. Second, where we (i) used the fact that if the bidder rejects the price , then the future rounds will be reinforced penalization ones (the strategic bidder will reject in all of them); and (ii) upper bounded the surplus in remaining rounds by assuming that only this bidder will get remaining goods for the lowest reserve price from the left subtree . We unite these bounds on and get what implies Eq. (2), since . ∎

We emphasize that the dividing structure of the algorithm is crucially exploited in the proof of Prop. 1. Namely, the fact that all other bidders are eliminated by a barrage price in the round is used (a) to guarantee obtaining of the good at price by the buyer and (b) to lower bound thus the future surplus in the case of acceptance in Eq. (A.4). If we dealt with a non-dividing algorithm, then another bidder might win the good or make the payment of the bidder higher than his reserve price ; in both cases, could only be lower bounded by in a general situation, what would result in an useless inequality instead of Eq. (2).

For a right-consistent algorithm , the increment is bounded by the difference between the current node’s price and the last accepted price by the buyer before reaching this node. Hence, the Prop. 1 provides us with a tool to locate the valuation despite the strategic buyer does not myopically report its position (similar to (2017-WWW-Drutsa, , Prop.2)). Namely, if the buyer bids no lower than , then ; if he bids lower than , then and the closer an offered price is to the last accepted price the smaller the location interval of possible valuations (since its length is ).

## 4 divPRRFES algorithm

In this section, we will show that we can use an optimal algorithm from the setting of repeated posted-price auctions to obtain the algorithm for our multi-bidder setting with upper bound on strategic regret with the same asymptotic. Namely, let us -transform PRRFES 2017-WWW-Drutsa , further denoted as .

Since a -transformation of PRRFES (with penalization reinforcement) individually tracks position of each buyer in the binary tree , we adapt the key notations of PRRFES 2017-WWW-Drutsa to our case of multiple bidders and periods. Against a buyer , PRRFES works in phases initialized by the phase index , the last accepted price before the current phase , and the iteration parameter ; at each phase , it sequentially offers prices (exploration rounds), with ; if a price is rejected, setting , (1) it offers the price for reinforced penalization rounds (if one of them is accepted, will be offered in all remaining rounds), (2) it offers the price for exploitation rounds, and (3) PRRFES goes to the next phase by setting and . Individual tracking of bidders by the -transformed PRRFES implies that different buyers can be in different phases in the same period . Hence, let denote the current phase of a buyer in the round of a period , and let in all subsequent periods (when the buyer is no more suspected). In particular, is the last accepted price by the buyer before the phase in the period . We rely on the decomposition from Lemma 1 in order to bound the strategic regret of a -transformed PRRFES.

Upper bound for individual regrets. Before specifying a particular stopping rule, let us obtain an upper bound on individual strategic regret . This regret is not equal to since, in the latter case, the -bidder game does not depend on behavior of the other bidders (while, in the former case, does). In other words, the rounds do not constitute the -round -buyer game of the RPPA setting considered in 2013-NIPS-Amin ; 2017-WWW-Drutsa , because the subhorizon and exact rounds (they determine the used discount factors: ) are unknown in advance and depend on actions of the other bidders. Hence, this does not allow to straightforwardly utilize the result on the strategic regret for PRRFES proved in (2017-WWW-Drutsa, , Th.5) for the setting of RPPA. So, we have to prove the bound for our case with buyer uncertainty about the future. Let us introduce the notation: .

###### Lemma 2.

Let , be the PRRFES algorithm with and the exploitation rate , and be a stopping rule. Then, if , the individual regret of the -transformed PRRFES against the buyer is upper bounded:

 Regm(Im,⟨A1⟩,vm,˚bm1:T)≤(rvm+4)(log2log2Im+2)∀γm∈(0,γ0]∀vm∈[0,1], (4)

where are optimal bids of the strategic buyers .

###### Proof sketch.

Decompose the individual regret over the rounds into the sum of the phases’ regrets: , where is the number of phases conducted by the algorithm against the buyer . For : where the terms correspond to the accepted exploration rounds, the reject-penalization ones, and the exploitation ones. PRRFES and each rejected price satisfy the conditions of Prop. 1, what implies (since for and ). Hence, (since and PRRFES is right-consistent) and the number of exploration rounds is thus bounded: . All further steps are similar to (2017-WWW-Drutsa, , Th.5): ; for each phase , we get that ; and the number of phases . The full proof is in Appendix A.2.1 of Supp. Materials. ∎

Upper bound for deviation regret. Prop. 1 provides us with the tool that locates the valuation of a bidder at least in the segment right after a period (see the proof [sketch] of Lemma 2), when . This means: if, after playing a period , the upper bound of the valuation of a bidder is lower that the lower bound of the valuation of another bidder , i.e., , then the bidder does definitely have non-maximal valuation (i.e., ) and needs not to be suspected in the period and subsequent ones. For given parameters and of the PRRFES algorithm , any state of the algorithm can be mapped to the current phase and the last accepted price before the phase . Thus, we define the stopping rule: , where

 ρ(m,l,q):=∃^m∈M−m:qm+2ϵlm−1

The -transformation of the PRRFES algorithm with the stopping rule defined in Eq. (5) is referred to as the dividing Penalized Reject-Revising Fast Exploiting Search (divPRRFES). The pseudo-code of divPRRFES is presented in Appendix B.2 of Supp.Materials.

###### Lemma 3.

Let , the discounts , and the seller uses the divPRRFES pricing algorithm with the number of penalization rounds , with the exploitation rate , and with the stopping rule defined in Eq. (5). Then, for a bidder with non-maximal valuation, i.e., , his subhorizon is bounded:

 Im≤24(¯¯¯v−vm)−1+r(1+log2log2(4(¯¯¯v−vm)−1))<(24+5r)(¯¯¯v−vm)−1. (6)
###### Proof sketch.

Let be a buyer with the maximal valuation . Note that, in any period , the location intervals and must intersect (otherwise, the stopping rule has eliminated the buyer before the period , and, hence, ). In particular, in the period , holds for either or (not exclusively) , where . From the definition of the iteration parameter , i.e.