## I Introduction

Due to the rapid growth of the internet and online markets such as Amazon, eBay, Airbnb, and Uber, the influence of online platforms on our daily decision making is inevitable. In addition, online platforms play a major role in revenue management and supply-demand based on their pricing mechanisms. For instance, when a customer searches for a specific product (e.g., a TV) in an online marketplace (e.g., Amazon.com), the platform decides on what list of products to display to the customer. Similarly, booking a room through Hotels.com or Orbitz.com, the online platform offers a list of available rooms from different hotels based on the searched data. As a result, depending on what list of items are offered, the online platform derives different revenues from potential buyers while meeting their demands using available items. In fact, based on the global survey [13], the total market value of online platforms is growing rapidly and exceeding a net value of 4.3 trillion dollars worldwide.

In general, one can identify two types of online platforms in terms of their operation freedom [35]. The first one, known as *full control model*, allows the platform to have full control over offered assortments, prices, and the underlying supply-demand matching algorithms. As an example of the full control model, one can consider Uber that not only offers the rides to the passengers but also sets the prices based on a revenue-maximizing optimization algorithm. In the second model, known as *discriminatory control model*, the platform has only control over the set of displayed items, and the prices or potential matches are determined endogenously based on customers’ choices. Perhaps, Airbnb is a good example of a discriminatory control model where it only displays a list of available rooms to the customers and the renting prices are determined by the competition among the property owners.

In this paper, we analyze a discriminatory control model for the marketplace problem under both online and offline settings. More precisely, in the online setting we consider a model in which each indivisible item has a certain quality and a limited inventory that is owned by a seller. As each item is owned by exactly one seller, we often refer to items and sellers interchangeably. There is a platform that can decide on what set of sellers to display to each upcoming buyer. This induces a competition among the displayed sellers to set their items prices, where the optimal prices are given by the Nash equilibrium of the Bertrand game induced by the set of displayed sellers. Given these equilibrium prices, the buyer selects an item based on the multinomial logit (MNL) model or decides to leave the market without any purchase. Therefore, we are interested in devising an efficiently computable online policy for the platform to maximize the aggregate revenue of the sellers over the entire horizon. In particular, we want our online algorithm to be competitive with respect to the best in-hindsight policy that knows the total number of buyers and the inventory levels a priori.

In the offline setting, we assume that all the buyers are already available in the market (rather than joining sequentially) and buyers can have heterogeneous evaluations about items qualities. As a result, the equilibrium prices of the sellers are now given by a *generalized* Bertrand game which is induced by a two-sided market with buyers and sellers being on different sides. Here we are interested in the existence of a pure Nash equilibrium in the generalized Bertrand game. This generalizes the equilibrium existence result of [35] from the case of single buyer Bertrand game to the case of multi-buyer Bertrand game over general bipartite graphs. In particular, we consider the problem of optimal market segmentation over the generalized Bertrand game where the goal is to segment the entire market into smaller pools such that the aggregate revenue obtained at the equilibrium of the submarkets improves the revenue obtained at the equilibrium of the original market.

### I-a Related Works

The design of an online marketplace with different objectives has been studied in the past literature [24, 3, 6, 26]. However, unlike prior work [6, 3] that use monotone distribution functions to capture supply and demand curves, we use the well-known multinomial logit (MNL) model [25, 12, 15]

to capture buyers’ choice behavior. This, in turn, allows us to describe buyers’ demands in terms of purchase probabilities of substitutable items. Using such demand functions, the competition among displayed sellers is captured using the Bertrand game which has been extensively studied in economic literature

[32, 33] for modeling oligopolistic competition in actual markets. Perhaps, our work is most related to [36] and [35] in which the static version of the problem that we consider here was studied, i.e., when there is only one buyer with no inventory constraints. It was shown in [35] that the revenue function for a single buyer one-shot market posses a nice quasi-convex structure which allows one to find the optimal assortment efficiently. In particular, the authors raised the question of designing an optimal dynamic mechanism, where the buyers arrive and depart, and the sellers have limited capacity for products. In this paper, we answer this question by devising a constant factor competitive algorithm which can be implemented efficiently in real-time.This work is also closely related to dynamic online assortment [20, 14, 34, 21, 16, 29], and dynamic revenue management [30, 18, 11, 19, 9], where broadly speaking the goal is to dynamically choose assortments/prices in order to maximize the aggregate revenue obtained by selling items subject to inventory constraints. Such problems have been extensively studied under various settings such as the stochastic demand arrival model [30, 19, 9], adversarial demand model where the sequence of buyers’ types can be chosen adversarially [20], and reusable items where an allocated product can be returned to the inventory after some random time [21, 16]. We refer to [8] for a comprehensive survey on recent advances on dynamic pricing and inventory management problems.

On the other hand, except for a handful of results [17, 23, 34], the effect of competition or externalities in dynamic assortment problems have not been fully addressed before. In fact, almost all the earlier results on dynamic online assortment (see, e.g., [14, 30, 20]

) consider fixed-revenue items, meaning that the revenue obtained from selling an item is fixed and does not depend on the offered assortments. This allows linear programming (LP) methods such as

*dual-fitting*quite amenable for designing competitive online algorithms and has been used in several earlier works [16, 20]. Unfortunately, the dependence of items revenues on the offered assortments creates extra externalities that make the use of conventional methods such as dual-fitting or dynamic programming formulations more complex. To handle such externalities, in this work we take a different approach and use a novel charging argument which allows us to compare the revenue obtained from our online algorithm with that of the optimal offline benchmark.

It is worth mentioning that our work is also related to online learning for LPs with packing constraints [2, 10, 22]

. This is because one can use an LP with packing constraints to upper bound the revenue of the optimal offline algorithm. As a result, generating a feasible solution to such LP in an online fashion whose objective value is competitive to the offline LP will automatically deliver a competitive online algorithm for our problem. Leveraging this idea, a typical approach is to dynamically learn/estimate the dual variables of the offline LP using some black-box online learning algorithms (e.g.,

*multiplicative weight*update rule [22]), and use the estimated dual variables to guide the primal solution generated by the algorithm. However, the LP formulation in our setting has exponentially many variables that are associated with all the feasible assortments. Therefore, it is not clear without an extra effort that how this approach can be used to devise a polynomial-time competitive algorithm for our online assortment problem.

Finally, the generalized Bertrand game that we consider in this paper is an extension of the single buyer Bertrand game given in [35]. To the best of our knowledge, Bertrand games over general bipartite graphs and MNL demands have not been addressed in the past literature. Here we should mention that other oligopolistic competitions such as Cournot competition over general bipartite graphs have been studied in [5, 1]. However, a major challenge in analyzing the generalized Bertrand game compared to the network Cournot game is the restriction of the players’ strategy spaces. More precisely, in the network Cournot game, a player can control its supply to any of its buyers separately, while in the generalized Bertrand game a player can only set one price and the demands are computed endogenously. Moreover, establishing the existence of a pure Nash equilibrium in network Cournot games is typically by reduction to potential games [27] or submodular games [31], a property that does not seem to hold in our generalized Bertrand game.

### I-B Organization

The rest of the paper is organized as follows. In Section II, we formally define the online assortment problem. In Section III, we provide some preliminary results and establish several lemmas for our main analysis. In Section IV, we establish our main result for the online setting that is a constant competitive algorithm for the online assortment problem. In Section V, we introduce the offline Bertrand game over general bipartite networks, establish the existence of its pure Nash equilibria, and provide an approximation algorithm for its optimal segmentation. We conclude the paper by identifying some future directions of research in Section VI.

## Ii Problem Formulation

In this section, we introduce the online assortment problem and postpone the offline setting to Section V. We consider an online assortment problem with sellers, where the th seller holds item with an initial inventory . As each item is assigned to exactly one seller, throughout this paper we use the terms sellers and items interchangeably. Each item has a fixed quality , and without loss of generality, we assume that the items are sorted according to their qualities, i.e., . We also define a “no-purchase” item with to represent the case where a buyer decides to leave the market without any purchase.

We consider a sequence of *homogeneous* buyers that arrive at discrete time instances , for some unknown integer . As in [35] and [26], we adopt the random utility model for purchase probability of the buyers. In the random utility model the buyer has a private random preference about the

th item. Given a vector of nonnegative prices

on the items,^{1}

^{1}1By convention, we assume that the price of is normalized to . the buyer derives utility from purchasing item . Therefore, the buyer purchases item with the highest utility, i.e., , where ties are broken arbitrarily. In particular, under the assumption that

are i.i.d random variables with Gumbel distribution, we obtain the well-known multinomial logit (MNL) purchase probabilities

[25, 12, 15]. Thus for every we have(1) |

The probability can also be viewed as the *expected demand* of a buyer for item given posted prices .

In this paper, we consider a discriminatory control model where upon arrival of a buyer, the online platform can only decide on what subset of items to display to the buyer. However, the platform has no control over the posted prices and they are determined endogenously by the competition among the displayed sellers. To capture the competition among the sellers, we use Bertrand competition game [35, 33, 32], where the seller of each item sets a price to maximize its own revenue. More precisely, let be the set of items that are displayed by the online platform to a specific buyer. Consider a Bertrand game between the sellers (players) in , where an action for seller is to set a price on its item. Given the profile of prices , the revenue of seller is given by the expected amount of item that is sold based on (1) when restricted on the set (rather than ), multiplied by the price of that item, i.e., . Assuming that in the induced Bertrand game each seller maximizes its own revenue, the *unique* pure Nash equilibrium prices can be computed in a closed-form as [36, Theorems 1 & 2]:

(2) |

where is the MNL demand probability (1) computed at the equilibrium prices . Substituting these probabilities from (1) into (2) and solving for equilibrium prices , one can obtain a closed-form solution for the equilibrium prices and the equilibrium demands in terms of “no-purchase” probability as [36, Theorem 2]:

(3) |

Here is a strictly increasing and concave function given by the unique solution of , and is given by the unique solution of the equation

(4) |

Thus, by offering assortment , the expected revenue that seller derives at the Nash equilibrium equals to

(5) |

where is given by (3). Finally, we define

to be the total expected revenue obtained by offering the assortment . It is worth noting that from (5), the revenue of an item depends on the assortment that is offered. This makes analysis of assortment dynamics much more complicated than the case where each item has a *fixed* revenue , regardless of the offered assortment . In particular, the equilibrium revenues (5) create externalities as the revenue of an item also depends on items that are offered in the same assortment as .

###### Remark 1

Unlike [35] that assumes , we allow . This extension captures more realistic markets where a negative (low) quality item captures the case that a buyer strongly prefers to leave the market rather than purchasing that item. In fact, for nonnegative qualities, using (3) and (4) one can show that offering any item alone still forces a buyer to purchase it with a large probability of , which may not always be true.

The online optimization problem that is faced by the platform is to select a sequence of assortments subject to inventory constraints so as to maximize the aggregate expected revenue given by

(6) |

where are random variables denoting the assortments offered by the platform at times . Here, the offered assortments must satisfy the inventory constraints meaning that an item can be included in at time only if it is still available at that time, i.e., if its units are not fully sold to buyers . It is worth noting that the available inventory of an item at a time not only depends on the assortments offered by the platform up to that time but also it depends on the buyers’ choice realizations up to time . Therefore, an online algorithm must satisfy the inventory constraints for any realized sample path of buyers’ choices.

Finally, we note that the platform can use either a deterministic or a randomized online algorithm, where by an *online* algorithm we refer to an algorithm that does not know the total number of buyers and the initial inventory of the sellers , nor it has access to the random choices realized by buyers. In other words, the only information that an online algorithm has access to it before it makes a decision at time is the quality of items and whether or not an item is available at that time. On the other hand, an *offline* algorithm knows all the parameters , and a priori, but not the realization of the buyers’ random choices. To evaluate the performance of an online algorithm, we use the notion of competitive ratio as defined next.

###### Definition 1

Given , an online algorithm for the dynamic assortment problem is called -competitive if for every instance it achieves in expectation an -fraction of the total revenue obtained by any offline algorithm that knows the number of buyers and inventory levels a priori.

## Iii Preliminary Results

In this section, we state and prove several preliminary results which will be used later to establish our main results. To devise a competitive algorithm, we first derive an upper bound for the expected revenue that any offline algorithm can obtain and use it as a benchmark to compare the performance of our online algorithm with that upper bound. This can be done by writing an LP whose optimal objective value is no less than any feasible offline algorithm. This step is fairly standard in devising competitive algorithms and has been particularly used in [20, 14], and [16] for online assortment problems. To this aim, let us assume that the number of buyers and inventory levels are known and consider an LP whose optimal objective value (OPT) is given by

(7) | ||||

s.t. | (8) | |||

(9) |

Here is the column vector of initial inventories, and is the column vector of equilibrium purchasing probabilities, where is obtained from (3) and is the equilibrium demand for item given that assortment is offered. Moreover, for each , the variable vector belongs to the probability simplex , where can be viewed as the probability that the algorithm offers assortment to the buyer at time . Thus, the first constraints in (7) capture the inventory constraints that are written in a vector form. We note that in the online assortment problem we restrict our attention to identical buyers such that and do not depend on . This is because if can also depend on the type of buyers, then as it is shown in the following example no online algorithm can achieve a constant competitive ratio of . We shall address a market with heterogeneous buyers under offline setting in Section V.

###### Example 1

Consider a single item with unit inventory and an adversary who first selects the number of buyers and their quality evaluations (types) and then reveals them sequentially over time. First, we note that any competitive online algorithm must offer the item to the first arriving buyer. Otherwise, for instances that the adversary selects and , the revenue obtained by the online algorithm is while the expected revenue of the optimal offline algorithm that knows is . This implies that the competitive ratio is . Similarly, any competitive online algorithm must offer the item to the second buyer given that it was not purchased by the first buyer. Otherwise, for all the instances that the adversary chooses and , the competitive ratio of the online algorithm is at most

Thus for large the competitive ratio is arbitrarily close to . By repeating the same argument, one can see that a competitive algorithm must offer the item to any arriving buyer given that it is not sold in previous steps. Now for all the instances that the adversary chooses a large and selects , we know that the online algorithm will sell the item up to time with probability at least (for very small ) and achieves an expected revenue of at most . On the other hand, an offline algorithm that offers the item only to the last buyer achieves an expected revenue of . Therefore, the competitive ratio of the algorithm over such instances is at most , as .

Next, we show that any online algorithm generates a feasible solution to the LP (7) with an objective value that is equal to the expected revenue of that algorithm. To see this, let be random variables denoting the assortments offered by the algorithm at time steps . Moreover, let be a random variable denoting the item that is purchased by the buyer at time . Then, for every realized sample path , , where is the indicator function. Taking expectation from this inequality over all sample paths, we can write

(10) |

Thus, setting forms a feasible solution to the LP (7) whose objective value is equal to the expected revenue of the algorithm, that is . This shows that OPT provides an upper bound for the revenue obtained by any offline algorithm.

In the rest of this section, we prove several useful lemmas for our later analysis. We start by the following so-called *substitutability* property.

###### Lemma 1

For any assortment and two items , we have and .

First note that offering more items reduces the probability of not purchasing any item at the equilibrium. This is because if denotes the no-purchase probability at the equilibrium, then , where is a strictly increasing function. Now if we offer a larger assortment , we must have . Otherwise, offering can only increase the right-hand side of the former equality while decreases its left-hand side, implying . This contradiction shows that . Now by monotonicity of and using (3) we have,

In other words, offering more items in the assortment reduces the market share for the existing ones. This also implies that the revenue derived from an item if it is offered in a larger set is less than when it is offered in a smaller set, as

###### Definition 2

An item is called *heavy* if offering it alone obtains at least half of the market share, i.e., . An item is *heavier* than item if . We denote the set of heavy items, if any, by , and refer to any other item as a *light* item.

Note that having access to items qualities, an online algorithm can use (3) and (4) to easily compute the expected demands a priori. Therefore, we may assume without loss of generality that an online algorithm knows the heaviness of all the items. It is worth noting that the choice of in Definition 2 can be replaced by any other constant in which then leads to a different competitive ratio for our devised online algorithm. However, to keep the analysis simple, in this paper we set the threshold to and do not seek to optimize the competitive ratio over different thresholds.

###### Remark 2

As the items are sorted according to their qualities , using (3) and monotonicity of one can see that the same order must hold on the heaviness of the items, i.e., .

###### Lemma 2

Given an assortment , let be a heavy item that is at least as heavy as any other item in . Then offering alone obtains at least half of the expected revenue of offering , i.e., .

As is a heavy item, . Now let us first assume , meaning that is the heaviest item in .

Case I: If , by , we have . Thus , and

(11) | ||||

(12) |

where the third inequality holds because , and the last inequality holds because by the substitutability property (Lemma 1).

Case II: If , since is the heaviest item in , this means that for every other , . As a result,

(13) | ||||

(14) |

where the last inequality holds because .

Finally, if , then either does not contain any heavy item in which case , and the same chain of inequalities in (13) holds, or contains at least one heavy item. In the latter case, let be the heaviest item in . Now using the first part of the proof . Since by the assumption is heavier than , , implying .

The next lemma shows that in the absence of heavy items, a greedy algorithm achieves a constant competitive ratio.

###### Lemma 3

Assume that every item in the online assortment problem is light. Then an online algorithm that at each time offers all the available items is at least -competitive.

Consider a *virtual* online assortment problem with identical initial inventory and purchase probabilities as in the original online assortment problem, except that the revenue of an item is now *fixed* and equals to some constant . Using similar argument as in Section III, the optimal offline revenue to the virtual problem is upper bounded by the optimal value of the following LP,

(15) | ||||

s.t. | (16) | |||

(17) |

It has been shown in [20, Example 1] that a greedy online algorithm that at time offers the maximizing assortment

(18) |

is -competitive with respect to OPT, where denotes the set of available items at time . Since both the original and virtual assortment problems (as well as their offline LP benchmarks (7) and (15)) share the same inventory constraints, any feasible online algorithm for one is also feasible for the other one. The only difference is in their expected objective revenues given by and , respectively. From Lemma 1 we know that for any assortment , . Thus without heavy items . Denoting the expected revenue of the greedy algorithm on the original and virtual problems by and , respectively, a simple coupling shows that

(19) |

where and denote -dimensional column vectors with all entries being 1 and 2, respectively. Finally, at each time the greedy rule (18) with offers the assortment

where by (4) the right-hand side is maximized if .

## Iv A Competitive Algorithm for Online Assortment Problem

In this section, we first describe an online assortment algorithm and then prove its performance guarantee. The proposed algorithm is very simple and requires only sorting the item with an overall computation of . The algorithm is comprised of two phases. In the first phase we take care of the heavy items (if any) by offering them alone until either they are fully sold or no buyer is left. We then take care of the light items by offering them all together in larger bundles. A formal description of this hybrid algorithm that we shall refer to it as “Alg” is summarized below.

Note that since an online algorithm does not know the number of buyers and their initial inventory levels ahead of time, a competitive online algorithm must carefully balance a trade-off between two scenarios: 1) Increasing the chance of selling items by offering larger assortments (hence reducing prices) when there are a few buyers and many items, and 2) Reducing the competition among the sellers by offering smaller assortments (hence increasing prices) when there are many buyers and a few items. In fact, balancing these two cases is the main reason behind the choice of each phase in Algorithm 1. It is worth noting that Algorithm 1 can be implemented in a fully online fashion and it does not need to know anything about the number of buyers and the initial inventory levels . At each time it only needs to know whether there is an arriving buyer and if an item is still available at that time. Finally, we note that due to random realization of buyers’ choices, the time at which Phase 1 in Algorithm 1 terminates is a random variable , which can be at most . The following lemma shows that for all sample paths that Algorithm 1 has a chance to enter its second phase (i.e., ), the expected revenue obtained during Phase 2 is within a constant factor of the total revenue that light items contribute to the offline benchmark (7).

###### Lemma 4

Let be any sample path of length that may be realized during Phase 1 of Algorithm 1. Conditioned on , the expected revenue obtained in Phase 2 is at least -fraction of the total revenue that light items contribute to OPT after time .

Let be the optimal offline solution to LP (7) over . The total contribution of light items to the revenue of OPT after time is

Now let us consider

(20) | ||||

s.t. | (21) | |||

(22) |

and notice that (20) is precisely the LP relaxation upper bound for any online algorithm that can be used during Phase 2 with constant revenues . On the other hand, from the second inequality in (19) (Lemma 3), we know that the greedy algorithm used during Phase 2 obtains at least -fraction of the optimal value in (20). Thus, if there exists a feasible solution to (20) with an objective value of at least , then we can conclude that the revenue obtained during Phase 2 is at least , completing the proof. Therefore, in the rest of the proof we proceed to construct a feasible solution to (20) with an objective value of at least .

Let us fix an arbitrary and for simplicity we drop the time superscript . Define , for , and consider the following auxiliary LP:

(23) | ||||

s.t. | (24) | |||

(25) |

Note that (23) is a feasible LP. This is because for any and using substitutability property we have,

(26) |

and hence is feasible to (23).

Next we show that the optimal value of (23) is zero. To derive a contradiction, let be an optimal solution to (23) with strictly positive objective value and let be an item for which . This means that there exists such that and . Note that must contain at least another item other than . Otherwise if , reducing by a small positive amount and increasing by the same amount will give us another feasible solution to (23) with strictly smaller objective value, contradicting the optimality of . Now let us partition the items in into and , where

(27) |

By definition of , this means that there exists a positive number such that reducing to , i.e., setting (by abuse of notation), will preserve the feasibility of all the constraints associated with items . Note that such a change can only affect the feasibility of the constraints and has no influence on constraints (as ). Unfortunately, this update violates the constraints such that in the new solution . However, we will show that one can sequentially redistribute the -mass that was removed from to nested subsets of and again satisfy all the constraint in at equality.

Let , and note that by substitutability property and since , we have . Therefore, returning amount from -mass to , i.e., setting , each constraint in becomes “more” feasible with at least one constraint (namely the one that achieves ) is satisfied by equality. Let be all the items in that are not satisfied by equality after this update, i.e.,

and define . Again by substitutability and since , we have

and thus . Therefore, relocating amount of the left over mass to by setting , every constraint in becomes more feasible with at least one constraint being tight at equality. Repeating this argument inductively, one can see that due to substitutability property we always have enough leftover mass to make one more constrain in tight so that at the end of this process all the constraints in are satisfied at equality. Finally, denoting the left over mass at the end of this process by , we can relocate that to the empty set by setting . This last step does not affect the feasibility of any constraints and only guarantees that the mass conservation is preserved so that . Thus, at the end of this process we obtain a feasible solution to (23) with strictly smaller objective value than the initial optimal solution , a contradiction. Therefore, the optimal value of (23) is zero and there exists such that

As the above argument holds for any , we obtain a feasible solution to (20) that consumes the exact same amount from each resource that is consumed by the optimal offline solution . In particular, the objective value of (20) for equals

(28) |

where the inequality holds by the fact that for every light item , . This completes the proof.

###### Definition 3

We let denote the set of all sample paths that can be realized during the execution of Phase 1 of Algorithm 1. We define to include sample paths for which all the heavy items are fully sold during Phase 1, and to be all sample paths for which the algorithm does not even get a chance to enter its second phase.

###### Lemma 5

Let denote the expected revenue of Algorithm 1 over all the sample paths in . Then, we have .

Let be a random variable denoting the first time that item is fully sold during Phase 1 of the algorithm, where we note that . Then . We first derive a lower bound for . Given an arbitrary sample-path , the revenue obtained during Phase 1 equals to . This is because over this sample-path all the units of item are fully sold at an equilibrium price of (as heavy items are offered individually). Thus,

(29) |

Now given any sample path , using Lemma 2, we can upper bound the revenue of OPT up to time as

(30) |

Also, the total contribution of item to the value of OPT is:

(31) |

where the first inequality is by the substitutability property, and the second inequality is because is a feasible solution to (7). As we have accounted the total contribution of item to OPT, we can safely remove item from all the assortments offered by OPT and upper bound the remaining revenue of OPT over as

(32) | ||||

(33) | ||||

Comments

There are no comments yet.