1. Introduction
Members of a society often leverage the experiences of others to inform their own choices. A high school student contemplating colleges might ask former high school students about their decisionmaking processes: what schools were considered, why the chosen one was selected over the others, and so on. A tourist contemplating restaurants might read reviews of former diners on review websites: what was the overall impression of the restaurant, what aspects contributed to this impression, etc. This sort of communal learning is highly valuable. By listening to experiences from the past, society can avoid repeating the mistakes of the past. However, this benefit comes at a potentially significant cost. As each individual optimizes their own choice, they might naturally select a triedandtrue option over an unknown one. As a result, unknown options, which could have great potential, are never discovered.
The classic field of social learning, initiated by (Banerjee, ; Bikhchandrietal, ), models this problem in the following way. Agents arrive sequentially and choose an action (e.g., search strategy) that results in some payoff based on the state of the world (e.g., the values of the items). Each agent receives a private signal correlated with the state of the world (e.g., priors on item values) and also observes chosen actions of past agents. The literature derives conditions under which the society may “herd” on a suboptimal action, ignoring private signals and instead exploiting information gleaned from the history of actions.
The nascent field of incentivizing exploration tries to use information design or mechanism design to prevent the exploitation of historical information. In this line of work, there is a principal (the review website in the case of restaurants for instance), a sequence of agents (tourists in the restaurant example), and a set of options (the restaurants themselves). Each option has an inherent value, which is common to all agents, and the goal is to discover the highestvalue option as quickly as possible. The principal either explicitly pays agents to try new options (FrazierKKK14, ), or sends signals that cleverly obfuscate the past experiences in a way that convinces agents to try new options (KremerMP13, ; MansourSS15, ). While these approaches successfully mitigate the dangers of learning from the past, they require the existence of a principal to coordinate society’s search.
In this paper, we study decentralized exploration. Our central hypothesis is that societies perform sufficient search, even without a coordinating principal, when there are sufficiently diverse preferences. Intuitively, this diversity should be 1) high enough that agents occasionally explore even when past agents had great experiences, and 2) low enough that there is still value in observing the past. In this case, then society can perhaps learn to select good options, both by learning from each individual agent as she investigates her options before making a choice, and by learning from the sequence of agents who make different choices than their peers. We ask, is there a simple model and large range of parameters in which we can observe this effect?
We explore our hypothesis in the context of a standard model of consumer search, iterated over time. In a single round of consumer search, an agent is presented with a set of items. The agent knows the distribution of her value for each item and may pay a fixed cost to learn its instantiation. The resulting utility of the agent is her value for the item she selects minus the costs she paid to learn her values. For example, a high school student contemplating colleges might know her approximate value for each and, based on this information, pick a subset to visit prior to selecting one. In our iterated version of this model, what an agent knows about her values depends on what options previous agents considered. More precisely, before performing her search, each agent updates her priors on the values observed by previous agents during their searches. In the college choice problem, this corresponds to a high school student listening to the trip reports of students in prior years before selecting which subset of schools to visit.
The degree to which historical information impacts an agent’s search depends on the amount of correlation, or diversity, in the agents’ preferences. In order to study the impact of diversity on average utility, we introduce a model in which each option has a common value component, drawn IID from a zeromean Gaussian with variance , and each agent/option pair has a private value component, also drawn IID from a zeromean Gaussian with variance . Thus the total value of each option for each agent, the sum of the common and private value components, is also a zeromean Gaussian with a fixed variance of . The variance parameter, , can be thought of as the diversity level. For , all the value of an option comes from the common value component, whereas for , all the value comes from the private value component. Since the distribution of option values remains constant as we change the diversity level, we are able to compare the average value obtained by societies of different diversity levels.
Our theoretical and simulation results show that diversity helps in this simple model. In particular, intermediate levels of diversity always yield higher average utilities than the extremes of zero diversity (), in which preferences are perfectly correlated, or total diversity (), in which preferences are independent across agents. We further investigate the comparison between intermediate levels of diversity. Our main result, Theorem 4.3, Theorem 4.4 and simulations, identify three distinct behaviors: For very short time horizons, less diverse societies outperform more diverse ones as the benefits of the added exploration have yet to take effect. For very long time horizons, less diverse societies again outperform more diverse ones as even the less diverse societies eventually perform sufficient exploration. But for a very large range of intermediate time horizons, we see more diverse societies outperform less diverse ones. This is because the less diverse societies tend to exploit historical information at the expense of exploration, similar to the phenomenon of herding in the social learning literature.
In addition to our aforementioned results, in Section 5, we extend our results to a more general setting that allows the quality of an item to get a very high value
with tiny probability
. This setting models situations where even though the society might be very diverse and agents have almost no common value, in the unlikely event that some clearly superior option with value appears, they all agree that its value is high. We use this model to capture settings where the item values are not very concentrated, as Gaussians are, but have heavier probability tails. These happen in many natural situations where breakthroughs with significantly high value occur and the whole society agrees that they are valuable. We show that for such settings, high diversity is extremely beneficial for the social good as it increases incentives for exploration and breakthroughs occur faster. The literature on incentivizing exploration (FrazierKKK14, ; HanKQ15, ) usually refers to these settings as “Diamonds in the rough” and often uses this setting to provide lower bounds. In contrast, we show diverse societies excel in these settings.Techniques
To establish our results, we need to be able to describe the optimal behavior of agents and bound the society’s average expected utility. As we argue, the optimal behavior of the agents is characterized in (Weitzman79, )
. As shown in that paper, agents define an index for each item and then search them in decreasing order of index. Unfortunately, this index is defined implicitly, making it difficult to explicitly determine the utilities obtained by the agents. A crucial result that allows us to perform our analyses is that the average utility is approximately equal to the maximum quality of an item that agents will have explored by the end of the time horizon. This result enables us to focus on bounding the maximum quality of explored items. To provide an upper bound on this value, we consider an an alternative simpler scenario where agents, instead of following the optimal policy and computing indices of the inferred distributions of values, they use a plain threshold stopping rule. We show that there exists a coupling of the two processes such that the simpler process always explores more items and then proceed to bound the highest quality achievable when using the simple threshold stopping rule for every agent. We also give a matching lower bound by bounding the number of items that agents with a given diversity level could explore and then bound the expectation of the maximum value that these many normally distributed items may have.
We show that our bounds are asymptotically tight. This is required in order to be able to perform comparisons across different diversity levels as even a constant approximation of the expected average utility would not suffice to show a clear separation as average utilities of different diversity levels can be very close multiplicatively. To prove these sharp bounds, we rely on strong concentration results for Gaussian distributions.
2. Related Work
Our work is very related to the line of work in Bayesian social learning. The classic results of (Banerjee, ; Bikhchandrietal, ) show that in social learning with homogeneous payoffs, agents may ignore their private signals in favor of information they infer from previous agents. When this happens, society herds on suboptimal actions. A recent strand of literature, including (Acemoglu11, ; Acemoglu17, ; Goereeetal, ; Lobel15a, ; Lobel15b, ), explore the impact of heterogeneous preferences on social learning, generally observing that heterogeneous preferences help learning when the society is wellconnected. The primary difference between these works and our paper is the structure of the actions and information sets. With the exception of (Goereeetal, ), these works assume that the action of each agent is binary and the state of world is binary. On the contrary, in our paper, the action of each agent is a search strategy and the state of world (the qualities of items) is sampled from independent Gaussian distributions. This choice allows us to study the impact of diversity in a specific setting of interest, as motivated by the literature on consumer search initiated by (Weitzman79, ). For the dispersion of information, (Acemoglu11, ; Lobel15a, ; Lobel15b, ) assume agents observe a subset of past actions based on the social network and (Acemoglu17, ; Goereeetal, ) assume agents observe the full history or summary statistics of the reviews of past agents where a review is a discrete valued function of an agent’s utility. In our paper, we assume agents observe actions of all previous agents and qualities/values of items explored by all previous agents.
Incentivizing exploration has been studied in the multiarmed bandit setting. In this line of research, it is assumed that myopic agents arrive sequentially and each pulls an arm at a time. As each agent only wants to maximize its own reward, exploration might not be done sufficiently without a principal coordinating exploration. (KremerMP13, ; MansourSS15, ) study the setting where the outcomes of the actions are only observed by the principal and the principal convinces myopic agents to explore by sending signals containing partial information of the past outcomes. (FrazierKKK14, ) considers a different setting where the myopic agents observe all the previous outcomes and the principal is allowed to use money transfers to incentivize exploration. A great amount of followup work has been done in various settings, including (CheH15, ; HanKQ15, ; MansourSSW16, ; KannanKMPRVW17, ; Papanastasiou18, ).
There is an emerging line of work studying “greedy algorithms” in multiarmed bandits(bastani2017exploiting, ; kannan2018smoothed, ; externalitiescolt18, ). Greedy algorithms can be viewed as a form of social learning and these papers show that nearoptimal performance can be achieved under certain diversity conditions.
(SR17, ) considers myopic agents with diverse preferences in the multiarmed bandit setting. They show that if subjective preferences are known to agents beforehand, myopic agents will explore all the arms even without a principal coordinating exploration. They consider a setting of qualities and subjective preferences such that the difference between any two qualities is smaller than 1 and the subjective preferences have random binary values. For each arm, this setting guarantees the existence of agents who prefer the arm no matter what quality is. Exploration is therefore guaranteed in the long run.
(AnalytisSGM17, ) provides simulation results (without theoretical guarantees) showing that certain amount of preference diversity can lead to an increase in the average utility in a model that is closely related to our model. The main difference to our model is that they assume each agent searches items according to popularity levels of items and stops searching once an item has value larger than some threshold. On the contrary, we assume each agent performs the optimal search.
3. Model
We consider a setting with infinitely many items^{1}^{1}1This assumption allows us to avoid edge cases in our analyses; our results carry over to settings with finitely many items so long as there are sufficiently many of them (in particular, more than the logarithm of the time horizon). of unknown value. Agents arrive sequentially in rounds. The agent that arrives in round has a value for item that is the sum of two terms: an objective score for the item and an individual subjective score . The objective scores (qualities) are the same for all agents and are drawn from a Gaussian with mean zero and variance for some . We refer to this score as the quality of item . The subjective scores vary among agents and are drawn independently from a Gaussian with mean zero and variance . The total value of an item, , is thus distributed according to a Gaussian with mean zero and variance one.
Definition 3.1 (Diversity Level).
We define the diversity of a population of agents to be the parameter
(the standard deviation of the subjective score).
When the diversity is , all the value of an item is derived from its objective score, i.e., , and so agents’ values are perfectly correlated. When diversity is , all the value of an item is derived from its subjective score, i.e.,
, and so agents’ values are independent. Intermediate diversities interpolate between these two extremes.
Initially, the agent only knows a prior over her value for each item , which is a function of the initial Gaussian priors and her observations from past agents’ searches. The agent may learn the objective score and subjective score of an item , and hence its value , by paying a search cost , where is a fixed constant smaller than (chosen so that the agents are incentivized to explore at least one item). A search strategy of an agent is a mapping from observations to either a choice from among previously explored items, or an item to explore next. The utility of the agent for a given search strategy and priors is her expected value for the chosen item , , minus her total expected search cost, i.e., times the expected number of items she explores. The chosen item has to be an explored item.
The optimal search strategy for an agent in this setting was characterized by (Weitzman79, ) in a slightly more general setting called the Pandora’s problem. This strategy assigns a score, called the Weitzman index, to every item and then explores items with positive indices in decreasing order of their indices until the observed value is greater than the following item’s index.
Definition 3.2 (Weitzman Index).
The Weitzman index is defined as .
We assume that all agents use the optimal strategy and explore items in decreasing order of Weitzman index. We denote by the Weitzman index of an item that has not been explored previously, i.e. the solution to . The choice of guarantees that and thus every agent will explore at least one item.
Once an item is explored (but not necessarily chosen) by some agent, agents in subsequent rounds obtain information about that item. There are multiple choices we can make about exactly what information agents learn; the right choice is dictated by the setting and whether agents can infer each others’ subjective biases. In this paper, we will focus on a “revealed quality” model, where, after an item is explored by some agent, its quality is revealed to all agents in future rounds. In this model, agents update their prior for a previously explored item to before computing its Weitzman index . We also consider a “revealed value” model, where after an item is explored by some agent in round , only its value to that agent is revealed to future agents. In this model, the optimal strategy of the agents becomes more complicated. Given the past values of agents for a given item, the current agent forms a posterior distribution of the item’s quality. This posterior can be expressed as a Gaussian distribution with mean and variance that depend on the average past observed values and the number of times that item gets explored. We can show similar results in the “revealed value” model. We omit them because of the space limit.
Our goal is to understand how the expected average utility changes depending on the diversity level of the society. The impact of diversity will vary with the time spent searching, and so we define the average utility with respect to a time horizon . Let be a history indicating the information obtained about the items explored in the first rounds and be the induced priors for the agent arriving in round .
Definition 3.3 (Average Utility).
We define to be the average expected utility over agents in the first rounds when the diversity level is .
3.1. Justification of Our Model
Here we provide the rationale behind the model choices.
Decomposition in Common and Private Values The assumption that agent values decompose in independent common and private components is commonly used in the empirical auction literature. Going back to the work of (LiV98, ) who showed how to perform the decomposition based on deconvolution, it has been shown that it predicts the agent behavior quite accurately in realworld data (Krasnokutskaya11, ; Asker10, ).
Normally Distributed Values The choice of normal distributions for the common and private values enables expressing a wide range of diversity levels with a single parameter. The constant sum of variances property allows for fair comparisons as in all cases agents have the same distribution of values for every option. Normal distributions appear in practice; e.g., the distribution of metacritic scores for movies on Rotten Tomatoes closely fits a normal distribution; and the distribution of the common value in the US offshore oil and gas auctions has been shown to fit a normal distribution using econometrics (SyrgkanisTZ18, ).
4. How Different Diversity Levels Compare
There are two opposing forces that affect average utility – exploration and exploitation. When the society is not very diverse, agents tend to exploit previously explored options as it is quite costly for them to search for new alternatives that could potentially give them a higher value. On the other hand, diverse societies tend to explore more, but cannot exploit as much since information collected in the past is less valuable to them.
Focusing solely on exploration or exploitation can come at a significant cost. In the extreme case where the society is not at all diverse (i.e., ) and focuses on exploitation, we can show the society’s average expected utility, , can be upper bounded by a constant. In particular, does not depend on the number of rounds .
Lemma 4.1 ().
For any , .
To prove Lemma 4.1, we first bound by a constant the Weitzman index of an unsearched item, i.e. one whose quality is not yet revealed. We then argue that once some item with sufficient quality has been identified (with quality ), no other item will be explored in future rounds. We conclude the proof by upperbounding the expected quality of such an item conditional on its quality being higher than . This directly gives a bound for the average utility of the agents. The proof of Lemma 4.1 is given in Appendix B.
In the other extreme case where agents are fully diverse, the society’s average expected utility, , is even worse. We show that it is always the case that a fully diverse population obtains lower utility than a nondiverse population and so, in particular, its utility is also bounded by a constant independent of .
Lemma 4.2 ().
For any , .
To prove Lemma 4.2, we observe that the first agent in any society performs the same search no matter the diversity level and so achieves the same expected utility. When the diversity level is 0, the expected utility improves across rounds, while when the diversity level is 1, the expected utility stays the same across rounds, implying the result. The proof of Lemma 4.2 is given in Appendix B.
In both of these extreme cases, the average expected utility is bounded by a constant that depends on the search cost but not on the number of rounds. However, as we show, any intermediate diversity level can be significantly better than those extreme cases as the average expected utility in those cases grows with the number of rounds. Thus, it becomes clear that societies with intermediate diversity levels achieve higher average expected utility than those with extreme diversity levels.
Our main result is a characterization of how different diversity levels compare for different numbers of rounds . We identify three regions, depending on the number of rounds, that affect which diversity level is better than the other.
When the number of rounds is small, the smaller diversity level achieves higher average expected utility. This is because the benefit of broader past explorations is not as significant compared to knowing the values of a smaller subset of items more accurately. We show numerically in Section 4.1 that this is indeed the case and that initially, for very few rounds, almost no diversity is preferable.
When takes intermediate values, the value of exploring a larger number of options becomes more significant than higher accuracy on fewer options and thus the higher diversity level is preferred. Theorem 4.3 shows the existence of this region under a technical assumption which requires a small gap between and .
Theorem 4.3 (Intermediate ).
Let be two diversity levels with . There exists an interval such that for all , i.e. the large diversity level outperforms the small diversity level . Here and depend on and .
When is very large, the smaller diversity level becomes again preferable to the higher level as in both cases a comparably large number of options have been explored in which case it is again more beneficial to know the value of the explored items more accurately. This region is described in Theorem 4.4.
Theorem 4.4 (Very large values of ).
Let be two diversity levels with . There exists an interval such that for all , i.e. the small diversity level outperforms the large diversity level . Here depends on and .
Theorems 4.3 and 4.4 show that the diversity level is preferable to a small diversity level for intermediate number of rounds but as the number of rounds increases the smaller diversity level becomes more socially beneficial. The proofs of the theorems are presented in Section 4.2.
We note that even though is guaranteed to be larger than for large values of in the third region, this will happen at extremely large values of . In fact, in all simulations presented in the next subsection, we ran up to rounds for different values of , we did not observe any transition from the second region where to the third region where . This is because the behavior of the is tightly characterized by the upperbound we will show in Proposition 4.5, i.e., , which shows that the dependence on is doublylogarithmic and thus extremely large values of are needed to reach the third region. Thus, for all practical purposes, we observe in simulations that the optimal diversity level increases monotonically with the number of rounds.
4.1. Behavior for small  Numerical Simulations
Before we proceed with the proof of Theorems 4.3 and 4.4, we present some simulations that show the evolution of society’s expected average expected utility with the number of rounds in the “revealed quality” model. Figure 1, shows the plots for different number of rounds up to 100 and different diversity levels . We present two plots, one for and one for . For each of the different scenarios, we run independent runs.
As can be seen from the figure, all different diversity levels start at the same point for . The full diverse case remains constant at that value achieving the lowest possible utility than all diversity levels as every agent searches from scratch every time without reusing any information from the past. The case without diversity () on the other hand, initially has an increasing average social utility but as we’ve shown in Lemma 4.1, approaches a constant value after few rounds.
Diversity levels and , achieve lower expected utility than the nodiversity case in the first rounds but eventually all outperform it as described by Theorem 4.3. The lowest diversity level manages to do this faster but also gets outperformed by the higher diversity level in later rounds. The diversity level also outperforms after rounds (not shown in the graph) but doesn’t seem to outperform even after simulating for rounds. This is an indication that the gap in diversity levels required by Theorem 4.3 is necessary for statement to hold.
4.2. Behavior for intermediate and large values of
We now move on to prove Theorems 4.3 and 4.4. Our main technical contribution that drives the proof of those Theorems is a sharp characterization of the average expected utility for any fixed diversity level . Proposition 4.5 gives a closed form expression for the average expected utility which is tight up to additive constants.
Proposition 4.5 ().
For any , and any ,
The proof of Proposition 4.5 can be found in Appendix B. The main observation in establishing the statement is that the expected utility is well approximated by the maximum quality of the revealed items after rounds (Lemma B.3). Using this observation, we obtain the result by giving upper and lower bounds to the best item that is revealed in the end which in turn is closely related to the number of revealed items. To give the lowerbound (Lemma B.4), we bound the rate that new items are revealed by bounding the probability that all available options are unsatisfactory for a given agent, i.e. when his true value is negative for all currently explored items. For the upperbound (Lemma B.5), we show that the probability that a new item is explored is exponentially small with the number of high quality items already known. Thus as we show, the number of items explored after rounds is logarithmic in . We obtain the bound on the maximum quality of a revealed item by noting that the maximum of
gaussian random variables with variance
is approximately . This gives us the approximate expression of the expected utility .Given the closed form expression of Proposition 4.5, we can easily obtain Theorems 4.3 and 4.4 as corollaries.
Proof of Theorem 4.3:
By Proposition 4.5, there exists a constant such that
For notation convenience, define constant . Set and .
For any , we can upper bound by constant :
For any , we can lower bound by constant :
Therefore for any , . In order to make sure , we need .
Proof of Theorem 4.4:
For any , we have that . This immediately implies that for any , there exists a large enough such that for all , as required by Theorem 4.4.
5. An Extension with Rare but High Reward Items
We extend our results to a setting where agents’ values are correlated in a more structured way. In this extension, we introduce an additional term of objective value in the quality of an item, i.e. we write as where takes value with probability and 0 otherwise while is sampled from as before. We call an item with a “diamond” due to its rarity and high value. We assume that so that the exploration cost is larger than the expected value of an item.
In this diamond model, diversity gives significant help in early rounds. Our main result in this model is that the existence of a diamond can significantly change the behavior and average utilities of agents of different diversity levels.
Theorem 5.1 ().
Consider any constant diversity level . If the number of rounds satisfies , then as the diamond value increases.
In contrast to the previous section, where exploration in the full diverse setting is meaningless as future agents cannot benefit from the experience of their predecessors, Theorem 5.1 shows that if there is even a tiny probability that a very desirable and socially beneficial alternative is identified, diversity helps significantly. Therefore, we show that in societies where even fully diverse agents can agree on a clearly superior outcome which might occur with very small probability, larger diversity levels work in society’s favor as they help identify this option faster.
The full proof of Theorem 5.1 can be found in Appendix B. Here we show some important steps of the proof.
To prove Theorem 5.1, we first show that when the diversity level is 1 and , the expected average utility is close to the diamond value .
Lemma 5.2 ().
is at least for .
The main idea behind the proof of Lemma 5.2 is to show that at least one new item is searched in each round before a diamond is discovered and thus in , the diamond will be found. The proof of Lemma 5.2 can be found in Appendix B.
Next we proceed to bound the expected average utility for any diversity level , when , . We start with the extreme case when . The proof is similar to the proof of Lemma 4.1 and is given in Appendix B.
Lemma 5.3 ().
is at most .
The case for intermediate diversity level is more complicated. We give an upper bound on the average utility in the appendix (Lemma B.9).
Appendix A Tools for Gaussian Distributions
In this section, we list several facts and lemmas about Gaussian distributions. Throughout this document, we use to refer to the of the Gaussian and to refer to the density function of the Gassian .
Fact A.1 ().
Let be the CDF of standard Gaussian distribution then for ,
Fact A.2 ((Baricz08, )).
Let be the CDF of standard Gaussian distribution. Let be the density of standard Gaussian distribution. For any ,
The following two lemmas can be proved by standard techniques about Gaussian distributions. We omit their proofs because of space limit.
Lemma A.1 ().
For any , let be sampled from independently. Then we have
Lemma A.2 ().
For any , let be sampled from independently. Then we have
Appendix B Omitted Proofs
b.1. Ommitted Proofs of Lemma 4.1 and Lemma 4.2
Lemma B.1 (Restatement of Lemma 4.1).
For any , .
Proof.
The proof plan is to first bound by a constant the Weitzman index of an unsearched item, i.e. one whose quality is not yet revealed. We then argue that once some item with sufficient quality has been identified (with quality ), no other item will be explored in future rounds. We conclude the proof by upperbounding the expected quality of such an item conditional on its quality being higher than . This directly gives a bound for the average utility of the agents.
We start by bounding the Weitzman index of an unsearched item. For an unsearched item, the distribution of the value is . The Weitzman index satisfies where . Using Fact A.1 on Gaussian distributions, we see
This implies an upper bound on the Weitzman index of an unsearched item:
When an item has been searched with observed quality , since the diversity level is 0, the values of this item for future agents are determinstically . Therefore the Weitzman index of this item is simply .
Once some item has quality , its Weitzman index is larger than which means it will be searched before any unsearched items. And once this item is searched in a round, we know its realized value is which is larger than . Therefore no more unsearched items will be searched.
Let be the quality of the first item with quality larger than . We have . We know that starting from the round this item is searched, each agent’s utility is at most and before this item is ever searched, each agent’s utility is at most . Therefore we can bound the average utility as following:
∎
Lemma B.2 (Restatement of Lemma 4.2).
For any , .
Proof.
First notice that when the diversity level is 1, values are completely independent of the history and have the same distributions across agents. Therefore, the expected utilities of all agents are the same. Therefore equals the expected utility of the first agent.
When the diversity level is 0, consider the following strategy for agents except the first agent: just search the item chosen by the first agent and choose it. By using this strategy, we know all agents will get at least the utility of the first agent. By the optimality of Weitzman index strategy, we know will be at least the expected utility of the first agent.
Finally, notice that the expected utility of the first agent does not depend on the diversity level. This implies . ∎
b.2. Omitted Proof of Proposition 4.5
We proceed to show Proposition 4.5. We first relate the average expected utility to the expectation of the positive part of the highest quality explored during the first rounds when the diversity level is , denoted by . Here the positive part of means . In Lemma B.3, we show that is sandwiched between and . Subsequently, we provide sharp bounds on and show that and are very close for large .
Lemma B.3 ().
For any diversity level and any ,
Proof.
Let’s first prove the left hand side. For each agent after round , a suboptimal strategy is to explore and pick the item with highest quality searched before the end of round and leave. This gives utility in expectation. By the optimality of the Weitzman index strategy, agents after round should at least get in expectation. Again by the optimality of the Weitzman index strategy, agents in the first rounds should get utility at least 0 in expectation. Thus, we have .
Now let’s consider the right hand side. The main idea is to upper bound the expected utility of the last round by .
Let be the number of items explored in the first rounds, with qualities . In round , the agent’s expected utility will be at least as large as the expected utility of agents in previous rounds as more information is revealed. Let be the item where the th agent stops his search. We fix and in this argument and expectations are taken over randomness of subjective scores of the th agent. th agent’s value for any previously explored option is less than , the Weitzman index of item , as he chose to explore item . Moreover, since he decided to stop at , his value at , must be higher than , the Weitzman index of the next item to search in round . After the search, the th agent might take item or some other items with value smaller than . Thus, his expected utility, is at most and is sampled from conditioned on .
Using Lemma A.2, we can upper bound the expected utility of the agent in round by
Recall is the Weitzman index of an unexplored item. If item has been searched before round , we have . We can simply get
If item hasn’t been searched before round , we bound the difference between and . The definition of Weitzman index gives us,
Note that the last expression is strictly less than when implying that . Then we get
To sum up, no matter item has been searched or not before round , the expected utility of the agent in round is at most . Since the agent in round has expected utility at least as high as expected utilities of agents in previous rounds, we conclude that . ∎
In Lemma B.4, we give a lower bound on . The main idea of the proof is to give a lower bound on the number of searched items and then show that is roughly the expected value of the maximum of that many independent Gaussians. The lower bound on the number of searched items can be achieved by considering events (indicated as in the proof) when all the previously searched items have low values (caused by very negative subjective scores).
Lemma B.4 ().
For any diversity level , and any ,
Proof.
To prove the statement, we fix a sufficiently large parameter and we show that with high probability after a sufficiently large number of rounds , two events will happen:

The maximum quality among the first items is at least ,

At least one agent will explore an item with quality higher than .
We first bound the probability that the maximum quality of the first items is less than . As the qualities of all items are independent, this is a standard calculation about the maximum of independent Gaussian random variables. Setting , we have:
from Fact A.1  
Assuming that there is some item with quality higher than among the first , we now bound the probability that some agent will explore it. Let be the indicator of the event that in round , for all items with , . It is clear that if for some , for all items with quality less than so the agent will keep exploring until he finds the item with quality higher than .
We lower bound the probability that . We have that:
Setting for , we get that . Thus, the probability that no agent will find the high quality item is at most: .
Thus, overall we have that if
for  
Thus, setting so that , we get the required bound. ∎
In addition to this lower bound, in Lemma B.5, we provide an upper bound on . The main idea of the proof is to consider items with good qualities and show that the chance of searching a new item decreases exponentially with the number of such items. We perform the analysis using a coupling argument to an imaginary setting in which a larger number of items is always explored.
Lemma B.5 ().
For any diversity level ,
Proof.
Let be a random variable that indicates the number of explored items after the agent in the th round has finished exploring, label items in the order in which they were first explored, and let be random variables that denote the quality of the items explored in the first rounds. Then, in this notation, , the expected positive part of the maximum quality of the explored items at the end of round , is .
To bound this expression, we define the following alternative exploration procedure that is significantly easier to analyze. We will show that there is always a coupling between the original and the alternative processes so that more items are always explored in the latter. Let be the Weitzman index that satisfies when . In our alternative process, every agent goes through every item starting from and, whenever , flips a coin that has probability of coming heads. If the coin comes heads he stops, otherwise he continues on to the next item . Let be the maximum number of items that have been explored by the agents in the first rounds of this alternative procedure.
We define a coupling such that . In the original process, we have that every agent has value for item . In the alternative process, if , we couple the coin with the event that which also has probability . Assuming inductively that , we will show that . This follows immediately if since we have that . Otherwise, when , the original agent in the ’th round must have explored new item(s). Therefore, he must have for all items for which as he must have explored all of those first before reaching item . By definition of the coupling, the alternative agent will not stop in any item such that and thus .
Thus, . We complete the proof by bounding in Lemma B.6. ∎
Lemma B.6 ().
Let be as defined in the proof of Lemma B.5. Then
Proof.
For notation convenience, define . Letting denote the expected maximum of random variables distributed according to , we have that . From Lemma A.2, we have that and thus we get that
for all .
We now bound , i.e. the probability that items with qualities higher than are explored during the rounds. We note that this can happen if in at least one round an agent manages to get coin flips to land tails before stopping the exploration. Every agent achieves this with probability at most and thus for all rounds, the probability that items with value higher than are explored is at most by a union bound. We thus get that:
Setting we have that
and we obtain
∎
Proof of Proposition 4.5: Let’s first prove the upper bound on . By Lemma B.3 and Lemma B.5, we have
Now let’s prove the lower bound on . Notice that if ,
and therefore the lower bound is trivial. We only need to prove the lower bound when .
Comments
There are no comments yet.