I Introduction
It is becoming increasingly common for companies to make use of machine learning (ML) predictions in their services [1, 2, 3, 4]. When there are several companies on the market offering similar MLbased services, customers choose only one service they prefer the most, arguably the highest quality service within their budget. In this user selection process, the customer pays a fee to the company that provides the service, which naturally creates competition among companies on the market. As a result, competing companies strive to produce highquality ML predictions to attract more customers, which leads to buying customer data or subscriptions to data marketplaces [5].
For example, the competing companies in the U.S. auto insurance market, namely, State Farm, Progressive, and AllState, use ML predictions to analyze customer data, assess risk, and adjust premiums [6]. They provide an insurance called the PayHowYouDrive, which offers cheaper premiums than regular auto insurances on the condition that the insurer monitors driving patterns such as rapid acceleration or oscillations in speed [7, 8]. That is, auto insurance companies provide financial benefits to customers for the purpose of collecting customers’ driving pattern data. Using the purchased data, the companies develop better ML models for their business (e.g., updating the insurance recommendation model or reassessing the risks) while competing with each other.
Analyzing the effects of data purchase in competitions could have practical implications, but it has not been studied much in the ML literature. The effects of data acquisition have been considered extensively in active learning (AL), the problem of finding effective data to label [9, 10]. However, since the AL usually assumes a singleagent situation, it is not straightforward to establish competing systems that require more than one competitor. Recently, [11] examined the impacts of competitions by modeling an environment where several predictors can compete for user data. They showed that competition pushes competing predictors to focus on a small subset of the population and helps users find highquality predictions. It describes interesting implications of competition, but the impact of data purchase has not been investigated and this is the main focus of our work. Our proposed environment can represent a general case in which users change their choice when there is a company willing to offer a financial benefit (See Figure 1). Related works are further discussed in Section V.
Contributions
In this paper, we study what happens when competing predictors can purchase customer data to improve their ML models. We propose a novel environment that can simulate various realworld competitions. Our environment allows ML predictors to purchase labeled data via AL algorithms within a finite budget while competing against each other (Section II). Surprisingly, our results show that when competing predictors purchase data, the quality of the predictions selected by each user can decrease even as competing ML predictors get better (Section IIIA). We explain this counterintuitive finding by demonstrating that data purchase makes competing predictors similar to each other (Section IIIB). We theoretically analyze how the diversity of users’ available options can affect the user experience to support our empirical findings. (Section IV).
Ii A general environment for competition and data purchase
This section formally introduces a new and general competitive environment that allows competing predictors to acquire data points within a finite budget. In our environment, competition is represented by a series of interactions between a sequence of users and the fixed number of competing predictors, where the interaction is modeled by supervised learning tasks. To be more specific, we define some notations.
Notations
For , we denote a user query by and its associated user label by . Throughout this paper, we focus on classification problems, i.e., is finite, while our environment can easily extend to regression cases. We denote a user stream by a set and assume each user datum is an independent and identically distributed (i.i.d.) sample from a distribution . We call the user distribution. As for the predictor side, we suppose there are competing predictors in the market. For , each ML predictor is described as a tuple , where is the number of i.i.d. seed data points from , is a budget, is an ML model, and is a buying strategy.
We consider a predictor initially owns data points and can additionally purchase user data within budgets. We assume the price of one data point is one, i.e., a predictor can purchase data up to data points from a sequence of users. A predictor produces a prediction using the ML model and determines whether to buy the user data with the buying strategy . We consider the utility function for is the classification accuracy of with respect to the user distribution . Lastly, and are allowed to be updated during the competition rounds. That is, companies strive to provide good quality predictions and keep improving their ML models. In the following, we elaborate on the details of the competition dynamics.
Competition dynamics
Before the first round, all the competing predictors independently train their model with the seed data points. After this initialization, at each round , a user sends a query to all the predictors , and each predictor determines whether to buy the user data. We describe this decision by using the buying strategy . If the predictor thinks that the labeled data would be worth one unit of budget, we denote this by . Otherwise, if thinks that it is not worth one unit of budget, then . As for the , ML predictors can use any streambased AL algorithm [12, 13]. For instance, a predictor can use the uncertaintybased AL rule [14]; considers purchasing user data if the current prediction is not certain: the Shannon’s entropy of is higher than some predefined threshold value where
is the probability estimate at the
th round. In brief, we suppose a predictor shows purchase intent if the remaining budget is greater than zero and . If the remaining budget is zero or , then provides a prediction to the user instead of showing purchase intent.We now explain how a user selects one predictor. At every round , the user selects only one predictor based on both purchase intents and prediction information received from . If there are more than one buyer, then we assume that a user selects one of the buyers uniformly at random. This assumption can be thought that users prefer the financial advantage (e.g., discounts or coupons) to the model’s prediction quality, and thus users select one of companies which show purchase intents. Once selected, the only selected predictor’s budget is reduced by one; all other predictor’s budget stays the same because they are not selected and do not have to provide financial benefits. If no predictor shows purchase intent, then a user receives prediction information and chooses the predictor randomly. We assume that higher quality prediction is more likely to be selected and the probability is defined as
(1) 
where denotes a temperature parameter and is a predefined quality function that measures similarity between the user label and the ML prediction (e.g., ). We assume that users are rational in that users are more likely to select highquality predictions, i.e., .
We denote the index of selected predictor by . The selected predictor gets the user label and updates the model by training on the new datum . The other predictors stay the same for .
Example 1 (Auto insurance in Section I).
includes the th driver’s demographic information, insurance claim history, and desired price range, and is the driver’s preferred insurance plan within the user’s budget constraints. Each predictor is one insurance company (e.g., State Farm, Progressive, or AllState). If a company finds the th driver’s data is worth buying because there are infrequent similar data in its database, it can offer this driver discounts to attract her and collect her driving pattern data. When no company shows purchase intent, the regular competition begins and each company offers an auto insurance plan based on what it predicts to be most suitable for this driver. Then, the driver chooses one company whose offered plan is the closest to . Afterwards, the acquired data then can be used to improve the company’s future predictions.
Characteristics of our environment
It is noteworthy that a user selects only one predictor at each round. This makes user labels a limited resource in the market, so competition among ML predictors naturally arises. This was firstly considered by [11], but we significantly extend it to incorporate ALbased data purchase systems.
Our environment simplifies data purchase and realworld competition, which usually exist in much more complicated forms, yet it has several powerful characteristics. First, our environment is realistic in that it represents the rational preference of customers and companies in data purchase. Customers are likely to choose the best service within their budget after comparing options, but they can change their selection if there is a company offering financial benefits, such as promotional coupons, discounts, or free services [15, 16, 17]. Although this buying process could be costly for the companies, it enables them to effectively collect user data within finite budgets using AL algorithms. Second, our environment is flexible and take into account various competition situations. Note that we make no assumptions about the number of competing predictors or budgets , algorithms for predictors or buying strategies , and the user distribution .
Iii Experiments
Using the proposed competition environment, we investigate the impacts of the data purchase on the quality and diversity of ML predictions across various user distributions. Our experiments show an interesting phenomenon that data purchase can decrease the quality of the predictor selected by a user, even when the quality of the predictors gets improved on average (Section IIIA). In addition, we demonstrate that data purchase makes ML predictors similar to each other. Data purchase reduces the effective variety of options, and predictors can avoid specializing to a small subset of the population (Section IIIB). In Appendix, we provide additional experimental results with various competition settings to show robustness of our findings.
Metrics
To quantitatively measure the effects of data purchase, we introduce evaluation metrics. First, we define the overall quality as follows.
(Overall quality) 
where the expectation is taken over the user distribution . The overall quality represents the average quality that competing predictors provide in the market. Another type of quality metric is the quality of experience (QoE), the quality of the predictor selected by a user. The QoE is defined as
(QoE) 
Here,
is a random variable for a selected index defined in Equation (
1), and the expectation is considered over the random variables . Considering that a user selects one predictor based on Equation (1), QoE can be considered as the utility of users. Note that the overall quality and QoE capture different aspects of prediction qualities, and they are only equal when users select one predictor uniformly at random, i.e., when (See Lemma 1).Next, we define the diversity to quantify how variable the ML predictions are. To be more specific, for , we define the proportion of predictors whose prediction is as and the diversity is defined as
(Diversity) 
where the expectation is taken over the marginal distribution and we use the convention when . Note that the diversity is defined as the expected Shannon’s entropy of competing ML predictions. When there are various different options that a user can choose from, the diversity is more likely to be large.
In our experiments, we run competition rounds, and to capture the effects of data purchase in competition dynamics, we evaluate the three metrics after the rounds. That is, we do not perform the data purchase procedure during the evaluation. As for the evaluation, since it is difficult to compute exact expectations, we estimate the three metrics with the heldout test data that are not used during the competition rounds.
Implementation protocol
Our experiments consider the three real datasets to describe various user distributions , namely Insurance [18], Adult [19], and Postures [20]
datasets. To minimize the variance caused by other factors, we consider a homogeneous setting: for each competition, all predictors have the same number of seed data
and budgets , the same classification algorithm for , and the same AL algorithm for. We use either a logistic model or a neural network model with one hidden layer for
and the standard entropybased AL rule for [14]. Throughout the experiments, we fix the total number of competition rounds to , the number of predictors to , and a quality function to the correctness function, i.e., for all . We set a small number for seed data points , between 100 and 200 depending on a dataset, to prevent models from being saturated before the first round. We consider various competition situations by varying the budget ^{1}^{1}1In Section III, for notational convenience, we often suppress the predictor ID in the superscript if the context is clear. For example, we use instead of . and the temperature parameter . All the results are based on 30 independent experiments. In Appendix, we provide detailed information on the implementation.Iiia Effects of data purchase on quality
We first study how data purchase affects the overall quality and the QoE in various competition settings. Figure 2 illustrates how QoE changes with respect to (w.r.t.) the overall quality. When , data purchase increases the overall quality as increases across all datasets. This can be explained as follows. Given that a predictor buys user data using a streambased AL algorithm (e.g., a predictor buys when its prediction is highly uncertain), the active data acquisition reduces the model’s uncertainty and increases the quality of the individual model. As a result, data purchase increases the overall quality. In particular, when and the dataset is Postures, the overall quality is on average when , but it increases to and when and , which correspond to and increases, respectively. As for QoE, however, data purchase mostly decreases QoE as increases. For example, when the user distribution is Insurance and , QoE is when , but it reduces to and when and , which correspond to 1% and 7% reduction, respectively.
Implications of data purchase on quality
As Figure 2 shows, in most cases, surprisingly, QoE decreases even when the overall quality increases. In other words, the quality that competing predictors provide is generally improved, but it does not necessarily mean that users will be more satisfied in terms of the quality of the prediction. Although this result might sound counterintuitive, we argue that it could happen if the data purchase restricts users from choosing various options and decreases the probability of finding highquality predictions when increases. To verify this, in the next section, we investigate how data purchase affects the diversity.
IiiB Effects of data purchase on diversity
We now study how data purchase affects the diversity. Figure 3 illustrates the diversity as a function of the budget in various competition settings. When , the diversity monotonically decreases w.r.t. across all datasets. That is, the competing predictors get similar as more budgets are allowed. In particular, when and the dataset is Adult, the diversity is on average when , but it reduces to and , which correspond to and reduction, when and , respectively.
When , the diversity does not vary much w.r.t. across all datasets. This is because competing ML predictors are mainly trained on a large enough number of i.i.d. user data. Note that when , users select one predictor uniformly at random during regular competition rounds. Even though competing predictors actively can buy the user data when , the i.i.d. user data from the regular competitions and the seed dataset are large enough, and the purchased data points do not affect ML predictions much.
Iv Theoretical analysis on competition
In this section, we first establish a simple representation for QoE when a quality function is the correctness function. Based on this finding, we theoretically analyze how the diversitylike quantity can affect QoE. Proofs are provided in Appendix.
Lemma 1 (A simple representation for QoE).
Suppose there are predictors and a quality function is the correctness function, i.e., for all . Let be the average quality for a user . For any , we have
(2) 
where the equality holds when and the expectation is considered over .
Lemma 1 establishes a relationship between QoE and the overall quality: the QoE is always greater than the overall quality if . Furthermore, it shows that QoE can be simplified as a function of the average quality when a quality function is the correctness function. When is not the correctness function, QoE does not have an explicit representation. We present upper and lower bounds for QoE in Appendix. Using Lemma 1, we elaborate on the condition for when QoE decreases in the following theorem.
Theorem 1 (Comparison of two competition dynamics).
Suppose there are two sets of predictors, and . Without loss of generality, the overall quality for is larger than that of . For the correctness function , we define and as Lemma 1. If and for some explicit constants and , then QoE for is smaller than that for .
Theorem 1 compares two competition dynamics and shows the condition for when QoE for is smaller than that for the other while the associated overall quality is larger. For ease of understanding, we can regard (resp. ) as a set of ML predictors when (resp. when ). Theorem 1 implies that QoE decreases when is sufficiently greater than . Considering our results in Figure 3 that data purchase makes competing predictors similar to each other when is large enough, the average quality is more likely to become zero or one. It results in an increase in variance because the variance is maximized when random variables spread over . Theorem 1 supports our experimental results by explaining why QoE decreases after data purchase with an increase in .
V Related works
This work builds off and extends the recent paper [11], which studied the impacts of the competition. We extend this setting by incorporating the ALbased data purchase procedure into the competition system. Note that the setting by [11] is a special case of ours when for all . Our environment enables us to study the impacts of data acquisition in competition, which is not considered in the previous work. Compared to the previous work, which showed competing predictors become too focused on subpopulations, our work suggests that this can be a good thing in that it provides a variety of different options and better quality of the predictors selected by users.
A related field of our work is the streambased AL where the problem is to learn an algorithm finding effective data points to label from a stream of data points [9, 10]. Our competition environment has unique features that cannot be described by AL. In AL, since there is only one agent, competition cannot be established. In addition, while an agent in AL collects data only from label queries, competing predictors in our environment can obtain data from data purchase as well as regular competition. These differences create a unique competition environment, and this work studies the impacts of data purchase in competitive systems.
Competition has been studied in multiagent reinforcement learning (MARL), which considers a setting where a group of agents in a common environment interact with each other and with the environment
[21, 22]. Competing agents in MARL maximize their own objective goal that could conflict with others. This setting is often characterized by zerosum Markov games and is applied to video games such as Pong or Starcraft II [23, 24, 25]. We refer to [26] for a complementary literature survey of MARL.There are some similarities between MARL and our environment, but in our environment, user selection and data purchases uniquely define the interaction between users and ML predictors. In MARL, all agents observe the information (e.g., status and reward) drawn from the shared environment and use them to update the policy function. In contrast, in our environment, the only selected predictor obtains the label and updates the predictor. In addition, ML predictors can collect data points through the data purchase. These settings have not been considered in the field of MARL.
Vi Conclusion
In this paper, characterizing the nature of competition and data purchase, we propose a new competitive environment that allows predictors to actively acquire user labels. Our results show that even though the data purchase improves the quality that predictors provide, it can decrease the quality that users experience. We explain this counterintuitive finding by demonstrating that data purchase makes competing predictors similar to each other.
References
 [1] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Itemtoitem collaborative filtering,” IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.
 [2] P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, 2016, pp. 191–198.

[3]
B. Marr, “The amazing ways ebay is using artificial intelligence to boost business success,”
https://www.forbes.com/sites/bernardmarr/2019/04/26/theamazingwaysebayisusingartificialintelligencetoboostbusinesssuccess, 2019, posted April 26, 2019; Retrieved May 19, 2021.  [4] J. Nanduri, Y. Jia, A. Oka, J. Beaver, and Y.W. Liu, “Microsoft uses machine learning and optimization to reduce ecommerce fraud,” INFORMS Journal on Applied Analytics, vol. 50, no. 1, pp. 64–79, 2020.

[5]
J. Meierhofer, T. Stadelmann, and M. Cieliebak, “Data products,” in
Applied Data Science
. Springer, 2019, pp. 47–61.  [6] K. Sennaar, “How america’s top 4 insurance companies are using machine learning,” https://emerj.com/aisectoroverviews/machinelearningatinsurancecompanies, 2019, posted February 26, 2020; Retrieved May 19, 2021.
 [7] S. Arumugam and R. Bhargavi, “A survey on driving behavior analysis in usage based insurance using big data,” Journal of Big Data, vol. 6, no. 1, pp. 1–21, 2019.
 [8] Y. Jin and S. Vasserman, “Buying data from consumers: The impact of monitoring programs in us auto insurance,” Unpublished manuscript. Harvard University, Department of Economics, Cambridge, MA, 2019.
 [9] B. Settles, “Active learning literature survey,” University of WisconsinMadison Department of Computer Sciences, Tech. Rep., 2009.
 [10] P. Ren, Y. Xiao, X. Chang, P.Y. Huang, Z. Li, X. Chen, and X. Wang, “A survey of deep active learning,” arXiv preprint arXiv:2009.00236, 2020.
 [11] T. Ginart, E. Zhang, Y. Kwon, and J. Zou, “Competing ai: How does competition feedback affect machine learning?” in International Conference on Artificial Intelligence and Statistics. PMLR, 2021, pp. 1693–1701.
 [12] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, “Selective sampling using the query by committee algorithm,” Machine learning, vol. 28, no. 2, pp. 133–168, 1997.
 [13] I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning with drifting streaming data,” IEEE transactions on neural networks and learning systems, vol. 25, no. 1, pp. 27–39, 2013.

[14]
B. Settles and M. Craven, “An analysis of active learning strategies for
sequence labeling tasks,” in
Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing
, 2008, pp. 1070–1079.  [15] J. Rowley, “Promotion and marketing communications in the information marketplace,” Library review, 1998.
 [16] M. Familmaleki, A. Aghighi, and K. Hamidi, “Analyzing the influence of sales promotion on customer purchasing behavior,” International Journal of Economics & management sciences, vol. 4, no. 4, pp. 1–6, 2015.
 [17] I. Reimers and B. R. Shiller, “The impacts of telematics on competition and consumer behavior in insurance,” The Journal of Law and Economics, vol. 62, no. 4, pp. 613–632, 2019.
 [18] P. Van Der Putten and M. van Someren, “Coil challenge 2000: The insurance company case,” Technical Report 2000–09, Leiden Institute of Advanced Computer Science, Tech. Rep., 2000.
 [19] D. Dua and C. Graff, “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
 [20] A. Gardner, C. A. Duncan, J. Kanno, and R. Selmic, “3d hand posture recognition from small unlabeled point sets,” in 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2014, pp. 164–169.
 [21] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multiagent actorcritic for mixed cooperativecompetitive environments,” in Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.
 [22] J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multiagent reinforcement learning,” in International Conference on Machine Learning. PMLR, 2017, pp. 1146–1155.
 [23] M. L. Littman, “Markov games as a framework for multiagent reinforcement learning,” in Machine learning proceedings 1994. Elsevier, 1994, pp. 157–163.
 [24] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.
 [25] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multiagent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
 [26] K. Zhang, Z. Yang, and T. Başar, “Multiagent reinforcement learning: A selective overview of theories and algorithms,” arXiv preprint arXiv:1911.10635, 2019.

[27]
C.C. Chang and C.J. Lin, “Libsvm: A library for support vector machines,”
ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.  [28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashionmnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.

[29]
Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,”
ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.  [30] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
 [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
Implementation details
In this section, we provide implementation details. We explain the user distribution, ML predictors, and the proposed environment.
Datasets (user distribution)
As for the datasets (user distribution ), we used the following seven datasets for our experiments: Insurance [18], Adult [19], Postures [20], Skinnonskin [27], FashionMNIST [28], MNIST [29], and CIFAR10 [30] throughout the paper. For all datasets, we first split a dataset into competition and evaluation datasets: the competition dataset is used during the competition rounds and the evaluation dataset is used for evaluate metrics after the competition. For FashionMNIST, MNIST, and CIFAR10, we use the original training and test datasets for competition and evaluation datasets, respectively. For Insurance, Adult, Postures, and Skinnonskin, we randomly sample data points from the original dataset to make the evaluation dataset and use the remaining data points as the competition dataset. At each round of competition, we randomly sample one data point from the competition dataset. After the competition rounds, we randomly sample points from the evaluation dataset and evaluate the metrics (the overall quality, QoE, and diversity). Note that all of experiment results are based on the evaluation dataset. Table I shows a summary of the seven datasets used in our experiments.
As for the preprocessing, we apply the standardization to have zero mean and one standard deviation for
Skinnonskin. For the two image datasets, MNIST and CIFAR10 we apply the channelwise standardization. Other than the three datasets, we do not apply any other preprocessing. To reflect the customers’ randomness in their selection, we apply a random noise on the original label. We assign a random label with 30% for every dataset. This random perturbation is applied to both the competition and evaluation datasets.Dataset  The size of  The size of  Input dimension  # of classes 

competition dataset  evaluation dataset  
Insurance  13823  5000  16  2 
Adult  43842  5000  108  2 
Postures  69975  5000  15  5 
Skinnonskin  239057  5000  3  2 
FashionMNIST  60000  10000  784  10 
MNIST  60000  10000  784  10 
CIFAR10  50000  10000  3072  10 
ML predictors
We fix the number of predictors to throughout our experiments. For each dataset, which makes one competition environment, we consider a homogeneous setting, i.e., all predictors have the same number of seed data , a budget , a model , and a buying strategy . As for the buying strategy, we fix , where is the Shannon’s entropy of , and is the corresponding probability estimate for . That is, if the entropy is higher than the predefined threshold , a predictor decides to buy the user data. Note that
is the Shannon’s entropy of the uniform distribution on
.Table II shows a summary information for the seed data and the model for each dataset. Every ML predictor initially trains with the seed data points. For all experiments, we use the Adam optimization [31]
with the specified learning rate and epochs. The batch size is fixed to
. If an predictor is selected, then its ML model is updated with one iteration with the newly obtained data point, and we retrain the model whenever the ‘retrain period’ new samples are obtained.Dataset  Seed data  ML predictor  

Model  # of hidden nodes  Epoch  Learning rate  Retrain period  
Insurance  100  Logistic    10  50  
Adult  100  Logistic    10  50  
Postures  200  Logistic    10  50  
Skinnonskin  50  Logistic    10  50  
FashionMNIST  50  NN  400  30  150  
MNIST  50  NN  400  30  150  
CIFAR10  100  NN  400  30  150 
A summary of hyperparameters related to ML predictors by datasets. We consider homogeneous predictors if there are no further explanations. Logistic denotes a logistic model and NN denotes a neural network one hidden layer.
Proposed environment
In Environment 1, we describe our proposed environment.
Additional numerical experiments
In this section, we provide additional experimental results to demonstrate the robustness of our conclusions against (i) different distributions in the homogeneous setting (subsection A) and (ii) different modeling assumptions in the heterogeneous setting. As for the heterogeneous setting, we consider different buying strategies (subsection B), budgets (subsection C), and the number of competing predictors (subsection D).
a Additional results in the homogeneous setting
In Figure 4, we compare the QoE and overall quality with various user distributions: Skinnonskin [27], MNIST [29], FashionMNIST [28], and CIFAR10 [30] datasets. As we saw in Figure 2, the quality that users experience can decrease but the overall quality can increase as the budget increases. In addition, Figure 5 shows diversity as a function of budget for the four datasets as in Figure 3.
Implications of data purchase on diversity
We also compare the classspecific qualities of competing predictors. In Figure 6, we illustrate heatmaps of the difference where is the classspecific quality defined as
for and , and for is its average over predictors defined as . We use the Insurance and Adult datasets. When , the regular competitions happen, and the Adult heatmap shows that predictor 1 and predictor 5 so specialize to class 2 prediction that they sacrifice their prediction power for class 1 compared to other predictors. However, when , all predictors have similar levels of classspecific quality. The data purchase makes competing ML predictors similar and helps predictors not too much focus on a subgroup of the population.
[11] showed competing ML predictors too specialize on subpopulations when , which was also shown in Figure 6, and our results further show that this specialization can be alleviated when predictors purchase user data. However, as we discussed in Section IIIA
, it can hurt the quality of the predictor selected by a user. To explain this phenomenon, we demonstrate that the probability of finding lowquality prediction increases due to the reduction in the diversity. We illustrate the probability density functions of the average quality near zero in Figure
7. It clearly shows that the probability that the average quality is near zero increases as more budgets are used: The areas for (colored in yellow) are clearly larger than those for (colored in red). That is, as predictions become similar, it is more likely that all ML predictions are poor at the same time, and thus the probability that users are not satisfied with the predictions increases.B Different buying strategies
We use the same setting as in the homogeneous setting used in Section III, but with different buying strategies among predictors. To be more specific, we consider the three different types of buying strategies by varying the threshold of the uncertaintybased AL method. For constants , we consider the following buying strategy models , where is the Shannon’s entropy function and is the probability estimate given . We assume there are predictors for each buying strategy , , and . This modeling assumption considers the situation where there are three groups with different levels of sensitivity to data purchases. For instance, in our setting, is the most conservative data buyer.
Figure 8 shows the relationship between the QoE and the overall quality on the seven datasets when there are heterogeneous competing predictors with different buying strategies. Similar to the homogeneous case, in Figure 2, the overall quality increases, but QoE can decrease. Similarly, Figure 9 shows that the diversity decreases as the budget increases. It demonstrates the robustness of our findings against different environment settings. In addition, we provide probability density plots of the average quality at near zero. Figure 10 shows the similar trend as in the homogeneous setting in Figure 7.
C Different budgets
Here, we use the same setting used in the homogeneous setting but with different budgets. We use the Insurance, Adult, and Skinnonskin datasets. For , we assume that the first predictors have budgets, but the last predictors have budgets. That is, half of the predictors have half the budget compared to the other group. Figure 11 shows that the main trends appear again even when different budgets are used. This again shows the robustness of our results to different modeling assumptions.
D Different number of competing predictors
In this section, we show that our findings are not affected by the number of competing predictors in a market. All the experiments in Section III consider . Here, we consider the homogeneous setting but a different number of competing predictors or . As Figures 12 and 13 show, the main trends appear again when the number of predictors are changed.
Proofs and additional theoretical results
We provide proofs for Lemma 1 and Theorem 1 in the subsection E and provide QoE for a general quality function in the subsection F.
E Proofs
Proof of Lemma 1.
For notational convenience, we set for .
(3) 
where for ,
Since , for , we have
and
Since and is an increasing function, it concludes a proof. ∎
Proof of Theorem 1.
For and , let , , and . Note that
Thus, we have
For , we have
Therefore, since , we have
(4) 
Let and . From the inequalities (4), we have
The last equality is due to . Therefore, QoE is decreased if
where
Therefore, if there is a constant such that , then it concludes a proof.
By definition of , it is positive when .
By setting , it concludes a proof.
∎
F QoE for a general quality function
The following theorem shows the upper and lower bounds of QoE for a general quality function.
Theorem 2.
Suppose there is a set of prediction models . For any nonnegative function and , we have the following upper and lower bounds.
where denotes the selected index.
Proof of Theorem 2.
We use the same notations in the proof of Lemma 1. We first show QoE is an increasing function w.r.t . From the representation (3), we have
where . From the last equality, we have
Note the nonnegativity is from CauchySchwarz inequality. We now prove an upper bound. Note that
and the equality holds when . Therefore, taking expectations on both sides provides an upper bound. As for the lower bound. Due to the representation (3), it is enough to show that
Since QoE is an increasing function, by plugging in , it gives . It concludes a proof. ∎