Competition over data: how does data purchase affect users?

01/26/2022
by   Yongchan Kwon, et al.
Stanford University
0

As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience – i.e. the accuracy of the predictor selected by each user – can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.

READ FULL TEXT VIEW PDF

page 10

page 11

page 13

09/15/2020

Competing AI: How does competition feedback affect machine learning?

This papers studies how competition affects machine learning (ML) predic...
09/21/2020

Massive MIMO Channel Prediction: Kalman Filtering vs. Machine Learning

This paper focuses on channel prediction techniques for massive multiple...
08/13/2021

The Sharpe predictor for fairness in machine learning

In machine learning (ML) applications, unfair predictions may discrimina...
05/31/2018

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

Machine learning predictors are successfully deployed in applications ra...
03/15/2022

Approximability and Generalisation

Approximate learning machines have become popular in the era of small de...
08/28/2022

IDP-PGFE: An Interpretable Disruption Predictor based on Physics-Guided Feature Extraction

Disruption prediction has made rapid progress in recent years, especiall...
07/19/2022

Sample Efficient Learning of Predictors that Complement Humans

One of the goals of learning algorithms is to complement and reduce the ...

I Introduction

It is becoming increasingly common for companies to make use of machine learning (ML) predictions in their services [1, 2, 3, 4]. When there are several companies on the market offering similar ML-based services, customers choose only one service they prefer the most, arguably the highest quality service within their budget. In this user selection process, the customer pays a fee to the company that provides the service, which naturally creates competition among companies on the market. As a result, competing companies strive to produce high-quality ML predictions to attract more customers, which leads to buying customer data or subscriptions to data marketplaces [5].

For example, the competing companies in the U.S. auto insurance market, namely, State Farm, Progressive, and AllState, use ML predictions to analyze customer data, assess risk, and adjust premiums [6]. They provide an insurance called the Pay-How-You-Drive, which offers cheaper premiums than regular auto insurances on the condition that the insurer monitors driving patterns such as rapid acceleration or oscillations in speed [7, 8]. That is, auto insurance companies provide financial benefits to customers for the purpose of collecting customers’ driving pattern data. Using the purchased data, the companies develop better ML models for their business (e.g., updating the insurance recommendation model or reassessing the risks) while competing with each other.

Analyzing the effects of data purchase in competitions could have practical implications, but it has not been studied much in the ML literature. The effects of data acquisition have been considered extensively in active learning (AL), the problem of finding effective data to label [9, 10]. However, since the AL usually assumes a single-agent situation, it is not straightforward to establish competing systems that require more than one competitor. Recently, [11] examined the impacts of competitions by modeling an environment where several predictors can compete for user data. They showed that competition pushes competing predictors to focus on a small subset of the population and helps users find high-quality predictions. It describes interesting implications of competition, but the impact of data purchase has not been investigated and this is the main focus of our work. Our proposed environment can represent a general case in which users change their choice when there is a company willing to offer a financial benefit (See Figure 1). Related works are further discussed in Section V.

Fig. 1: Illustrations of our competition environment (left) when there is a company showing purchase intent and (right) when no company shows purchase intent. In step 1, each predictor receives a user query and decides whether to buy user data. In step 2, (left) if there is a company that thinks the data is worth buying to improve the predictability of their ML models, the company shows purchase intent. We suppose the user prefers financial benefits and selects the buyer. (Right) If no one thinks the user data is worth buying, regular competition proceeds; a user selects one company based on received predictions. In step 3, the only selected predictor gets the user label and updates its model. We provide details on the environment in Section II.

Contributions

In this paper, we study what happens when competing predictors can purchase customer data to improve their ML models. We propose a novel environment that can simulate various real-world competitions. Our environment allows ML predictors to purchase labeled data via AL algorithms within a finite budget while competing against each other (Section II). Surprisingly, our results show that when competing predictors purchase data, the quality of the predictions selected by each user can decrease even as competing ML predictors get better (Section III-A). We explain this counterintuitive finding by demonstrating that data purchase makes competing predictors similar to each other (Section III-B). We theoretically analyze how the diversity of users’ available options can affect the user experience to support our empirical findings. (Section IV).

Ii A general environment for competition and data purchase

This section formally introduces a new and general competitive environment that allows competing predictors to acquire data points within a finite budget. In our environment, competition is represented by a series of interactions between a sequence of users and the fixed number of competing predictors, where the interaction is modeled by supervised learning tasks. To be more specific, we define some notations.

Notations

For , we denote a user query by and its associated user label by . Throughout this paper, we focus on classification problems, i.e., is finite, while our environment can easily extend to regression cases. We denote a user stream by a set and assume each user datum is an independent and identically distributed (i.i.d.) sample from a distribution . We call the user distribution. As for the predictor side, we suppose there are competing predictors in the market. For , each ML predictor is described as a tuple , where is the number of i.i.d. seed data points from , is a budget, is an ML model, and is a buying strategy.

We consider a predictor initially owns data points and can additionally purchase user data within budgets. We assume the price of one data point is one, i.e., a predictor can purchase data up to data points from a sequence of users. A predictor produces a prediction using the ML model and determines whether to buy the user data with the buying strategy . We consider the utility function for is the classification accuracy of with respect to the user distribution . Lastly, and are allowed to be updated during the competition rounds. That is, companies strive to provide good quality predictions and keep improving their ML models. In the following, we elaborate on the details of the competition dynamics.

Competition dynamics

Before the first round, all the competing predictors independently train their model with the seed data points. After this initialization, at each round , a user sends a query to all the predictors , and each predictor determines whether to buy the user data. We describe this decision by using the buying strategy . If the predictor thinks that the labeled data would be worth one unit of budget, we denote this by . Otherwise, if thinks that it is not worth one unit of budget, then . As for the , ML predictors can use any stream-based AL algorithm [12, 13]. For instance, a predictor can use the uncertainty-based AL rule [14]; considers purchasing user data if the current prediction is not certain: the Shannon’s entropy of is higher than some predefined threshold value where

is the probability estimate at the

-th round. In brief, we suppose a predictor shows purchase intent if the remaining budget is greater than zero and . If the remaining budget is zero or , then provides a prediction to the user instead of showing purchase intent.

We now explain how a user selects one predictor. At every round , the user selects only one predictor based on both purchase intents and prediction information received from . If there are more than one buyer, then we assume that a user selects one of the buyers uniformly at random. This assumption can be thought that users prefer the financial advantage (e.g., discounts or coupons) to the model’s prediction quality, and thus users select one of companies which show purchase intents. Once selected, the only selected predictor’s budget is reduced by one; all other predictor’s budget stays the same because they are not selected and do not have to provide financial benefits. If no predictor shows purchase intent, then a user receives prediction information and chooses the predictor randomly. We assume that higher quality prediction is more likely to be selected and the probability is defined as

(1)

where denotes a temperature parameter and is a predefined quality function that measures similarity between the user label and the ML prediction (e.g., ). We assume that users are rational in that users are more likely to select high-quality predictions, i.e., .

We denote the index of selected predictor by . The selected predictor gets the user label and updates the model by training on the new datum . The other predictors stay the same for .

Example 1 (Auto insurance in Section I).

includes the -th driver’s demographic information, insurance claim history, and desired price range, and is the driver’s preferred insurance plan within the user’s budget constraints. Each predictor is one insurance company (e.g., State Farm, Progressive, or AllState). If a company finds the -th driver’s data is worth buying because there are infrequent similar data in its database, it can offer this driver discounts to attract her and collect her driving pattern data. When no company shows purchase intent, the regular competition begins and each company offers an auto insurance plan based on what it predicts to be most suitable for this driver. Then, the driver chooses one company whose offered plan is the closest to . Afterwards, the acquired data then can be used to improve the company’s future predictions.

Characteristics of our environment

It is noteworthy that a user selects only one predictor at each round. This makes user labels a limited resource in the market, so competition among ML predictors naturally arises. This was firstly considered by [11], but we significantly extend it to incorporate AL-based data purchase systems.

Our environment simplifies data purchase and real-world competition, which usually exist in much more complicated forms, yet it has several powerful characteristics. First, our environment is realistic in that it represents the rational preference of customers and companies in data purchase. Customers are likely to choose the best service within their budget after comparing options, but they can change their selection if there is a company offering financial benefits, such as promotional coupons, discounts, or free services [15, 16, 17]. Although this buying process could be costly for the companies, it enables them to effectively collect user data within finite budgets using AL algorithms. Second, our environment is flexible and take into account various competition situations. Note that we make no assumptions about the number of competing predictors or budgets , algorithms for predictors or buying strategies , and the user distribution .

Iii Experiments

Using the proposed competition environment, we investigate the impacts of the data purchase on the quality and diversity of ML predictions across various user distributions. Our experiments show an interesting phenomenon that data purchase can decrease the quality of the predictor selected by a user, even when the quality of the predictors gets improved on average (Section III-A). In addition, we demonstrate that data purchase makes ML predictors similar to each other. Data purchase reduces the effective variety of options, and predictors can avoid specializing to a small subset of the population (Section III-B). In Appendix, we provide additional experimental results with various competition settings to show robustness of our findings.

Metrics

To quantitatively measure the effects of data purchase, we introduce evaluation metrics. First, we define the overall quality as follows.

(Overall quality)

where the expectation is taken over the user distribution . The overall quality represents the average quality that competing predictors provide in the market. Another type of quality metric is the quality of experience (QoE), the quality of the predictor selected by a user. The QoE is defined as

(QoE)

Here,

is a random variable for a selected index defined in Equation (

1), and the expectation is considered over the random variables . Considering that a user selects one predictor based on Equation (1), QoE can be considered as the utility of users. Note that the overall quality and QoE capture different aspects of prediction qualities, and they are only equal when users select one predictor uniformly at random, i.e., when (See Lemma 1).

Next, we define the diversity to quantify how variable the ML predictions are. To be more specific, for , we define the proportion of predictors whose prediction is as and the diversity is defined as

(Diversity)

where the expectation is taken over the marginal distribution and we use the convention when . Note that the diversity is defined as the expected Shannon’s entropy of competing ML predictions. When there are various different options that a user can choose from, the diversity is more likely to be large.

In our experiments, we run competition rounds, and to capture the effects of data purchase in competition dynamics, we evaluate the three metrics after the rounds. That is, we do not perform the data purchase procedure during the evaluation. As for the evaluation, since it is difficult to compute exact expectations, we estimate the three metrics with the held-out test data that are not used during the competition rounds.

Implementation protocol

Our experiments consider the three real datasets to describe various user distributions , namely Insurance [18], Adult [19], and Postures [20]

datasets. To minimize the variance caused by other factors, we consider a homogeneous setting: for each competition, all predictors have the same number of seed data

and budgets , the same classification algorithm for , and the same AL algorithm for

. We use either a logistic model or a neural network model with one hidden layer for

and the standard entropy-based AL rule for [14]. Throughout the experiments, we fix the total number of competition rounds to , the number of predictors to , and a quality function to the correctness function, i.e., for all . We set a small number for seed data points , between 100 and 200 depending on a dataset, to prevent models from being saturated before the first round. We consider various competition situations by varying the budget 111In Section III, for notational convenience, we often suppress the predictor ID in the superscript if the context is clear. For example, we use instead of . and the temperature parameter . All the results are based on 30 independent experiments. In Appendix, we provide detailed information on the implementation.

Fig. 2: Illustrations of QoE as a function of the overall quality in various levels of and on the seven datasets. Different color indicates different , and the size of point indicates budgets . The larger budget is, the larger the point size is. In several settings, the overall quality increases as more budgets are used, but QoE decreases.
Fig. 3: Illustrations of the diversity as a function of the budget for various on the seven datasets. Each color indicates different . We denote a 99% confidence band based on 30 independent runs. Competing ML predictors become similar in the sense that the diversity decreases as the budget increases.

Iii-a Effects of data purchase on quality

We first study how data purchase affects the overall quality and the QoE in various competition settings. Figure 2 illustrates how QoE changes with respect to (w.r.t.) the overall quality. When , data purchase increases the overall quality as increases across all datasets. This can be explained as follows. Given that a predictor buys user data using a stream-based AL algorithm (e.g., a predictor buys when its prediction is highly uncertain), the active data acquisition reduces the model’s uncertainty and increases the quality of the individual model. As a result, data purchase increases the overall quality. In particular, when and the dataset is Postures, the overall quality is on average when , but it increases to and when and , which correspond to and increases, respectively. As for QoE, however, data purchase mostly decreases QoE as increases. For example, when the user distribution is Insurance and , QoE is when , but it reduces to and when and , which correspond to 1% and 7% reduction, respectively.

Implications of data purchase on quality

As Figure 2 shows, in most cases, surprisingly, QoE decreases even when the overall quality increases. In other words, the quality that competing predictors provide is generally improved, but it does not necessarily mean that users will be more satisfied in terms of the quality of the prediction. Although this result might sound counterintuitive, we argue that it could happen if the data purchase restricts users from choosing various options and decreases the probability of finding high-quality predictions when increases. To verify this, in the next section, we investigate how data purchase affects the diversity.

Iii-B Effects of data purchase on diversity

We now study how data purchase affects the diversity. Figure 3 illustrates the diversity as a function of the budget in various competition settings. When , the diversity monotonically decreases w.r.t. across all datasets. That is, the competing predictors get similar as more budgets are allowed. In particular, when and the dataset is Adult, the diversity is on average when , but it reduces to and , which correspond to and reduction, when and , respectively.

When , the diversity does not vary much w.r.t. across all datasets. This is because competing ML predictors are mainly trained on a large enough number of i.i.d. user data. Note that when , users select one predictor uniformly at random during regular competition rounds. Even though competing predictors actively can buy the user data when , the i.i.d. user data from the regular competitions and the seed dataset are large enough, and the purchased data points do not affect ML predictions much.

Iv Theoretical analysis on competition

In this section, we first establish a simple representation for QoE when a quality function is the correctness function. Based on this finding, we theoretically analyze how the diversity-like quantity can affect QoE. Proofs are provided in Appendix.

Lemma 1 (A simple representation for QoE).

Suppose there are predictors and a quality function is the correctness function, i.e., for all . Let be the average quality for a user . For any , we have

(2)

where the equality holds when and the expectation is considered over .

Lemma 1 establishes a relationship between QoE and the overall quality: the QoE is always greater than the overall quality if . Furthermore, it shows that QoE can be simplified as a function of the average quality when a quality function is the correctness function. When is not the correctness function, QoE does not have an explicit representation. We present upper and lower bounds for QoE in Appendix. Using Lemma 1, we elaborate on the condition for when QoE decreases in the following theorem.

Theorem 1 (Comparison of two competition dynamics).

Suppose there are two sets of predictors, and . Without loss of generality, the overall quality for is larger than that of . For the correctness function , we define and as Lemma 1. If and for some explicit constants and , then QoE for is smaller than that for .

Theorem 1 compares two competition dynamics and shows the condition for when QoE for is smaller than that for the other while the associated overall quality is larger. For ease of understanding, we can regard (resp. ) as a set of ML predictors when (resp. when ). Theorem 1 implies that QoE decreases when is sufficiently greater than . Considering our results in Figure 3 that data purchase makes competing predictors similar to each other when is large enough, the average quality is more likely to become zero or one. It results in an increase in variance because the variance is maximized when random variables spread over . Theorem 1 supports our experimental results by explaining why QoE decreases after data purchase with an increase in .

V Related works

This work builds off and extends the recent paper [11], which studied the impacts of the competition. We extend this setting by incorporating the AL-based data purchase procedure into the competition system. Note that the setting by [11] is a special case of ours when for all . Our environment enables us to study the impacts of data acquisition in competition, which is not considered in the previous work. Compared to the previous work, which showed competing predictors become too focused on sub-populations, our work suggests that this can be a good thing in that it provides a variety of different options and better quality of the predictors selected by users.

A related field of our work is the stream-based AL where the problem is to learn an algorithm finding effective data points to label from a stream of data points [9, 10]. Our competition environment has unique features that cannot be described by AL. In AL, since there is only one agent, competition cannot be established. In addition, while an agent in AL collects data only from label queries, competing predictors in our environment can obtain data from data purchase as well as regular competition. These differences create a unique competition environment, and this work studies the impacts of data purchase in competitive systems.

Competition has been studied in multi-agent reinforcement learning (MARL), which considers a setting where a group of agents in a common environment interact with each other and with the environment

[21, 22]. Competing agents in MARL maximize their own objective goal that could conflict with others. This setting is often characterized by zero-sum Markov games and is applied to video games such as Pong or Starcraft II [23, 24, 25]. We refer to [26] for a complementary literature survey of MARL.

There are some similarities between MARL and our environment, but in our environment, user selection and data purchases uniquely define the interaction between users and ML predictors. In MARL, all agents observe the information (e.g., status and reward) drawn from the shared environment and use them to update the policy function. In contrast, in our environment, the only selected predictor obtains the label and updates the predictor. In addition, ML predictors can collect data points through the data purchase. These settings have not been considered in the field of MARL.

Vi Conclusion

In this paper, characterizing the nature of competition and data purchase, we propose a new competitive environment that allows predictors to actively acquire user labels. Our results show that even though the data purchase improves the quality that predictors provide, it can decrease the quality that users experience. We explain this counterintuitive finding by demonstrating that data purchase makes competing predictors similar to each other.

References

  • [1] G. Linden, B. Smith, and J. York, “Amazon. com recommendations: Item-to-item collaborative filtering,” IEEE Internet computing, vol. 7, no. 1, pp. 76–80, 2003.
  • [2] P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, 2016, pp. 191–198.
  • [3]

    B. Marr, “The amazing ways ebay is using artificial intelligence to boost business success,”

    https://www.forbes.com/sites/bernardmarr/2019/04/26/the-amazing-ways-ebay-is-using-artificial-intelligence-to-boost-business-success, 2019, posted April 26, 2019; Retrieved May 19, 2021.
  • [4] J. Nanduri, Y. Jia, A. Oka, J. Beaver, and Y.-W. Liu, “Microsoft uses machine learning and optimization to reduce e-commerce fraud,” INFORMS Journal on Applied Analytics, vol. 50, no. 1, pp. 64–79, 2020.
  • [5] J. Meierhofer, T. Stadelmann, and M. Cieliebak, “Data products,” in

    Applied Data Science

    .   Springer, 2019, pp. 47–61.
  • [6] K. Sennaar, “How america’s top 4 insurance companies are using machine learning,” https://emerj.com/ai-sector-overviews/machine-learning-at-insurance-companies, 2019, posted February 26, 2020; Retrieved May 19, 2021.
  • [7] S. Arumugam and R. Bhargavi, “A survey on driving behavior analysis in usage based insurance using big data,” Journal of Big Data, vol. 6, no. 1, pp. 1–21, 2019.
  • [8] Y. Jin and S. Vasserman, “Buying data from consumers: The impact of monitoring programs in us auto insurance,” Unpublished manuscript. Harvard University, Department of Economics, Cambridge, MA, 2019.
  • [9] B. Settles, “Active learning literature survey,” University of Wisconsin-Madison Department of Computer Sciences, Tech. Rep., 2009.
  • [10] P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang, “A survey of deep active learning,” arXiv preprint arXiv:2009.00236, 2020.
  • [11] T. Ginart, E. Zhang, Y. Kwon, and J. Zou, “Competing ai: How does competition feedback affect machine learning?” in International Conference on Artificial Intelligence and Statistics.   PMLR, 2021, pp. 1693–1701.
  • [12] Y. Freund, H. S. Seung, E. Shamir, and N. Tishby, “Selective sampling using the query by committee algorithm,” Machine learning, vol. 28, no. 2, pp. 133–168, 1997.
  • [13] I. Žliobaitė, A. Bifet, B. Pfahringer, and G. Holmes, “Active learning with drifting streaming data,” IEEE transactions on neural networks and learning systems, vol. 25, no. 1, pp. 27–39, 2013.
  • [14] B. Settles and M. Craven, “An analysis of active learning strategies for sequence labeling tasks,” in

    Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing

    , 2008, pp. 1070–1079.
  • [15] J. Rowley, “Promotion and marketing communications in the information marketplace,” Library review, 1998.
  • [16] M. Familmaleki, A. Aghighi, and K. Hamidi, “Analyzing the influence of sales promotion on customer purchasing behavior,” International Journal of Economics & management sciences, vol. 4, no. 4, pp. 1–6, 2015.
  • [17] I. Reimers and B. R. Shiller, “The impacts of telematics on competition and consumer behavior in insurance,” The Journal of Law and Economics, vol. 62, no. 4, pp. 613–632, 2019.
  • [18] P. Van Der Putten and M. van Someren, “Coil challenge 2000: The insurance company case,” Technical Report 2000–09, Leiden Institute of Advanced Computer Science, Tech. Rep., 2000.
  • [19] D. Dua and C. Graff, “Uci machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml
  • [20] A. Gardner, C. A. Duncan, J. Kanno, and R. Selmic, “3d hand posture recognition from small unlabeled point sets,” in 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).   IEEE, 2014, pp. 164–169.
  • [21] R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. P. Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Advances in Neural Information Processing Systems, 2017, pp. 6379–6390.
  • [22] J. Foerster, N. Nardelli, G. Farquhar, T. Afouras, P. H. Torr, P. Kohli, and S. Whiteson, “Stabilising experience replay for deep multi-agent reinforcement learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 1146–1155.
  • [23] M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine learning proceedings 1994.   Elsevier, 1994, pp. 157–163.
  • [24] A. Tampuu, T. Matiisen, D. Kodelja, I. Kuzovkin, K. Korjus, J. Aru, J. Aru, and R. Vicente, “Multiagent cooperation and competition with deep reinforcement learning,” PloS one, vol. 12, no. 4, p. e0172395, 2017.
  • [25] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Mathieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, T. Ewalds, P. Georgiev et al., “Grandmaster level in starcraft ii using multi-agent reinforcement learning,” Nature, vol. 575, no. 7782, pp. 350–354, 2019.
  • [26] K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” arXiv preprint arXiv:1911.10635, 2019.
  • [27]

    C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,”

    ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.
  • [28] H. Xiao, K. Rasul, and R. Vollgraf, “Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017.
  • [29]

    Y. LeCun, C. Cortes, and C. Burges, “Mnist handwritten digit database,”

    ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, vol. 2, 2010.
  • [30] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  • [31] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.

Implementation details

In this section, we provide implementation details. We explain the user distribution, ML predictors, and the proposed environment.

Datasets (user distribution)

As for the datasets (user distribution ), we used the following seven datasets for our experiments: Insurance [18], Adult [19], Postures [20], Skin-nonskin [27], FashionMNIST [28], MNIST [29], and CIFAR10 [30] throughout the paper. For all datasets, we first split a dataset into competition and evaluation datasets: the competition dataset is used during the competition rounds and the evaluation dataset is used for evaluate metrics after the competition. For FashionMNIST, MNIST, and CIFAR10, we use the original training and test datasets for competition and evaluation datasets, respectively. For Insurance, Adult, Postures, and Skin-nonskin, we randomly sample data points from the original dataset to make the evaluation dataset and use the remaining data points as the competition dataset. At each round of competition, we randomly sample one data point from the competition dataset. After the competition rounds, we randomly sample points from the evaluation dataset and evaluate the metrics (the overall quality, QoE, and diversity). Note that all of experiment results are based on the evaluation dataset. Table I shows a summary of the seven datasets used in our experiments.

As for the preprocessing, we apply the standardization to have zero mean and one standard deviation for

Skin-nonskin. For the two image datasets, MNIST and CIFAR10 we apply the channel-wise standardization. Other than the three datasets, we do not apply any other preprocessing. To reflect the customers’ randomness in their selection, we apply a random noise on the original label. We assign a random label with 30% for every dataset. This random perturbation is applied to both the competition and evaluation datasets.

Dataset The size of The size of Input dimension # of classes
competition dataset evaluation dataset
Insurance 13823 5000 16 2
Adult 43842 5000 108 2
Postures 69975 5000 15 5
Skin-nonskin 239057 5000 3 2
Fashion-MNIST 60000 10000 784 10
MNIST 60000 10000 784 10
CIFAR10 50000 10000 3072 10
TABLE I: A summary of datasets used in our experiments.

ML predictors

We fix the number of predictors to throughout our experiments. For each dataset, which makes one competition environment, we consider a homogeneous setting, i.e., all predictors have the same number of seed data , a budget , a model , and a buying strategy . As for the buying strategy, we fix , where is the Shannon’s entropy of , and is the corresponding probability estimate for . That is, if the entropy is higher than the pre-defined threshold , a predictor decides to buy the user data. Note that

is the Shannon’s entropy of the uniform distribution on

.

Table II shows a summary information for the seed data and the model for each dataset. Every ML predictor initially trains with the seed data points. For all experiments, we use the Adam optimization [31]

with the specified learning rate and epochs. The batch size is fixed to

. If an predictor is selected, then its ML model is updated with one iteration with the newly obtained data point, and we retrain the model whenever the ‘retrain period’ new samples are obtained.

Dataset Seed data ML predictor
Model # of hidden nodes Epoch Learning rate Retrain period
Insurance 100 Logistic - 10 50
Adult 100 Logistic - 10 50
Postures 200 Logistic - 10 50
Skin-nonskin 50 Logistic - 10 50
Fashion-MNIST 50 NN 400 30 150
MNIST 50 NN 400 30 150
CIFAR10 100 NN 400 30 150
TABLE II:

A summary of hyperparameters related to ML predictors by datasets. We consider homogeneous predictors if there are no further explanations. Logistic denotes a logistic model and NN denotes a neural network one hidden layer.

Proposed environment

In Environment 1, we describe our proposed environment.

Input: Number of competition rounds ; user distribution ; number of predictors ; competing predictors for .
Procedure:
For all , a model is trained using the seed data points
for  do
      from is drawn and a set of buyers is initialized
     for  do
         if () and (then
              
         else
              Predict
         end if
     end for
     if   then
         A user selects one predictor from uniformly at random
         
     else
         A user selects one predictor based on (1) using received predictions
     end if
      receives a user label and updates
end for
Environment 1 A general environment for competition and data purchase

Additional numerical experiments

In this section, we provide additional experimental results to demonstrate the robustness of our conclusions against (i) different distributions in the homogeneous setting (subsection -A) and (ii) different modeling assumptions in the heterogeneous setting. As for the heterogeneous setting, we consider different buying strategies (subsection -B), budgets (subsection -C), and the number of competing predictors (subsection -D).

-a Additional results in the homogeneous setting

In Figure 4, we compare the QoE and overall quality with various user distributions: Skin-nonskin [27], MNIST [29], Fashion-MNIST [28], and CIFAR10 [30] datasets. As we saw in Figure 2, the quality that users experience can decrease but the overall quality can increase as the budget increases. In addition, Figure 5 shows diversity as a function of budget for the four datasets as in Figure 3.

Fig. 4: Illustrations of QoE as a function of the overall quality in various levels of and on the seven datasets. Different color indicates different , and the size of point indicates budgets . The larger budget is, the larger the point size is. In several settings, the overall quality increases as more budgets are used, but QoE decreases.
Fig. 5: Illustrations of the diversity as a function of the budget for various on the seven datasets. Each color indicates different . We denote a 99% confidence band based on 30 independent runs. Competing ML predictors become similar in the sense that the diversity decreases as the budget increases.
Fig. 6: Heatmaps of for (left) Insurance and (right) Adult datasets. We consider and . The heatmaps in each row represent different but share the same color scale. For each heatmap, a horizontal axis indicates a predictor ID in and a vertical axis indicates a class in . The grid colored red (resp. blue) indicates a class-specific quality is higher (resp. lower) than average, and the white grid indicates the average. As the budget increases, the diversity decreases, and predictors produce similar class-specific quality.

Implications of data purchase on diversity

We also compare the class-specific qualities of competing predictors. In Figure 6, we illustrate heatmaps of the difference where is the class-specific quality defined as

for and , and for is its average over predictors defined as . We use the Insurance and Adult datasets. When , the regular competitions happen, and the Adult heatmap shows that predictor 1 and predictor 5 so specialize to class 2 prediction that they sacrifice their prediction power for class 1 compared to other predictors. However, when , all predictors have similar levels of class-specific quality. The data purchase makes competing ML predictors similar and helps predictors not too much focus on a subgroup of the population.

Fig. 7: Probability density plots of the average quality at near zero when and . Different color indicates different . As increases, the average quality is more likely to be close to zero. That is, the probability that all ML predictors produce low-quality prediction at the same time increases, and users might not be satisfied with the ML predictions after the competing predictors purchase data.

[11] showed competing ML predictors too specialize on sub-populations when , which was also shown in Figure 6, and our results further show that this specialization can be alleviated when predictors purchase user data. However, as we discussed in Section III-A

, it can hurt the quality of the predictor selected by a user. To explain this phenomenon, we demonstrate that the probability of finding low-quality prediction increases due to the reduction in the diversity. We illustrate the probability density functions of the average quality near zero in Figure 

7. It clearly shows that the probability that the average quality is near zero increases as more budgets are used: The areas for (colored in yellow) are clearly larger than those for (colored in red). That is, as predictions become similar, it is more likely that all ML predictions are poor at the same time, and thus the probability that users are not satisfied with the predictions increases.

Fig. 8: Illustrations of QoE as a function of the overall quality when there are heterogeneous predictors with different buying strategies. Different color indicates different , and the size of point indicates budgets . The larger budget is, the larger the point size is. In several settings, the overall quality increases as more budgets are used, but QoE decreases.
Fig. 9: Illustrations of the diversity as a function of the budget when there are heterogeneous predictors with different buying strategies. Different color indicates different . We denote a 99% confidence band based on 30 independent runs. As the budget increases, the diversity decreases.

-B Different buying strategies

We use the same setting as in the homogeneous setting used in Section III, but with different buying strategies among predictors. To be more specific, we consider the three different types of buying strategies by varying the threshold of the uncertainty-based AL method. For constants , we consider the following buying strategy models , where is the Shannon’s entropy function and is the probability estimate given . We assume there are predictors for each buying strategy , , and . This modeling assumption considers the situation where there are three groups with different levels of sensitivity to data purchases. For instance, in our setting, is the most conservative data buyer.

Figure 8 shows the relationship between the QoE and the overall quality on the seven datasets when there are heterogeneous competing predictors with different buying strategies. Similar to the homogeneous case, in Figure 2, the overall quality increases, but QoE can decrease. Similarly, Figure 9 shows that the diversity decreases as the budget increases. It demonstrates the robustness of our findings against different environment settings. In addition, we provide probability density plots of the average quality at near zero. Figure 10 shows the similar trend as in the homogeneous setting in Figure 7.

Fig. 10: Probability density plots of the average quality at near zero when there are heterogeneous predictors with different buying strategies. Different color indicates different . As the budget increases, the probability that the average quality is near zero increases.

-C Different budgets

Here, we use the same setting used in the homogeneous setting but with different budgets. We use the Insurance, Adult, and Skin-nonskin datasets. For , we assume that the first predictors have budgets, but the last predictors have budgets. That is, half of the predictors have half the budget compared to the other group. Figure 11 shows that the main trends appear again even when different budgets are used. This again shows the robustness of our results to different modeling assumptions.

Fig. 11: Main figures when different budgets are used. The results are similar to the homogeneous setting, showing the robustness of our main results.

-D Different number of competing predictors

In this section, we show that our findings are not affected by the number of competing predictors in a market. All the experiments in Section III consider . Here, we consider the homogeneous setting but a different number of competing predictors or . As Figures 12 and 13 show, the main trends appear again when the number of predictors are changed.

Fig. 12: Main figures when there are competing predictors. The results are similar to the , showing the robustness of our main results.
Fig. 13: Main figures when there are competing predictors. The results are similar to the , showing the robustness of our main results.

Proofs and additional theoretical results

We provide proofs for Lemma 1 and Theorem 1 in the subsection -E and provide QoE for a general quality function in the subsection -F.

-E Proofs

Proof of Lemma 1.

For notational convenience, we set for .

(3)

where for ,

Since , for , we have

and

Since and is an increasing function, it concludes a proof. ∎

Proof of Theorem 1.

For and , let , , and . Note that

Thus, we have

For , we have

Therefore, since , we have

(4)

Let and . From the inequalities (4), we have

The last equality is due to . Therefore, QoE is decreased if

where

Therefore, if there is a constant such that , then it concludes a proof.

By definition of , it is positive when .

By setting , it concludes a proof.

-F QoE for a general quality function

The following theorem shows the upper and lower bounds of QoE for a general quality function.

Theorem 2.

Suppose there is a set of prediction models . For any non-negative function and , we have the following upper and lower bounds.

where denotes the selected index.

Proof of Theorem 2.

We use the same notations in the proof of Lemma 1. We first show QoE is an increasing function w.r.t . From the representation (3), we have

where . From the last equality, we have

Note the non-negativity is from Cauchy-Schwarz inequality. We now prove an upper bound. Note that

and the equality holds when . Therefore, taking expectations on both sides provides an upper bound. As for the lower bound. Due to the representation (3), it is enough to show that

Since QoE is an increasing function, by plugging in , it gives . It concludes a proof. ∎