Competition over data: how does data purchase affect users?

by   Yongchan Kwon, et al.
Stanford University

As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience – i.e. the accuracy of the predictor selected by each user – can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.


page 10

page 11

page 13


Competing AI: How does competition feedback affect machine learning?

This papers studies how competition affects machine learning (ML) predic...

Algorithms with Prediction Portfolios

The research area of algorithms with predictions has seen recent success...

Massive MIMO Channel Prediction: Kalman Filtering vs. Machine Learning

This paper focuses on channel prediction techniques for massive multiple...

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

Machine learning predictors are successfully deployed in applications ra...

Active Learning for Network Traffic Classification: A Technical Study

Network Traffic Classification (NTC) has become an important feature in ...

IDP-PGFE: An Interpretable Disruption Predictor based on Physics-Guided Feature Extraction

Disruption prediction has made rapid progress in recent years, especiall...

Approximability and Generalisation

Approximate learning machines have become popular in the era of small de...

Please sign up or login with your details

Forgot password? Click here to reset