Competition over data: how does data purchase affect users?

01/26/2022
by   Yongchan Kwon, et al.
0

As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience – i.e. the accuracy of the predictor selected by each user – can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.

READ FULL TEXT

page 10

page 11

page 13

research
09/15/2020

Competing AI: How does competition feedback affect machine learning?

This papers studies how competition affects machine learning (ML) predic...
research
10/22/2022

Algorithms with Prediction Portfolios

The research area of algorithms with predictions has seen recent success...
research
09/21/2020

Massive MIMO Channel Prediction: Kalman Filtering vs. Machine Learning

This paper focuses on channel prediction techniques for massive multiple...
research
05/31/2018

Multiaccuracy: Black-Box Post-Processing for Fairness in Classification

Machine learning predictors are successfully deployed in applications ra...
research
06/13/2021

Active Learning for Network Traffic Classification: A Technical Study

Network Traffic Classification (NTC) has become an important feature in ...
research
08/28/2022

IDP-PGFE: An Interpretable Disruption Predictor based on Physics-Guided Feature Extraction

Disruption prediction has made rapid progress in recent years, especiall...
research
03/15/2022

Approximability and Generalisation

Approximate learning machines have become popular in the era of small de...

Please sign up or login with your details

Forgot password? Click here to reset