Data Pricing in Machine Learning Pipelines

08/18/2021
by   Zicun Cong, et al.
0

Machine learning is disruptive. At the same time, machine learning can only succeed by collaboration among many parties in multiple steps naturally as pipelines in an eco-system, such as collecting data for possible machine learning applications, collaboratively training models by multiple parties and delivering machine learning services to end users. Data is critical and penetrating in the whole machine learning pipelines. As machine learning pipelines involve many parties and, in order to be successful, have to form a constructive and dynamic eco-system, marketplaces and data pricing are fundamental in connecting and facilitating those many parties. In this article, we survey the principles and the latest research development of data pricing in machine learning pipelines. We start with a brief review of data marketplaces and pricing desiderata. Then, we focus on pricing in three important steps in machine learning pipelines. To understand pricing in the step of training data collection, we review pricing raw data sets and data labels. We also investigate pricing in the step of collaborative training of machine learning models, and overview pricing machine learning models for end users in the step of machine learning deployment. We also discuss a series of possible future directions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/09/2020

A Survey on Data Pricing: from Economics to Data Science

Data are invaluable. How can we assess the value of data objectively, sy...
research
11/08/2019

Collaborative Machine Learning Markets with Data-Replication-Robust Payments

We study the problem of collaborative machine learning markets where mul...
research
06/08/2023

Modern Data Pricing Models: Taxonomy and Comprehensive Survey

Data play an increasingly important role in smart data analytics, which ...
research
10/10/2021

Algorithmic collusion: A critical review

The prospect of collusive agreements being stabilized via the use of pri...
research
05/01/2016

Directional Statistics in Machine Learning: a Brief Review

The modern data analyst must cope with data encoded in various forms, ve...
research
01/28/2021

tf.data: A Machine Learning Data Processing Framework

Training machine learning models requires feeding input data for models ...
research
06/12/2020

dagger: A Python Framework for Reproducible Machine Learning Experiment Orchestration

Many research directions in machine learning, particularly in deep learn...

Please sign up or login with your details

Forgot password? Click here to reset