Try Before You Buy: A practical data purchasing algorithm for real-world data marketplaces

Data trading is becoming increasingly popular, as evident by the appearance of scores of Data Marketplaces (DMs) in the last few years. Pricing digital assets is particularly complex since, unlike physical assets, digital ones can be replicated at zero cost, stored, and transmitted almost for free, etc. In most DMs, data sellers are invited to indicate a price, together with a description of their datasets. For data buyers, however, deciding whether paying the requested price makes sense, can only be done after having used the data with their AI/ML algorithms. Theoretical works have analysed the problem of which datasets to buy, and at what price, in the context of full information models, in which the performance of algorithms over any of the O(2^N) possible subsets of N datasets is known a priori, together with the value functions of buyers. Such information is, however, difficult to compute, let alone be made public in the context of real-world DMs. In this paper, we show that if a DM provides to potential buyers a measure of the performance of their AI/ML algorithm on individual datasets, then they can select which datasets to buy with an efficacy that approximates that of a complete information model. We call the resulting algorithm Try Before You Buy (TBYB) and demonstrate over synthetic and real-world datasets how TBYB can lead to near optimal buying performance with only O(N) instead of O(2^N) information released by a marketplace.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2015

Selecting Near-Optimal Learners via Incremental Data Allocation

We study a novel machine learning (ML) problem setting of sequentially a...
research
07/03/2020

Model Distillation for Revenue Optimization: Interpretable Personalized Pricing

Data-driven pricing strategies are becoming increasingly common, where c...
research
04/08/2023

SimbaML: Connecting Mechanistic Models and Machine Learning with Augmented Data

Training sophisticated machine learning (ML) models requires large datas...
research
04/19/2021

Benchmarking the Benchmark – Analysis of Synthetic NIDS Datasets

Network Intrusion Detection Systems (NIDSs) are an increasingly importan...
research
01/11/2023

Near-optimal Online Algorithms for Joint Pricing and Scheduling in EV Charging Networks

With the rapid acceleration of transportation electrification, public ch...
research
08/22/2022

Real-world-robustness of tree-based classifiers

The concept of trustworthy AI has gained widespread attention lately. On...
research
11/07/2019

Fair Allocation through Selective Information Acquisition

Public and private institutions must often allocate scare resources unde...

Please sign up or login with your details

Forgot password? Click here to reset