Private Training Set Inspection in MLaaS

05/15/2023
by   Mingxue Xu, et al.
0

Machine Learning as a Service (MLaaS) is a popular cloud-based solution for customers who aim to use an ML model but lack training data, computation resources, or expertise in ML. In this case, the training datasets are typically a private possession of the ML or data companies and are inaccessible to the customers, but the customers still need an approach to confirm that the training datasets meet their expectations and fulfil regulatory measures like fairness. However, no existing work addresses the above customers' concerns. This work is the first attempt to solve this problem, taking data origin as an entry point. We first define origin membership measurement and based on this, we then define diversity and fairness metrics to address customers' concerns. We then propose a strategy to estimate the values of these two metrics in the inaccessible training dataset, combining shadow training techniques from membership inference and an efficient featurization scheme in multiple instance learning. The evaluation contains an application of text review polarity classification applications based on the language BERT model. Experimental results show that our solution can achieve up to 0.87 accuracy for membership inspection and up to 99.3 distribution.

READ FULL TEXT
research
11/24/2022

Data Origin Inference in Machine Learning

It is a growing direction to utilize unintended memorization in ML model...
research
05/22/2023

Watermarking Text Data on Large Language Models for Dataset Copyright Protection

Large Language Models (LLMs), such as BERT and GPT-based models like Cha...
research
12/04/2021

SHAPr: An Efficient and Versatile Membership Privacy Risk Metric for Machine Learning

Data used to train machine learning (ML) models can be sensitive. Member...
research
02/17/2020

Data and Model Dependencies of Membership Inference Attack

Machine Learning (ML) techniques are used by most data-driven organisati...
research
05/05/2020

When Machine Unlearning Jeopardizes Privacy

The right to be forgotten states that a data owner has the right to eras...
research
03/29/2023

Fairness-Aware Data Valuation for Supervised Learning

Data valuation is a ML field that studies the value of training instance...
research
05/15/2019

Autonomous Membership Service for Enclave Applications

Trusted Execution Environment, or enclave, promises to protect data conf...

Please sign up or login with your details

Forgot password? Click here to reset