To share or not to share: What risks would laypeople accept to give sensitive data to differentially-private NLP systems?

07/13/2023
by   Christopher Weiss, et al.
0

Although the NLP community has adopted central differential privacy as a go-to framework for privacy-preserving model training or data sharing, the choice and interpretation of the key parameter, privacy budget ε that governs the strength of privacy protection, remains largely arbitrary. We argue that determining the ε value should not be solely in the hands of researchers or system developers, but must also take into account the actual people who share their potentially sensitive data. In other words: Would you share your instant messages for ε of 10? We address this research gap by designing, implementing, and conducting a behavioral experiment (311 lay participants) to study the behavior of people in uncertain decision-making situations with respect to privacy-threatening situations. Framing the risk perception in terms of two realistic NLP scenarios and using a vignette behavioral study help us determine what ε thresholds would lead lay people to be willing to share sensitive textual data - to our knowledge, the first study of its kind.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/13/2021

"I need a better description”: An Investigation Into User Expectations For Differential Privacy

Despite recent widespread deployment of differential privacy, relatively...
research
02/24/2022

How reparametrization trick broke differentially-private text representation learning

As privacy gains traction in the NLP community, researchers have started...
research
02/10/2021

Privacy-Preserving Graph Convolutional Networks for Text Classification

Graph convolutional networks (GCNs) are a powerful architecture for repr...
research
03/01/2023

What Are the Chances? Explaining the Epsilon Parameter in Differential Privacy

Differential privacy (DP) is a mathematical privacy notion increasingly ...
research
08/23/2018

Privacy-Preserving Synthetic Datasets Over Weakly Constrained Domains

Techniques to deliver privacy-preserving synthetic datasets take a sensi...
research
04/18/2018

When the signal is in the noise: The limits of Diffix's sticky noise

Finding a balance between privacy and utility, allowing researchers and ...
research
01/04/2021

Covert Embodied Choice: Decision-Making and the Limits of Privacy Under Biometric Surveillance

Algorithms engineered to leverage rich behavioral and biometric data to ...

Please sign up or login with your details

Forgot password? Click here to reset