Differential Privacy, Linguistic Fairness, and Training Data Influence: Impossibility and Possibility Theorems for Multilingual Language Models

08/17/2023
by   Phillip Rust, et al.
0

Language models such as mBERT, XLM-R, and BLOOM aim to achieve multilingual generalization or compression to facilitate transfer to a large number of (potentially unseen) languages. However, these models should ideally also be private, linguistically fair, and transparent, by relating their predictions to training data. Can these requirements be simultaneously satisfied? We show that multilingual compression and linguistic fairness are compatible with differential privacy, but that differential privacy is at odds with training data influence sparsity, an objective for transparency. We further present a series of experiments on two common NLP tasks and evaluate multilingual compression and training data influence sparsity under different privacy guarantees, exploring these trade-offs in more detail. Our results suggest that we need to develop ways to jointly optimize for these objectives in order to find practical trade-offs.

READ FULL TEXT

page 25

page 26

page 27

page 28

page 29

page 31

page 32

page 33

research
07/09/2020

The Trade-Offs of Private Prediction

Machine learning models leak information about their training data every...
research
01/29/2021

N-grams Bayesian Differential Privacy

Differential privacy has gained popularity in machine learning as a stro...
research
06/26/2021

Benchmarking Differential Privacy and Federated Learning for BERT Models

Natural Language Processing (NLP) techniques can be applied to help with...
research
05/12/2022

On the Economics of Multilingual Few-shot Learning: Modeling the Cost-Performance Trade-offs of Machine Translated and Manual Data

Borrowing ideas from Production functions in micro-economics, in this pa...
research
11/04/2022

Intriguing Properties of Compression on Multilingual Models

Multilingual models are often particularly dependent on scaling to gener...
research
07/28/2023

Holistic Survey of Privacy and Fairness in Machine Learning

Privacy and fairness are two crucial pillars of responsible Artificial I...
research
09/20/2023

Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness

As Large Language Models (LLMs) have advanced, they have brought forth n...

Please sign up or login with your details

Forgot password? Click here to reset