Quantifying and Mitigating Privacy Risks of Contrastive Learning

02/08/2021
by   Xinlei He, et al.
6

Data is the key factor to drive the development of machine learning (ML) during the past decade. However, high-quality data, in particular labeled data, is often hard and expensive to collect. To leverage large-scale unlabeled data, self-supervised learning, represented by contrastive learning, is introduced. The objective of contrastive learning is to map different views derived from a training sample (e.g., through data augmentation) closer in their representation space, while different views derived from different samples more distant. In this way, a contrastive model learns to generate informative representations for data samples, which are then used to perform downstream ML tasks. Recent research has shown that machine learning models are vulnerable to various privacy attacks. However, most of the current efforts concentrate on models trained with supervised learning. Meanwhile, data samples' informative representations learned with contrastive learning may cause severe privacy risks as well. In this paper, we perform the first privacy analysis of contrastive learning through the lens of membership inference and attribute inference. Our experimental results show that contrastive models are less vulnerable to membership inference attacks but more vulnerable to attribute inference attacks compared to supervised models. The former is due to the fact that contrastive models are less prone to overfitting, while the latter is caused by contrastive models' capability of representing data samples expressively. To remedy this situation, we propose the first privacy-preserving contrastive learning mechanism, namely Talos, relying on adversarial training. Empirical results show that Talos can successfully mitigate attribute inference risks for contrastive models while maintaining their membership privacy and model utility.

READ FULL TEXT

page 5

page 7

page 8

page 11

research
09/10/2020

Privacy Analysis of Deep Learning in the Wild: Membership Inference Attacks against Transfer Learning

While being deployed in many critical applications as core components, m...
research
02/22/2022

Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning

Indiscriminate data poisoning attacks are quite effective against superv...
research
07/30/2020

Label-Leaks: Membership Inference Attack with Label

Machine learning (ML) has made tremendous progress during the past decad...
research
08/17/2022

On the Privacy Effect of Data Enhancement via the Lens of Memorization

Machine learning poses severe privacy concerns as it is shown that the l...
research
11/09/2021

Membership Inference Attacks Against Self-supervised Speech Models

Recently, adapting the idea of self-supervised learning (SSL) on continu...
research
06/20/2023

Understanding Contrastive Learning Through the Lens of Margins

Self-supervised learning, or SSL, holds the key to expanding the usage o...
research
10/20/2022

How Does a Deep Learning Model Architecture Impact Its Privacy?

As a booming research area in the past decade, deep learning technologie...

Please sign up or login with your details

Forgot password? Click here to reset