Honest-but-Curious Nets: Sensitive Attributes of Private Inputs can be Secretly Coded into the Entropy of Classifiers' Outputs

05/25/2021
by   Mohammad Malekzadeh, et al.
23

It is known that deep neural networks, trained for the classification of a non-sensitive target attribute, can reveal sensitive attributes of their input data; through features of different granularity extracted by the classifier. We, taking a step forward, show that deep classifiers can be trained to secretly encode a sensitive attribute of users' input data, at inference time, into the classifier's outputs for the target attribute. An attack that works even if users have a white-box view of the classifier, and can keep all internal representations hidden except for the classifier's estimation of the target attribute. We introduce an information-theoretical formulation of such adversaries and present efficient empirical implementations for training honest-but-curious (HBC) classifiers based on this formulation: deep models that can be accurate in predicting the target attribute, but also can utilize their outputs to secretly encode a sensitive attribute. Our evaluations on several tasks in real-world datasets show that a semi-trusted server can build a classifier that is not only perfectly honest but also accurately curious. Our work highlights a vulnerability that can be exploited by malicious machine learning service providers to attack their user's privacy in several seemingly safe scenarios; such as encrypted inferences, computations at the edge, or private knowledge distillation. We conclude by showing the difficulties in distinguishing between standard and HBC classifiers and discussing potential proactive defenses against this vulnerability of deep classifiers.

READ FULL TEXT

page 5

page 28

research
01/23/2022

Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Increasing use of machine learning (ML) technologies in privacy-sensitiv...
research
03/16/2023

Image Classifiers Leak Sensitive Attributes About Their Classes

Neural network-based image classifiers are powerful tools for computer v...
research
07/08/2022

Probing Classifiers are Unreliable for Concept Removal and Detection

Neural network models trained on text data have been found to encode und...
research
02/16/2021

Evaluating Fairness of Machine Learning Models Under Uncertain and Incomplete Information

Training and evaluation of fair classifiers is a challenging problem. Th...
research
12/08/2022

Vicious Classifiers: Data Reconstruction Attack at Inference Time

Privacy-preserving inference via edge or encrypted computing paradigms e...
research
05/13/2018

AttriGuard: A Practical Defense Against Attribute Inference Attacks via Adversarial Machine Learning

Users in various web and mobile applications are vulnerable to attribute...
research
11/09/2022

QuerySnout: Automating the Discovery of Attribute Inference Attacks against Query-Based Systems

Although query-based systems (QBS) have become one of the main solutions...

Please sign up or login with your details

Forgot password? Click here to reset