Eavesdrop the Composition Proportion of Training Labels in Federated Learning

10/14/2019 ∙ by Lixu Wang, et al. ∙ Zhejiang University Northwestern University 12

Federated learning (FL) has recently emerged as a new form of collaborative machine learning, where a common model can be learned while keeping all the training data on local devices. Although it is designed for enhancing the data privacy, we demonstrated in this paper a new direction in inference attacks in the context of FL, where valuable information about training data can be obtained by adversaries with very limited power. In particular, we proposed three new types of attacks to exploit this vulnerability. The first type of attack, Class Sniffing, can detect whether a certain label appears in training. The other two types of attacks can determine the quantity of each label, i.e., Quantity Inference attack determines the composition proportion of the training label owned by the selected clients in a single round, while Whole Determination attack determines that of the whole training process. We evaluated our attacks on a variety of tasks and datasets with different settings, and the corresponding results showed that our attacks work well generally. Finally, we analyzed the impact of major hyper-parameters to our attacks and discussed possible defenses.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 5

page 10

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The emergence of federated learning (FL) enables multiple devices to learn a common model while keeping all the training data on their own devices. It allows for less resource consumption on the cloud and ensures the privacy at the same time. Multiple applications have benefited from FL, including mobile phones [1, 2, 3], wearable devices [4, 5], autonomous vehicles [6, 7], etc. In standard federated learning, all participants are required to train their local models. A random subset of clients will be selected each round, who will upload their gradient updates to the central server. Similar FL architectures can be found in [8, 9, 10, 11, 12, 13].

One interesting question here is about the security and privacy implication in the FL training process. Any characteristic of clients’ private data needs to be protected carefully since it may reveal some important private information about the training data – e.g., the distribution of labels might show the diversity of participants. Similarly, what the training data consists of is also what attackers want to explore, i.e., can they determine the quantity proportion of different labels in the whole training dataset during the training process? This problem may pose serious threats to FL security. For instance, an attacker can acquire information about the morbidity of a particular disease if the government is training an online disease diagnosis system. A malicious store can figure out the relation between the supply and demand of a certain product when there is a new commodity registration system trained with FL approach, and it can then adjust its price accordingly to gain unfair advantage.

In the literature, there are mainly two areas of research on attacking FL models: division and aggregation, which correspond to the two main roles in FL (distributed devices and central server). The former one assumes that the attacker compromises some participated devices and uses them to achieve malicious intentions, e.g. importing backdoor to FL [14, 15], adversarial poisoning [16, 17, 18], membership or property inference [19, 20, 21, 22], and reconstruction attack [23, 24, 25]. The attacks in the latter area are relatively less-studied. In [26], the authors assume that the central server is malicious and train a GAN to reproduce data samples similar to the privacy of clients. Similar idea can also be seen in [27]. Please see Section VI for a more comprehensive discussion on related works.

These previously-studied attacks to FL, e.g., membership inference or reconstruction attack, did not lay much emphasis on the quantity information in training, as they usually focus on existential information, i.e., whether a certain sample exists in training data. Another drawback of these approaches is that they all need individual updates that clients sent to the server. However, under the secure aggregation protocol [28] or differential privacy techniques [29, 30], both the participants and the server cannot acquire the individual updates in the plain form, which make most of these attacks difficult. Therefore, more practical and applicable attacks should be based on the assumption that the observation of individual updates is not available.

The aforementioned issues motivated us to consider attacks without asking individual updates. In this paper, we propose three new inference attacks with high success rate and without the need of any gradient updates from individual clients. In addition, our attacks concentrate on the quantity information of training data in FL, which could lead to serious consequences but has never been studied in prior works to the best of our knowledge. We conducted extensive experiments to evaluate the effectiveness and generality of our approaches, and the results showed the existence of vulnerability from quantity privacy leakage.

Our contributions.

In this paper, we make the first step towards quantity estimation attacks in federated learning. Specifically:

  • We propose a new attacking surface in the context of federated learning, i.e., inferring the quantity composition proportion of different labels in the training process. For instance, an attacker may learn how many data samples with each label are used in the training of a certain learning model, which may possibly pose considerable privacy threats to the practical application of FL.

  • We design three general attacks towards FL without the need to observe any individual updates. This enables the adversaries to launch our attacks successfully in FL even with secure aggregation protocols or under the protection of differential privacy. Our attacks are passive, which means they will not impose any influence to the training process, and thus they can work covertly without being detected by many intrusion detection techniques.

  • Our technique can infer the labels’ quantity composition proportion of a single training round, or the whole training process. The former aims at stealing the quantity information of the training data owned by selected clients, while the latter one targets the quantity proportion of all participants at different training stages.

Ii Threat Model

Ii-a Problem

The advancement of deep learning techniques has received significant interests in recent years. It also has a wide range of applications on different types of devices, which gives a bright stage for FL to show its merits on convenience, privacy protection and resource utilization. In fact, FL has shown great promises not only on smartphone-based applications (e.g., human activity recognition 

[1], heart rate monitoring [4] and keyboard prediction [2, 3]), but also in other fields, such as healthcare industry (e.g., disease diagnosis online expert system and medical insurance registration [31]) and transportation systems (vehicular networking technology [6, 7]).

FL is designed to preserve the private data of individuals, and any information or characteristic should be protected seriously. In FL architecture, training data owned by clients comes from various sources, so the quantity of samples among different labels might be unbalanced, which reflects the clients’ overall characteristic. It is a potential source of information leakage if these quantities are illegally obtained by malicious attackers. For instance, an organization wants to build up an online disease prediction system with FL structure among thousands of hospitals. Each hospital trains their local model with their own data, and the organization will obtain a global model that is able to predict the trend of many diseases, not just a small group of diseases emerged in a single hospital. Then, there is a malicious player who wants to know how many hospitals have treated a particular disease, so that it can raise the corresponding treatment expenses and even estimate the approximate distribution scope of this disease. This is a simple example of attackers may try to learn the composition proportion of training data for their advantage, and many other applications could have similar concern.

Thus, in this work, the goal of attackers is defined as to infer the quantity information of particular training labels, especially the composition proportion of training labels in a single training round and the whole training process.

Ii-B Assumptions

Unlike prior inference attack, the application setting here is based on more realistic scenarios, i.e., the central aggregation server will choose a set of clients randomly from thousands of participants, which we call the selection process

, and collect their gradient updates generated by training local models with each own data in every training epoch. After such collection, we assume a secure aggregation algorithm, which is an important characteristic of FL, is executed so that the server cannot observe individual updates sent by clients in plain text form but can only acquire the aggregated value.

According to the property inference attack [19, 32, 23]

, we know that particular batches, or particular property of training data, can result in change of gradient on corresponding neurons but have little effect on other neurons. As we know, different training labels are units of different features. Given that we sum up all property inferences towards the feature set for a particular label, is it possible to infer some information about such label rather than just its properties? As we discovered, the answer is yes: an adversary can infer some important information about the training label by analyzing gradient changes in the training process. Here, without loss of generality, we assume that the same labels possessed by different clients result in similar local gradient changes. And if we can determine the global updates consist of how many such local changes, then the number of clients who own the same labels can be obtained.

Fig. 1: The basic workflow of our inference attacks. The server collects the gradient updates from selected clients and aggregates them to the current global model . The attacker downloads the current global model , and trains different labels respectively on the same to obtain corresponding updates . He can then estimate the quantity proportion of labels in the training data by analyzing these updates.

Ii-C Attacker Capacity

One of the key features of our attacks is that they do not require any observation of individuals’ gradient updates, which makes them much easier to be launched than previous attack models. Other basic pre-requirements are similar to other attacks, as discussed below.

The attacker should obtain some control of a legal participant in FL, specifically, he should be able to acquire complete privileges of reading the content of messages from the aggregation server, comprehending the structure of local model, and modifying or changing the training data with full freedom. He will need some prior knowledge about the training process, i.e., the average number of labels owned by each participant and the probable number of data samples per label. Such information can be estimated by collecting the data of a few participants and performing simple statistical analysis. At last, the attacker should know the approximate number of clients selected by the server in a single training round.

Ii-D Attack Overview

We propose three original label inference attacks in FL environment:

  1. Class Sniffing. In a single training round, the adversary is able to infer whether a particular class of training data appears.

  2. Quantity Inference. In a single training round, the attacker can make a judgment that whether a certain training label is owned by a small group of clients or a large group, and predicts how many clients own this label.

  3. Whole Determination. The malicious participant aims to obtain the composition proportion about the dataset labels of current global model.

The inferred information about training labels can be applied to many fields. We list three possible scenarios here:

  1. Use the rare ‘labels’ to identify clients, since these labels are usually owned by extremely few people. Specifically, if such labels are detected in training by our attacks, the attacker can know who participate in the training.

  2. Apply this approach to detect the intrusion of malicious participants. The intrusion phenomenon, e.g., backdoor and poisoning attack in FL, was studied in prior works. For instance, Fung et al. [16]

    applied cosine similarity to detect Sybil, and

    Bagdasaryan et al. [14] proposed a type of powerful backdoor attack under FL scenario. They all mentioned that the updates provided by malicious attackers are different from that of benign clients. Thus, we can regard adversarial data as a type of unique labels that are only owned by malicious clients, and detect them in the training process.

  3. Obtain the composition proportion of labels in the training process. We may use some other techniques to train the learning model better (such as data augmentation, focal loss [33]) if we find that the training labels are unbalanced.

Iii Design

Iii-a Background

In supervised learning, we denote the loss function as

(1)

where

(2)
(3)

Here could represent distinct formats under different scenarios, e.g., Mean Square Error (MSE) or Mean Absolute Error (MAE). is the number of label classes. is the mapping from inputs to target label ; is the learning model that maps the inputs to the prediction label .

The objective of the training process is to minimize the loss of the network, and here we choose the popular stochastic gradient descent (SGD) method to be the network’s optimizer. SGD decides how to modify the network parameters

in each training iteration. Specifically, it calculates the opposite direction of the gradient of the loss function in terms of every member in , combines it with the learning rate , and updates to the next state. When the value of the loss function shrinks to a relative bottom bound, the training process stops. The calculation of gradients is implemented by back-propagation operation from the last to the first layer of the whole network, and the standard updated formula of SGD is .

Iii-B Overview

The basic process of our attack is presented in Figure 1. There are one or more observers, in other words, adversarial attackers in the training process. At each training iteration , they download the current global model, which is the detailed parameter information of the network and denoted as (hence FL problems are always in the white-box form). Next, the attackers can train local model with auxiliary dataset to obtain relatively standard gradient changes , and then conduct analysis between the global updates and to determine whether a particular label appear in the training round . Furthermore, based on a much deeper comparison between the magnitude of and , the quantity information, i.e., how many clients own a particular training label, can also be acquired. One thing to note is that if these observers/attackers are selected as training clients, their contribution to the global model needs to be removed when comparing the magnitude of and . We name the former type of attack (determining whether a particular label appears) as Class Sniffing, and the latter types (acquiring quantity information) as Quantity Inference. These two types of attack are from the perspective of a single training round. We also propose another label inference attack, Whole Determination, which can determine the composition proportion of training labels in the whole training process.

Fig. 2: The positions of the neuron weights we are interested in, denoted as of labels , respectively.

Iii-C Class Sniffing

Like most prior work, we build these attack models on the supervised classifying task. We utilize a feed-forward neural network with output size equal to the number of total classes. For each training label, the position of output neurons is shown in Figure 

2. We discovered a phenomenon that is similar to the basis of property inference attack [32]. More specifically, in our experiments, we observed that, using a particular label in the training will make the inputting weights (the network connection weights denoted as in Figure 2

) of corresponding output neuron grow significantly and the weight vectors of other neurons decrease slightly. Such observation motivated our design of the Class Sniffing attack.

We use to denote the updates of weight set . Both and exist in a vector form with the size equal to the number of neurons in the layer before the output layer. For example, when we train a model on the MNIST dataset, the average increase achieves approximately , while the average decrease is . The worst case happens when there is no sample of a particular label in the training data, and then its corresponding inputting weights accept all negative impact without any positive benefit. This case can be simulated with our auxiliary data by restricting this particular label not to emerge in training, so that the weight updates of its corresponding neurons would be as the worst case. The inputting weight updates vector in such worst case can be regarded as a threshold . In a particular round, if the updates of corresponding to label are higher than , it means that appears in training; and if the weight changes are approximately close to this threshold, label can be considered absent in the training round. The detailed acquiring process of such thresholds is shown in Algorithm 1.

0:    Attacker’s auxiliary data samples of different training labels, ;Approximate number of labels owned by selected clients in a training round, ;Selection proportion of clients in a training round, ;Approximate number of whole participants, ;Inputting weight positions of output layer each label, ;
0:    The threshold that indicates the existence of a certain label in a training round, ;
  Begin:
  Receive from server
  for  = 1 to len(do
      = []
     for  in  do
         = Local_train(, ) -
         = acquire(, ) acquire updates from ;
        .append()
     end for
      = delete(, ) delete the updates of on ;
      = * * * mean()
  end forLocal_train(, ):
   use to train local model
   return local model
Algorithm 1 The threshold acquiring for Class Sniffing

Iii-D Quantity Inference

Similar to the workflow of Class Sniffing, in Quantity Inference, malicious attacker trains his local model using auxiliary data, especially just using data samples of a single label, and then obtain several local updates , where each corresponds to a label . And we denote the increase of as when the local model is trained with the samples of . The decreases on the same are s when the local model is input with the samples of other labels, and both and are vectors too. But their magnitudes are different, i.e., the extent of increase is much higher than that of decrease. The specific values of weight update magnitudes may be changing in different training rounds. Nevertheless, the information about magnitudes in different training rounds can be obtained by training local models on the current global model, just like what the attacker does in Class Sniffing.

As it happens, the positive effect of the increase can be offset by the accumulated impact of other decreases, and this phenomenon appears when a label is possessed by a small number of clients. However, we can still launch the following attack with the existence of above phenomenon. The details of such Quantity Inference attack is described in Algorithm 2 and explained below.

The changes of inputting neuron weights do reflect the quantity information about training data, but not all of them possess such evident reflection, which means part of weights increase less than the rest and sometimes they could decrease even if the corresponding label appears in training. This set of weights are easily to experience the aforementioned ‘Offset’ phenomenon, which could make the attack fail. Hence, the first question from the attacker perspective is how to remove them from the original intact inputting neuron weights. First, when we train the network with the data of a certain label in the training process, its existence will make the corresponding inputting neuron weights grow while the inputting neuron weights of other labels decrease. Following the simple superposition rules, the higher the ratio between magnitudes of and is, the easier ‘Offset’ phenomenon emerges. Thus, we can set a threshold towards , and compare on each inputting weight in the weight vector with the s on the same

corresponding to other local updates. If there is an outlier whose corresponding ratio

is higher than , we will delete it from the original set and get a new set , as shown in Algorithm 2 from Line 9 to Line 19.

Next, let us take a label as an example. After local training process using auxiliary data, there will be following local updates . Correspondingly, its original updates of the inputting weights are , which is a vector with the size equal to the number of neurons in the layer before the output. Then, we can regard the members of on as , and the updates vectors of the same on other s as s, followed by the process of deleting the aforementioned outliers and obtaining the new set . Next, we are able to regard each member of on as , which denotes the increase when label is owned by a single client, and the averaged value of each member of on other s as , which indicates the negative impact of other labels. With and , we can calculate all possible numbers of clients who own label by using each inputting weight change in . The client number calculation formula is (5), which is a derivation form of the simple average aggregation shown in (4).

(4)
(5)

Here, indicates the average number of labels owned by selected clients, which is the same as that in Algorithm 1. is the predicted number of clients, and each corresponds to such a . However, there are still abnormal weight changes whose corresponding s are unreasonable. For instance, providing that there are clients in a particular round, some s could be larger than or less than (the circumstance of less than is regarded as outlier since we assume label has been proven by Class Sniffing to be present in training). Thus, we also need to remove them from the current weight change set and obtain a final version , which is shown in Figure 3, and the detailed steps can be seen in Algorithm 2 from Line 20 to Line 27. The final number of clients who owns label can be determined by the mean of s corresponding to

. Another point worth mentioning is that the standard deviation of

s corresponding to should ideally be small, however occasionally it is large, in which case we abort the Quantity Inference attack. Such scenario happens at an extremely low frequency (below 1% in the whole training process).

Fig. 3: The inputting weight updates of the output layer (its former layer has 50 neurons) when the model is trained with the samples of label 0. And the updates circled by red ellipse are .
0:    Attacker’s auxiliary data samples of different training labels, ;Approximate number of labels owned by selected clients in a training round, ;Selection proportion of clients in a training round, ;Approximate number of whole participants, ;Inputting weight positions of output layer each label, ;The interest label, ;Ratio threshold ;
0:    The number of clients who own , ;
1:  Begin:
2:  Receive from server
3:   = []
4:  for  in  do
5:      = Local_train(, ) -
6:      = acquire(, ) acquire updates from ;
7:     .append()
8:  end for
9:   = []
10:   = []
11:   = delete(, ) delete the updates of ;
12:  for each in  do
13:      = [][]
14:      = mean()[]
15:      = /
16:     if  then
17:        .append()
18:     end if
19:  end for
20:  for each in  do
21:      = [][]
22:      = mean()[]
23:     
24:     if  or  then
25:        delete(, ) delete from ;
26:     end if
27:  end for
28:   =
29:  for each in  do
30:      = [][]
31:      = mean()[]
32:     
33:     .append()
34:  end for
35:   = mean()
Algorithm 2 Quantity Inference Attack

Iii-E Whole Determination

If the attacker is not sensitive about time immediacy and patient enough, which means he cares about the composition proportion of entire training data over a long training span rather than just a single or several training rounds, we can propose another new attack. This attack lays emphasis on the overfitting characteristic of learning model when the training process sustains constantly, which suits best to FL application scenarios.

Let us describe an example case. In the training process of a deep neural network, there are a set of labels appearing frequently (the number of samples is large) and other labels appearing occasionally (the number of samples is small), denoted by and , respectively. Like the former attacks, the attacker downloads current global model and trains his local one with auxiliary data inputted label by label, and eventually he can obtain all corresponding local gradient updates of each label, i.e., . When we investigated the inputting weight changes of one particular frequent label and an occasional label, for instance, of and of , we observed an interesting phenomenon that the corresponding absolute value of and on other s (except and ) present a huge difference. That is to say that the absolute values of in other frequent gradient updates, i.e., , are much higher than those of in the same gradient updates . This is from the perspective of . Similar huge difference can be seen for (e.g., the difference between the on and the on ). In various experiments, such phenomenon can be easily observed. For instance, it appears after 10 epochs in the MNIST classifier training process, which is shown in Figure 4 and Figure 5. The attacker is then able to analyze this phenomenon to access the information about the composition proportion of training labels.

Fig. 4: The ratio change of a frequent label among training rounds.
Fig. 5: The ratio change of an occasional label among training rounds.

Iii-E1 Explanation

Let us give a possible explanation about this phenomenon here. The connection between inputting weight updates of the output layer and the corresponding labels is strong enough that we can regard the neuron weights as the features set of each label. Then, (2) is changed to

(6)
(7)

Here . We assume that the features embedded in the neurons of the output layer are highly independent of each other after the extract and filter of front layers. This independence is regarded as they are irrelevant to each other, hence their derivatives to each other are zero:

(8)

The output format of a classifier is a type of probability vector, and the outputting result of a particular input is the label corresponding to the highest dimension in the probability vector. Then, we know that the quantities of data samples in terms of different labels are different, hence their proportions in the target label are different. We thus hypothesize that the proportion can be measured by calculating the derivative of on the features of the output layer. If a label has more samples, its proportion will be greater, and vice versa. Then, it is presented as

(9)

Then the specific format of the updates is and , which corresponds to the aforementioned phenomenon from the perspective of .

The here denotes the next version of ideal model when the current global model accepts the input of label training samples, hence it is natural that there is only difference existing in the derivatives of between these two versions to some extent. In other words, the differences of other s can be neglected:

(10)

Since and are frequent labels, is an occasional label, and is the current global model, which is the same in both cases, we can obtain that

(11)
(12)

Combine the above results, we can conclude that

(13)

Iii-E2 Attack Approach

Based on the phenomenon above (13), we can conclude that the ratio of occasional labels is larger than that of frequent labels, and leverage such conclusion to determine the quantity relation between different labels. Like former attacks, when the adversary launches the Whole Determination attack, he trains his local model on the basis of with auxiliary data , and correspondingly obtains local updates . Then, he can calculate all s by using data from . Next, these s will be formed as a vector , where each label corresponds to a vector

. Finally, these vectors can be clustered into different groups by an unsupervised algorithm, and the vectors being in the same group indicates their corresponding labels have approximately the same number of data samples in training. The quantity could present huge differences if labels belong to different clusters. Here, the unsupervised algorithm we adopted is Hierarchical Clustering, which can classify given data into different clusters with the metric of Euclidean Distance. Attackers may also choose other clustering approaches.

Iv Evaluation

All experiments were conducted on a workstation running Ubuntu 18.04 LTS equipped with a 2.10GHz CPU Intel Xeon(R) Gold 6130, 64GB RAM, and an NVIDIA TITAN RTX GPU card. We construct the model mainly on PyTorch

[34], and use Scipy-scikit-learn [35] to implement some machine learning models.

Iv-a Experiment Setting

Iv-A1 Auxiliary Data

Like many other inference attacks on machine learning, the attacker also needs auxiliary data. It consists of some data samples of the labels he wants to infer. In practice, such data samples are often not difficult to acquire. The number of data samples should be close to the average quantity owned by participants. If the samples are not enough, the attacker may try reproduction techniques such as GAN to construct more similar samples.

Iv-A2 Network Structure

The main structure is based on standard construction of federated learning [36], with some modifications for practical purpose, e.g., participating clients are able to process several epochs locally rather than just a single epoch before sending their updates to the aggregation server [37]. The symbols of major hyper-parameters are defined in Table I.

Symbols Description
Local model training batch size
Local model learning rate
Local model training epochs
Selection proportion of clients in a training round
Selected models to accomplish learning task
Approximate number of whole participants
TABLE I: Hyper-parameters of FL in evaluation

Iv-A3 Datasets

The dataset information (number of training labels and corresponding training model) are presented in Table II

. We choose these datasets that are close to our concerns about privacy in daily life. For instance, Fer2013 is relevant to face recognition, while HAM10000 aims at diagnosing several skin cancers. Both of them contain private information owned by different people.

Datasets #Records #Labels Model
MNIST MLP & CNN
CIFAR10 LeNet5 & Resnet18
Fer2013 Resnet18
HAM10000 Resnet18
TABLE II: Datasets and relevant information in our experiments
  • MNIST. As one of the most popular and classical datasets in machine learning, MNIST includes 10 labels, each of which corresponds to approximately 6,000 32

    32 gray handwritten digital images. 5,000 of them are training data, while 1,000 are for testing. Because of its simplicity and the small number of total training data, it is not easy for deep and complicated models to achieve high performance. Hence we choose a standard but simple MLP (multi-layer perceptron) and a CNN model for it, both of which are able to achieve

    accuracy.

    The MLP model contains an input layer, followed by two fully-connected hidden layers of size 256 and 64. There is a dropout operation between hidden layers, and finally it has an output layer with size of 64. We use rectified linear unit (ReLU) as the activation function for all layers. Other settings are

    , , . The CNN model consists of two spatial convolution layers with 10 and 20 filters (kernel size is 5

    5), max pooling layers with size set to 2, a dropout layer and a fully-connected layer with size 320, and finally an output layer whose size is 50. The activation function is ReLU, and other settings are the same as MLP.

  • CIFAR10. CIFAR10 consists of 10 classes containing 6,000 3232 RGB images each, which can also be divided into 5,000 for training and 1,000 for testing. The entire training labels contain common objects in daily life, suitable for the object identifying task on smartphones. For clustering model, we select two commonly used networks, LeNet5 [38] and ResNet18. The former can achieve accuracy, while the latter can achieve . LeNet5 consists of two convolution layers with 6 and 16 filters respectively (kernel size of them is 5

    5), pooling layers with size set to 2, two fully-connected linear layers with size set to 400 and 120, and an output layer and a softmax layer. Its parameter setting is

    , , . The specific network structure of Resnet18 can be found in [39]. The parameter setting on Resnet18 is , , .

  • Fer2013. Fer2013 [40] originates from a Kaggle competition, which is Facial Expression Recognition Challenge 2013, and it aims to build a learning model to recognize human’s expression automatically. It contains approximately 30,000 facial RGB images of different expressions with size restricted to 4848, and the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral. The Disgust expression has the minimal number of images – 600, while other labels have nearly 5,000 samples each. We randomly select of the samples for each label as the training data, and use the rest for testing. We choose Resnet18 as the learning model for Fer2013. It is trained under the setting , , . The testing accuracy is .

  • HAM10000. HAM10000 [41] is a large collection of multi-source dermatoscopic images of pigmented lesions. There are nearly 37,000 records about skin lesions, and they are classified into 7 labels, i.e., 0=Melanocytic nevi, 1=Melanoma, 2=Benign keratosis-like lesions, 3=Basal cell carcinoma, 4=Actinic keratoses, 5=Vascular lesions, 6=Dermatofibroma. Each label corresponds to approximately 5,000 images. Similarly, we randomly divide the data into 4,500 for training and 500 for testing. HAM10000 is also trained on Resnet18, and its parameters are , , . The testing accuracy is .

Iv-B Class Sniffing

In order to simulate more practical application scenarios of FL, we try to allocate training dataset samples randomly. Let us take MNIST as an example. We create a setting with 100 participants. In each training round, the server is required to select 10 clients randomly and collect their gradient updates to form an aggregated global model. In our setting, each participant can possess 3, 4 or 5 main labels and a small number of other labels, and the number of data samples per main label is much larger than that of other labels. All participants select their own main labels randomly. The data allocations of other datasets are similar to that of MNIST, which we believe can simulate the practical scenarios to some extent.

Dataset Model Success Rate(%)
Labels 0 1 2 3 4 5 6 7 8 9
MNIST MLP 94 97 95 98 99 93 94 96 97 95
CNN 96 97 98 93 96 95 95 96 98 94
CIFAR10 LeNet5 92 94 97 99 98 93 96 98 99 96
Resnet18 93 97 97 94 97 95 96 93 98 97
Fer2013 Resnet18 99 94 95 94 97 98 98 - - -
HAM10000 Resnet18 93 97 98 98 98 96 95 - - -
TABLE III: The success rate of Class Sniffing on experiment datasets
Dataset Model Success Rate(%)
Labels 0 1 2 3 4 5 6 7 8 9
MNIST MLP 91 91 94 93 93 94 92 97 96 92
CNN 95 96 98 98 100 95 95 100 100 96
CIFAR10 LeNet5 96 94 98 96 96 92 94 97 94 94
Resnet18 92 95 94 93 95 92 98 93 95 96
Fer2013 Resnet18 97 92 100 100 98 95 98 - - -
HAM10000 Resnet18 95 93 94 95 93 94 97 - - -
TABLE IV: The success rate of Quantity Inference on experiment datasets

The goal of Class Sniffing is to predict whether a certain label appears in a training round, hence the evaluated metric here is the success rate of prediction. That is, if we correctly detect the existence of a label for

times and fail for times in several training rounds, then the success rate is . We perform this attack on all datasets with each own standard model for 100 training rounds, and the results are presented in Table III. As shown in the table, the success rate is relatively high (above ) for all datasets, which demonstrates the effectiveness of Class Sniffing.

Iv-C Quantity Inference

The Class Sniffing attack is designed to detect the existence of a particular label, while Quantity Inference aims to acquire the quantity information of a label in a single training cycle. Because the pre-settings of Class Sniffing are relatively practical, there is no need to change them here. That is, the participants of FL are also set to 100, the randomly selected fraction is , and the allocation strategy in terms of dataset samples is the same.

Iv-C1 Metrics

Considering both the threat model and the problems we want to solve, we need to define a new metric to evaluate the attack here. The main idea is to set an error bound and evaluate how often the attacker can estimate the number of clients possessing a particular label within the error bound. Specifically, in a particular training round, assume there are clients possessing a label , and the attacker launches Quantity Inference in this round and obtains an estimated number of clients who possess label . We regard an attack successful if , while failed when , where the error bound controls the accuracy requirement. We set in our experimental evaluation. Then, by recording the number of times the attacker successfully make the estimation (i.e., within the error bound) and fails it (i.e., larger than the error bound), we can calculate the success rate.

Iv-C2 Results

We evaluates the Quantity Inference attack on all datasets with each own standard models for 100 training rounds, and the results are shown in Table IV. We can see that the success rate is high for all datasets, i.e., between and , which shows the effective of our Quantity Inference attack. Moreover, the results are consistently high across the four datasets, which shows the broad applicability of our approach.

Iv-C3 Impact of Hyper Parameters

We also study how Quantity Inference is affected by hyper-parameters, in particular (local model training batch size) and (local model training epochs). We choose MNIST with CNN for our study, and fix other settings as in Sec. IV-A3 when we evaluate a particular parameter. The results are shown as Figure 6 and Figure 7.

For the impact of different batch sizes shown in Figure 6, we can observe that success rate hits the bottom () when the batch size is set to relatively small, and reaches a relatively high level () with batch size at 20. As we know, the batch size in training should not be set too small as it may lead to more calculation iterations and cause the model perform bad. Hence, we think Class Sniffing should be effective under common batch size settings.

We can also clearly observe the impact of local training epochs in Figure 7, where the success rate decreases slightly when the local epoch rises (the lowest is ). To our knowledge, the standard application of FL usually allows participants to train their local model only one epoch. The reason is that considering the limited computation capacity of local devices, more epochs will drastically increase the computation load and may affect the normal operation of those devices. Recently, some researchers propose that local devices can shoulder more computation tasks [37], which means more local training epochs are possible, however more than 10 epochs would still be quite demanding. Thus, we believe that Quantity Inference should work well in most circumstances, even when there are multiple local training epochs (as long as not too many).

Fig. 6: The success rate of Quantity Inference with different training batch sizes among 100 training rounds.
Fig. 7: The success rate of Quantity Inference with different local training epochs among 100 training rounds.

Iv-C4 Quantity of Participants

To further investigate the practicality of Quantity Inference, we consider changing the selection proportion and the overall number of participants . In our former default settings, and . In this study, we explore a range of values, and the results are shown in Figure 8 and Figure 9.

We first fix the overall number of participants to 100, and change the selection fraction from 0.1 to 0.5 with the step of 0.1. If we still use the original metric in Sec. IV-C1, which is denoted as in the figures, we can see that the success rate shows a moderate decline from to when the proportion is increasing. Such trend is reasonable, as the more clients there are in each round, the harder it is to achieve the same success rate under an absolute error bound. However, we could consider a new metric based on a relative error bound (depending on the number of participants), which is defined as the difference between the real number of clients possessing a label and the estimated number with respect to an error bound rather than of . Under , the success rate always stays at a high level (near ).

Then, we make the selection proportion unchanged at , and increase the number of overall participants from 100 to 1000. We can observe the trend that the success rate slightly decreases when the number of participants increases. However, if we use the metric based on relative error bound, we can see that the success rate stays at a high level (near ). Overall, both figures demonstrate the effectiveness of our Quantity Inference attack over a wide range of number of overall participants and selection proportion.

Fig. 8: The success rate of Quantity Inference with different selection proportions among 100 participants.
Fig. 9: The success rate of Quantity Inference with different numbers of participants with selection proportion .

Iv-D Whole Determination

Whole Determination is an attack typically towards developed models, which have been trained with considerable data samples and perform well on corresponding given tasks. Hence, for its evaluation, we choose to launch the attack in the middle and late stages of the training process when the model is near to convergence. However, it does not mean that Whole Determination cannot work in more advanced stages. Before the attack, all datasets above are required to train themselves under their own default settings. When the loss of model decreases to a relative small value, the attacker will use Whole Determination to obtain the composition proportion of training labels.

Iv-D1 Dataset Allocation

In previous experiments, the number of data samples with each label is decided via random selection, but here we cannot apply this selection strategy. We need to make the numbers of data samples belonging to each label have differences, otherwise we cannot evaluate the performance of Whole Determination. To start with, we should figure out the connection between the magnitude of ratio difference and the proportion difference of labels. We conduct experiments by changing the number of samples belonging to a certain label and record the corresponding ratio difference. The results are presented in Figure 10. As shown in the figure, obvious ratio difference can be observed when there is a four-fold difference in the number of samples owned by the two labels. As a result, we divide the whole labels into 3 groups randomly, and ensure that each label in the first group can be allocated with data samples, the second group can only acquire , and the last group just get samples. These groups will be used to train learning model, and what we want is to evaluate if our approach can detect this composition proportion.

Fig. 10: The ratio difference between two labels, e.g., label and , and the difference is . The index of horizontal axis () means the number of data samples owned by is of that owned by .

Iv-D2 Results

We conduct the experiments on all datasets. For each dataset, we train it for 20 times and launch Whole Determination attack in every training. The middle stage of training is defined as the round when the testing accuracy is approximately , while the late stage is when the testing accuracy exceeds . Clearly, the dataset allocation is different in each training process because of the random allocation. We also use success rate as our metric, and only consider the attack successful when the clustering results are exactly the same as the data allocation before training, including the number of clusters and the specific labels in each cluster; otherwise we regard it as a failure.

The results are presented in Table V. We can see that the average success rate is very high (almost 95%), which shows that Whole Determination attack can be effective under such circumstances. The success rate of middle stage is a little worse than that of late stage. We think the reason could be that the exploration direction of gradient is relatively more random in the middle stage of training.

Dataset Model Success Rate(%)
Stages Middle Late
MNIST MLP 95 100
CNN 95 100
CIFAR10 LeNet5 90 95
Resnet18 95 100
Fer2013 Resnet18 95 95
HAM10000 Resnet18 95 95
TABLE V: The results of Whole Determination on experiment datasets

V Discussion

V-a Network Layers

It could be asked why we consider the output layer and whether similar phenomenon exists in other layers, e.g., hidden layers. The main task of the front network layers is typically to extract and filter the features of training data, which means the objects to be processed are various features. The emergence of a particular feature leads to corresponding gradient updates, while its absence has no influence [32]. It is interesting to note that a certain label usually possesses many different features, in other words, it is an unity of multiple features. And different labels possibly have the same features, e.g., cats and dogs have similar fur features. What is more, in some special neural networks, the features embedded in front layers are not explainable or interpretable to human analysts practically, especially the cases in convolution operation. These characteristics make the front layers not applicable to our case, where we want to obtain the quantity information about training labels. However, in case we cannot access the output layer but only several front layers, we could try to apply some Explainable Machine Learning techniques to extract the key features of each class, like LIME [42] for linear classifiers and LEMNA [43] for deep neural networks. Explainable Machine Learning aims at figuring out why a particular input sample is clustered into its corresponding label and obtaining some relatively interpretable reasons for users, especially for debuggers and computer security practitioners. And these reasons, namely the key features, can be served as the identifications for different labels, and our method is possibly able to build on them.

V-B Defense

The three attacks proposed by us share similarity to property inference attack (albeit our focus switches from the existential proof to quantity information). Thus, some defenses designed for property inference may be leveraged to mitigate our attacks. Here we discuss two possible defenses.

V-B1 Compress the gradient updates

As mentioned by [44], there is no need to share the whole network parameters in collaborative learning. In other words, compressing or distilling the significant neuron updates can also make the global model converge and achieve great performance. As for our attacks, the adversary does not need to observe any individual updates, and what he needs is only the global model parameters, hence there is some possible impact to the attacks if the participants cannot acquire the whole global model (it can only acquire part of it).

We try to simulate such defense in our experiment setting, and we choose to keep the weights whose gradient updates are relatively greater and make other weights invisible to participants. Specifically, if we set the compression rate (CR) in advance, each client will select top CR of inputting weight updates to be uploaded to the server. This simple compression operation might make the convergence of global model slower to some extent, but the influence to performance of the model is not great. And under such circumstances, we conduct some experiments on Quantity Inference under the default setting (MNIST on CNN). The results can be seen in Table VI.

Compression Rate Success Rate(%) Aborting Rate(%)
0 96 0
0.1 95 3
0.2 96 10
0.3 97 20
0.4 94 28
0.5 95 37
TABLE VI: The results of compression defenses on MNIST

We launch Quantity Inference for 100 rounds under each CR setting. From the results, we can see that the success rate of our attack is not impacted by such defense. However, the aborting rate increases a lot since Quantity Inference is designed to abort the round whose corresponding standard deviation of ’s (Sec. III-D) is high. Thus, the aborting operation indirectly makes this attack less effective.

V-B2 Dropout

Another possible defense is adding Dropout Layer to the neural networks, which is also an effective approach to mitigate the overfitting phenomenon. Since its random removal of the features in training process, the dropout operation may make the gradient updates of clients more different from each other, which possibly poses some impact to our attacks. However in our initial evaluation experiments, especially for MNIST dataset, both MLP and CNN (Sec. IV-A3) have dropout layers, and the success rate of our three attacks is still extremely high. Thus, we are not sure whether the dropout technique is able to defense our attack methods and plan to conduct a deeper analysis in the future.

Vi Related Work

Vi-a Privacy-preserving Federated Learning

Federated learning is an evolution form of distributed learning, and it enables training data stay locally while a collaborative global model can be learned. Existing work with the considering of privacy can be classified into Differential Privacy mechanism (DP) and secure Multi-Party Computation (MPC). Geyer et al. [29] stand on the perspective of the client and realize differential privacy protection by adding Gaussian noise to the local updates under the setting where there are a large number of participants, and similar work can also be seen in [30]. Hamm et al. [45] apply knowledge transfer techniques to aggregate multiple models trained on individual devices with DP guarantee.

Bonawitz et al. [28] design secure multi-party aggregation techniques, pertinent for federated learning, to enable participants to encrypt themselves so that the central server cannot observe individual gradient updates in the plain form and only do the aggregation operation. Mohassel and Zhang [46] enable two servers to train a global model with multi-party encrypted data, and the training process is protected by MPC techniques.

Vi-B Inference Attack

Different types of inference attacks in collaborative setting emerge frequently. Hitaj et al. [23]

create a GAN structure to imitate the output probability distributions and use reverse learning to infer the training data.

Hayes et al. [25] note the privacy leakage in the scenario of machine-learning-as-a-service application and also train several GANs to detect overfitting characteristics of input-output pairs. Truex et al. [20] propose a membership inference threat on the surface of FL, but they assume FL is under machine-learning-as-a-service application and adversaries hold the ability that sniffing output probability distributions of all other clients rather than model parameter updates, which we think it is not an inference attack towards the standard FL. Melis et al. [19] lay emphasis on the unintended feature leakage in collaborative learning setting by training a shadow attack model to infer information about training data, and the threat model have been simplified to some extent. Different from above works, Wang et al. [26] assume that the aggregation server in FL is malicious, and they combine the main work of global model, identity distinguishing and traditional authentication task to form the mixed discriminator of a GAN that is able to track particular victims and reconstruct their private training data. Much similar to the aggregation feature of FL, in aggregated location field, Pyrgelis et al. [21] use a challenge game to distinguish the victims with other participates and then track the location information of particular victims.

There are also some relative work about property inference attack for traditional machine learning, in both white-box and black-box settings. For instance, under black-box circumstance, Salem et al. [24] use GAN to achieve reconstruction and then sniff the information about training data between different versions of learning model based on updates of output results. Ateniese et al. [32] use different property of training data to obtain several meta-models, and combine these meta-models to sense the existence of a particular label. Ganju et al. [47] construct an inference attack towards fully-connected neural networks, and they realize it by applying post-training techniques to a white-box model.

Vi-C Other Attacks and Defenses

Attack. Federated learning is a fertile research field for security problems, and there have been several other interesting attacks recently. Bagdasaryan et al. [14] create a type of backdoor approach in FL setting, which can pose the backdoor threat after only a few rounds of attack with high target class accuracy. Baruch et al. [15] propose a poisoning attack whose impact is profound and it can escape prevalent abnormal detection. They realize it by splitting abnormal parts in a few neurons to a large number of neurons, and they also investigate the capacity scope of current abnormal detection approaches on the degree of abnormality. Bhagoji et al. [17] explore the threat of model poisoning attacks on FL launched by a single, non-colluding malicious agent, where the adversarial objective is to cause the model to mis-classify a set of chosen inputs with high confidence.

Defense. Shen et al. [48] apply the clustering operation to individual parameter updates before aggregation to detect malicious participants in distributed learning setting. Blanchard et al. [49] use Euclidean Distance to measure the contribution of clients to global model, and design a selection strategy to tolerate the gradient contribution from Byzantine attackers. Fung et al. [16] present the impact of sybils attack in FL and design a detection algorithm by comparing the cosine similarity between gradient updates.

Vii Conclusion

In this paper, we proposed three original inference attacks against federated learning. The attack target includes the quantity composition proportion of training labels, a new consideration in FL security. Specifically, Class Sniffing can detect the existence of a particular label in a single training round; Quantity Inference is able to determine how many clients own a certain label from the perspective of a single iteration; and finally, Whole Determination aims to infer the quantity information among different labels for the whole training process. All of them work in a passive way, and they will not impose any influence to the whole FL structure, hence it is difficult for the prevalent intrusion detection techniques to detect our attacks. Besides, all three attacks do not require the observation of any individual gradient updates from participants, which enables the attackers to apply them in more practical scenarios.

We have conducted extensive experiments that demonstrate the effectiveness of our attacks, with evaluation settings as practical as we can. All three attacks are shown to be very effective, with their success rates staying at a relative high level (typically around ). Moreover, we also investigated the impact of major hyper-parameters, e.g., batch size, local epochs and the overall number of participants. The results demonstrate broad applicability of our approaches.

References