Log In Sign Up

Quantification of the Leakage in Federated Learning

by   Zhaorui Li, et al.

With the growing emphasis on users' privacy, federated learning has become more and more popular. Many architectures have been raised for a better security. Most architecture work on the assumption that data's gradient could not leak information. However, some work, recently, has shown such gradients may lead to leakage of the training data. In this paper, we discuss the leakage based on a federated approximated logistic regression model and show that such gradient's leakage could leak the complete training data if all elements of the inputs are either 0 or 1.


page 1

page 2

page 3

page 4


Parallel Distributed Logistic Regression for Vertical Federated Learning without Third-Party Coordinator

Federated Learning is a new distributed learning mechanism which allows ...

Privacy Leakage of Real-World Vertical Federated Learning

Federated learning enables mutually distrusting participants to collabor...

A Quantitative Metric for Privacy Leakage in Federated Learning

In the federated learning system, parameter gradients are shared among p...

Understanding Training-Data Leakage from Gradients in Neural Networks for Image Classification

Federated learning of deep learning models for supervised tasks, e.g. im...

Bayesian Framework for Gradient Leakage

Federated learning is an established method for training machine learnin...

A Novel Privacy-Preserved Recommender System Framework based on Federated Learning

Recommender System (RS) is currently an effective way to solve informati...

1 Introduction

Federated learning, where multiple parties construct a joint model with each party’s own data, has become more and more popular with the increase of the emphasis on privacy. Many architectures have been proposed (e.g.McMahan et al. (2016) and Konečnỳ et al. (2016)) for a tradeoff between effeciency and security. However some work have analyzed the leakage in federated learning (e.g. Hitaj et al. (2017)). Hence many defense work have been proposed (e.g. Bonawitz et al. (2017) and Geyer et al. (2017)). One of the defenses is to use additive homomorphic encryption to encrypt model parameters for secure aggregation during the update procedure (Aono et al. (2017)).

However this technique might not mitigate the leakage in federated learning. Melis et al. (2018) has shown an honest-but-curious participant could obtain the gradient computed by others through the difference of the global joint model and thus can infer unintended feature of the training data. Zhu et al. (2019)

‘steal’s the training data pixel-wise from gradients. But these methods can only work on complicated networks (e.g. Convolutional Neural Network) due to its huge parameters and do not apply to simple models like the logistic regression model.

For logistic regression model, the loss function cannot be directly encrypted by additive homomorphic encryption due to the imcompatiablity of sigmoid function and additive homomorphic encryption. Thus many approximation work (e.g.

Aono et al. (2016)) are proposed, of which the goals are to increase accuracy as well as approxiamate the sigmoid function for additive homomorphic encryption.

In this paper, we discuss the leakage of the federated approximated logistic regression model in the case where all elements of the input are binary, which is a widely used encoding method in gene data analysis (e.g. Uhlerop et al. (2013)) and risk analysis. The loss function of the model is approximated as the way raised by Aono et al. (2016). We will show the training data can completely be inferred by an honest-but-curious participant.

2 Related Work

Aono et al. (2017) has shown part of the training data is leaked in collaborative learning in the situation that one batch only contains a single data. Melis et al. (2018) proved that the features of the training data which are unrelated with the model target leak through the update procedure. Zhu et al. (2019) use an optimization method to infer the whole training data based on the gradient leaked from the update procedure. However this optimization method does not apply to the approximated logistic regression model since gradients computed based on different batches could be same. In addition, none of these works consider the case where the gradient is computed based on multiple batches.

3 Leakage Analysis

3.1 Preliminary

Consider the two-party case of federated learning, where two parties (Alice and Bob) share the same logistic regression model. The parameters of the federated model is and the feature num of inputs is . Consider the case where all elements of the input data are either 0 or 1 which has practial application in domains like gene analysis (e.g. Uhlerop et al. (2013)) and risk analysis.

The way of training is the same as the horizontal case in Yang et al. (2019): In each iteration, (1) Alice (Bob) computes local training gradients (

) by Stochastic Gradient Descent (SGD) and sends

() to the server. (2) The server uses secure aggregation raised by Aono et al. (2017) to aggregate (3) The server sends the aggregated gradient back to Alice and Bob. (4) Alice (Bob) updates his local model by .

The loss function of this model is the same as the approximated way of Aono et al. (2016) for the secure aggregation in the third step of training. The gradient of the loss function in the matrix form is .

Notice that . Suppose Alice is an honest-but-curious participant and thus is leaked to Alice. We assume Bob uses the same data for all iterations and Alice can obtain as many pairs (, ) as possible. There are two common method to compute the training gradients in the first step of training: synchronized and asynchronized. In the following sections we will ommit the superscript and the subscript and discuss what Alice can infer from the obtained set , seperately with both methods.

3.2 Synchronized

We first consider the synchronized case, where Bob calculates gradients based on his own data in current batch. Without loss of generality we assume Bob only has one batch of data . Let denote and denote and thus . Notice that (the subscript denote the th row of a matrix) is a linear equation. Hence and can be solved out if . Therefore and are leaked. If the batch contains samples, Alice can obtain:


where is Bob’s private data. Futhermore, Alice

can change the knotty quadratic equation set to a linear programming and solve out


According to Equation.(1), . Since , . Thus we have . For , let denote , we have . Given that if and only if , , otherwise , we can rewrite this relation to a if-then constraint: if then and if then .

Consider the first constraint: if then . The inequality is equivalent to the constriant (When , the inequality is the original constraint. When , the inequality always holds). Hence . Similarly we can get the equivalent linear constraint form of the second constraint: .

Hence Alice can change Equation.(1) to the following linear constraints:

This linear programming problem can be easily solved by revised simplex method or inner point method as shown in Table.1. Futhermore, even if the data has non-binary features, this method can separate them from binary features and reveal the latter.

3.3 Asynchronized

In this section we will discuss the leakage in asychronized case, where Bob uses serveral batches to calculate the gradients. For each batch , Bob calculates its gradients based on the current local model and then updates the local model by . After all batches have been used Bob pushes the difference between the current local model and the original global model to the server. We first give the mathematical form of .

Theorem 1.

Suppose Bob uses batches in asychronized cases. Let denote the th batch, denote the corresponding labels, denote and denote . Therefore 111 denotes the continuous multiplication of matrices, i.e. ( is the learning rate).


Since for eack batch , its gradient is computed as , the theorem is true when . Now we assume that the theorem is true for any and any batches of data and consider the case .

Notice that . Let denote and thus .

Since , we can obtain that

Hence the theorem is true for all . ∎

By Theorem.1, and is leaked, as the same way of the leakage of in Section.3.2.

Theorem 2.

The solution of is infinite under the constraint that , is symmetrical222The constraint is because .


When treating the equation as an equation set, we have variables to determine due to its symmetry. Therefore we have variables with equations. When , . Thus the original equation has myriad solutions. ∎

According to Theorem.2, leads to myriad possibilities of the batches matching the obtained . Notice the equation set contains variables with equations even when is known, which means cannot help to reduce the possibilities. In this circumstance, it is unclear what further information beyond and about the target’s data could be inferred. It is interesting for future work to formally justify the leakage when infinite solutions are found, e.g., how much additional information is sufficient to reduce the solution space to a few plausible ones or even a single one.

3.4 Multi-party

In the synchronized case where there are () parties in the federated learning, the global model is updated as . Thus (the superscript is used to denote the th party).

Notice is one solution of the equation . Therefore increasing the number of the parties in synchronized case is equivalent to increasing the batch size. An honest-but-curious party can infer all other participants’ data only not knowing their belongings.

However for the asychronized case, has no simple solution for . Hence the increase of party num in asychronized case is unequal to the increase of batch size. We do not further analyze how to simplify the multi-party case to the two-party case since Alice can only infer some constraints in the two-party case for now.

4 Defense

4.1 Batch size

time (s) 5 10 15 20
3 0.805 0.795 0.866 0.928
5 0.812 0.87 1.032 1.517
8 0.83 7.231 3.751 4.43
9 * 17.659 74.469 89.852
11 * 39.295 665.628 1634.821
Table 1: represents the batch size, represents the feature num and * represents there are multiple batches that match the same gradient. The linear programming is solved by pulp library written in python with an intel i5-8500 CPU (3.00GHz) and the batch is sorted in alphabetical order to eliminate the impact of the sequence of the data in a batch.

In Section 3.2 we use a linear programming method to obtain the whole batch data from the gradient computed, whereas this linear programming method has constraints. As shown in Table 1, the more constraints the linear programming have, the longer time it takes to solve. Table.1 also shows with the increase of come multiple batches corresponding the same gradient even under the constriant that all elements are binary.

4.2 Batch gradient

As discussed in Section 3.3, Alice cannot solve out Bob’s data just based on the gradients. Thus avoiding leaking the batch gradient would be a effective method. The simplest way is to avoid using synchronized method to compute the training gradients. Another method is to obscure the orginal gradient. For instance, Bob may shuffle the sequence of his data to obfuscate Alice or pushes the gradient selectively as proposed by Shokri and Shmatikov (2015).

5 Conclusion

In this paper, we discuss the leakage in federated learning of approximated logistic regression model. We first showed how an honest-but-curious participant can easily infer the whole training data of other participants in synchronized case. We then quantified of the leakage in asychronized case and illustrated an honest-but-curious can infer nothing other than some certain constraints of the other participants’ training data. We also analyzed how the hyperparameters of the learning, batch size and participant, affect the leakage and further proposed several plausible defenses.


  • Y. Aono, T. Hayashi, L. Trieu Phong, and L. Wang (2016) Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 142–144. Cited by: §1, §1, §3.1.
  • Y. Aono, T. Hayashi, L. Wang, S. Moriai, et al. (2017)

    Privacy-preserving deep learning: revisited and enhanced

    In International Conference on Applications and Techniques in Information Security, pp. 100–110. Cited by: §1, §2, §3.1.
  • K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth (2017)

    Practical secure aggregation for privacy-preserving machine learning

    In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §1.
  • R. C. Geyer, T. Klein, and M. Nabi (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §1.
  • B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §1.
  • J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1.
  • H. B. McMahan, E. Moore, D. Ramage, and B. A. y Arcas (2016) Federated learning of deep networks using model averaging. CoRR abs/1602.05629. External Links: Link, 1602.05629 Cited by: §1.
  • L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov (2018) Exploiting unintended feature leakage in collaborative learning. arXiv preprint arXiv:1805.04049. Cited by: §1, §2.
  • R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §4.2.
  • C. Uhlerop, A. Slavković, and S. E. Fienberg (2013) Privacy-preserving data sharing for genome-wide association studies. The Journal of privacy and confidentiality 5 (1), pp. 137. Cited by: §1, §3.1.
  • Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 12. Cited by: §3.1.
  • L. Zhu, Z. Liu, and S. Han (2019) Deep leakage from gradients. CoRR abs/1906.08935. External Links: Link, 1906.08935 Cited by: §1, §2.