Federated learning, where multiple parties construct a joint model with each party’s own data, has become more and more popular with the increase of the emphasis on privacy. Many architectures have been proposed (e.g.McMahan et al. (2016) and Konečnỳ et al. (2016)) for a tradeoff between effeciency and security. However some work have analyzed the leakage in federated learning (e.g. Hitaj et al. (2017)). Hence many defense work have been proposed (e.g. Bonawitz et al. (2017) and Geyer et al. (2017)). One of the defenses is to use additive homomorphic encryption to encrypt model parameters for secure aggregation during the update procedure (Aono et al. (2017)).
However this technique might not mitigate the leakage in federated learning. Melis et al. (2018) has shown an honest-but-curious participant could obtain the gradient computed by others through the difference of the global joint model and thus can infer unintended feature of the training data. Zhu et al. (2019)
‘steal’s the training data pixel-wise from gradients. But these methods can only work on complicated networks (e.g. Convolutional Neural Network) due to its huge parameters and do not apply to simple models like the logistic regression model.
For logistic regression model, the loss function cannot be directly encrypted by additive homomorphic encryption due to the imcompatiablity of sigmoid function and additive homomorphic encryption. Thus many approximation work (e.g.Aono et al. (2016)) are proposed, of which the goals are to increase accuracy as well as approxiamate the sigmoid function for additive homomorphic encryption.
In this paper, we discuss the leakage of the federated approximated logistic regression model in the case where all elements of the input are binary, which is a widely used encoding method in gene data analysis (e.g. Uhlerop et al. (2013)) and risk analysis. The loss function of the model is approximated as the way raised by Aono et al. (2016). We will show the training data can completely be inferred by an honest-but-curious participant.
2 Related Work
Aono et al. (2017) has shown part of the training data is leaked in collaborative learning in the situation that one batch only contains a single data. Melis et al. (2018) proved that the features of the training data which are unrelated with the model target leak through the update procedure. Zhu et al. (2019) use an optimization method to infer the whole training data based on the gradient leaked from the update procedure. However this optimization method does not apply to the approximated logistic regression model since gradients computed based on different batches could be same. In addition, none of these works consider the case where the gradient is computed based on multiple batches.
3 Leakage Analysis
Consider the two-party case of federated learning, where two parties (Alice and Bob) share the same logistic regression model. The parameters of the federated model is and the feature num of inputs is . Consider the case where all elements of the input data are either 0 or 1 which has practial application in domains like gene analysis (e.g. Uhlerop et al. (2013)) and risk analysis.
The way of training is the same as the horizontal case in Yang et al. (2019): In each iteration, (1) Alice (Bob) computes local training gradients (
) by Stochastic Gradient Descent (SGD) and sends() to the server. (2) The server uses secure aggregation raised by Aono et al. (2017) to aggregate (3) The server sends the aggregated gradient back to Alice and Bob. (4) Alice (Bob) updates his local model by .
The loss function of this model is the same as the approximated way of Aono et al. (2016) for the secure aggregation in the third step of training. The gradient of the loss function in the matrix form is .
Notice that . Suppose Alice is an honest-but-curious participant and thus is leaked to Alice. We assume Bob uses the same data for all iterations and Alice can obtain as many pairs (, ) as possible. There are two common method to compute the training gradients in the first step of training: synchronized and asynchronized. In the following sections we will ommit the superscript and the subscript and discuss what Alice can infer from the obtained set , seperately with both methods.
We first consider the synchronized case, where Bob calculates gradients based on his own data in current batch. Without loss of generality we assume Bob only has one batch of data . Let denote and denote and thus . Notice that (the subscript denote the th row of a matrix) is a linear equation. Hence and can be solved out if . Therefore and are leaked. If the batch contains samples, Alice can obtain:
where is Bob’s private data. Futhermore, Alice
can change the knotty quadratic equation set to a linear programming and solve out:
According to Equation.(1), . Since , . Thus we have . For , let denote , we have . Given that if and only if , , otherwise , we can rewrite this relation to a if-then constraint: if then and if then .
Consider the first constraint: if then . The inequality is equivalent to the constriant (When , the inequality is the original constraint. When , the inequality always holds). Hence . Similarly we can get the equivalent linear constraint form of the second constraint: .
Hence Alice can change Equation.(1) to the following linear constraints:
This linear programming problem can be easily solved by revised simplex method or inner point method as shown in Table.1. Futhermore, even if the data has non-binary features, this method can separate them from binary features and reveal the latter.
In this section we will discuss the leakage in asychronized case, where Bob uses serveral batches to calculate the gradients. For each batch , Bob calculates its gradients based on the current local model and then updates the local model by . After all batches have been used Bob pushes the difference between the current local model and the original global model to the server. We first give the mathematical form of .
Suppose Bob uses batches in asychronized cases. Let denote the th batch, denote the corresponding labels, denote and denote . Therefore 111 denotes the continuous multiplication of matrices, i.e. ( is the learning rate).
Since for eack batch , its gradient is computed as , the theorem is true when . Now we assume that the theorem is true for any and any batches of data and consider the case .
Notice that . Let denote and thus .
Since , we can obtain that
Hence the theorem is true for all . ∎
The solution of is infinite under the constraint that , is symmetrical222The constraint is because .
When treating the equation as an equation set, we have variables to determine due to its symmetry. Therefore we have variables with equations. When , . Thus the original equation has myriad solutions. ∎
According to Theorem.2, leads to myriad possibilities of the batches matching the obtained . Notice the equation set contains variables with equations even when is known, which means cannot help to reduce the possibilities. In this circumstance, it is unclear what further information beyond and about the target’s data could be inferred. It is interesting for future work to formally justify the leakage when infinite solutions are found, e.g., how much additional information is sufficient to reduce the solution space to a few plausible ones or even a single one.
In the synchronized case where there are () parties in the federated learning, the global model is updated as . Thus (the superscript is used to denote the th party).
Notice is one solution of the equation . Therefore increasing the number of the parties in synchronized case is equivalent to increasing the batch size. An honest-but-curious party can infer all other participants’ data only not knowing their belongings.
However for the asychronized case, has no simple solution for . Hence the increase of party num in asychronized case is unequal to the increase of batch size. We do not further analyze how to simplify the multi-party case to the two-party case since Alice can only infer some constraints in the two-party case for now.
4.1 Batch size
In Section 3.2 we use a linear programming method to obtain the whole batch data from the gradient computed, whereas this linear programming method has constraints. As shown in Table 1, the more constraints the linear programming have, the longer time it takes to solve. Table.1 also shows with the increase of come multiple batches corresponding the same gradient even under the constriant that all elements are binary.
4.2 Batch gradient
As discussed in Section 3.3, Alice cannot solve out Bob’s data just based on the gradients. Thus avoiding leaking the batch gradient would be a effective method. The simplest way is to avoid using synchronized method to compute the training gradients. Another method is to obscure the orginal gradient. For instance, Bob may shuffle the sequence of his data to obfuscate Alice or pushes the gradient selectively as proposed by Shokri and Shmatikov (2015).
In this paper, we discuss the leakage in federated learning of approximated logistic regression model. We first showed how an honest-but-curious participant can easily infer the whole training data of other participants in synchronized case. We then quantified of the leakage in asychronized case and illustrated an honest-but-curious can infer nothing other than some certain constraints of the other participants’ training data. We also analyzed how the hyperparameters of the learning, batch size and participant, affect the leakage and further proposed several plausible defenses.
- Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pp. 142–144. Cited by: §1, §1, §3.1.
Privacy-preserving deep learning: revisited and enhanced. In International Conference on Applications and Techniques in Information Security, pp. 100–110. Cited by: §1, §2, §3.1.
Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §1.
- Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §1.
- Deep models under the gan: information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 603–618. Cited by: §1.
- Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1.
- Federated learning of deep networks using model averaging. CoRR abs/1602.05629. External Links: Cited by: §1.
- Exploiting unintended feature leakage in collaborative learning. arXiv preprint arXiv:1805.04049. Cited by: §1, §2.
- Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §4.2.
- Privacy-preserving data sharing for genome-wide association studies. The Journal of privacy and confidentiality 5 (1), pp. 137. Cited by: §1, §3.1.
- Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 12. Cited by: §3.1.
- Deep leakage from gradients. CoRR abs/1906.08935. External Links: Cited by: §1, §2.