A Quasi-Newton Method Based Vertical Federated Learning Framework for Logistic Regression

12/01/2019 ∙ by Kai Yang, et al. ∙ 13

Data privacy and security becomes a major concern in building machine learning models from different data providers. Federated learning shows promise by leaving data at providers locally and exchanging encrypted information. This paper studies the vertical federated learning structure for logistic regression where the data sets at two parties have the same sample IDs but own disjoint subsets of features. Existing frameworks adopt the first-order stochastic gradient descent algorithm, which requires large number of communication rounds. To address the communication challenge, we propose a quasi-Newton method based vertical federated learning framework for logistic regression under the additively homomorphic encryption scheme. Our approach can considerably reduce the number of communication rounds with a little additional communication cost per round. Numerical results demonstrate the advantages of our approach over the first-order method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

With the surge of artificial intelligence (AI) driven services including recommender system and natural language processing, data privacy and security have raised worldwide concerns

yang2019federated . More and more stringent requirements of data privacy and security become an emerging trend of laws and regulations from states across the world. A known example is the General Data Protection Regulation (GDPR) by the European Union albrecht2016gdpr . Traditional AI service providers usually collect and transfer data instances from one party to anther party. Then a machine learning model is trained at the cloud data center with the fused data set. However, it faces challenges of data breach and violation of data protection laws and regulations wiki:facebookdatabreach .

Recently, federated learning yang2019federated ; konevcny2016federated ; mcmahan2017communication is an emerging frontier field studying privacy-preserving collaborative machine learning while leaving data instances at their providers locally. A line of works konevcny2016federated ; mcmahan2017communication ; yang2018federated focus on the horizontal structure, in which each node has a subset of data instances with complete data attributes. There are also many researches studying the vertical federated learning structure where the data set is vertically partitioned and owned by different data providers. That is, each data provider holds a disjoint subset of attributes for all data instances. The target is to learn a machine learning model collaboratively without transferring any data from one data provider to another. In particular, cheng2019secureboost proposes a privacy-preserving tree-boosting system SecureBoost and hardy2017private propose a logistic regression framework for vertically partitioned data.

Communication is one of the main bottlenecks in federated learning due to the much worse network conditions than the cloud computing center konevcny2016federated . To address the communication challenge in horizontal federated learning, structured updates are considered in konevcny2016federated to reduce the communication costs per round and an iterative model averaging algorithm is proposed in mcmahan2017communication to reduce the number of communication rounds. For vertical federated learning structure, hardy2017private considers a two party (denoted by party A and party B) logistic regression problem and proposes a stochastic gradient descent (SGD) method based privacy-preserving framework. Due to the slow convergence of first-order algorithms, it requires a large number of communication rounds. This work shall propose a quasi-Newton method based vertical federated learning system with sub-sampled Hessian information to reduce the communication round.

Related Works

Second-order Newton’s method is known to converge faster than first-order gradient based methods. To avoid the high cost of computing the inversion of Hessian matrix, a well recognized quasi-Newton method Limited-memory BFGS (L-BFGS) nocedal2006numerical algorithm is proposed by directly approximating inverse Hessian matrix. There are a number of works schraudolph2007stochastic ; byrd2016stochastic ; moritz2016linearly

focus on developing stochastic quasi-Newton algorithms for problems with large amounts of data. However, the inverse Hessian estimated by

schraudolph2007stochastic may be not stable for small batch sizes and the algorithm in moritz2016linearly

requires computing the full gradient which would double the communication cost in each epoch compared with SGD. This paper develops a communication efficient vertical federated learning framework based on the stochastic quasi-Newton method proposed in

byrd2016stochastic .

2 Problem Statement

Consider a typical logistic regression problem with vertically partitioned data hardy2017private . Let be the data set consisting of data samples and each instance has features. The class attribute information, i.e., the label of data, is given by . The data set is vertically partitioned and distributed on two honest-but-curious private parties A (the host data provider with only features) and B (the guest data provider with features and labels). Let be the data set owned by party A and owned by party B. Each party owns a disjoint subset of data features over a common sample IDs with . In addition, only party B has access to the labels . The target of logistic regression is to train a linear model for classification by solving

(1)

where is the model parameters, is the -th data instance and

is the corresponding label. The negative log-likelihood loss function is given by

. In this paper, we suppose that party and party hold the model parameters corresponding to their features respectively, which can be denoted as where and .

hardy2017private proposes a stochastic gradient descent (SGD) based vertical logistic regression framework by computing gradients via exchanging encrypted intermediate values at each iteration. Specifically, party and party collaboratively compute the vertically partitioned encrypted gradient and , which can be decrypted by the third party. To achieve secure computation without transferring data from one party to another, the additively homomorphic encryption is adopted. Additively homomorphic encryption schemes such as Paillier paillier1999public allow any party can encrypt their data with a public key, while the private key for decryption is owned by the third party, i.e., the coordinator. With additively homomorphic encryption we can compute the additive of two encrypted numbers as well as the product of an unencrypted number and an encrypted one, which can be denoted as by using as the encryption operation. Unfortunately, the loss function and its gradient cannot be computed directly with additively homomorphic encryption. To address this issue, we will adopt the Taylor approximation for the loss function is proposed in hardy2017private ; aono2016scalable as

(2)

3 A Quasi-Newton Method Based Vertical Federated Learning Framework

In federated learning, the communication cost between different parties is much more expensive than it in the cloud computing center since the data providers are usually across distant data centers, across different networks, or even in a wireless environment with limited bandwidth mcmahan2017communication . So it becomes one of the main bottlenecks for efficiently model training. For this reason, we develop a communication efficient vertical federated learning framework by incorporating second-order information byrd2016stochastic to reduce the communication rounds between parties, which is illustrated in Fig. 1.

The gradient and the Hessian of the Taylor loss in equation (2) with respect to the -th data instance are respectively given by . In the -th iteration, classical L-BFGS algorithm uses the history information in last iterations by differencing gradient and model parameters between every two consecutive iterations to obtain an estimated inverse Hessian matrix . But it will lead to a unstable curvature estimation if we use mini-batch data instead of full data. To obtain a stable estimation of , we shall use the sub-sampled Hessian information as suggested by byrd2016stochastic . Moreover, the curvature information can be updated every iterations to reduce the communication overhead as well as improve the stability of quasi-Newton algorithm. The details of computing the key ingredients for our system are introduced in the following part.

Computing Loss and Gradient at Party A&B

Let be the index set of the chosen mini-batch data instances. The corresponding loss and gradient are given by . By denoting for party A (similarly and for party B) and , the encrypted loss and gradient can be computed by transmitting from party A to party B, and transmitting from B to A following

(3)
(4)

Computing Updates for Estimating Curvature Information at Party A&B

To achieve cheap communication costs introduced additionally, the curvature information is updated every iterations at the coordinator by collecting encrypted from party A and B. Specifically, every iterations we shall compute the difference of average model parameters as

(5)

at party and party . Then the product of sub-sampled Hessian and average model difference are given by

(6)

The sub-sampled Hessian is calculated with respect to a randomly chosen subset of data . Under additively homomorphic encryption, can be computed following

(7)

where . By transmitting from party A to party B, and transmitting from B to A, the corresponding components can be computed at party A and is computed at party B privately.

Computing Descent Direction at the Coordinator

After collecting the encrypted loss , gradient , and from party A&B, the coordinator should determine a descent direction for updating and , i.e., . Given an estimated , the descent direction is given by where is the learning rate. Every iterations, and are stored in two queues with length . is determined by successively computing

(8)

from the initial point . It should be noted that can be computed locally at the coordinator as without any additional transmissions. The overall quasi-Newton method based vertical federated learning framework is illustrated in Fig. 1. The source code will be released in an upcoming version of the FATE framework webankfate .

At each iteration, the communication costs of SGD are encrypted numbers between party A and party B, and encrypted numbers between party A&B and the coordinator. With our quasi-Newton framework, the communication costs become encrypted numbers between party A and party B, and encrypted numbers between party A&B and the coordinator. By choosing , the presented quasi-Newton method introduces no more than additional communication costs at per communication round compared with hardy2017private .

Figure 1: A Quasi-Newton Framework for Vertical Federated Learning

4 Experiments and Conclusion

We conduct numerical experiments on two credit scoring data sets to test the advantages of our system over the mini-batch SGD method based system in hardy2017private . Credit 1 dataset:defaultcredit : It consists of data instances and each instance has attributes; 2) Credit 2 dataset:givecredit : It contains data instances and each with attributes. By splitting each data set into two parts vertically, each party holds a subset of features and party also holds the labels. We randomly choose data instances as the training set and the remaining as the test set. We choose and in all simulations and each algorithm stops when the loss between two consecutive epochs is less than . The number of epochs, the training loss and the area under the curve (AUC) of the receiver operating characteristics (ROC) curve on the test set are shown in table 1. Numerical results demonstrate that the proposed system requires less communication overhead than the first-order SGD based framework.

Batch Size Method Credit 1 Credit 2
Epochs Loss AUC Epochs Loss AUC
1000 SGD 12 0.496218 0.7224 12 0.314555 0.7033
Proposed 3 0.496600 0.7222 4 0.314643 0.7061
3000 SGD 18 0.496194 0.7219 14 0.314648 0.6982
Proposed 12 0.496317 0.7225 6 0.314490 0.7077
Table 1: Numerical Results on Two Public Data Sets

In this paper, we consider the communication challenges in vertical federated learning problem with two data providers for learning a logistic regression model collaboratively. We propose to use a quasi-Newton method to reduce the number of communication rounds. With the additively homomorphic encryption scheme, two data providers compute an encrypted gradient by exchanging encrypted intermediate values, and an additional vector every

iterations for updating the curvature information. Numerical experiment demonstrate that our method considerably reduces the number of communication rounds with a little additional communication cost per round.

References

Appendix A

We provide details of the proposed vertical federated learning framework in Algorithm 1.

Input : 
Output : 
1Set
2 for each round  do
3       Choose a minibatch
4       if  then
5             Party A&B: compute as equation (3) (4)
6             Coordinator: where
7       else
8            
9             Party A&B: Choose a minibatch
10             compute as equation (3) (4) (6)
11             Coordinator: where
12            
13             if  then
14                  
15                   for  do
16                        
17                        
18                   end for
19                  
20             end if
21            
22       end if
23      
24 end for
Algorithm 1 A Quasi-Newton Framework for Vertical Federated Learning