## 1 Introduction

With the surge of artificial intelligence (AI) driven services including recommender system and natural language processing, data privacy and security have raised worldwide concerns

yang2019federated . More and more stringent requirements of data privacy and security become an emerging trend of laws and regulations from states across the world. A known example is the General Data Protection Regulation (GDPR) by the European Union albrecht2016gdpr . Traditional AI service providers usually collect and transfer data instances from one party to anther party. Then a machine learning model is trained at the cloud data center with the fused data set. However, it faces challenges of data breach and violation of data protection laws and regulations wiki:facebookdatabreach .Recently, federated learning yang2019federated ; konevcny2016federated ; mcmahan2017communication is an emerging frontier field studying privacy-preserving collaborative machine learning while leaving data instances at their providers locally. A line of works konevcny2016federated ; mcmahan2017communication ; yang2018federated focus on the horizontal structure, in which each node has a subset of data instances with complete data attributes. There are also many researches studying the vertical federated learning structure where the data set is vertically partitioned and owned by different data providers. That is, each data provider holds a disjoint subset of attributes for all data instances. The target is to learn a machine learning model collaboratively without transferring any data from one data provider to another. In particular, cheng2019secureboost proposes a privacy-preserving tree-boosting system SecureBoost and hardy2017private propose a logistic regression framework for vertically partitioned data.

Communication is one of the main bottlenecks in federated learning due to the much worse network conditions than the cloud computing center konevcny2016federated . To address the communication challenge in horizontal federated learning, structured updates are considered in konevcny2016federated to reduce the communication costs per round and an iterative model averaging algorithm is proposed in mcmahan2017communication to reduce the number of communication rounds. For vertical federated learning structure, hardy2017private considers a two party (denoted by party A and party B) logistic regression problem and proposes a stochastic gradient descent (SGD) method based privacy-preserving framework. Due to the slow convergence of first-order algorithms, it requires a large number of communication rounds. This work shall propose a quasi-Newton method based vertical federated learning system with sub-sampled Hessian information to reduce the communication round.

### Related Works

Second-order Newton’s method is known to converge faster than first-order gradient based methods. To avoid the high cost of computing the inversion of Hessian matrix, a well recognized quasi-Newton method Limited-memory BFGS (L-BFGS) nocedal2006numerical algorithm is proposed by directly approximating inverse Hessian matrix. There are a number of works schraudolph2007stochastic ; byrd2016stochastic ; moritz2016linearly

focus on developing stochastic quasi-Newton algorithms for problems with large amounts of data. However, the inverse Hessian estimated by

schraudolph2007stochastic may be not stable for small batch sizes and the algorithm in moritz2016linearlyrequires computing the full gradient which would double the communication cost in each epoch compared with SGD. This paper develops a communication efficient vertical federated learning framework based on the stochastic quasi-Newton method proposed in

byrd2016stochastic .## 2 Problem Statement

Consider a typical logistic regression problem with vertically partitioned data hardy2017private . Let be the data set consisting of data samples and each instance has features. The class attribute information, i.e., the label of data, is given by . The data set is vertically partitioned and distributed on two honest-but-curious private parties A (the host data provider with only features) and B (the guest data provider with features and labels). Let be the data set owned by party A and owned by party B. Each party owns a disjoint subset of data features over a common sample IDs with . In addition, only party B has access to the labels . The target of logistic regression is to train a linear model for classification by solving

(1) |

where is the model parameters, is the -th data instance and

is the corresponding label. The negative log-likelihood loss function is given by

. In this paper, we suppose that party and party hold the model parameters corresponding to their features respectively, which can be denoted as where and .hardy2017private proposes a stochastic gradient descent (SGD) based vertical logistic regression framework by computing gradients via exchanging encrypted intermediate values at each iteration. Specifically, party and party collaboratively compute the vertically partitioned encrypted gradient and , which can be decrypted by the third party. To achieve secure computation without transferring data from one party to another, the additively homomorphic encryption is adopted. Additively homomorphic encryption schemes such as Paillier paillier1999public allow any party can encrypt their data with a public key, while the private key for decryption is owned by the third party, i.e., the coordinator. With additively homomorphic encryption we can compute the additive of two encrypted numbers as well as the product of an unencrypted number and an encrypted one, which can be denoted as by using as the encryption operation. Unfortunately, the loss function and its gradient cannot be computed directly with additively homomorphic encryption. To address this issue, we will adopt the Taylor approximation for the loss function is proposed in hardy2017private ; aono2016scalable as

(2) |

## 3 A Quasi-Newton Method Based Vertical Federated Learning Framework

In federated learning, the communication cost between different parties is much more expensive than it in the cloud computing center since the data providers are usually across distant data centers, across different networks, or even in a wireless environment with limited bandwidth mcmahan2017communication . So it becomes one of the main bottlenecks for efficiently model training. For this reason, we develop a communication efficient vertical federated learning framework by incorporating second-order information byrd2016stochastic to reduce the communication rounds between parties, which is illustrated in Fig. 1.

The gradient and the Hessian of the Taylor loss in equation (2) with respect to the -th data instance are respectively given by . In the -th iteration, classical L-BFGS algorithm uses the history information in last iterations by differencing gradient and model parameters between every two consecutive iterations to obtain an estimated inverse Hessian matrix . But it will lead to a unstable curvature estimation if we use mini-batch data instead of full data. To obtain a stable estimation of , we shall use the sub-sampled Hessian information as suggested by byrd2016stochastic . Moreover, the curvature information can be updated every iterations to reduce the communication overhead as well as improve the stability of quasi-Newton algorithm. The details of computing the key ingredients for our system are introduced in the following part.

### Computing Loss and Gradient at Party A&B

Let be the index set of the chosen mini-batch data instances. The corresponding loss and gradient are given by . By denoting for party A (similarly and for party B) and , the encrypted loss and gradient can be computed by transmitting from party A to party B, and transmitting from B to A following

(3) | |||

(4) |

### Computing Updates for Estimating Curvature Information at Party A&B

To achieve cheap communication costs introduced additionally, the curvature information is updated every iterations at the coordinator by collecting encrypted from party A and B. Specifically, every iterations we shall compute the difference of average model parameters as

(5) |

at party and party . Then the product of sub-sampled Hessian and average model difference are given by

(6) |

The sub-sampled Hessian is calculated with respect to a randomly chosen subset of data . Under additively homomorphic encryption, can be computed following

(7) |

where . By transmitting from party A to party B, and transmitting from B to A, the corresponding components can be computed at party A and is computed at party B privately.

### Computing Descent Direction at the Coordinator

After collecting the encrypted loss , gradient , and from party A&B, the coordinator should determine a descent direction for updating and , i.e., . Given an estimated , the descent direction is given by where is the learning rate. Every iterations, and are stored in two queues with length . is determined by successively computing

(8) |

from the initial point . It should be noted that can be computed locally at the coordinator as without any additional transmissions. The overall quasi-Newton method based vertical federated learning framework is illustrated in Fig. 1. The source code will be released in an upcoming version of the FATE framework webankfate .

At each iteration, the communication costs of SGD are encrypted numbers between party A and party B, and encrypted numbers between party A&B and the coordinator. With our quasi-Newton framework, the communication costs become encrypted numbers between party A and party B, and encrypted numbers between party A&B and the coordinator. By choosing , the presented quasi-Newton method introduces no more than additional communication costs at per communication round compared with hardy2017private .

## 4 Experiments and Conclusion

We conduct numerical experiments on two credit scoring data sets to test the advantages of our system over the mini-batch SGD method based system in hardy2017private . Credit 1 dataset:defaultcredit : It consists of data instances and each instance has attributes; 2) Credit 2 dataset:givecredit : It contains data instances and each with attributes. By splitting each data set into two parts vertically, each party holds a subset of features and party also holds the labels. We randomly choose data instances as the training set and the remaining as the test set. We choose and in all simulations and each algorithm stops when the loss between two consecutive epochs is less than . The number of epochs, the training loss and the area under the curve (AUC) of the receiver operating characteristics (ROC) curve on the test set are shown in table 1. Numerical results demonstrate that the proposed system requires less communication overhead than the first-order SGD based framework.

Batch Size | Method | Credit 1 | Credit 2 | ||||
---|---|---|---|---|---|---|---|

Epochs | Loss | AUC | Epochs | Loss | AUC | ||

1000 | SGD | 12 | 0.496218 | 0.7224 | 12 | 0.314555 | 0.7033 |

Proposed | 3 | 0.496600 | 0.7222 | 4 | 0.314643 | 0.7061 | |

3000 | SGD | 18 | 0.496194 | 0.7219 | 14 | 0.314648 | 0.6982 |

Proposed | 12 | 0.496317 | 0.7225 | 6 | 0.314490 | 0.7077 |

In this paper, we consider the communication challenges in vertical federated learning problem with two data providers for learning a logistic regression model collaboratively. We propose to use a quasi-Newton method to reduce the number of communication rounds. With the additively homomorphic encryption scheme, two data providers compute an encrypted gradient by exchanging encrypted intermediate values, and an additional vector every

iterations for updating the curvature information. Numerical experiment demonstrate that our method considerably reduces the number of communication rounds with a little additional communication cost per round.## References

- [1] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):12, 2019.
- [2] Jan Philipp Albrecht. How the GDPR will change the world. Eur. Data Prot. L. Rev., 2:287, 2016.
- [3] Wiki. Data breach — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Data_breach&oldid=912247856, 2019.
- [4] Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
- [5] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282, 2017.
- [6] Kai Yang, Tao Jiang, Yuanming Shi, and Zhi Ding. Federated learning via over-the-air computation. arXiv preprint arXiv:1812.11750, 2018.
- [7] Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, and Qiang Yang. Secureboost: A lossless federated learning framework. arXiv preprint arXiv:1901.08755, 2019.
- [8] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677, 2017.
- [9] Jorge Nocedal and Stephen Wright. Numerical optimization. Springer Science & Business Media, 2006.
- [10] Nicol N Schraudolph, Jin Yu, and Simon Günter. A stochastic quasi-newton method for online convex optimization. In Artificial intelligence and statistics, pages 436–443, 2007.
- [11] Richard H Byrd, Samantha L Hansen, Jorge Nocedal, and Yoram Singer. A stochastic quasi-newton method for large-scale optimization. SIAM Journal on Optimization, 26(2):1008–1031, 2016.
- [12] Philipp Moritz, Robert Nishihara, and Michael Jordan. A linearly-convergent stochastic L-BFGS algorithm. In Artificial Intelligence and Statistics, pages 249–258, 2016.
- [13] Pascal Paillier. Public-key cryptosystems based on composite degree residuosity classes. In International Conference on the Theory and Applications of Cryptographic Techniques, pages 223–238. Springer, 1999.
- [14] Yoshinori Aono, Takuya Hayashi, Le Trieu Phong, and Lihua Wang. Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 142–144. ACM, 2016.
- [15] WeBank. FATE: An industrial grade federated learning framework. https://fate.fedai.org, 2018.
- [16] UCI Machine Learning Repository. default of credit card clients data set. https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients, 2017.
- [17] Give me some credit. Give me some credit. https://www.kaggle.com/c/GiveMeSomeCredit/data, 2011.

## Appendix A

We provide details of the proposed vertical federated learning framework in Algorithm 1.

Comments

There are no comments yet.