Secure Federated Transfer Learning

12/08/2018 ∙ by Yang Liu, et al. ∙ The Hong Kong University of Science and Technology 0

Machine learning relies on the availability of a vast amount of data for training. However, in reality, most data are scattered across different organizations and cannot be easily integrated under many legal and practical constraints. In this paper, we introduce a new technique and framework, known as federated transfer learning (FTL), to improve statistical models under a data federation. The federation allows knowledge to be shared without compromising user privacy, and enables complimentary knowledge to be transferred in the network. As a result, a target-domain party can build more flexible and powerful models by leveraging rich labels from a source-domain party. A secure transfer cross validation approach is also proposed to guard the FTL performance under the federation. The framework requires minimal modifications to the existing model structure and provides the same level of accuracy as the non-privacy-preserving approach. This framework is very flexible and can be effectively adapted to various secure multi-party machine learning tasks.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Recent Artificial Intelligence (AI) achievements have been depending on the availability of massive amount of labeled data. AlphaGo

[Silver et al.2016]

uses 30 millions of moves from 160,000 actual games.The ImageNet dataset

[Deng et al.2009] has over 14 million images. However, across various industries, more fields of application have only small or poor quality data. Labeling data is very expensive, especially in fields which require human expertise and domain knowledge. In addition, data needed for a specific task may not be kept in one place. Many organizations may only have unlabeled data, and some other organizations may have very limited amount of labels. It has been increasingly difficult for organizations to combine their data too. For example, General Data Protection Regulation (GDPR) [EU2016], a new bill introduced by the European Union, has enforced many terms that protect the privacy of the user security and prohibit organizations to exchange data directly. How to enable the large number of businesses and applications that have only small data (few samples and features) or weak supervision (few labels) to build effective and accurate AI models while meeting the data privacy, security and regulatory requirements is a major challenge. To overcome these challenges, Google first introduced a federated learning (FL) system [McMahan et al.2016] in which a global machine learning model is updated by a federation of distributed participants while keeping their data locally. Their framework require all contributors share the same feature space. On the other hand, secure machine learning with data partitioned in the feature space has been studied [Karr et al.2004, Sanil et al.2004, Gascón et al.2016, Du, Han, and Chen2004, Wan et al.2007, Hardy et al.2017, Nock et al.2018]. These existing approaches are only applicable to either common features or common samples under a federation. In reality, however, the set of common entities could be small, making a federation less attractive and leaving the majority non-overlapping data undermined. In this paper, we propose a possible solution to these challenges: Federated Transfer Learning (FTL), which leverages transfer learning technique [Pan et al.2010] to provide solutions for the entire sample and feature space under a federation. Our main contributions are the following:
- We introduce federated transfer learning in a privacy-preserving setting to provide solutions for federation problems beyond the scope of existing federated learning approaches;
- We provide an end-to-end solution to the proposed FTL problem and show that convergence and accuracy of the proposed approach is comparable to the non-privacy-preserving approach;

- We provide a novel approach for adopting additively homomorphic encryption (HE) to multi-party computation (MPC) with neural networks such that only minimal modifications to the neural network is required and the accuracy is almost lossless, whereas most of the existing secure deep learning frameworks suffer from loss of accuracy when adopting privacy-preserving techniques.

Related Work

Federated Learning and Secure Deep Learning

Recent several years have seen a surge of studies on encrypted machine learning. For example, Google introduced a secure aggregation scheme to protect the privacy of aggregated user updates under their federarted learning framework [Bonawitz et al.2017]. CryptoNets [Dowlin et al.2016] adapted neural network computations to work with data encrypted with Homomorphic Encryption [Rivest, Adleman, and Dertouzos1978]. CryptoDL [McMahan et al.2016]

approximates the activation functions in neural networks with low degree polynomials to achieve less precision loss in prediction. DeepSecure

[Rouhani, Riazi, and Koushanfar2017] uses Yao’s Garbled Circuit Protocol for data encryption instead of HE. All of these frameworks are designed for making encrypted predictions with a server-end model, therefore are applicable for inference only. SecureML [Mohassel and Zhang2017] is a multi-party computing scheme which uses secret-sharing [Rivest, Shamir, and Tauman1979]

and Yao’s Garbled Circuit for encryption and supports collaborative training for linear regression, logistic regression and neural networks, which is recently extended by

[Mohassel and Rindal2018] with three-party computation. Differential Privacy [Dwork2008] is another thread of works for privacy-preserving training. It has the weakness that raw data is possibly exposed and could not make inference on a single entity.

Transfer Learning

Transfer learning is a powerful technique to provide solutions for applications with small dataset or weak supervision. In recent years there have been tremendous amount of research work on adopting transfer learning techniques to various fields such as image classification tasks [Zhu et al.2010]

, and sentiment analysis

[Pan et al.2010, Li et al.2017]. The performance of transfer learning relies on how related the domains are. Intuitively parties in the same data federation are usually organizations from the same or related industry, therefore are more prone to knowledge propagation.

Problem Definition

Consider a source domain dataset where and is the th label, a target domain where . , are separately held in two private parties and can not be exposed to each other. We also assume that there exists a limited set of co-occurrence samples and a small set of labels for B in party A: , where is the number of available target labels. Without losing generality, we assume all labels are in party A, but all the deduction here can be adapted to the case where labels exist in party B. One can find the commonly shared sample ID set in a privacy-preserving setting by masking data IDs with encryption techniques such as RSA scheme. Here we assume that A and B already found or both know their commonly shared sample IDs. Given the above setting, the objective is for the two parities to build a transfer learning model to predict labels for the target-domain party as accurately as possible without exposing data to each other.

Security Definition

In our security definition, all parties are honest-but-curious. We assume a threat model with a semi-honest adversary who can corrupt at most one of the two data clients. The security definition is that, for a protocol performing , where and are party A and B’s output and and are their inputs, P is secure against A if there exists infinite number of pairs such that . Such a security definition has been adopted in [Du, Han, and Chen2004]. It provides a practical solution to control information disclosure as compared to complete zero knowledge security.

Proposed Approach

In this section, we will first introduce the transfer learning model, and then propose a federated framework. In recent years, deep neural networks have been widely adopted in transfer learning to find implicit transfer mechanism [Oquab et al.2014]

. Here we explore a general scenario where hidden representations of A and B, are produced from two neural networks

and , where and , is the dimension of the hidden representation layer. To label the target domain, a general approach is to introduce a prediction function . Without losing much generosity, we assume is linearly separable, that is . For example, [Shu et al.2015] used a translator function, , where and We can then write the training objective function using the available labeled set:


where , are training parameters of and , respectively. Let and be the number of layers for and , respectively, then , where and are the training parameters for the th layer.

denotes the loss function. For logistic loss,


In addition, we also wish to minimize the alignment loss between A and B.


where denotes the alignment loss. Typical alignment losses can be or . For simplicity, we assume it can be expressed in the form , where is a constant.

The final objective function is:


where and are the weight parameters, and , are the regularization terms.

Now we focus on obtaining the gradients for updating , in back propagation. For , we have


Under the assumption that A and B are not allowed to expose their raw data, a privacy-preserving approach needs to be developed here to compute (3) and (4).

Additively Homomorphic Encryption

Additively Homomorphic Encryption [Acar et al.2018] and polynomial approximations have been widely used for privacy-preserving machine learning, and the trade-offs between efficiency and privacy by adopting such approximations have been discussed intensively [Aono et al.2016, Kim et al.2018, Phong et al.2017]. Here we use a second order Taylor approximation for loss and gradients computations:


where ,


For logistic loss, , . Applying equation (5) and (6), and additively homomorphic encryption, denoted as , we can finally obtain:

Input: learning rate , weight parameter , , max iterations , tolerance
Output: Model parameters ,
A, B initializes , and creates an encryption key pair, respectively, and sends public key to each other.
; while  do
       for ;
       computes and encrypts and sends to B;
       for ;
       computes and encrypts , and send to A;
       creates random mask ;
       computes and by equation (7)  and  (9)  and  sends  to  B;
       create random mask ;
       computes by equation (8)  and  sends  to  A;
       decrypts , and sends to A;
       decrypts and sends to B;
       update ;
       update ;
       if  then
             send stop signal to B;
       end if
end while
Algorithm 1 Federated Transfer Learning: Training
Input: Model parameters , ,
encrypts and sends to A;
creates random mask ;
computes and sends to B;
decrypts and sends to A;
gets and , sends to B.
Algorithm 2 Federated Transfer Learning: Prediction

Federated Transfer Learning

With above equations (7),(8) and (9), we can now design a federated algorithm for solving the transfer learning problem. See Algorithm 1. Denote and be the homomorphic encryption with public key A and B, respectively. Specifically, party A and party B initialize and run their independent neural networks and locally to get hidden representations and , party A then computes and encrypts components and sends to B to assist calculations of gradients of . In the current scenario, and , , and . Similarly, B then computes and encrypts components and sends to A to assist calculations of gradients of and loss . In the current scenario, and , , , and . Recently, there are large amount of works discussing the potential risks associated with this indirect leakage such as gradients [Hitaj, Ateniese, and Pérez-Cruz2017, Bonawitz et al.2017, Shokri and Shmatikov2015, McSherry2016, Phong et al.2018]. To prevent from knowing A and B’s gradients, A and B further mask each gradient with an encrypted random value. A and B then send the encrypted masked gradients and loss to each other and get decrypted values. A can send termination signals to B once the loss convergence condition is met. Otherwise A and B will unmask the gradient, update the weight parameters with their gradients respectively and move to next iteration. Once the model is trained, we can provide predictions for unlabeled data in party B. Specifically for each unlabeled data, B computes with trained network parameters , sends encrypted to A, and then A evaluates and masks it with random values and sends encrypted and masked to B, B decrypts and sends back to A, A obtains and gets the label, and sends the label to B. Notice the only source of performance loss over the secure FTL process is second-order Taylor approximation of the final loss function, rather than at every non-linear activation layer of the neural network, such as in [Hesamifard, Takabi, and Ghasemi2017], and the computations inside the networks are unaffected. As demonstrated in the Experiments section, the errors in loss and gradient calculations, as well as the loss in accuracy by adopting our approach are minimal. Therefore the approach is scalable and flexible to the change of neural network structures.

Input: Model , number of folds
Output: model performance
split into shares, ;
for i=1,2…K do
       train with and by Algorithm 1;
       predict labels for B by Algorithm 2;
       combine predicted labels: ;
       train with by Algorithm 1;
       predict labels for party A by Algorithm 2;
       evaluate with ;
end for
Algorithm 3 Federated Transfer Learning: Cross Validation

Transfer Cross Validation

For model validation, we also propose a secure transfer cross validation approach (TrCV), inspired by [Zhong et al.2010]. See Algorithm 3. First, we split the labeled data in the source domain into folds and each time reserve one fold data as our test set. We use the remaining data to build a model by Algorithm 1 and conduct label predictions by Algorithm 2. Next, we combine the predicted labels with the original dataset and retrain the model by Algorithm 1 and evaluate it on the reserved dataset with:


Decrypting and comparing with true labels , we obtain the performance of the kth fold : . Finally, an optimal model is selected as:


Notice that TrCV performs validations using source domain labels, which could be advantageous in situations where target labels are difficult to obtain. A self-learning supervised model is also built with to provide safeguards against negative transfer [Kuzborskij and Orabona2013, Zhong et al.2010]. In the scenario that the labels are in the source-domain party, the self-learning is reduced to a feature-based federated learning problem. Otherwise the target-domain party will build the self-learning model itself. In the cases that the transfer learning model is inferior to a self-learning model, knowledge needs not to be transfered.

Security Analysis

Theorem 1.

The protocol in Algorithm 1 and 2 is secure under our security definition, provided that the underlying additively homomorphic encryption scheme is secure.


The training protocol in Algorithm 1 and 2 do not reveal any information, because all A and B learns are the masked gradients. Each iteration A and B creates new random masks thereby the randomness and secrecy of the masks will guarantee the security of the information against the other party[Du, Han, and Chen2004]. During training, party A learns its own gradients at each step, but this is not enough for A to learn any information from B based on the inability of solving n equations in more than n unknowns [Du, Han, and Chen2004, Vaidya and Clifton2002]. In another word, there exists infinite number of inputs from B to provide the same gradients to A. Similarly, party B can not learn any information from A. Therefore, as long as the encryption scheme is considered secure, the protocol is secure. During evaluation, party A learns the predicted result for each sample from B, which is a scalar product, from which A can not learn B’s information. B learns only the label, from which B can not learn A’s information. ∎

At the end of the training process, each party (A or B) remains oblivious to the data structure of the other party, and it obtains the model parameters associated only with its own features. At inference time, the two parties need to collaboratively compute the prediction results. Note the protocol does not deal with a malicious party. If party A fakes its inputs and submits only one non-zero input, it may tell the value of at that input’s position. It still can not tell or , and neither party will get correct results.

In summary, we provide both data security and performance gain in the proposed FTL framework. Data security is provided because raw data and , as well as the local models and are never exposed and only the encrypted common hidden representations are exchanged. In each iteration, the only non-encrypted values party A and party B receive are the gradients of their model parameters, which is aggregated from all sample variants. Performance gain is provided by the combination of transfer learning, transfer cross validation and a safeguard with the self-learning supervised model.

Figure 1:

Comparison of learning loss (Left) and weighted-F1 score (Right) for using logistic loss and Taylor approximation in federated transfer learning with 2-layers and 1-layer neurons respectively.

tasks samples TLT TLL LR SVMs SAEs
water vs other
water vs other
person vs other
person vs other
sky vs other
sky vs other
Table 1:

Comparison of weighted F1 score of transfer learning with Taylor loss (TLT), with logistic loss (TLL) and self-learning with logistic regression (LR), with support vector machines(SVMs), and with stacked auto encoders(SAEs).


In this section, we conduct experiments on multiple public data sets: 1) NUS-WIDE data set [seng Chua et al.2009] 2) Kaggle’s Default-of-Credit-Card-Clients [Kaggle] (”Default-Credit”) to validate our proposed approach and study the effectiveness and scalability of the approach with respect to various key impacting factors,including the number of overlapping samples, the dimension of hidden common representations and the number of features. The NUS-WIDE data set [seng Chua et al.2009] consists of hundreds of low-level features from Flickr images as well as their associate tags and ground truth labels. There are in total 81 ground truth labels. We use the top 1000 tags as text features and combine all the low-level features including color histogram, color correlogram as image features. Here we consider a data federation between party A and party B, where A has text tags features and image labels and B has low-level image features , and an one-vs-all classification problem. For each experiment we randomly sampled from the negative data set to maintain a balanced ratio between positive and negative samples. Here we consider networks of stacked auto-encoders, where for ,


where denotes the th layer of stacked auto-encoder,, and

denotes the sigmoid activation function. In our experiments, we train the stacked auto encoders for each party separately and minimize the encoder loss together with the supervised federated transfer loss as in Algorithm 1. ”Default-Credit” data set consists of credit card records including user’s demographic features, history of payment and bill statement etc with user’s default payment as labels. After applying one-hot encoding to categorical features, we obtain a data set with 33 features and 30,000 samples. We then split the dataset both in the feature space and the sample space to simulate a two-party federation problem. We assign all the labels to party A. We also assign each sample to party A, party B or both so that there is a small number of sample overlaps between A and B. We used one-layer SAEs with 32 neurons in this case. We separate the features in a way that the demographic features are on one side, separated from the payment and balance features. Such segregation can be found in industrial scenarios where businesses such as retail and car rentals leveraging banking data for users’ credibility predictions and customer segmentation. Many businesses (Party B) only have user’s demographic data and possibly limited set of data about user’s financing behavior, whereas banks usually have reliable labels. However, current collaborations with banking data (Party A) are rare due to data privacy constrains, but federated transfer learning provides a possibility to bridge data from different industries. In our experiments, Party A has six months of payment and bill balance data, whereas Party B has user’s profile data such as education, marriage,age and sex. We adopted the translator function as in

[Shu et al.2015], logistic loss function and an alignment loss .

Figure 2: Performance of FTL and LR with respect to the number of overlapping samples.
Figure 3: Comparison of performance using TrCV, CV and LR for various number of folds k.
Figure 2: Performance of FTL and LR with respect to the number of overlapping samples.
Figure 4: Running time per iteration as a function of dimension of the commonly shared hidden representation (Left); number of the features in B (Middle); number of the commonly shared samples (Right).

Impact of Taylor approximation

We studied the effect of Taylor approximation by monitoring and comparing the training loss decay and the performance of prediction. Here we test the convergence and precision of the algorithm using the NUS-WIDE data and neural networks with different levels of depth. In the first case and both have one auto-encoder layer with 64 neurons. In the second case and both have two auto-encoder layers with 128 and 64 neurons. In both cases, we used 500 training samples, 1396 overlapping pairs and . We summarize the results in Figure (1). We found that loss decays at a similar rate when using Taylor approximation as compared to using the full logistic loss, and the weighted F1 score of the Taylor approximation approach is also comparable to the full logistic approach. The loss converges to a different minima in both cases, similar to [Hardy et al.2017]. As we increased the depth of the neural networks, the convergence and the performance of the model do not decay. Most of existing secure deep learning frameworks suffer from accuracy loss when adopting privacy-preserving techniques. For example, SecureML [Mohassel and Zhang2017] reports more than 1% accuracy loss when training with 2 hidden layers with 128 neurons in each layer on the MNIST dataset using MPC techniques. In CryptoDL [Hesamifard, Takabi, and Ghasemi2017]

, an inference-only secure deep learning protocol, it is shown that using degree-2 Taylor approximation can result in over 50% accuracy loss in a 2-layer convolutional neural network. Using only low-degree Taylor approximation, the drop in accuracy in our approach is much less than the state-of-art secure neural networks with similarly approximations, therefore our approach is very adaptive to deeper neural networks.

Transfer learning vs self-learning

In this section, we evaluate the performance of the proposed transfer learning approach by comparing it to the self-learning approach. We tested both the transfer learning approaches with Taylor loss (TLT) and with logistic loss (TLL). For the self-learning approach, we picked three algorithms,logistic regression(LR), SVMs and stacked auto encoders(SAEs). The SAEs are of the same structure of the ones we used for transfer learning, and are connected to a logistic layer for classification. We picked three of the most occurring labels in the NUS-WIDE dataset. For each experiment, the number of co-occurrence samples we used is half of the number of total samples in that category. We varied the size of the training sample set and conducted three tests for each experiment with different random partition of the samples. For each experiment the parameters and are optimized by cross validation. The results are summarized in Table 1. From Table 1 we show that TLT and TLL yield comparable performance across all tests. The proposed transfer learning approach outperforms the baseline self-learning approaches using only a small set of training samples for almost all the experiments conducted. In addition, the performance improves as we increased the number of training samples.The results validate the robustness of the algorithm.

Effect of overlapping samples

Figure 3 shows the effect of varying the number of overlapping samples on the performance of transfer learning. The overlapping sample pairs are used to bridge the hidden representations between the two parties, therefore the performance of federated transfer learning improves as the availability of the overlapping pairs increase.

Transfer Cross Validation

We evaluate the performance of the transfer cross validation (TrCV) technique by comparing to a plain cross validation(CV) approach using the Default-Credit data set. The experiments are conducted using , 200 training samples and 6000 overlapping samples, and . The results are shown in Figure 3. We show that TrCV approach outperforms a CV approach at various values of k (folds).


To evaluate the scalability of the proposed algorithm, we conduct experiments to simulate three-party computation on a single Intel i5 machine with 8 GB memory. In these experiments parties communicate with an XML-RPC protocol. We use the Paillier additively homomorphic encryption [Paillier1999] implemented in python 111 The key size for encryption we adopted is 1024 bits. We study how the running time scales with the number of overlapping samples and the number of target-domain features, as well as the dimension of the domain-invariant hidden representations, denote as . From equation (8) and (9) we show that the communication cost for B sending encrypted information to A is:


where denotes the size of one encrypted text, n denotes the number of samples sent. The same level of communication cost applies when sending encrypted information from A to B. For , bytes, per sample communication is about 1 MB. We show how the running time per iteration scales with various key factors in Figure (4). As expected from the above analysis, as we increase the dimension of the hidden representation

, the increase of the running time is accelerating across different values of number of overlapping samples tested. On the other hand, the running time grows linearly with respect to the number of target-domain features, as well as the number of samples shared. The communication time is included in the overall runtime reported. With 10 hidden dimensions, the communication time is roughly 40 percents of the total runtime. Computations of encrypted gradients accounts for about 50 percents of the runtime and the rest is consumed on encryption and decryption operations. For simulations we use a python-implementation of XML-RPC protocol with HTTP connections and localhost server proxy. Efficiency and scalability is still a challenge, although we haven’t used distributed and asynchronous computing techniques or high-power GPUs for efficient improvement, the algorithm proposed here is parallelable and also friendly with high-performance machine learning platforms such as tensorflow.


In this paper we proposed Federated Transfer Learning (FTL) framework and expanded the scope of existing secure federated learning to broader real-world applications. We demonstrated that in contrast to the existing secure deep learning approaches which usually suffer from accuracy loss, the proposed secure FTL is as accurate as the non-privacy-preserving approach and has superior performance over the non-federated self-learning approach. We also introduced a scalable and flexible approach for adapting additively homomorphic encryption to neural networks with minimal modifications to the existing structure of the neural networks. The proposed framework is a complete privacy-preserving solution which includes training, evaluation and cross validation. The current framework is not limited to any specific learning models but rather a general framework for privacy-preserving transfer learning. That said, the current solution does have limitations. For example, it requires parties to exchange encrypted intermediate results from only the common representation layers therefore are not applicable to all transfer mechanisms. Future works for FTL may include exploring and adopting the methodology to other deep learning systems where privacy-preserving data collaboration is needed, and continuing improving the efficiency of the algorithms by using distributed computing techniques, and finding less expensive encryption schemes.


We are especially grateful to Professor Chaoping Xing from Nanyang Technological University and Professor Cunsheng Ding from Hong Kong University of Science and Technology for their suggestions on designing secure protocols.


  • [Acar et al.2018] Acar, A.; Aksu, H.; Uluagac, A. S.; and Conti, M. 2018. A survey on homomorphic encryption schemes: Theory and implementation. ACM Comput. Surv. 51(4):79:1–79:35.
  • [Aono et al.2016] Aono, Y.; Hayashi, T.; Trieu Phong, L.; and Wang, L. 2016. Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, CODASPY ’16, 142–144. New York, NY, USA: ACM.
  • [Bonawitz et al.2017] Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H. B.; Patel, S.; Ramage, D.; Segal, A.; and Seth, K. 2017. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS ’17, 1175–1191. New York, NY, USA: ACM.
  • [Deng et al.2009] Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09.
  • [Dowlin et al.2016] Dowlin, N.; Gilad-Bachrach, R.; Laine, K.; Lauter, K.; Naehrig, M.; and Wernsing, J. 2016. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. Technical report.
  • [Du, Han, and Chen2004] Du, W.; Han, Y. S.; and Chen, S. 2004. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In SDM.
  • [Dwork2008] Dwork, C. 2008. Differential privacy: A survey of results. In Proceedings of the 5th International Conference on Theory and Applications of Models of Computation, TAMC’08, 1–19. Berlin, Heidelberg: Springer-Verlag.
  • [EU2016] EU. 2016. Regulation (eu) 2016/679 of the european parliament and of the council on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing directive 95/46/ec (general data protection regulation). Available at: https://eur-lex. europa. eu/legal-content/EN/TXT.
  • [Gascón et al.2016] Gascón, A.; Schoppmann, P.; Balle, B.; Raykova, M.; Doerner, J.; Zahur, S.; and Evans, D. 2016. Secure linear regression on vertically partitioned datasets. IACR Cryptology ePrint Archive 2016:892.
  • [Hardy et al.2017] Hardy, S.; Henecka, W.; Ivey-Law, H.; Nock, R.; Patrini, G.; Smith, G.; and Thorne, B. 2017. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. CoRR abs/1711.10677.
  • [Hesamifard, Takabi, and Ghasemi2017] Hesamifard, E.; Takabi, H.; and Ghasemi, M. 2017. Cryptodl: Deep neural networks over encrypted data. CoRR abs/1711.05189.
  • [Hitaj, Ateniese, and Pérez-Cruz2017] Hitaj, B.; Ateniese, G.; and Pérez-Cruz, F. 2017. Deep models under the GAN: information leakage from collaborative deep learning. CoRR abs/1702.07464.
  • [Kaggle] Kaggle. Default of credit card clients dataset:
  • [Karr et al.2004] Karr, A. F.; Lin, X. S.; Sanil, A. P.; and Reiter, J. P. 2004. Privacy-preserving analysis of vertically partitioned data using secure matrix products.
  • [Kim et al.2018] Kim, M.; Song, Y.; Wang, S.; Xia, Y.; and Jiang, X. 2018. Secure logistic regression based on homomorphic encryption: Design and evaluation. JMIR Med Inform 6(2):e19.
  • [Kuzborskij and Orabona2013] Kuzborskij, I., and Orabona, F. 2013. Stability and hypothesis transfer learning. In Dasgupta, S., and McAllester, D., eds., Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, 942–950. Atlanta, Georgia, USA: PMLR.
  • [Li et al.2017] Li, Z.; Zhang, Y.; Wei, Y.; Wu, Y.; and Yang, Q. 2017. End-to-end adversarial memory network for cross-domain sentiment classification. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, 2237–2243.
  • [McMahan et al.2016] McMahan, H. B.; Moore, E.; Ramage, D.; and y Arcas, B. A. 2016. Federated learning of deep networks using model averaging. CoRR abs/1602.05629.
  • [McSherry2016] McSherry, F. 2016. Deep learning and differential privacy. Available at:
  • [Mohassel and Rindal2018] Mohassel, P., and Rindal, P. 2018. Aby3: A mixed protocol framework for machine learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, 35–52. New York, NY, USA: ACM.
  • [Mohassel and Zhang2017] Mohassel, P., and Zhang, Y. 2017. Secureml: A system for scalable privacy-preserving machine learning. IACR Cryptology ePrint Archive 2017:396.
  • [Nock et al.2018] Nock, R.; Hardy, S.; Henecka, W.; Ivey-Law, H.; Patrini, G.; Smith, G.; and Thorne, B. 2018. Entity resolution and federated learning get a federated resolution. CoRR abs/1803.04035.
  • [Oquab et al.2014] Oquab, M.; Bottou, L.; Laptev, I.; and Sivic, J. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In CVPR.
  • [Paillier1999] Paillier, P. 1999. Public-key cryptosystems based on composite degree residuosity classes. In IN ADVANCES IN CRYPTOLOGY — EUROCRYPT 1999, 223–238. Springer-Verlag.
  • [Pan et al.2010] Pan, S. J.; Ni, X.; Sun, J.-T.; Yang, Q.; and Chen, Z. 2010. Cross-domain sentiment classification via spectral feature alignment. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10, 751–760. New York, NY, USA: ACM.
  • [Phong et al.2017] Phong, L. T.; Aono, Y.; Hayashi, T.; Wang, L.; and Moriai, S. 2017. Privacy-preserving deep learning via additively homomorphic encryption. Cryptology ePrint Archive, Report 2017/715.
  • [Phong et al.2018] Phong, L. T.; Aono, Y.; Hayashi, T.; Wang, L.; and Moriai, S. 2018. Privacy-preserving deep learning via additively homomorphic encryption. Trans. Info. For. Sec. 13(5):1333–1345.
  • [Rivest, Adleman, and Dertouzos1978] Rivest, R. L.; Adleman, L.; and Dertouzos, M. L. 1978. On data banks and privacy homomorphisms. Foundations of Secure Computation, Academia Press 169–179.
  • [Rivest, Shamir, and Tauman1979] Rivest, R. L.; Shamir, A.; and Tauman, Y. 1979. How to share a secret. Communications of the ACM 22(22):612–613.
  • [Rouhani, Riazi, and Koushanfar2017] Rouhani, B. D.; Riazi, M. S.; and Koushanfar, F. 2017. Deepsecure: Scalable provably-secure deep learning. CoRR abs/1705.08963.
  • [Sanil et al.2004] Sanil, A. P.; Karr, A. F.; Lin, X.; and Reiter, J. P. 2004. Privacy preserving regression modelling via distributed computation. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’04, 677–682. New York, NY, USA: ACM.
  • [seng Chua et al.2009] seng Chua, T.; Tang, J.; Hong, R.; Li, H.; Luo, Z.; and Zheng, Y. 2009. Nus-wide: A real-world web image database from national university of singapore. In In CIVR.
  • [Shokri and Shmatikov2015] Shokri, R., and Shmatikov, V. 2015. Privacy-preserving deep learning. In Proceedings of the 22Nd ACM SIGSAC Conference on Computer and Communications Security, CCS ’15, 1310–1321. New York, NY, USA: ACM.
  • [Shu et al.2015] Shu, X.; Qi, G.-J.; Tang, J.; and Wang, J. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15, 35–44. New York, NY, USA: ACM.
  • [Silver et al.2016] Silver, D.; Huang, A.; Maddison, C. J.; Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman, S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.; Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; and Hassabis, D. 2016. Mastering the game of go with deep neural networks and tree search. Nature 529:484–503.
  • [Vaidya and Clifton2002] Vaidya, J., and Clifton, C. 2002. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’02, 639–644. New York, NY, USA: ACM.
  • [Wan et al.2007] Wan, L.; Ng, W. K.; Han, S.; and Lee, V. C. S. 2007. Privacy-preservation for gradient descent methods. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, 775–783. New York, NY, USA: ACM.
  • [Zhong et al.2010] Zhong, E.; Fan, W.; Yang, Q.; Verscheure, O.; and Ren, J. 2010. Cross validation framework to choose amongst models and datasets for transfer learning. In Balcázar, J. L.; Bonchi, F.; Gionis, A.; and Sebag, M., eds., Machine Learning and Knowledge Discovery in Databases, 547–562. Berlin, Heidelberg: Springer Berlin Heidelberg.
  • [Zhu et al.2010] Zhu, Y.; Chen, Y.; Lu, Z.; Pan, S. J.; rong Xue, G.; Yu, Y.; and Yang, Q. 2010. Heterogeneous transfer learning for image classification. In In Special Track on AI and the Web, associated with The Twenty-Fourth AAAI Conference on Artificial Intelligence.