Standard machine learning environments often rely on large amounts of sensitive data to achieve a high level of performance 35179. However, preparing a central repository of data is laborious, making secure-collaborative training expensive. Outsourcing the data to a central server that performs model training for the users is a potential solution, but is often not feasible in privacy-sensitive settings. Secure aggregation of data using multiparty computation frameworks (MPC) yao1982protocols; mood2016frigate; perry2014systematizing; di2014practical; gupta2016using; lindell2020secure; goldreich1998secure; goldreich2019play has been explored in recent works, but they significantly impact framework efficiency due to added computational overheads phong. Moreover, centralized aggregation creates a single point of failure in the framework that can potentially compromise the security and privacy of the training data if the server is malicious, or prone to adversarial attacks by colluding participants chen2021robust; kairouz2021advances.
A recently proposed alternative for privacy-preserving training, without data outsourcing, is Federated Learning (FL) mcmahan2016communication. FL has emerged as a promising approach to collaboratively train a model by exchanging model parameters with a central aggregator (or server), instead of the actual training data. However, parameter exchange may still leak a significant amount of private data zhu. Several approaches have been proposed to overcome this leakage problem based on differential privacy (DP) shokri2015privacy; papernot2018scalable, MPC bonawitz2017practical; ryffel2018generic, HE truex2019hybrid, etc. While DP-based learning aims to mitigate inference attacks, it significantly degrades model utility as the training of accurate models requires high privacy budgets jayaraman2019evaluating. Cryptographic techniques provide improved privacy protection but remain too slow for practical use due to the extensive cryptographic operations. Hence, there arises a need for a secure, decentralized FL framework that protects user privacy, while allowing seamless training of ML models. This requires strong cryptographic protection of the intermediate model updates during the model aggregation and the final model weights.
In this work, we propose Scotch, a practical framework that enables secure m-party aggregation in a distributed n-server setting. It provides end-to-end protection of the parties’ training data, intermediate model weights, and the final resulting model by combining secure multiparty computation (MPC) primitives based on secure outsourced computation and secret sharing to enable decentralized FL. Our contributions have been described in further detail in the following section.
In this paper, we introduce a one-of-its-kind framework for privacy-preserving federated learning with primitives from conventional machine learning and multiparty computation (MPC). Specifically,
We propose Scotch, a simple, fast, and efficient federated learning framework that allows for decentralized gradient aggregation using secure outsourced computation and secret sharing while ensuring strict privacy guarantees of the training data mohassel2017secureml; wagh2019securenn.
We evaluate the efficiency of our proposed secret sharing-based FL protocol against existing state-of-the-art frameworks. To the best of our knowledge, Scotch is the only approach for decentralized privacy-preserving FL with the least possible cryptographic computational overheads – only crypto-related operations required in each training round, where is the number of participants and is the number of aggregators (See Table 1).
We implement Scotch and perform extensive experiments on multiple standard datasets such as MNIST, EMNIST, and FMNIST with promising results: Scotch has efficiency improvements both in training time and communication cost while providing similar model performance and privacy guarantee as other approaches.
For ease of access, all of our code and experiments are available at: https://github.com/arupmondal-cs/SCOTCH.
FL mcmahan2016communication is a distributed ML approach that enables model training on a large corpus of decentralized data with myriad participants. It is an example of the more general approach of “bring code to data, not data to code”. In FL, each party trains a model locally and exchanges only model parameters with an FL server or aggregator, instead of the private training data.
The participants in the training processes are parties and the FL server, which is a cloud-based distributed service. Devices agreement to the server that they are ready to run an FL task for a given FL population
. An FL population is specified by a globally unique name which identifies the learning problem, or application, which is worked upon. An FL task is a specific computation for an FL population, such as training to be performed with given hyperparameters, or evaluation of trained models on local device data. After finishing the local computation on its local dataset then each device updates the model parameters (e.g. the weights of a neural network) to the FL server. The server incorporates these updates into its global state of the global model.
Secure Multiparty Computation.
Secure multiparty computation (MPC) yao1982protocols; mood2016frigate; perry2014systematizing; di2014practical; gupta2016using; lindell2020secure; goldreich1998secure; goldreich2019play is the universal cryptographic functionality, allowing any function to be computed obliviously by a group of mutually distrustful parties. There exist a number of different techniques for MPC (e.g., garbled circuits, functional encryption, and homomorphic encryption, etc.). In this work, we have considered MPC based on secret sharing shamir1979share.
In cryptography, secret sharing shamir1979share; blakley1979safeguarding refers to the process of splitting a secret among parties such that each party does not learn anything about the whole secret from the share it holds. The secret can be reconstructed only if a certain minimum number of parties, greater than or equal to a threshold, , combine their shares. The scheme is known as the threshold scheme or -out- secret sharing. In this work, we use additive secret sharing, which uses addition as the way to combine shares. We use the notation to denote the share of a secret .
In this section, we describe the proposed framework Scotch, an efficient distributed secure-computation approach for secure outsourced aggregation based on MPC primitives. The distributed federated averaging algorithm has been described in Algorithm 1. Algorithm 1 briefly describes one iteration of our protocol. The steps given in this algorithm have been illustrated in Figure 1.
We assume a passively secure threat model. A passive (honest-but-curious) adversary follows the protocol specifications but may try to learn information about the private input data by inspecting the shared inputs. Both the participants, data owners (or clients), and the aggregators (or servers) are honest-but-curious. Scotch ensures that aggregators (collude with any subset of participants and aggregators) can’t learn any information about the private inputs of the honest participants. Similarly, it also ensures that any subset of colluding participants cannot learn any information about the private inputs or outputs of the honest participants by inspecting the messages exchanged with the aggregators or the final model. We also assume any encryption broadcast to the network in Algorithm 1 is re-randomized to avoid leakage about parties’ confidential data by two consecutive broadcasts. We omit this operation in Algorithm 1 for clarity. Finally, attacks that aim to create denial-of-service attacks or inject malicious model updates are beyond the scope of this short paper.
We assume a set of honest-but-curious aggregators, and a set of clients, , where each client for holds its own private dataset . We defer more details about the threat model and security of the framework to ‘threat model’ and ‘privacy guarantees’ section. The clients in agree upon a model architecture, , for local training prior to the runtime of the framework. The underlying concept in this framework is -out-of--additive-secret-sharing-based MPC, which provides protocols for aggregators and is secure against a passive adversary that corrupts at most clients.
At the beginning of every iteration, the function local_training is invoked by client in with input . This function allows clients to train local models on their private datasets using the pre-decided model architecture, . In the first iteration, initial weights are sampled and stored in . For subsequent iterations, the aggregated weights from the previous iteration are used as initial weights. Each client samples a randomly-permuted (without replacement) subset from the dataset in each iteration. Functions permute_indices and choose_subset help with the same. In each iteration, each client trains a model on with inputs and . The clients then split the model weights into -out-of- additive secret shares by invoking split_secret_shares. These shares are then sent to the aggregators.
Having received a total of shares from clients in , each server for adds its local shares and divides the sum by the total number of aggregators to obtain the value by invoking federated_sum. One can observe that is an -out-of- additive secret share of the federated average of the local models of the clients. Each server then sends to clients in so that they can obtain the final model.
Computing the Final Model.
Having obtained additive secret shares of the federated average from the aggregators, each client locally adds up the shares to obtain the federated average of their models by invoking the function compute_final_model. Clients set the value of variable as the federated average obtained in this iteration. If the current iteration is the final one, is returned as the final output. If not, is used as the initial weights in local_training for the subsequent iteration.
To enable seamless integration between machine learning primitives (which generally use floating-point), and MPC primitives (which generally use integers), we use integer ring arithmetic in our implementation. To enable conversions between the float and integer realms, we use functions float_to_int, int_to_float, and truncate based on primitives provided in mohassel2017secureml. After training its local model, each client embeds its weights onto the integer ring by invoking float_to_int111float_to_int converts a floating-point value into an -bit integer by allocating bits to the integer part, bits to the fractional part, and bit to the sign of the value, such that . Note that represents the maximum precision of the value. Refer to wagh2019securenn for further details.. The rest of the operations are performed in the integer ring realm. At places where two values in the integer ring are multiplied, the product is truncated by invoking truncate. Finally, at the end of every iteration, the aggregated weights are converted back to float by invoking int_to_float in order to facilitate any further local training on them.
innertopmargin = 10pt,
frametitle=Algorithm 1: Secure Outsourced Aggregation,
Input: Client for holds its private dataset .
Output: Client for obtains the final aggregated global model, .
Client for trains local model on a random subset of its private dataset . Note that all the clients use the same model architecture.
Client for creates additive secret shares of its model and sends each share for to server .
Server for adds up the received shares from all clients and divides the sum by to obtain and sends to all the clients.
Each client locally computes ).
[ innerlinewidth=0.5pt, innerleftmargin=10pt, innerrightmargin=10pt, innertopmargin = 10pt, innerbottommargin=10pt, skipabove=roundcorner=5pt, frametitle=Algorithm 2: Scotch Framework, frametitlerule=true, frametitlerulewidth=1pt]
Input: Client in possesses private dataset for . the total number of global iterations for aggregation len( represents the number of data points in the dataset . is the total number of aggregators.
Output: Clients obtain the final aggregated model stored in .
Procedure local_training (, ):
random_init(); // randomly sample initial weights for n.
choose_subset(, len() / );
; // split the model into -out-of- additive secret shares.
Table 1 describes the complexity of the secure aggregation protocol (refer to Algorithm 1). Since Scotch is a secure aggregation framework, the complexity of functions local_training and compute_final_model can be considered offline. As a result, we only consider federated_sum() as the online phase of the protocol.
|Complexity||Data Owners||Aggregator Servers|
Scotch achieves data privacy guarantees under a semi-honest adversary model with any subset of colluding aggregators. Scotch’s infrastructure is designed using multi-input secret sharing-based MPC protocol to calculate the federated average of model gradients shared by participating clients. Private training data is not sent – participating entities get split “shares” of model gradients, or the generated averaged model, neither of which can be used to reconstruct sensitive information about the training dataset used. The security of these shares is guaranteed by standard MPC theorems goldreich1998secure, and since the actual computations performed within the MPC setup (which can perform arbitrary computations and is agnostic in that sense).
We simulate Scotch
using socket, a low-level networking interface that can be accessed using Python. We rely on the Tensorflow library for the training and inference of machine learning models. All our experiments are performed on a local machine – a Linux machine with Intel i7-9700K CPU@3.60 GHz and GeForce RTX 2070 GPU with 32 GB RAM. All clients and servers are assumed to be running independent nodes and are connected via a virtual network.
Dataset and Model Configuration
The MNIST mnist
dataset comprises 60,000 handwritten digit character images, along with 10,000 testing images. The data is pre-processed by resizing each image, and one-hot encoding the labels. Each client uses a three-layer Multi-layered Perceptron to train on their local datasets. The architecture of the MLP is outlined in Figure2.
The Extended MNIST (digits) emnist dataset contains 240,000 handwritten digit character images and 40,000 images for training and testing purposes respectively. The data is preprocessed by resizing each image, and one-hot encoding the labels. We use the same MLP architecture as used in MNIST, to train each local model.
Fashion-MNIST fmnist is a dataset of Zalando’s article images that contains 60,000 training images and 10,000 testing images. The data is pre-processed by resizing each image, and one-hot encoding the labels.
Scotch’s framework incorporates secure aggregation via secret outsourced computation. Each client takes part in federated learning by (a) locally training on their private data, and (b) sharing their gradients with servers via secret-sharing. Each server receives partial shares from the clients, which it aggregates and propagates back to all clients. This allows each client to recompute the global model gradients by averaging the shares received from the server(s).
We evaluate Scotch in terms of three indicators: (a) Performance of the generated model with a different number of clients and servers, (b) Impact of varying precision while secret sharing on model performance, and (c) communication complexity (see Table 1).
We evaluate Scotch’s performance on three standard datasets – MNIST, EMNIST, and FMNIST (refer Dataset and Model Configuration section) with varying numbers of clients, in a 3-server setting. For each dataset, we use a three-layer MLP whose architecture has been outlined in §Dataset and Model Configuration
. We use a standard 70-30 train-test split, for each dataset, and the training data is equally divided amongst the clients. Each client locally trains on their individual dataset for 3-4 epochs, with a learning rate of 0.01. The results have been summarized in Table3.
To test the effects of precision on training our global model, we compare the results of Scotch on MNIST dataset, with 16 and 32 bits of precision. The test accuracy comparison between these two is shown in Table 2. To support decimal arithmetic in an integer ring we use the solution proposed by mohassel2017secureml. As we observe from our experiments, if we restrict the number of decimal places to 32 bits, we see a significant improvement in test accuracy as opposed to 16 bits. Therefore, we observe a direct correlation between the precision of floating-point numbers involved in network training and the resulting model. To understand the effects of precision, we trained a centralized FL server with a constraint – we round each weight update of the ML model with 32 bits of precision (restricting values up to 5 decimal places). We observed that there is a considerable decrease in model accuracy with decreasing precision. This underscores the importance of precision while training machine learning models. We summarize our observations in Table 4 (for further details, please refer to the Impact of Precision Length section).
We observe a decrease in accuracy with increasing number of clients because of the compounding errors in float_to_int() and int_to_float() conversions as a result of limited precision. These can be offset by an increase in precision. We plan to scale our existing framework to a larger number of clients and servers with the help of a reasonable increase in precision size in the near future.
Impact of Precision Length
Most protocols in secure multiparty computation operate in integer rings. However, one needs to deal with decimal numbers while tackling computations in machine learning algorithms. To mitigate this, we use a mapping between fixed-point decimals and the integer ring (as used by state-of-the-art MPC frameworks such as SecureML mohassel2017secureml). The integer part of the decimal number is represented by bits and the fractional part by . To evaluate the effects of precision on our training and testing accuracy, we replicate the precision settings used in SecureML mohassel2017secureml
for logistic and linear regression. Comparing our tests with SecureML helps us to understand the effects of precision-length while training different machine learning models. Even though SecureML’s experiments were restricted to 13–16 bits, they used much simpler models such as logistic regression and with a simpler dataset – (1000 to 1M samples of the MNIST dataset), and objective – Binary Classification. Through our experiments we observe that multi-class classification via Multi-layer Perceptrons on much smaller dataset 70% of 60,000 MNIST images performs better if we increase the precision (refer to Table4). One might notice how much of a difference does precision make on gradient updates while performing gradient descent using Neural Networks. This difference plays a role in our experiments as well.
The existing privacy-preserving machine learning (PPML) works focus exclusively on training (generalized) linear models. They rely on centralized solutions where the learning task is securely outsourced to a server, notably using homomorphic encryption (HE) techniques. As such, these works do not solve the problem of privacy-preserving distributed ML, where multiple parties collaboratively train an ML model on their data. To address the latter, several works propose multi-party computation (MPC) yao1982protocols; mood2016frigate; perry2014systematizing; di2014practical; gupta2016using; lindell2020secure; goldreich1998secure; goldreich2019play solutions where several tasks, such as clustering and regression, are distributed among 2, 3, or 4 servers mohassel2017secureml; wagh2019securenn; blaze; ramachandran2021s++; wagh2021falcon; riazi2018chameleon; demmler2015aby; payman2018aby3; wagh2021falcon. Although such approaches, however, limit the number of parties among which the trust is split, often assume an honest majority among the computing servers, and require parties to communicate (i.e., secret share) their data outside their premises. This might not be acceptable due to the privacy and confidentiality requirements and the strict data protection regulations.
A recently proposed alternative for privacy-preserving training – without data outsourcing – is federated learning (FL) mcmahan2016communication. FL has emerged as a promising approach to collaboratively train a model by exchanging model parameters with a central aggregator, instead of the actual training data. However, parameter exchange may still leak a significant amount of private data. Several approaches have been proposed to overcome this leakage problem based on differential privacy (DP) shokri2015privacy; papernot2018scalable, MPC bonawitz2017practical; ryffel2018generic, HE truex2019hybrid; sav2020poseidon, Trusted Execution Environment mondal2021flatee; mondal2021poster, etc. Furthermore, in those settings, the aggregator is a central player, which also potentially represents a single point of failure kairouz2021advances and due to the extensive use of cryptographic operations, these frameworks remain too slow for practical use. Finally, other works combine MPC with DP techniques to achieve better privacy guarantees truex2019hybrid; xu2019hybridalpha; pettai2015combining. While DP-based learning aims to mitigate inference attacks, it significantly degrades model utility, as training accurate models requires high privacy budgets jayaraman2019evaluating. Therefore, a practical distributed privacy-preserving federated learning approach requires strong cryptographic protection of the intermediate model updates during the model aggregation and the final model weights.
We propose Scotch, a decentralized m-party, n-server secure-computation framework for federated aggregation that utilizes MPC primitives. The protocol provides strict privacy guarantees against honest-but-curious aggregators or colluding data-owners; it offers the least communication overheads compared to other existing state-of-the-art
privacy-preserving federated learning frameworks on standard datasets. In the near future, we plan to extend this framework to provide security against malicious servers and clients, scale it to a larger number of clients and servers, and finally deploy it via open-source channels for academic and industrial use-cases.