FL (fl) is an emerging machine learning technique which allows participating clients to collaboratively train a joint global machine learning model without sharing their local training data. FL reduces privacy risks for the local training data which may be highly sensitive relating to personal finances, political views, health, etc. Thus, it has been used widely in the industry since it helps companies comply with regulations on issues regarding the way in which personal data is handled and processed - such as EU’s General Data Protection Regulation (GDPR) (gdpr). The core idea of FL is that each client trains a local model, rather than sharing training data to a centralized training system which is deployed in an untrusted environment, e.g., a public cloud. For each iteration, the clients send their local training parameters to the central system to train a global model which takes benefits from all local training data from clients. Typically, the central system aggregates the local training parameters from the clients and sends the aggregated parameters back to them. This training process is repeated until it converges or the global model reaches a certain desired accuracy. An example of FL in real-life deployment is that several hospitals collaborate to develop a shared machine learning model based on their patient data to detect a disease at an early stage. Each hospital trains its data locally, shares the local model with the central training system, and receives the global model in each iteration.
While promising at first glance, FL paradigm suffers several vulnerabilities. First, an attacker with privileged/root accesses can easily obtain the training models (➊). The attacker can also compromise the privacy of individuals in the training data by inferring it from parameters of the global model (deeplearning-DP). Therefore, the training models need to be protected at rest, in transit, and in use. Second, a large number of malicious clients may collude with each other to reveal local data and local models of the remaining clients (mugunthan2020privacyfl) (➋). Last but not least, these malicious clients can tamper their local training data or parameters updates forwarded to the central training system to corrupt the global model (xie2020fall; fang2020local) (➌).
To handle the issues ➊ ➋, state-of-the-art solutions rely on a privacy-preserving mechanism such as differential privacy or secure multiparty computation (MPC). The disadvantage of the differential privacy mechanism is that it reduces the performance of the global training model regarding utility or accuracy. Meanwhile, the solutions based on secure multiparty computation incur significant overhead (securetf; tensorscone). To cope with issue ➌, several Byzantine-robust federated learning mechanisms have been proposed (blanchard2017machine; xie2020fall; fang2020local; bhagoji2019analyzing)
. The core idea behind these mechanisms is to reduce the impact of statistical outliers during model updates in the federated learning system. However, recent works(xie2020fall; fang2020local; bhagoji2019analyzing) show that the mitigation of the impact is still not enough to protect the utility of the global model. The malicious clients can still affect the accuracy of the global model trained by a Byzantine-robust mechanism by carefully tampering with their model parameters sent to the central training system (cao2021provably).
In this work, we overcome these limitations by building a confidential federated learning system called SecFL using TEEs, e.g., Intel SGX. tee technologies, such as Intel sgx, have gained much attention in the industry (AzureSGX; IBMCloudSGX; singh2021enclaves) as well as in academia (scone; costan2016intel; tsai2017graphene; sgx-pyspark; ozga2021perun; perun2; securetf; singh2020enclaves; teemon; TSR; avocado; weles). To ensure confidentiality and integrity of applications, tee execute their code and data inside an encrypted memory region called enclave. Adversaries with privileged access cannot read or interfere with the memory region and only the processor can decrypt and execute the application inside an enclave. In addition, TEEs such as Intel SGX also provide a mechanism for users to verify that the tee is genuine and that an adversary did not alter their application running inside TEE enclaves. The verification process is called Remote Attestation (costan2016intel) and allows users to establish trust in their application running inside an enclave on a remote host.
We leverage tee to handle issue ➊, by providing end-to-end encryption in SecFL. SecFL encrypts input training data and code (e.g., Python code) and performs all training computations including local training and global training insides TEE enclaves. SecFL enables all model updates via TLS connections between the enclave of clients and the enclaves of the central training computation. Thus attackers with privileged accesses cannot violate the integrity and confidentiality of the input training data, code, and models. SecFL also ensures the freshness of the input training data, models, and, by applying an advanced asynchronous monotonic counter service (adam-cs). We tackle issues ➋ ➌ by developing in SecFL a Security Policy Manager component based on the remote attestation mechanism supported by TEEs (palaemon; intel-remote-attestation). The component ensures the integrity of input data and training code, i.e., it makes sure that training computations are running with correct code, correct input data and not modified by anyone, e.g., an attacker or malicious client. This component also monitors and attests to the compliance of participated clients with the pre-defined agreement before collaborating to train the global machine learning model. In addition, SecFL can clone the global training computation and randomly take a sample of clients for the training computation. This helps to detect outliers regarding the utility which helps to solve issue ➌. Our preliminary evaluation shows that SecFL can ensure the confidentiality and integrity of federated learning computations while maintaining the same utility/accuracy of the training computations.
2. Confidential Federated Learning
Figure 1 shows the architecture of SecFL. The main goal of SecFL is not only to ensure the confidentiality, integrity and freshness of input data, code, and machine learning models but also to enable multiple clients (who do not necessarily trust each other) to get the benefits of collaborative training without revealing their local training data. In SecFL, each client performs the local training also inside TEE enclaves to make sure that no one tamper input data or training code during the computations. To govern and coordinate the collaborative machine learning training computation between clients, we design in SecFL a trusted management component, called Security Policy Manager which maintains security policies based on the agreement among all clients to define the access control over global training computation, the global training model, also the code and input data used for local training at each client. Security Policy Manager automatically and transparently performs remote attestation to make sure the local computations are running correct code, correct input data, and on correct platforms as the agreement. It only allows clients to participate in the global training after successfully performing the remote attestation process. It also conducts the remote attestation on the enclaves that execute the global training in a cloud, to ensure that no one at the cloud provider side modify the global training aggregation computation. SecFL encrypts the training code, and Security Policy Manager only provides the key to decrypt it inside enclaves after the remote attestation. Secrets including keys for encryption/decryption in each policy are generated by the Security Policy Manager also running inside Intel SGX enclaves and cannot be seen by any human or client. Examples of the policies can be found in (sconedocs; palaemon).
After receiving the agreed security policies from clients, Security Policy Manager strictly enforces them. It only passes secrets and configuration to applications (i.e., training computations), after attesting them. The training computations are executed inside Intel SGX enclaves and associated with policies provided and pre-agreed by clients. The training computations are identified by a secure hash and the content of the files (input data) they can access. Secrets can be passed to applications as command-line arguments, environment variables, or can be injected into files. The files can contain environment variables referring to the names of secrets defined in the security policy. The variables are transparently replaced by the value of the secret when an application that is permitted to access the secrets reads the file. We design the component Security Policy Manager in the way that we can delegate the management of it to an untrusted party, e.g., a cloud provider, while clients can still trust that their security policies for protecting their properties are safely maintained and well protected. In SecFL, clients can attest Security Policy Manager component, i.e., they can verify that it runs the expected unmodified code, in a correct platform before uploading security policies. We implement SecFL using Intel OpenFL (intel-openfl) — a distributed federated machine learning framework We run the local and global training computations inside SGX enclaves using SCONE (scone) — a shielded execution framework to enable unmodified applications to run inside SGX enclaves. In the SCONE platform, the source code of an application is recompiled against a modified standard C library (SCONE libc) to facilitate the execution of system calls. The address space of the application stays within an enclave. In SecFL, the input training data and code are encrypted using the file system shield of SCONE, and then decrypted and processed inside SGX enclaves which cannot be accessed even by strong attackers with root access. We rely on our previous works (palaemon; sconedocs) to implement Security Policy Manager. A demo of SecFL is publicly available in (secfl-demo).