Federated learning (FL) allows users to collaboratively train a machine learning model without sharing their data and while protecting their privacy. The training is typically coordinated by a central server. The main idea that enables decentralized training without sharing data is that each user trains a local model using its dataset and the global model maintained by the server. The users then only share their local models with the server which updates the global model and pushes it again to the users for the next training round until convergence. Recent studies, however, showed that sharing the local models still breaches the privacy of the users through inference or inversion attacks e.g., [10, 19, 30, 11]. To overcome this challenge, secure aggregation protocols were developed to ensure that the server only learns the global model without revealing the local models [4, 24, 13, 29, 9, 2]. FL protocols commonly rely on synchronous training , which suffers from stragglers due to waiting for the updates of a sufficient number of users at each round. Asynchronous FL tackles this by incorporating the updates of the users as soon as they arrive at the server [27, 26, 6, 7]. While asynchronous FL handles stragglers efficiently, it is not compatible with the secure aggregation protocols designed particularly for synchronous FL. This is because these protocols securely aggregate many local models together each time the global model is updated and hence they are not suitable for asynchronous FL in which each single local model updates the global model. Another approach that can be applied in asynchronous FL to protect the privacy of the users is local differential privacy (LDP) . In this approach, each user adds a noise to the local model before sharing it with the server. This approach, however, degrades the training accuracy.
In , an asynchronous aggregation protocol known as FedBuff has been proposed to mitigate stragglers and enable secure aggregation jointly. FedBuff enables secure aggregation through trusted execution environments (TEEs) as Intel software guard extensions (SGX) . Specifically, the individual updates are not incorporated by the server as soon they arrive. Instead, the server keeps the received local models in a TEE-enabled secure buffer of size , where is a tunable parameter. The server then updates the global model when the buffer is full. This idea has been shown to be times faster than the conventional synchronous FL schemes.
Contributions. Since TEEs have limited memory, which limits the buffer size , and are inefficient compared to the untrusted hardware , we instead develop a buffered asynchronous secure aggregation protocol that does not rely on TEEs. The main challenge of leveraging the conventional secure aggregation protocols in the buffered asynchronous setting is that the pairwise masks may not cancel out. This is because of the asynchronous nature of this setting which may result in local models of different rounds in the buffer, while the pairwise masks cancel out if they belong to the same round. This requires a careful design of the masks such that they can be cancelled even if they do not correspond to the same round. Specifically, our contributions are as follows.
We propose a buffered asynchronous secure aggregation protocol that extends a recently proposed synchronous secure aggregation protocol known as LightSecAgg  to this buffered asynchronous setting. The key idea of our protocol, BASecAgg, is that we design the masks such that they cancel out even if they correspond to different training rounds.
We extend the convergence analysis of  to the case where the local updates are quantized, which is necessary for the secure aggregation protocols to protect the privacy of the local updates.
2 Related Works
Secure aggregation protocols typically rely on exchanging pairwise random-seeds and secret sharing them to tolerate users’ dropouts [4, 24, 13, 2]. The running time of such approaches, however, increases significantly with the number of dropped users since the server needs to reconstruct the mask of each dropped user. Recently, a secure aggregation protocol known as LightSecAgg has been proposed to address this challenge . In LightSecAgg, unlike the prior works, the server does not reconstruct the pairwise random-seeds of each dropped user. Instead, the server directly reconstructs the aggregate masks of all surviving users. This one-shot reconstruction of the masks of all surviving users results in a much faster training. It is also worth noting that the protocol of  is based on the one-shot reconstruction idea, but it requires a trusted third party unlike LightSecAgg.
Prior secure aggregation protocols [4, 24, 13, 2] are designed for the synchronous FL algorithms such as FedAvg , which suffer from stragglers. Asynchronous FL handles this problem by updating the global model as soon as the server receives any local model [27, 26, 6, 7]. The larger staleness is of the local model, the greater is the error when updating the global model . To address this staleness problem, an asynchronous protocol known as FedAsync has been developed in  that updates the global model through staleness-aware weighted averaging of the old global model and the received local model. In , an asynchronous protocol known as FedAt has been proposed, which bridges the gap between synchronous FL and asynchronous FL by developing a semi-synchronous protocol that groups the users, synchronously updates the model of each group and then asynchronously updates the global model across groups. Similarly, a semi-synchronous FL protocol has been developed in  to handle the staleness problem and also mitigates Byzantine users simultaneously.
Asynchronous FL, however, is not compatible with secure aggregation. A potential approach to ensure privacy then is through DP approaches that add noise to the local models before sharing them with the sever . A similar approach has been also leveraged in 
to develop a privacy-preserving protocol for a limited class of learning problems as linear regression, logistic regression and least-squares support vector machine in the vertically partitioned (VP) asynchronous decentralized FL setting. Adding noise, however, degrades the training accuracy. In, an asynchronous aggregation protocol known as FedBuff has been proposed to mitigate stragglers while ensuring privacy. The key idea of FedBuff is that the server stores the local models in a TEE-enabled secure buffer of size until the buffer is full and then securely aggregates them. Due to the memory limitations of TEEs, this approach is only feasible when is small. This motivates us in this work to develop a buffered asynchronous secure aggregation protocol without TEEs.
3 Synchronous Secure Aggregation
In this section, we provide an overview of secure aggregation of synchronous FL.
The goal in FL is to collaboratively learn a global model with dimension , using the local datasets of
users without sharing them. This problem can be formulated as minimizing a global loss function as follows
where is the local loss function of user and are the weight parameters that indicate the relative impact of the users and are selected such that .
This problem is solved iteratively. At round , the server sends the global model to the users. Some of the users may dropout due to various reasons such as wireless connectivity. We assume that at most users may dropout in any round. We denote the set of the surviving users at round by and the set of dropped users by . User updates the global model by carrying out
local stochastic gradient descent (SGD) steps. The goal of the server is to get the sum of the local models of the surviving users to update its global model asThe server then sends to the users for the next round. While the users do not share their data with the server and just share their local models, the local models still reveal significant information about their datasets [10, 19, 30, 11]. To address this challenge, a secure aggregation protocol known as SecAgg was developed in  to ensure that the server does not learn anything about the local models except at round . Specifically, we assume that up to users can collude with each other as well as with the server to reveal the local models of other users. The secure aggregation protocol then must ensure that nothing is revealed beyond the aggregate model despite such collusions.
3.1 Overview of SecAgg
We now provide an overview of SecAgg. In this discussion, we omit the round index for simplicity since the procedure is the same at each round. SecAgg ensures privacy against any subset of up to colluding users and resiliency against colluding workers provided that .
In SecAgg, the users mask their models before sharing them with the server using random keys. Specifically, each pair of users agree on a pairwise random seed . Moreover, user also uses a private random seed that is used when the update of this user is delayed but eventually reaches the server. The model of user is then masked as follows
where is a pseudo random generator. The server then reconstructs the private random-seed of each surviving user, the pairwise random-seed of each dropped user and recovers the aggregate model of the surviving users as follows
3.2 Overview of LightSecAgg
Next, we provide an overview of LightSecAgg. LightSecAgg has three parameters that represents the privacy guarantee, that represents that dropout guarantee and which represents the targeted number of surviving users. These parameters must be selected such that . In LightSecAgg, user selects a random mask and partitions it to sub-masks denoted by . User also selects another random masks denoted by . These partitions are then encoded through an Maximum Distance Separable (MDS) code  as follows
where is the -th column of a Vandermonde matrix . After that, user sends to user . User then masks its model as
The goal of the server now is to recover the aggregate model , where is the set of surviving users in this phase. To do so, each surviving users sends to the server. The server then directly recovers for through MDS decoding when it receives at least messages from the surviving users. We denote this subset of the surviving users by , where . Finally, the server recovers the aggregate model as .
4 Buffered Asynchronous Secure Aggregation
In this section, we provide a brief overview of FedBuff . Then, we illustrate the incompatibility of the conventional secure aggregation with asynchronous FL in Section 4.1. Later on, in Section 4.2, we introduce BASecAgg.
In asynchronous FL, the updates of the users are not synchronized while the goal is the same as the synchronous FL to minimize the global loss function in (1). In the buffered asynchronous setting, the server stores each local model that it receives in a buffer of size and updates the global model when the buffer is full. In our setting, this buffer is not a secure buffer. Hence, our goal is to design the secure aggregation protocol where users send the masked updates to protect the privacy in a way that the server can aggregate the local updates while the server (and potential colluding users) learns no information about the local updates beyond the aggregate of the updates stored in the buffer.
FedBuff. Before presenting our protocol, BASecAgg, we first provide an overview about the buffered asynchronous aggregation framework, named FedBuff , and describe the challenges that render SecAgg incompatible with this framework. The key intuition of FedBuff is to introduce a new design parameter , the buffer size at the server, so that FedBuff
has two degrees of freedom,and the concurrency while the synchronous FL frameworks have only one degree of freedom, concurrency. The concurrency is the number of users training concurrently and is an important parameter to provide a trade-off between the training time and the data inefficiency. Synchronous FL speeds up the training by increasing the concurrency, but higher concurrency results in data inefficiency . In FedBuff, however, a high concurrency coupled with a proper value of results in fast training. In other words, the additional degree of freedom allows the server to update more frequently than concurrency, which enables FedBuff to achieve data efficiency at high concurrency.
At round , users are locally training the model by carrying out local SGD steps. When the local update is done, user sends the difference between the downloaded global model and updated local model to the server. The local update of user sent to the server at round is given by
where is the latest round index when the global model is downloaded by user and is the round index when the local update is sent to the server, hence the staleness of user is given by . denotes the local model after local SGD steps and the local model at user is updated as
for , where , denotes learning rate of the local updates. denotes the stochastic gradient with respect to the random sampling on user , and we assume for all where is the local loss function of user defined in (1). The server stores the received local updates in a buffer of size . When the buffer is full, the server updates the global model by subtracting the aggregate of all local updates from the current global model. Specifically, the global model at the server is updated as
where is an index set of the users whose local models are in the buffer at round and is the learning rate of the global updates. is a function that compensates for the staleness satisfying and is monotonically decreasing as increases. There are many functions that satisfy these two properties and we consider a polynomial function as it shows similar or better performance than the other functions e.g., Hinge or Constant stale function .
Privacy and Dropout Model. We assume at most users may dropout in any round and a threat model where the users and the server are honest but curious who follow the protocol but try to infer the local updates of the other users. The secure aggregation protocol guarantees that nothing beyond the aggregate of the local updates is revealed, even if up to users collude with the server. We consider information-theoretic privacy where from every subset of users of size at most , we must have mutual information , where and are the collection of information at the server and at the users in at round , respectively.
4.1 Incompatibility of SecAgg with Buffered Asynchronous FL
, and generate a random vector by running PRG based on the random seed ofto mask the local update. This additive structure has the unique property that these pairwise random vectors cancel out when the server aggregates the masked models because user adds to and user subtracts from .
In the buffered asynchronous FL, however, the cancellation of the pairwise random masks based on the key agreement protocol is not guaranteed due to the mismatch in staleness between users. Specifically, at round , user sends the masked model to the server that is given by
where is the local update defined in (5). When , the pairwise random vectors in and are not canceled out as . We note that the identity of the staleness of each user is not known a priori, hence each pair of users cannot use the same pairwise random-seed.
4.2 The Proposed BASecAgg Protocol
To address the challenge of asynchrony in the buffered asynchronous secure aggregation, we propose BASecAgg by modifying the idea of one-shot recovery leveraged in LightSecAgg  to our setting. We provide a brief overview of LightSecAgg in Section 3.2. Our key intuition is to encode the local masks in a way that the server can recover the aggregate of masks from the encoded masks via a one-shot computation even though the masks are generated in different training rounds.
BASecAgg has three phases. First, each user generates a random mask to protect the privacy of the local update, and further creates encoded masks via a -private Maximum Distance Separable (MDS) code that provides privacy against colluding users. Each user sends one of the encoded masks to one of the other users for the purpose of one-shot recovery. Second, each user trains a local model and converts it from the domain of real numbers to the finite field as generating random masks and MDS encoding are required to be carried out in the finite field to provide information-theoretic privacy. Then, the quantized model is masked by the random mask generated in the first phase, and sent to the server. The server stores the masked update in the buffer. Third, when the buffer is full, the server aggregates the masked updates in the buffer. To remove the randomness in the aggregate of the masked updates, the server reconstructs the aggregated masks of the users in the buffer. To do so, each surviving user sends the aggregate of encoded masks to the server. After receiving a sufficient number of aggregated encoded masks, the server reconstructs the aggregate of masks and hence the aggregate of the local updates. We now describe these three phases in detail.
4.2.1 Offline Encoding and Sharing of Local Masks
User generates uniformly at random from the finite field , where is the global round index when user downloads the global model from the server. The mask is partitioned into sub-masks denoted by , where denotes the targeted number of surviving users and . User also selects another random masks denoted by . These partitions are then encoded through an Maximum Distance Separable (MDS) code as follows
where is the -th column of a Vandermonde matrix . After that, user sends to user . At the end of this phase, each user has from .
4.2.2 Training, Quantizing, Masking, and Uploading of Local Updates
Each user trains the local model as in (5) and (6). User quantizes its local update from the domain of real numbers to the finite field as masking and MDS encoding are carried out in the finite field to provide information-theoretic privacy. The field size is assumed to be large enough to avoid any wrap-around during secure aggregation.
The quantization is a challenging task as it should be performed in a way to ensure the convergence of the global model. Moreover, the quantization should allow the representation of negative integers in the finite field, and enable computations to be carried out in the quantized domain. Therefore, we cannot utilize well-known gradient quantization techniques such as in , which represents the sign of a negative number separately from its magnitude. BASecAgg addresses this challenge with a simple stochastic quantization strategy combined with the two’s complement representation as described next. For any positive integer , we first define a stochastic rounding function as
where is the largest integer less than or equal to , and this rounding function is unbiased, i.e., . The parameter
is a design parameter to determine the number of quantization levels. The variance ofdecreases as the value of increases, which will be described in Lemma 1 in Appendix A in detail. We then define the quantized update
where the function from (10) is carried out element-wise, and is a positive integer parameter to determine the quantization level of the local updates. The mapping function is defined to represent a negative integer in the finite field by using the two’s complement representation,
To protect the privacy of the local updates, user masks the quantized update in (11) as
and sends the pair of to the server. The local round index will be used in two cases: (1) when the server identifies the staleness of each local update and compensates it, and (2) when the users aggregate the encoded masks for one-shot recovery, which will be explained in Section 4.2.3.
4.2.3 One-shot Aggregate-update Recovery and Global Model Update
The server stores in the buffer, and when the buffer of size is full the server aggregates the masked local updates. In this phase, the server intends to recover
where is the local update in the real domain defined in (5), () is the index set of users whose local updates are stored in the buffer and aggregated by the server at round , and is the staleness function defined in (7). To do so, the first step is to reconstruct . This is challenging as the decoding should be performed in the finite field, but the value of is a real number. To address this problem, we introduce a quantized staleness function ,
where is a stochastic rounding function defined in (10), and is a positive integer to determine the quantization level of staleness function. Then, the server broadcasts information of to all surviving users. After identifying the selected users in , the local round indices and the corresponding staleness, user aggregates its encoded sub-masks and sends it to the server for the purpose of one-shot recovery. The key difference between BASecAgg and LightSecAgg is that in BASecAgg, the time stamp for encoded masks for each can be different, hence user must aggregate the encoded mask with the proper round index. Due to the commutative property of coding and linear operations, each is an encoded version of for using the MDS matrix (or Vandermonde matrix) defined in (9). Thus, after receiving a set of any results from surviving users in , where , the server reconstructs for via MDS decoding. By concatenating the aggregated sub-masks , the server can recover . Finally, the server obtains the desired global update as follows
where is defined in (11) and is the demapping function defined as follows
Finally, the server updates the global model as , which is equivalent to
where and are the stochastic rounding function defined in (10) with respect to quantization parameters and , respectively.
5 Convergence Analysis
In this section, we provide the convergence guarantee of BASecAgg in the -smooth and non-convex setting. For simplicity, we consider the constant staleness function for all in (18). Then, the global update equation of BASecAgg is given by
where is the stochastic round function defined in (10), is the positive constant to determine the quantization level, and is the local update of user defined in (5). We first introduce our assumptions, which are commonly made in analyzing FL algorithms [16, 20, 22, 23].
(Lipschitz gradient). in (1) are all -smooth: for all and , .
(Bounded variance of local and global gradients). The variance of the stochastic gradients at each user is bounded, i.e., for and . For the global loss function defined in (1), holds.
(Bounded gradient). For all , .
In addition, we make an assumption on the staleness of the local updates under asynchrony .
(Bounded staleness). For each global round index and all users , the delay is not larger than a certain threshold where is the latest round index when the global model is downloaded to user .
Now, we state our main result for the convergence guarantee of BASecAgg.
Selecting the constant learning rates and such that , the global model iterates in (19) achieve the following ergodic convergence rate
where , , and .
Theorem 1 shows that convergence rates of BASecAgg and FedBuff (see Corollary 1 in ) are the same except for the increased variance of the local updates due to the quantization noise in BASecAgg. The amount of the increased variance in is negligible for large , which will be demonstrated in our experiments in Section 6.
In this section, we demonstrate the convergence performance of BASecAgg compared to the buffered asynchronous FL scheme from  termed FedBuff. We measure the performance in terms of the model accuracy evaluated over the test samples with respect to the global round index .
. For CIFAR-10 dataset, we train the convolutional neural network (CNN) used in
. These network architectures are sufficient for our needs as our goal is to evaluate various schemes, not to achieve the best accuracy. More details about hyperparameters are provided in AppendixB.
Setup. We consider a buffered asynchronous FL setting with users and a single server having the buffer of size . For IID data distribution, the training samples are shuffled and partitioned into
users. For asynchronous training, we assume the staleness of each user is uniformly distributed over, i.e., , as used in . We set the field size , which is the largest prime within bits.
Implementations. We implement two schemes, FedBuff and BASecAgg. The key difference between two schemes is that in BASecAgg, the local updates are quantized and converted into the finite field to provide privacy of the individual local updates while all operations are carried out over the domain of real numbers in FedBuff. For both schemes, to compensate the staleness of the local updates, we employ the two strategies for the weighting function: a constant function and a polynomial function .
Empirical results. In Figure 1(a) and 1(b), we demonstrate that BASecAgg has almost the same performance as FedBuff on both MNIST and CIFAR-10 datasets while BASecAgg includes quantization noise to protect the privacy of individual local updates of users. This is because the quantization noise in BASecAgg is negligible as explained in Remark 1. To compensate the staleness of the local updates over the finite field in BASecAgg, we implement the quantized staleness function defined in (15) with , which has the same performance to mitigate the staleness as the original staleness function carried out over the domain of real numbers.
Performance with various quantization levels. To investigate the impact of the quantization, we measure the performance with various values of quantization parameter on MNIST and CIFAR-10 datasets in Fig. 2. We can observe that has the best performance while small or large value of has the poor performance. This is because the value of provides a trade-off between two sources of quantization noise: 1) the rounding error from the stochastic rounding function defined in (10) and 2) the wrap-around error when modulo operations are carried out in the finite field. When has small value the rounding error is dominant while the wrap-around error is dominant when has large value. To find a proper value of , we can utilize the auto-tuning algorithm proposed in .
In this paper, we have proposed a buffered asynchronous secure aggregation protocol (BASecAgg) that is not based on TEEs. The independence of TEEs allows BASecAgg to have any buffer size unlike FedBuff. The crux of BASecAgg is that it designs the masks of the users such that they cancel out in the buffer even if they belong to different training rounds. Our convergence analysis and experiments show that BASecAgg almost has the same convergence guarantees as FedBuff.
-  Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic. Qsgd: Communication-efficient sgd via gradient quantization and encoding. In Advances in Neural Information Processing Systems, pages 1709–1720, 2017.
-  James Henry Bell, Kallista A Bonawitz, Adrià Gascón, Tancrède Lepoint, and Mariana Raykova. Secure single-server aggregation with (poly) logarithmic overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pages 1253–1269, 2020.
-  Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for federated learning on user-held data. arXiv preprint arXiv:1611.04482, 2016.
-  Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 1175–1191, 2017.
-  Keith Bonawitz, Fariborz Salehi, Jakub Konečnỳ, Brendan McMahan, and Marco Gruteser. Federated learning with autotuned communication-efficient secure aggregation. In 2019 53rd Asilomar Conference on Signals, Systems, and Computers, pages 1222–1226. IEEE, 2019.
-  Zheng Chai, Yujing Chen, Liang Zhao, Yue Cheng, and Huzefa Rangwala. FedAt: A communication-efficient federated learning method with asynchronous tiers under non-iid data. arXiv preprint arXiv:2010.05958, 2020.
-  Yujing Chen, Yue Ning, Martin Slawski, and Huzefa Rangwala. Asynchronous online federated learning for edge devices with non-iid data. In 2020 IEEE International Conference on Big Data (Big Data), pages 15–24. IEEE, 2020.
-  Victor Costan and Srinivas Devadas. Intel sgx explained. IACR Cryptol. ePrint Arch., 2016(86):1–118, 2016.
-  Ahmed Roushdy Elkordy and A Salman Avestimehr. Secure aggregation with heterogeneous quantization in federated learning. arXiv preprint arXiv:2009.14388, 2020.
-  Matt Fredrikson, Somesh Jha, and Thomas Ristenpart. Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 1322–1333, 2015.
-  Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. Inverting gradients–how easy is it to break privacy in federated learning? arXiv preprint arXiv:2003.14053, 2020.
Bin Gu, An Xu, Zhouyuan Huo, Cheng Deng, and Heng Huang.
Privacy-preserving asynchronous vertical federated learning
algorithms for multiparty collaborative learning.
IEEE Transactions on Neural Networks and Learning Systems, 2021.
-  Swanand Kadhe, Nived Rajaraman, O Ozan Koyluoglu, and Kannan Ramchandran. Fastsecagg: Scalable secure aggregation for privacy-preserving federated learning. arXiv preprint arXiv:2009.11248, 2020.
Alex Krizhevsky and Geoffrey Hinton.
Learning multiple layers of features from tiny images.
Technical report, Citeseer, 2009.
-  Yann LeCun. The MNIST database of handwritten digits. http://yann. lecun. com/exdb/mnist/, 1998.
-  Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and Zhihua Zhang. On the convergence of fedavg on non-iid data. In International Conference on Learning Representations, 2019.
-  Florence Jessie MacWilliams and Neil James Alexander Sloane. The theory of error correcting codes, volume 16. Elsevier, 1977.
-  H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Int. Conf. on Artificial Int. and Stat. (AISTATS), pages 1273–1282, 2017.
Milad Nasr, Reza Shokri, and Amir Houmansadr.
Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning.In 2019 IEEE symposium on security and privacy (SP), pages 739–753. IEEE, 2019.
-  John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Michael Rabbat, Mani Malek Esmaeili, and Dzmitry Huba. Federated learning with buffered asynchronous aggregation. arXiv preprint arXiv:2106.06639, 2021.
-  Jungwuk Park, Dong-Jun Han, Minseok Choi, and Jaekyun Moon. Sself: Robust federated learning against stragglers and adversaries. 2020.
-  Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and H Brendan McMahan. Adaptive federated optimization. arXiv preprint arXiv:2003.00295, 2020.
-  Jinhyun So, Ramy E Ali, Basak Guler, Jiantao Jiao, and Salman Avestimehr. Securing secure aggregation: Mitigating multi-round privacy leakage in federated learning. arXiv preprint arXiv:2106.03328, 2021.
-  Jinhyun So, Başak Güler, and A Salman Avestimehr. Turbo-aggregate: Breaking the quadratic aggregation barrier in secure federated learning. IEEE Journal on Selected Areas in Information Theory, 2(1):479–489, 2021.
-  Stacey Truex, Ling Liu, Ka-Ho Chow, Mehmet Emre Gursoy, and Wenqi Wei. Ldp-fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, pages 61–66, 2020.
-  Marten van Dijk, Nhuong V Nguyen, Toan N Nguyen, Lam M Nguyen, Quoc Tran-Dinh, and Phuong Ha Nguyen. Asynchronous federated learning with reduced number of rounds and with differential privacy from less aggregated gaussian noise. arXiv preprint arXiv:2007.09208, 2020.
-  Cong Xie, Sanmi Koyejo, and Indranil Gupta. Asynchronous federated optimization. arXiv preprint arXiv:1903.03934, 2019.
-  Chien-Sheng Yang, Jinhyun So, Chaoyang He, Songze Li, Qian Yu, and Salman Avestimehr. LightSecAgg: Rethinking secure aggregation in federated learning. arXiv preprint arXiv:2109.14236, 2021.
-  Yizhou Zhao and Hua Sun. Information theoretic secure aggregation with user dropouts. arXiv preprint arXiv:2101.07750, 2021.
-  Ligeng Zhu and Song Han. Deep leakage from gradients. In Federated Learning, pages 17–31. Springer, 2020.
Appendix A Theoretical Guarantees of BASecAgg: Proof of Theorem 1
The proof of Theorem 1 directly follows from the following useful lemma that shows the unbiasedness and bounded variance still hold for the quantized gradient estimator for any .
(Unbiasedness). Given in (10) and any random variable , it follows that,
from which we obtain the unbiasedness condition in (21),
Now, the update equation of BASecAgg is equivalent to the update equation of FedBuff except that BASecAgg has an additional random source, stochastic quantization , which also satisfies the unbiasedness and bounded variance. One can show the convergence rate of BASecAgg presented in Theorem 1 by exchanging and variance-bound in  with and variance-bound , respectively.
Appendix B Experiment Details
In this appendix, we provide more details about the experiments of Section 6.
Hyperparameters. For all experiments, we tune the hyperparameters based on the validation accuracy for each dataset by partitioning of the training samples into the validation dataset. We use mini-batch SGD for all tasks with a mini-batch size of . We select the best parameters for the global learning rate , local learning rate , regularization parameter , and staleness exponent with the following sweep ranges
We have found that the best values of , , and are , , and , respectively for both MNIST and CIFAR-10 datasets. Finally, we have found that the best value of is and for MNIST and CIFAR-10 datasets, respectively.