To perform intelligent big-data analysis driven using client-generated data from mobile devices while preserving the privacy of the clients, federated learning (FL)[1, 2] has attracted considerable attention. FL is a distributed machine-learning (ML) technique that enables clients to learn a shared ML model collaboratively while keeping their local data on their devices. Specifically, each client trains an ML model locally and uploads the model parameters instead of uploading their local data to a central server. The central server aggregates the model parameters and forms a global model. In FL, clients are not required to expose their private data to the central servers, which is beneficial in terms of client privacy. This is unlike conventional MLs that acquire and store the training data in central servers, wherein clients’ private data can be exposed to the central servers.
However, despite these promising benefits, FL involves the following two challenges: First, uploading model parameters requires connectivity of many clients via wireless multiple-access channel with limited radio resources, resulting in considerable upload latencies[1, 2]. Second, despite preventing the exposure of clients’ private data, FL still faces the risk of privacy leakage. This happens because the model parameters uploaded by clients may still be informative[3, 4, 5], and the clients’ private data may be inferred by malicious central servers from the aggregated global model parameters. These challenges call for the design of low-latency multiple-access schemes that also preserve privacy of clients in the sense that client private data cannot be inferred by such inference attacks.
Regarding the first issue of upload latency over wireless multiple-access channels, over-the-air computation (AirComp) has emerged as a promising approach[6, 7, 8, 9, 10]. AirComp typically computes the desired functions by exploiting the wireless superposition nature yielded from the simultaneous transmission of analog-modulated signals. This simultaneous transmission leads to dramatic latency reduction when compared to the conventional orthogonal access techniques. In the FL context, the aggregation of model parameters transmitted by clients can be performed using AirComp. Typically, these existing studies aim at minimizing the noise perturbation injected to the computation results of global model parameters. For example, this goal can be achieved by enhancing the received signal-to-noise ratio (SNR) using user scheduling  or by minimizing the mean squared error (MSE) between the desired computation results and received symbols through transmit power control, analog beamforming, multiple-input–multiple-output (MIMO) beamforming, or phase shifting by reflecting surfaces
. However, the minimization of noise perturbation and the consequent accurate estimation of the desired computation results still include risks from the aforementioned inference attacks performed by malicious central servers in base stations (BSs) or access points in the AirComp-based FL.
To tackle these two challenges jointly, we aim to design an AirComp-based FL that is secure against the aforementioned inference attacks. This goal is achieved by harnessing the perturbation from the inherent receiver noises, instead of minimizing the noise perturbations, thereby preserving differential privacy. In a nutshell, differential privacy is a privacy notion quantified by the difference in possible outcomes of data aggregations (e.g., calculating an arithmetic mean) performed with or without each individual’s data. The smaller difference indicates that each individual’s contribution is smaller, which makes it harder for adversaries to infer data of each individual. Hence, the smaller difference is interpreted as a higher privacy level. In the FL context, designing a differentially private model aggregation with a higher privacy level makes it harder for malicious model aggregators to infer clients local models, resulting in the privacy protection from the aforementioned inference attacks[12, 13, 14, 15]. Recent studies designed a differentially private FL in non-AirComp by injecting artificial noises to aggregated global model parameters. However, unlike these studies, our objective is to design an AirComp-based FL that preserves differential privacy by using the inherent receiver noises. Therein, the differential privacy level depends on the configurable wireless parameters, e.g., transmit powers, and hence, designing such parameters is necessary, which is an essential difference from designing differential private FL in non-AirComp.
The main contributions of this paper are as follows:
We design AirComp-based FL that achieves data security against the aforementioned privacy inference attacks, based on the following two ideas. The first idea is preserving a higher level of differential privacy by harnessing the perturbation from receiver noises inherently injected into aggregated global models. However, utilizing receiver noises poses the challenge of injecting an appropriate noise perturbation level to achieve a desired privacy level, because the variance of receiver noises is often uncontrollable. Motivated by this, the second idea is to intentionally adjust the received signal level by controlling the transmit powers, thereby controlling the perturbation levels effectively so that the desired privacy level is achieved. Through numerical evaluations, we demonstrate that the designed transmit power control achieves a higher privacy level compared to a conventional power control while exhibiting comparable training performance.
To obtain a fuller understanding of the performance of the designed AirComp-based FL, we derive a closed-form expression that presents the relationship between the essential performance metrics: received SNR and differential privacy level. The analytical results demonstrate the following two facts: (i) there is a challenging tradeoff between the received SNR and differential privacy level; (ii) under a constraint wherein a higher privacy level is desired, the number of participating clients will be a major configurable parameter that can enhance the received SNR. The analytical results are verified through numerical evaluations.
The remainder of this paper is organized as follows: In Section II, we provide a system model. In Section III, we design a transmit power control scheme to preserve the differential privacy with a target privacy level in the AirComp-based FL model aggregation. In Section IV, we analytically derive the received SNR and investigate the tradeoffs between the received SNR and privacy level. In Section V, we present a numerical evaluation to verify our analytical results. In Section VI, we present our concluding remarks.
Ii System Model
Ii-a Federated Learning
We consider an FL system comprising clients and one BS. A shared model represented by parameters is trained cooperatively across all clients with their own local datasets, each of which is denoted by , where denotes the index of clients . The objective of this learning is to train the parameters
to minimize the model error, termed loss function, thereby, obtaining a good approximation for the target labels:
where . In (1), denotes the loss function with respect to the samples quantifying the error between the model output and the target label with respect to the input sample . The straightforward approach to learn is to gather the local datasets in the BS and minimize with respect to the parameters using, for example, the stochastic gradient decent method. However, this exposes the local datasets to the BS, which is not desirable owing to privacy concerns.
Alternatively, FL can learn in a distributed manner keeping the local datasets in each client. Specifically, FL minimizes the local loss function defined as and integrates the learned parameter updates by, for example, taking a weighted average of the parameters updates. The detailed procedure is as follows: First, the BS distributes the shared parameters across all clients. Subsequently, each client computes the parameter updates to minimize the local loss function. Finally, the BS integrates the parameter updates uploaded by the clients by, for example, taking a weighted average as:
where for denotes the weights associated to client . Typically, the weights are set to be the data size of each client, i.e., for . This procedure is termed round and is iterated until the terminate conditions are satisfied, e.g., model performance converges or a predefined number of rounds has been reached.
Ii-B AirComp for FL Model Aggregation
In AirComp-based FL, the BS performs model aggregation by exploiting the superposition nature of the wireless signals transmitted by all clients, as illustrated in Fig. 1. Let the local update computed by client in a typical round be denoted as . We consider that transmission time is slotted as in time division multiple access (TDMA) channel in a cellular network, and that each element of the local update is transmitted in each time slot. The clients modulate the symbol for with analog-amplitude modulation, wherein modulated symbols are constant over a time slot. We also consider that all clients perform a slot-level synchronization through a synchronization channel as in timing advance in LTE systems. Therein, all clients transmit the modulated signals simultaneously to exploit the superposition nature of the wireless signals. The modulated signal in the client is given by
where denotes the central frequency of the modulated signal and denotes the power control policies to prohibit the transmit power from exceeding the predefined maximum value of . Hereinafter, we consider the computation for one time slot and skip the notation indicating the time slot , for ease of notation.
We consider that the modulated signal is transmitted via a Rayleigh fading and additive white Gaussian noise channel, and that it is detected by a matched-filter procedure for the transmitted signal in (3). Given the distance between the client and BS, , the product of the transmit and receive antenna gains, , and the path loss exponent , the received symbol is given by
where is the path loss for a reference unit distance, is the receiver noise, and is the fading channel gain.
The client-side transmission power strategy is given such that we obtain the desired computation from the received symbol. The solution is the channel inversion preprocessing with the maximum power constraint . In the channel inversion, is set as , where is constant for all clients and is termed the power-scaling factor. Considering that the maximum transmit power constraint should be satisfied across all clients, the power-scaling factor should be
Given the channel inversion, the received symbol at the BS results in
We take the real part of the received symbol and multiply the inversion of to obtain the desired model update aggregation . The computation results are as follows:
where is the real part of the receiver noise.
Iii Transmit Power Control in AirComp FL for Differential Privacy
We design the transmit power control for an AirComp-based FL to achieve differential privacy. Prior to providing details, we note the definition of differential privacy. Differential privacy is defined to characterize randomized mechanisms that output a desired value computed over a dataset perturbed with a random noise. The definition of differential privacy is as follows:
(Differential Privacy) A randomized mechanism is -differentially private if for any pair of adjacent dataset 111In the FL context, the notion of “adjacent” means that one dataset can be formed by adding or removing all examples associated with a single client from the other dataset. and and any sort of possible outcome , we obtain
In the above definition, is the set of all possible outcomes of . Note that the values and represent the similarity in the distribution of the outcomes of the randomized mechanisms performed over the datasets and , and are interpreted as a privacy level. Lower and account for a higher privacy level.
The computation in (7) is also a randomized mechanism, and the noise perturbation can be controlled by the power-scaling factor . Hence, an appropriate design of the power-scaling factor realizes the -differential privacy. We set the power-scaling factor so that differential privacy with the privacy level and can be preserved in the following discussion. Note that as the power-scaling factor and transmit power are proportional to each other, we use “setting the power scaling factor” and “transmit power control” interchangeably hereinafter.
Iii-a Update Clipping
In the computation discussed above, there is no limit to the query sensitivity, which is crucial to preserve the differential privacy. The sensitivity is generally referred to as the maximum effect of the attendance of one entity holding data on the desired computation results. In the case of the computation in (7), the query sensitivity is given by
A solution to this issue is the application of update clipping, which was proposed in  for non-AirComp FL. In update clipping, the weighted local update in each client is bounded by a threshold , as follows:
Iii-B Power-Scaling Factor Constraint for Differential Privacy
Based on the aforementioned update clipping, we adaptively choose the power-scaling factor with the objective of the computation in (7) being -differentially private with the target privacy level of and . First, we provide the condition that the computation in (7) is -differentially private, as follows:
(Power-scaling factor constraint for differential privacy) Given the target privacy level and , the computation in (7) is -differentially private if
From the reproducibility of the Gaussian distribution, the second term in (7. Given the sensitivity of the query bounded by , the computation in (7) is -differentially private if:
By solving for , we obtain (11). ∎
By setting the power-scaling factor to satisfy both (5) and (11), we can achieve -differential privacy in the computation in (7). Note that the scaling factor is set to be the maximum value that satisfies these two constraints, to enhance the received SNR.
(Transmit power control preserving differential privacy) If we set the power-scaling factor as given by:
then, the computation in (7) is -differentially private.
Iii-C Modified Transmit Power Control Anonymizing Client Updates
While setting the power-scaling factor as in (12) to perform channel inversion, each client requires the information on the client contribution for all clients, because the power-scaling factor in (12) is set according to . However, this is contrary to our objective of anonymizing the clients’ contributions.
Motivated by this, we modify the strategy for setting the scaling factor satisfying the maximum power constraint (5) such that information on is not required. In update clipping, we have for all clients, and then . Hence, if we set such that the following condition is satisfied, the constraint (5) will be satisfied for all clients:
(Transmit power control preserving differential privacy without information on for all clients) If we set the power-scaling factor as , given by
then, the computation in (7) will be -differentially private.
Iv Analysis of Tradeoff between SNR and Privacy Level
In the designed transmit power control, there is a tradeoff between the received SNR and privacy level , and we derive this tradeoff analytically. As a higher received SNR generally leads to a better performance of FL models, this analytical result provides an insight into the difficulty of enhancing the model performance learned in the designed AirComp while preserving the differential privacy with a higher privacy level.
The tradeoff between the received SNR and privacy level can be obtained by deriving the SNR bound in the designed transmit power control, as follows:
(SNR-privacy-level tradeoff) The received SNR in the designed transmit power control is bounded by
See Appendix. ∎
The SNR bound in (15) decreases monotonously as the target privacy level increases (i.e., and decrease). This indicates the tradeoff between the received SNR and privacy level exactly.
It is also remarkable that the number of clients is the most important factor for enhancing the received SNR while preserving a higher privacy level, i.e., smaller values of and . This fact can be expressed using the first-order Taylor approximation of the upper bound for the received SNR in (15). Let the upper bound of the received SNR be denoted by . For , the upper bound of the received SNR can be approximated by:
Clearly, the received SNR depends only on the number of clients. Hence, the number of clients is a key factor for realizing a higher SNR while preserving a higher privacy level.
V Numerical Evaluation
Clients and BS setting: The evaluation is performed under the condition that the clients are placed depart from a BS, at a distance of 100 m, i.e., m for .
Both the clients and BS are equipped with omni-antennas, and hence, dBi.
The central frequency is considered to be 5.0 GHz.
The variance of the receiver noise is dBm.
The path loss exponent is .
The path loss for a reference unit distance is dB.
Data set: We use the well-known MNIST dataset that consists of 10 categories of hand-written digits ranging from “0” to “9”. In the MNIST dataset, the total number of training data samples is 60000. The training data samples are randomly partitioned into equal shares, wherein each client holds training data samples.
The classifier model is implemented using two-layer fully connected neural networks. The number of units in each layer is 512. The last fully connected layer is followed by a softmax output layer. The training is performed by minimizing the loss function that is a categorical cross-entropy in this setting. As the optimizer, we use an Adam optimizer with the learning rate of , decaying rate parameters and , and batch size 32. The model aggregation is performed per epochs for stochastic gradient descent steps. The clipping threshold is set as . The privacy level is set as .
Validity of analytical result for received SNR bound: We validate the analytical results of the upper bound of the received SNR. Fig. 2 shows the SNR bound in (15) for the number of clients equal to five and 100 and received SNR measured in the simulation for . The maximum transmit power is set as dBm. Fig. 2 validates the analysis in the sense that the derived bound for the received SNR is indeed an upper bound of the received SNR measured in the simulation. Moreover, Fig. 2 demonstrates that a lower results in a lower received SNR, and this tendency coincides with the analytical upper bound of the received SNR. This result also validates our statement: there is a tradeoff between the received SNR and differential privacy level.
We validate the analytical results in (1), i.e., for a higher target privacy level, the number of clients participating in the computation has a significant impact on the received SNR, relative to other wireless factors. Fig. 3 shows the received SNR in the simulations, along with the analytical results in (1), under the condition of . As an example, Fig. 3 provides the received SNR for the two maximum transmit powers , i.e., 10 dBm and 30 dBm. We can see that, as the number of clients increases, the received SNR increases, along with the analytical SNR bound in (1). When compared to the increase in the number of clients, the increase in the maximum transmit power does not have an impact on the received SNR. These facts validate our statement: for a higher target privacy level, the number of clients is a key factor that enhances the received SNR.
Training performance: We demonstrate that our designed power control can achieve a higher differential privacy level while exhibiting comparable training performance relative to a conventional power control that sets the transmit power to be maximum in the constraint (5), i.e., setting the power scaling factor as . Fig. 4 shows the training performance and differential privacy level of our designed power control and conventional power control222The privacy level in the conventional power control is calculated with Lemma 1 by substituting the average power scaling factor into (11) and then by solving for the privacy level . for number of clients equal to and in the maximum transmit power dBm. Note that in the designed power control, the target privacy level is set as . The conventional power control does not achieve the target privacy level in the designed power control, i.e., , which shows the superiority of the designed power control to conventional one in terms of differential privacy levels. Regarding the training performance, the designed power control exhibits much poorer performance than the conventional one, owing to the abovementioned tradeoff, i.e., a lower received SNR for achieving the privacy level of when the number of clients is equal to . Meanwhile, when the number of clients is equal to , the training performance in the designed power control is closer to that in the conventional one, owing to the enhanced received SNR.
We designed AirComp-based FL preserving differential privacy with a desired privacy level to protect local data of clients against privacy inference attacks. To this end, we considered the use of inherent receiver noises and designed transmit power control to control the level of noise perturbation to the aggregated global model, ensuring a desired privacy level could be achieved. To gain insight into the challenges to achieve a higher privacy level, we derived a closed-form expression of SNR w.r.t. the privacy level and quantified the tradeoff between these two metrics. Moreover, the analytical results demonstrate that the number of participating clients is the major factor to enhance the SNR under the tradeoff in particular when a higher privacy level is desired. These analytical results were verified through numerical evaluations. The evaluation results also demonstrated the feasibility of the designed power control achieving a performance comparable to that of a conventional power control while achieving a higher differential privacy level.
[Proof of Proposition 2] From (6), the received SNR is given by
where and . The inequality stems from the fact that, in the update clipping in (10), .
In what follows, we derive . To this end, we first prove the following fact: given that
follows an exponential distribution with unit mean,follows an exponential distribution with a mean of
. The proof is obtained by deriving the complementary cumulative distribution function (CCDF) of. Given , we have
which is exactly the CCDF of an exponential distribution with mean .
This work was supported in part by JSPS KAKENHI Grant Numbers JP17H03266 and JP18H01442.
-  H. B. McMahan, E. Moore, D. Ramage, S. Hampson et al., “Communication-efficient learning of deep networks from decentralized data,” in Proc. AISTATS 2017, Fort Lauderdale, FL, USA, Apr. 2017, pp. 1–11.
-  P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” arXiv preprint arXiv:1912.04977, Dec. 2019.
-  N. Carlini, C. Liu, J. Kos, Ú. Erlingsson, and D. Song, “The secret sharer: Measuring unintended neural network memorization & extracting secrets,” arXiv preprint arXiv:1802.08232, Jul. 2018.
-  L. Melis, C. Song, E. De Cristofaro, and V. Shmatikov, “Exploiting unintended feature leakage in collaborative learning,” in Proc. IEEE S&P 2019, San Francisco, CA, USA, May 2019, pp. 691–706.
-  M. Fredrikson, S. Jha, and T. Ristenpart, “Model inversion attacks that exploit confidence information and basic countermeasures,” in Proc. ACM SIGSAC 2015, Denver, CO, USA, Oct. 2015, pp. 1322–1333.
-  G. Zhu, Y. Wang, and K. Huang, “Broadband analog aggregation for low-latency federated edge learning,” IEEE Trans. Wireless Commun., vol. 19, no. 1, pp. 491–506, Oct. 2019.
-  K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning via over-the-air computation,” IEEE Trans. Wireless Commun., vol. 19, no. 3, pp. 2022–2035, Mar. 2020.
-  X. Cao, G. Zhu, J. Xu, and K. Huang, “Optimal power control for over-the-air computation,” in Proc. IEEE GLOBECOM 2019, Hawaii, HA, USA, Dec. 2019, pp. 1–6.
-  D. Wen, G. Zhu, and K. Huang, “Reduced-dimension design of MIMO AirComp for data aggregation in clustered IoT networks,” in Proc. IEEE GLOBECOM 2019, Hawaii, HA, USA, Dec. 2019, pp. 1–6.
-  T. Jiang and Y. Shi, “Over-the-air computation via intelligent reflecting surfaces,” in Proc. IEEE GLOBECOM 2019, Hawaii, HA, USA, Dec. 2019, pp. 1–6.
-  C. Dwork, A. Roth et al., “The algorithmic foundations of differential privacy,” Found. Trends Theor. Comput. Sci., vol. 9, no. 3–4, pp. 211–407, Aug. 2014.
-  H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang, “Learning differentially private recurrent language models,” in Proc. ICLR 2018, Vancouver, Canada, Apr. 2018, pp. 1–14.
-  R. C. Geyer, T. Klein, and M. Nabi, “Differentially private federated learning: A client level perspective,” in Proc. NeurIPS Workshops 2017, Long Beach, CA, USA, Dec. 2017, pp. 1–7.
M. Hao, H. Li, G. Xu, S. Liu, and H. Yang, “Towards efficient and privacy-preserving federated deep learning,” inProc. IEEE ICC 2019, Shanghai, China, May 2019, pp. 1–6.
-  S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, R. Zhang, and Y. Zhou, “A hybrid approach to privacy-preserving federated learning,” in Proc. ACM AISec 2019, London, UK, Nov. 2019, pp. 1–11.
-  “Timing in advance.” [Online]. Available: http://4g5gworld.com/blog/timing-advance-ta-lte
-  A. Goldsmith, Wireless Communications. Cambridge University Press, 2005.
-  I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. ICML 2013, Atlabta, GA, USA, Jun. 2013, pp. 1139–1147.