Wireless Federated Learning with Local Differential Privacy

02/12/2020 ∙ by Mohamed Seif, et al. ∙ 0

In this paper, we study the problem of federated learning (FL) over a wireless channel, modeled by a Gaussian multiple access channel (MAC), subject to local differential privacy (LDP) constraints. We show that the superposition nature of the wireless channel provides a dual benefit of bandwidth efficient gradient aggregation, in conjunction with strong LDP guarantees for the users. We propose a private wireless gradient aggregation scheme, which shows that when aggregating gradients from K users, the privacy leakage per user scales as O(1/√(K)) compared to orthogonal transmission in which the privacy leakage scales as a constant. We also present analysis for the convergence rate of the proposed private FL aggregation algorithm and study the tradeoffs between wireless resources, convergence, and privacy.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

This work was supported by US NSF through grants CAREER 1651492, CNS 1715947, and by the Keysight Early Career Professor Award.

Federated learning (FL) [1] is a framework that enables multiple users to jointly train a learning model. In prototypical FL, a central server interacts with multiple users to train a ML model in an iterative manner as follows: users compute gradients for the ML model on their local data sets, and gradients are subsequently exchanged for model updates. There are several motivating factors behind the surging popularity of FL: a) centralized approaches can be inefficient in terms of storage/computation, and FL provides natural parallelization for training, and can leverage increasing computational power of devices and b) local data at each user is never shared, but only gradient computations from each user are collected. Despite the fact that in F-ML, local data is never shared by a user, even exchanging gradients in a raw form can leak information, as shown in recent works [2, 3, 4].

Motivated by these factors, there has been a recent surge in designing F-ML algorithms with rigorous privacy guarantees. Differential privacy (DP) [5] has been adopted a de facto standard notion for private data analysis and aggregation. Within the context of FL, the notion of local differential privacy (LDP) is more suitable in which a user can locally perturb and disclose the data to an untrusted data curator/aggregator [6]. LDP has been already adopted and used in current applications, including Google’s RAPPOR [7] for website browsing history aggregation, and by Microsoft for privately collecting telemetry data [8]. In the literature, there has been several research efforts to design FL algorithms satisfying LDP [9, 10, 11, 12, 13, 14, 15]. While LDP provides stronger privacy guarantees (compared to a centralized solution), this comes at the cost of lower utility. In particular, to achieve the same level of privacy attained by a centralized solution, significant higher amount of noise/perturbation is needed [16, 17, 18, 19, 20].

Another parallel recent trend is to study the feasibility of FL over wireless channels. As the prototypical computation for FL training involves gradient aggregation from multiple users, the superposition property of the wireless channel can naturally support this operation much more efficiently. This has led to several recent works [21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31] under the umbrella of FL at the wireless edge, where distributed users interact with a parameter server (PS) over a shared wireless medium for training ML models. Several methodologies have been proposed to study wireless FL, which can be broadly categorized into either digital or analog aggregation schemes. In digital schemes, quantized gradients from each user are individually transmitted to the PS using orthogonal transmission. For analog schemes, on the other hand, the gradient computations are rescaled and transmitted directly over the air by all users simultaneously. The superposition nature of the wireless medium makes analog schemes more bandwidth efficient compared to digital ones.

In this paper, we focus on the following question: Can the superposition property of wireless also be beneficial for privacy? If yes, how can we optimally utilize the wireless resources, and what are the tradeoffs between convergence of F-ML training, wireless resources and privacy?

Main Contributions: In this paper, we consider the problem of FL training over a flat-fading Gaussian multiple access channel (MAC), subject to LDP constraints. We propose and study analog aggregation schemes, in which each user transmits a linear combination of a) local gradients and b) artificial Gaussian noise, subject to power constraints. The local gradients are processed as a function of the channel gains to align the resulting gradients at the PS, whereas the artificial noise parameters are selected to satisfy the privacy constraints. We show that the privacy level per user scales as

compared to orthogonal transmission in which the privacy leakage scales as a constant. We also provide the privacy-convergence trade-offs for smooth and convex loss functions through convergence analysis of the distributed gradient descent algorithm. We show that the training error decreases as the number of users increases and converges to the centralized algorithm where all points are available at the PS. To the best of our knowledge, this is the first result on wireless FL with LDP constraints.

2 System Model & Problem Statement

Figure 1:

Illustration of the private wireless FL framework: Users collaborate with the PS to jointly train a machine learning model over a Gaussian MAC. The interaction between the users and the PS must satisfy local differential privacy (LDP) constraints for each user.

Wireless Channel Model: We consider a single-antenna wireless FL system with users and a central PS as shown in Fig. 1. The input-output relationship at time is

(1)

where is the signal transmitted by user at time , and is the received signal at the PS. Here, is the complex valued channel coefficient between the -th user and the PS, and and

is the independent additive zero-mean unit-variance (AWGN) Gaussian noise. The channel coefficients are assume to be time invariant, and each user can transmit subject to maximum power constraint of

. Each user is assumed to know its local channel gains, whereas we assume that the PS has global channel state information.

Federated Learning Problem: Each user has a private local dataset of size data points, denoted as , where is the -th data point and is the corresponding label at user . Users communicate with the PS through the Gaussian MAC described above in order to train a model by minimizing the loss function , i.e.,

where

is the parameter vector to be optimized,

is the loss function for user , and denotes the entire dataset used for training. The minimization of is carried out iteratively through a distributed gradient descent (GD) algorithm. More specifically, in the -th training iteration, the PS broadcasts the global parameter vector from the last iteration to all users. Each user computes his local gradient over the local data points, i.e., and sends back the computed gradient to the PS. For the scope of this paper, we assume that , therefore . The global parameter is updated according to

(2)

where is the learning rate of the distributed GD algorithm at iteration . The iteration process continues until convergence.

In addition, the gradient descent (GD) algorithm for wireless FL should also satisfy local differential privacy (LDP) constraints for each user, as defined next.

Definition 1.

(-LDP [32]) A randomized mechanism is -LDP if for any pair and any measurable subset , we have

(3)

The case of is called pure -LDP.

Problem Statement. The main goal of this paper is to explore the benefits of wireless gradient aggregation for privacy in FL. In addition, we investigate tradeoffs between the convergence rate of GD, wireless channel conditions and resources (such as power, SNR), subject to the privacy budgets of the users.

3 Main Results & Discussions

In this Section, we present a general gradient aggregation scheme for wireless FL, where each user transmits a linear combination of its local gradients and artificial noise. We then specialize this scheme in which the part of transmission containing gradients are designed in a manner so that this component is aligned at the PS. We analyze this scheme and obtain the privacy leakage under LDP for each user, as a function of the wireless channel conditions, and the transmission parameters. Finally, we present the convergence rate of the private FL algorithm, and maximize the convergence rate by optimizing the local perturbations of each user for privacy.

3.1 FL Transmission Scheme over Gaussian MAC

The overall FL scheme consists of training iterations, where each iteration comprises of uses of the wireless channel described in (1). At each iteration , each user transmits the computed gradient vector together with additive Gaussian noise for privacy. In particular, the transmitted signal of user at iteration is given as:

(4)

Here, each user performs local phase correction (i.e., input is multiplied by ) so that the received channel coefficient is non-negative, i.e., . We assume that the gradient vectors have a bounded norm, i.e., , and normalize the gradient vector by . Here, denotes the fraction of power dedicated to the gradient vector , whereas is the fraction of power dedicated to artificial Gaussian noise , whose elements are i.i.d., and drawn from . These parameters satisfy so that the maximum power constraint of is satisfied. From (1) and (4), the received signal at the PS can be written as:

(5)

where is the independent Gaussian noise, whose elements are i.i.d. drawn from

. In order to carry out the summation of the local gradients over-the-air, and receive an unbiased estimate of the true aggregated gradient, all users pick the coefficients

s in order to align their transmitted local gradient estimates. Specifically, user picks so that

(6)

where is a constant. From (6), we obtain , and using the fact that , for all , we can upper bound the constant as follows: . To maximize the signal power of the aligned gradient, we choose to match this upper bound, i.e.,

(7)

Plugging this back in (6), we obtain the choice of as

(8)

The above choice shows that alignment of gradients is effectively limited by the user with the worst effective SNR, i.e., . For the alignment scheme described above, the received signal by the PS in iteration in (5) simplifies to:

(9)

The PS subsequently performs post-processing on as follows:

(10)

where is the effective noise at the PS, and . Thus, we can write . As is zero mean, is an unbiased estimate of , with variance of being equal to .

3.2 Local Differential Privacy Analysis

We next analyze the privacy level achieved by the transmission scheme for each user, as per the definition of LDP. Recall, that the local perturbation noise is drawn from Gaussian distribution. This well-known technique is known as Gaussian mechanism and can provide rigorous privacy guarantees based on LDP, as defined next.

Definition 2.

(Gaussian Mechanism - Appendix A of [32]) Suppose a user wants to release a function of an input subject to -LDP. The Gaussian release mechanism is defined as:

(11)

If the sensitivity of the function is bounded by , i.e., , , then for any , Gaussian mechanism satisfies -LDP, where

(12)

In the next Theorem, we make use of the above result, and present the per-user privacy achieved by the proposed wireless FL scheme as a function of the noise power allocation parameters , transmit powers , and the channel coefficients .

Theorem 1.

For each user , the proposed transmission scheme achieves -LDP per iteration, where

(13)
Proof.

The final received signal at the PS from (9) can be expressed as: . We first observe that the variance of the effective Gaussian noise, i.e., variance of is . In order to invoke the result of the Gaussian mechanism, we next obtain a bound on the sensitivity for user . To bound the local sensitivity of , consider any two different local datasets and at user , while fixing the datasets (and thus the gradients) of the remaining users. The local sensitivity of user can then be bounded as

(14)

where in step (a), we used the fact that , and (b) follows from (7). Hence, using the sensitivity bound in (14) together with the variance in (12), we arrive at the proof of Theorem 1.

Remark 1.

From Theorem 1, we can observe the privacy benefits of wireless gradient aggregation. We can further upper bound the achievable in Theorem 1 as follows:

which shows that asymptotically, the per-user privacy level behaves like . In contrast, privacy achieved by orthogonal transmission can be shown to be:

(15)

which scales as a constant, and does not decay with .

Remark 2.

While Theorem 1 shows the per-iteration leakage, we can use advanced composition results for LDP using the Gaussian mechanism to obtain the total privacy leakage when the wireless FL algorithm is used for iterations. Using existing results in [33], it can be readily shown that the total leakage over iterations (per-user) of the proposed scheme is -LDP for where,

(16)

We illustrate the total per-user privacy leakage as a function of , the number of users in Fig. 2 for various values of . As is clearly evident, the leakage provided by wireless FL goes asymptotically to as .

Figure 2: Total per-user privacy leakage as a function of , number of users for different values of , the number of training iterations.

3.3 Convergence rate of private FL

We next analyze the performance of private wireless FL under the assumption that the global loss function is smooth and strongly convex. Due to privacy requirements and noisy nature of wireless channel, the convergence rate is penalized as shown in the following Theorem.

Theorem 2.

Suppose the loss function is -strongly convex and -smooth with respect to . Then, for a learning rate and a number of iterations , the convergence rate of the private wireless FL algorithm is

(17)

Theorem 2 is proved in Appendix I. We next show that artificial noise parameters can be optimized to maximize the convergence rate in (17) while satisfying a desired privacy level -LDP at each user.

Theorem 3.

The optimized convergence rate of the private wireless FL algorithm is given as follows:

(18)

where where , ,
, and .

Proof.

Maximizing the convergence rate in (17) is equivalent to minimizing the term that depends on . Therefore, we solve the following optimization problem:

For given target privacy levels , this is feasible when

We design as follows:

(19)

where , , and . As seen in Fig. 3, we first rank the left-over powers from the users after aligning the gradients, i.e., in an ascending order. We then allocate the powers such that a subset of users satisfies , to satisfy privacy constraints. This completes the proof of Theorem 3. ∎

Figure 3: An example for the iterative solution: , .

4 Simulation Results

In this Section, we provide some simulation results to assess the performance of private wireless FL model. We consider a linear regression task on a synthetic dataset. The regularized loss function at the

th user is given as:

(20)

Our synthetic dataset consists of 3000 i.i.d. samples drawn from , where , and . We assume that each user has data points. For the GD algorithm, the regularization parameter is and training iterations. The channel coefficients are drawn from , and the channel noise variance is set to . Also, we assume that each user requires the same privacy level -LDP.

In Fig. 4(a), we show the impact of the number of users on the training loss for dBm for all . As we increase the number of users, the training loss decays faster with . In Fig. 4(b), we compare with the private orthogonal scheme for iterations and dBm for all . Interestingly, the non-orthogonal scheme is more efficient in terms of the bandwidth and accuracy. In Fig. 4(c), we show the impact of the transmit power on the training loss where the error decays faster with as we increase the transmit power.

Figure 4: Impact of a) number of users, b) orthogonal vs non-orthogonal transmission, and c) transmit power, on the training loss as a function of iterations. As we see from the figures, as increases, the variance term due to the local privacy perturbation and the noisy channel becomes dominant.

5 Conclusion & Future Directions

We studied the problem of wireless federated learning subject to local differential privacy (LDP) constraints. We showed that the wireless channel provides a dual benefit of bandwidth efficiency together with strong LDP guarantees. Using the proposed wireless aggregation scheme, privacy leakage was shown to scale as compared to orthogonal transmission in which the privacy leakage scales as a constant. We also analyzed and optimized the convergence rate of the proposed private FL training algorithm and studied the tradeoffs between wireless resources, convergence, and privacy.

There are several interesting directions for future work, such as generalization to multiple-antennas at the users and the PS. In the proposed scheme, all users align their gradients, which limits the effective SNR by a user with the worst channel conditions. A possible direction would be to explore generalizations of this scheme, by selecting and aligning gradients from a smaller subsets of users.

Appendix I: Proof of Theorem 2

To prove the convergence rate of the proposed algorithm, we recall that the gradient estimate at the PS in (10) satisfies: (a) Unbiasedness, i.e.,

, since the total additive noise is zero mean; and (b) Bounded second moment,

, which we prove as follows:

(21)

where (a) follows from the fact that , (b) follows from Cauchy-Schwarz inequality, and (c) from the assumption that , i.e., the Lipschitz constant . We next invoke standard results [34] on convergence of SGD for -smooth and -strongly convex loss, which states

(22)

Plugging from (21) in (22), we arrive at Theorem 2.

References