Federated Generative Adversarial Learning

05/07/2020
by   Chenyou Fan, et al.
0

This work studies training generative adversarial networks under the federated learning setting. Generative adversarial networks (GANs) have achieved advancement in various real-world applications, such as image editing, style transfer, scene generations, etc. However, like other deep learning models, GANs are also suffering from data limitation problems in real cases. To boost the performance of GANs in target tasks, collecting images as many as possible from different sources becomes not only important but also essential. For example, to build a robust and accurate bio-metric verification system, huge amounts of images might be collected from surveillance cameras, and/or uploaded from cellphones by users accepting agreements. In an ideal case, utilize all those data uploaded from public and private devices for model training is straightforward. Unfortunately, in the real scenarios, this is hard due to a few reasons. At first, some data face the serious concern of leakage, and therefore it is prohibitive to upload them to a third-party server for model training; at second, the images collected by different kinds of devices, probably have distinctive biases due to various factors, e.g., collector preferences, geo-location differences, which is also known as "domain shift". To handle those problems, we propose a novel generative learning scheme utilizing a federated learning framework. Following the configuration of federated learning, we conduct model training and aggregation on one center and a group of clients. Specifically, our method learns the distributed generative models in clients, while the models trained in each client are fused into one unified and versatile model in the center. We perform extensive experiments to compare different federation strategies, and empirically examine the effectiveness of federation under different levels of parallelism and data skewness.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

02/18/2022

PerFED-GAN: Personalized Federated Learning via Generative Adversarial Networks

Federated learning is gaining popularity as a distributed machine learni...
01/07/2022

Multi-Model Federated Learning

Federated learning is a form of distributed learning with the key challe...
12/03/2018

Beyond Inferring Class Representatives: User-Level Privacy Leakage From Federated Learning

Federated learning, i.e., a mobile edge computing framework for deep lea...
11/22/2019

Parallel Distributed Logistic Regression for Vertical Federated Learning without Third-Party Coordinator

Federated Learning is a new distributed learning mechanism which allows ...
01/22/2020

Data Selection for Federated Learning with Relevant and Irrelevant Data at Clients

Federated learning is an effective way of training a machine learning mo...
08/02/2021

Information Stealing in Federated Learning Systems Based on Generative Adversarial Networks

An attack on deep learning systems where intelligent machines collaborat...
09/29/2017

A Study of Cross-domain Generative Models applied to Cartoon Series

We investigate Generative Adversarial Networks (GANs) to model one parti...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Figure 1: The task of generative learning under federated learning scheme. To preserve data privacy, remote devices exchange only model weights with a central server periodically to learn a global model. No data exchange would happen during any stage of communications.

Traditional machine learning methods require to gather training data into a central database and perform centralized training. However, as there are more and more edge devices such as smartphones, wearable devices, sensors, and cameras connecting to the World Wide Web, the data for training a model might be spread on various equipment. Due to privacy concerns, it might not be possible to upload all the data needed to a central node through public communications. How to safely access data on these heterogeneous devices to effectively train models has become an open research problem. Recently, federated learning has become a rapidly developing topic in the research community 

[19, 33, 16], as it provides a new way of learning models over a collection of highly distributed devices while still preserving data privacy and communication efficiency. Federated learning has witnessed many successful applications in distributed use cases such as smartphone keyboard input prediction [7], health monitoring [21, 29], IoT [32], and blockchain [13].

Although federated learning has been applied successfully on discriminative model learning, how to apply it on generative learning is still under exploration. Generative Adversarial Network (GAN) [6]

is one typical type of the generative models which aims to gain generative capacities based on game theory and deep learning techniques. Under the traditional machine learning framework, GANs have achieved huge successes in applications such as realistic images/videos generation 

[6, 25, 12], face editing [8], and style transferring [28]. Though many efforts [19, 33, 18] have been made in evaluating the performance of classification tasks with federated learning, there is little work of assessing whether existing federated learning framework works on generative learning or not. However, we observe in many real-world cases that data are naturally distributed for generative learning, e.g., hand-written digits and signatures are stored in thousands of millions of mobile devices, facial images are stored in edge devices and IoT participants, etc. It is urgent and necessary to understand whether the federated learning scheme is suitable for learning GANs.

In this paper, we propose a novel method of using the federated learning framework in GAN training and discuss four strategies of synchronizing the local models and central model. We will quantitatively evaluate the effectiveness of each strategy. Furthermore, we will extensively study the GAN training quality under different data distribution scenarios and examine whether federated GAN training is robust to non-IID data distribution. Our contributions include

  • We formulate the federated generative adversarial learning outline with algorithm details, which is the first work in this direction to the best of our knowledge.

  • We propose and compare four synchronization strategies for unifying local Generators and Discriminators to central models.

  • We extensively study the training quality with different data distributions of different datasets under our framework.

2 Related work

Recently, federated learning has become a rapidly developing topic in the research community [19, 33, 16], as it provides a new way of learning models over a collection of highly distributed devices while still preserving data privacy and communication efficiency. Federated learning has witnessed many successful applications in distributed use cases such as smartphone keyboard input prediction [7], health monitoring [21, 29], IoT [32], and blockchain [13].

Model averaging has been widely used in distributed machine learning  [30, 5, 31]

. In distributed settings, each client minimizes a shared learning target (in most cases, a loss function) on their local data, while the server aggregates clients’ models by computing a uniform or bootstrap average of local model weights to produce the global model. McMahan 

et al. [19]

extended this learning strategy to the federated learning setting, in which data could be non-IID, and communications between clients and servers could be constrained. They proposed the FedAvg method to fuse local client models into a central model, and demonstrated its robustness of applying on deep learning models such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) with IID and moderately non-IID data. Zhao 

et al. [33] discussed that FedAvg might suffer from weight divergence on highly skewed data, and several other works made attempts to propose robust federated learning in such cases [24, 17]. Recent work [18] also discussed how to further improve the safety of communications during federated training using Additively Homomorphic Encryption [1].

Generative Adversarial Network (GAN) [6] aims to learn generative models based on game theory and deep learning techniques. Since its origin, GANs have witnessed huge successes in applications like generating realistic images [6, 12, 2] and videos [25]

in computer vision areas. Conditional GAN (cGAN) 

[20] is a natural extension of GAN which aims to generate images with given labels or attributes, such as human genders [23], image categories [11] and image styles [12].

To our best knowledge, the only similar work is from a technical report [3] which conceptually mentioned the possibility of using federated learning ideas in generative tasks. However, no further details were provided in this article. Besides, a seemingly related work [9] studied a type of adversarial attack under a collaborative learning environment. Their main purpose is to demonstrate that, by manipulating local training data, attackers could generate adversary training samples that harm the learning objective of a normal user. This is entirely different to the federated learning setting that no unsafe local data exchange should happen during client-client or client-server communications.

3 Approach

We consider distributed GAN training on one center and a group of clients with the common communication-efficient federated learning framework [19, 15]. Commonly, each client device possesses its local data with (usually) biased data distribution. E.g., personal devices are mostly used to take portraits, while surveillance cameras are often used to monitor street views. We aim to train a unified central GAN model with the combined generative capacities of each client model. Yet we prohibit transferring any client data to the center as the communications between clients are costly and unsafe. In the following sections, we will (1) investigate four types of synchronization strategies that arise naturally for federated GAN training, (2) briefly introduce the conditional GAN models’ objective functions and architectures, and (3) summarize our proposed algorithm.

3.1 Synchronization strategies

FedAvg [19] is a widely used federated learning framework which fuses client models to a central model by averaging the model weights. With FedAvg, the clients are required to periodically upload their models (but not data) to the central server for model fusion. FedAvg does not specify whether the central model should be synchronized back to client models. Very recently, FedMA [26] was proposed to improve FedAvg by asking clients to download the updated global model and reconstruct their local models at the beginning of the next training round.

FedAvg and FedMA considered tasks of using federated learning to train a single CNN or RNN model on classification tasks. In our study, however, training federated GANs is more complicated, as two models (generator and discriminator) exist at the center and clients. How to synchronize central D and G back to clients like FedMA effectively becomes an open question. We propose four types of synchronization strategies during communications. Sync D&G synchronizes both the central model of D and G to each client. Sync G synchronizes central G model to each client. Sync D synchronizes central D model to each client. Sync None synchronizes neither G or D from the center to clients (i.e., FedAvg). Please see Fig 1 for illustration of the overall process. These approaches still maintain the independence of each client during local training stage, while enable information propagation across clients during synchronization between server and clients’ models.

Figure 2:

Architecture of a Conditional GAN with Generator G and Discriminator D. For G, the sampled noise vector

and a class label embedding vector are fed into a deep neural network to generate a new image. For D, an image together with its label-spanned mask are fed into a deep neural network to predict whether the input image is real or fake.

3.2 Conditional GAN (cGAN) model

In our paper, one important mission is to analyze how data distribution affects GAN training in federated setting. Therefore, we will study conditional GANs (cGANs) [20] which can manipulate the class distribution of generated images, e.g., generate ”horse” images given the horse label. We will simulate different class distributions in clients’ training data. Then we will evaluate the training status of cGANs and analyze the robustness against skewed data distributions.

(1)

In a cGAN, the discriminator (D) and generator (G) play the minimax game with the objective function shown in Equation 1. Intuitively, D learns to criticize the fidelity of given images while G learns to generate fake images with given labels as realistic as possible. In Figure 2, we demonstrate the typical architecture of D and G with Convolutional Neural Network structures. G takes a noise vector and an additional label to conditionally generate an image with given label. For hand-written digit generation, the given labels indicate the digits from 0 to 9 to be generated. For the task of natural image generation, the provided labels indicate the image classes, e.g., on CIFAR-10, the classes are plane, car, bird, cat, deer, dog, frog, horse, ship, truck. D takes an image and its conditional label to predict whether it is real or fake. The conditional label is expanded to image size and attached along image channels.

Our implementations of D and G networks follow DCGAN’s [23] designs: D consists of four convolutional layers with BatchNorm [10] and LeakyReLU [27]; G consists of four transposed convolutional layers with BatchNorm and LeakyReLU, followed by a function to map features into normalized pixel values between -1 and 1. We alternatively update D by ascending its stochastic gradient

(2)

and update G by descending its stochastic gradient

(3)

in which is a sampled batch of real images with true labels , and are sampled noise vectors and labels. Intuitively, D is distinguishing real images from fake images conditioned on the given labels, while G is attempting to fool D by producing as realistic images as possible given designated labels.

3.3 Algorithm outline

We summarize our algorithm of federated GAN learning as follows. At each communication round, a subset of clients is randomly selected. Each client in the subset trains an updated model of GAN with Eq (2) and (3

) based on their local data. After an epoch of training, the updated parameters of G and D are sent to the server via network communications. The server aggregates client models by weight averaging (or other model fusion techniques) to construct an improved central model. Finally, according to the chosen synchronization strategy mentioned in Section 

3.1, each client pulls back the global model to reconstruct their local model. Each client then performs the next round of local model training. The above steps are repeated until convergence or some stopping criteria is met. The details of the algorithm are shown in Algorithm 1.

We implemented our algorithm and cGANs in PyTorch 

[22]. The network parameters are updated by Adam solver [14] with batch size 64 and a fixed learning rate of . For each experimental setting, we train cGANs for at least 60 epochs with the federating step (communication between the center and clients) happening at the end of every epoch. We will release our source code for boosting further research.

Input: A global GAN model with parameters for Discriminator (D) and Generator (G) on central server ; local GAN models with parameters on clients ; local private data ; Sync_FLAG indicates whether to synchronize central G and/or D back to clients.
Output: Fully trained global GAN model .
1 for communication round  do
2       Select K random clients from all clients for each client in parallel do
3             Update discriminator of client k •  sample a batch of real images with true labels •  sample a batch of noise vectors and labels from •  update by ascending stochastic gradient in Eq (2) Update generator of client k •  sample a batch of noise vectors and labels •  update by descending stochastic gradient in Eq (3)
4      Update central model by averaging client weights                   if Sync_D&G or Sync_D then
5             for each client c in C in parallel do
6                  
7            
8      if Sync_D&G or Sync_G then
9             for each client c in C in parallel do
10                  
11            
12      
return
Algorithm 1 Federated Generative Learning algorithm.

4 Experiments

We demonstrate the federated GAN training results on the MNIST and CIFAR-10 benchmark datasets. We first visualize samples of generated images qualitatively. Then we introduce the metrics to quantitatively evaluate GAN training results. After that we conduct experiments to evaluate performance of different synchronization strategies proposed in Section 3.1. By simulating IID and various non-IID data distributions, we further investigate the efficiency of model training with different data skewness levels. This enables us to probe the robustness of GAN training under federated learning framework.

Figure 3: Samples of generated hand-written digits with different synchronization strategies. From (a) to (d), we (a) synchronize both central D and G model to each client, (b) synchronize only G model to each client while their individual D model is retained, (c) synchronize only D model to each client while their G model is retained, (d) synchronize neither central D or G to any client.

4.1 Visualization

In Fig 3, we show samples of digits generated by GANs trained with different synchronization strategies in Section 3.1. Specifically, in federating step, a sampled collection of clients upload their Gs and Ds to the center. The center fuses their weights to form central model G and D with FedAvg or any other federated learning framework. As a quick remind, strategy (a) Sync D & G will synchronize both central D and G model to each client, (b) Sync G will synchronize central G model back to each client, (c) Sync D will synchronize central D model to each client, (d) synchronize neither central D or G to any client. Obviously, strategy (a) and (b) are visually better than (c) and (d), while (a) and (b) are comparable in image qualities. In Fig 4, we show samples of images generated for CIFAR-10 classes with strategy Sync D & G and Sync G. We again found that these two strategies are comparable in visual quality. Curiously, the overall image quality is not as great as generated digits in Fig 3. This is because of less training samples in CIFAR-10, and more complex patterns in natural images. However, how to improve GAN training with more data or with more capable neural network architecture is out of the scope of this paper. We will focus on how federated learning settings affect GAN training in the rest of the paper.

Figure 4: Generated images of CIFAR-10 classes - from top to bottom - plane, car, bird, cat, deer, dog, frog, horse, ship, truck. We show results of Sync D&G and Sync G strategy.

4.2 Metrics

We use two metrics for measuring performance of cGANs on the image generation task. 1. Classification score (Score)

measures “reality” of a generator by using a pre-trained strong classifier

to classify generated images. In practice, we trained classifiers on MNIST and CIFAR-10, which yield a 99.6% and 90% accuracy on testing sets, respectively. We utilize the classifier as an oracle and apply it on generated samples to provide pseudo ground truth labels. Then we compare the pseudo ground truth labels with the conditional labels which are used to generate those images. The consensus between ground truth labels and conditional labels are taken as classification scores. Intuitively, the more realistic and fidelity the generated images are, the higher scores they will get. 2. Earth Mover’s Distance (EMD). Also known as Wasserstein distance [2], it measures the distance between distribution of real data and generated data . In practice, EMD is approximated by comparing average softmax scores of drawn samples from real data against the generated data such that

(4)

in which are real data samples, are generated data samples, is the oracle classifier mentioned above. EMD measures a relative distance between real data and fake data. Obviously, a better generator should have a lower EMD by producing realistic images closer to real images.

4.3 Result of different training strategies on IID data

In an ideal case, data across the federated clients are independent and identically distributed (IID). We assume the IID condition and assume there are two federated clients. In Figure 5(a), we show training results of all four synchronize strategies in two worker case on MNIST dataset. Congruent with visual intuitions from Fig 3, Sync D&G (purple line) and Sync G (green line) are much better than Sync D (blue line) and Sync None (red line). The Scores and EMDs for the former two strategies are around 0.99 and 0.05. Scores for the latter two are about or lower than 0.8, while EMDs are above 0.4.

Figure 5: Illustrations of training qualities influenced by different synchronization strategies. We show the Scores (higher the better) and EMDs (the lower the better) on MNIST and CIFAR-10 datasets.

In Figure 5(b), we show training results on CIFAR-10 dataset and observe similar trend: Sync D&G significantly outperforms other methods. Sync G comes the second, but is still much better than Sync D and Sync None. A question arises naturally: why is Sync D performing worse than Sync G ? Our explanation is that by synchronizing central D across clients, the discriminative capacity of each client model grows rapidly. Unless we also synchronize G (as Sync D&G does), the capacity of D exceeds G and thus rejects more samples of generated images. This harms and even stops the learning of G. Similar observation has also been reported by [2, 4] in which they found the generator stops training if discriminator reaches optimum too early. Sync D & G or Sync G would avoid this pitfall. In another aspect, by synchronizing G instead of both G and D, the communication costs could be reduced by about half in both upload and download streams. The trade-off between reducing communication costs and increasing training qualities should be considered case-by-case. For real-world applications when communication costs are essentially high, such as edge devices, we recommend to synchronize G to reduce costs while sacrifice some generative capacity. Otherwise, we recommend to synchronize both D and G. In the following experiments, we will synchronize both D and G at default unless otherwise stated.

4.4 Result of training GAN on different numbers of clients with IID data

In this section, we investigate federated GAN training with IID data on different numbers of clients. We build the training set of each client by randomly choosing 50% of the total training samples with replacement to simulate IID data. We conduct three sets of experiments with federated clients. We also compare federated learning with a baseline method (k=1) by training GAN on a single client with same amount of training data, simulating the situation that each client trains on its own data without federation. We show the results in Table 1 on both MNIST and CIFAR-10 dataset.

Workers Num k MNIST CIFAR-10
Optimal
Rounds
    Score
EMD
Optimal
Rounds
    Score
EMD
k = 1 (Local) 35 0.975 0.023 40 0.40 0.51
k = 2 (Fed) 25 0.990 0.004 25 0.428 0.475
k = 4 (Fed) 25 0.993 0.002 30 0.432 0.471
k = 6 (Fed) 30 0.994 0.002 35 0.456 0.457
Table 1: Results of different numbers of federated workers on IID training data. The “Optimal Rounds” column indicates how many communication rounds are needed for central models to reach optimal. Best preforming numbers are highlighted for each column.

First, training GANs on federated clients (Fed) always outperforms training on a single worker (Local) with the same amount of local data. Moreover, we found that with increase in number of clients , the metrics are slightly improving in terms of both Score and EMD on both MNIST (score: 0.99 v.s. 0.993 v.s. 0.994) and CIFAR-10 datasets (scores: 0.428 v.s. 0.432 v.s. 0.456). This indicates that GAN training benefits from more federated workers, given IID training samples over clients. However, we also observed that training with more numbers of workers leads to slower convergence, as a trade-off for performance. On CIFAR-10, it took 25 communication rounds for central models to reach optimal when , while it took 35 rounds when .

4.5 Result of training GAN with non-IID data

Recent research [33, 17] observed that common federated learning methods such as FedAvg is not robust to non-IID data. In this section, we will verify the performance of GAN training with non-IID data of different data skewness levels. Let us suppose a dataset has classes, and there are clients in federation. To simulate non-IID data across clients, we sort the data first by class. For each class, we randomly choose one client to allocate a fraction of the total training samples of that class, and then randomly allocate fraction of samples to other clients. This mimics a realistic scenario that data distribution is skewed across the clients, and the skewness is adjustable by . A larger indicates a higher degree of data skewness. We examine the training quality under different data skewness levels with different numbers of clients.

Workers Num k CIFAR p=0.7 CIFAR p=0.9
Optimal
Rounds
    Score
EMD
Optimal
Rounds
    Score
EMD
k = 2 (Fed) 30 0.40 0.50 30 0.37 0.52
k = 4 (Fed) 35 0.44 0.45 40 0.35 0.57
k = 6 (Fed) 40 0.42 0.48 30 0.31 0.58
Table 2: Results of training cGANs with non-IID data with data skewness level and . We only show the results of CIFAR-10 due to page limits.

In Table 2, we demonstrate the experiment results on CIFAR-10 with and . Results for different are shown in different rows, and and are shown in separate columns. Obviously, the overall performance of is better than , as the overall Score of is above while Score of is less than . This indicates that the more skewed of the data distribution, the less effective federated training of GANs. We also found that a larger number of federated clients is more affected by skewed data distribution. For example, in Table 1 IID case, as well as in Table 2 (p=0.7), outperforms for both cases (IID and moderately non-IID). In contrast, in Table 2 (p=0.9), we found that performs worse than in Score (0.30 v.s. 0.35, higher the better) and EMD (0.60 v.s. 0.55, lower the better) with highly non-IID(p=0.9) data. This accuracy drop can be explained by the weight divergence theory proposed by [33] such that more clients lead to faster divergence of model weights with non-IID training data. We would like to encourage researchers to tackle the problem of federated learning of GANs with non-IID data in future.

5 Conclusion

We presented a comprehensive study of training GAN with different federation strategies, and found that synchronizing both discriminator and generator across the clients yield the best results in two different tasks. We also observed empirical results that federate learning is generally robust to the number of clients with IID and moderately non-IID training data. However, for highly skewed data distribution, the existing federated learning scheme such as FedAvg is performing anomaly due to weight divergence. Future work could further improve GAN training by studying more effective and robust model fusion methods, especially for highly skewed data distribution.

References

  • [1] A. Acar, H. Aksu, A. S. Uluagac, and M. Conti (2018) A survey on homomorphic encryption schemes: theory and implementation. ACM Computing Surveys (CSUR). Cited by: §2.
  • [2] M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein GAN. In ICML, Cited by: §2, §4.2, §4.3.
  • [3] S. Augenstein (2019) Federated learning, diff privacy, and generative models. Note: https://inst.eecs.berkeley.edu/~cs294-163/fa19/slides/federated-learning-in-practice.pdf[Accessed: 03-15-2020] Cited by: §2.
  • [4] D. Bang and H. Shim (2018) Improved training of generative adversarial networks using representative features. In ICML, Cited by: §4.3.
  • [5] J. Chen, X. Pan, R. Monga, S. Bengio, and R. Jozefowicz (2016) Revisiting distributed synchronous sgd. arXiv preprint arXiv:1604.00981. Cited by: §2.
  • [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In NeurIPS, Cited by: §1, §2.
  • [7] A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604. Cited by: §1, §2.
  • [8] Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen (2019) Attgan: facial attribute editing by only changing what you want. IEEE Transactions on Image Processing. Cited by: §1.
  • [9] B. Hitaj, G. Ateniese, and F. Perez-Cruz (2017) Deep models under the gan: information leakage from collaborative deep learning. In ACM CCS, Cited by: §2.
  • [10] S. Ioffe and C. Szegedy (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. ICML. Cited by: §3.2.
  • [11] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017) Image-to-image translation with conditional adversarial networks. In CVPR, Cited by: §2.
  • [12] T. Karras, S. Laine, and T. Aila (2019) A style-based generator architecture for generative adversarial networks. In CVPR, Cited by: §1, §2.
  • [13] H. Kim, J. Park, M. Bennis, and S. Kim (2019) Blockchained on-device federated learning. IEEE Communications Letters. Cited by: §1, §2.
  • [14] D. P. Kingma and J. Ba (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §3.3.
  • [15] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. In NIPS Workshop on Private Multi-Party Machine Learning, Cited by: §3.
  • [16] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith (2019) Federated learning: challenges, methods, and future directions. arXiv preprint arXiv:1908.07873. Cited by: §1, §2.
  • [17] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. In ICLR, Cited by: §2, §4.5.
  • [18] Y. Liu, T. Chen, and Q. Yang (2018)

    Secure federated transfer learning

    .
    arXiv preprint arXiv:1812.03337. Cited by: §1, §2.
  • [19] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. (2017) Communication-efficient learning of deep networks from decentralized data. In AISTATS, Cited by: §1, §1, §2, §2, §3.1, §3.
  • [20] M. Mirza and S. Osindero (2014) Conditional generative adversarial nets. In arXiv preprint arXiv:1411.1784, Cited by: §2, §3.2.
  • [21] A. Pantelopoulos and N. G. Bourbakis (2009) A survey on wearable sensor-based systems for health monitoring and prognosis. IEEE Transactions on Systems, Man, and Cybernetics. Cited by: §1, §2.
  • [22] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer (2017) Automatic differentiation in pytorch. Cited by: §3.3.
  • [23] A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §2, §3.2.
  • [24] F. Sattler, S. Wiedemann, K. Müller, and W. Samek (2019) Robust and communication-efficient federated learning from non-iid data. IEEE transactions on neural networks and learning systems. Cited by: §2.
  • [25] C. Vondrick, H. Pirsiavash, and A. Torralba (2016) Generating videos with scene dynamics. In NeurIPS, Cited by: §1, §2.
  • [26] H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni (2020) Federated learning with matched averaging. arXiv preprint arXiv:2002.06440. Cited by: §3.1.
  • [27] B. Xu, N. Wang, T. Chen, and M. Li (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853. Cited by: §3.2.
  • [28] Z. Yang, Z. Hu, C. Dyer, E. P. Xing, and T. Berg-Kirkpatrick (2018) Unsupervised text style transfer using language models as discriminators. In NeurIPS, Cited by: §1.
  • [29] H. Zhang, J. Li, K. Kara, D. Alistarh, J. Liu, and C. Zhang (2017) Zipml: training linear models with end-to-end low precision, and a little bit of deep learning. In ICML, Cited by: §1, §2.
  • [30] S. Zhang, A. E. Choromanska, and Y. LeCun (2015) Deep learning with elastic averaging sgd. In NeurIPS, Cited by: §2.
  • [31] Y. Zhang, J. C. Duchi, and M. J. Wainwright (2013) Communication-efficient algorithms for statistical optimization. In JMLR, Cited by: §2.
  • [32] Y. Zhao, J. Zhao, L. Jiang, R. Tan, and D. Niyato (2019) Mobile edge computing, blockchain and reputation-based crowdsourcing iot federated learning: a secure, decentralized and privacy-preserving system. arXiv preprint arXiv:1906.10893. Cited by: §1, §2.
  • [33] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. Cited by: §1, §1, §2, §2, §4.5, §4.5.