Brainstorming Generative Adversarial Networks (BGANs): Towards Multi-Agent Generative Models with Distributed Private Datasets

02/02/2020 ∙ by Aidin Ferdowsi, et al. ∙ 11

To achieve a high learning accuracy, generative adversarial networks (GANs) must be fed by large datasets that adequately represent the data space. However, in many scenarios, the available datasets may be limited and distributed across multiple agents, each of which is seeking to learn the distribution of the data on its own. In such scenarios, the local datasets are inherently private and agents often do not wish to share them. In this paper, to address this multi-agent GAN problem, a novel brainstorming GAN (BGAN) architecture is proposed using which multiple agents can generate real-like data samples while operating in a fully distributed manner and preserving their data privacy. BGAN allows the agents to gain information from other agents without sharing their real datasets but by "brainstorming" via the sharing of their generated data samples. Therefore, the proposed BGAN yields a higher accuracy compared with a standalone GAN model and its architecture is fully distributed and does not need any centralized controller. Moreover, BGANs are shown to be scalable and not dependent on the hyperparameters of the agents' deep neural networks (DNNs) thus enabling the agents to have different DNN architectures. Theoretically, the interactions between BGAN agents are analyzed as a game whose unique Nash equilibrium is derived. Experimental results show that BGAN can generate real-like data samples with higher quality compared to other distributed GAN architectures.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 8

page 9

page 10

page 11

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Generative adversarial networks (GANs) are deep neural network (DNN) architectures that can learn a dataset distribution and generate realistic data points similar to this dataset (Goodfellow et al., 2014). In GANs, a DNN called generator generates data samples while another DNN called discriminator tries to discriminate between the generator’s data and the actual data. The interaction between the generator and discriminator results in optimizing the DNN weights such that the generator’s generated samples look similar to the realistic data. Recently, GANs were adopted in several applications such as image synthesis (Li and Wand, 2016)

, anomaly detection

(Ferdowsi and Saad, 2019), text to image translation (Reed et al., 2016), speech processing (Pascual et al., 2017), and video generation (Vondrick et al., 2016).

Similar to many deep learning algorithms, GANs require large datasets to execute their associated tasks. Conventionally, such datasets are collected from the end-users of an application and stored at a data center to be then used by a central workstation or cloud to learn a task. However, relying on a central workstation requires powerful computational capabilities and can cause large delays. On the other hand, such central data storage units are vulnerable to external attacks. Furthermore, in many scenarios such as health and financial applications, the datasets are private and distributed across multiple agents (e.g., end-users) who do not intend to share them. Such challenges motivate parallelism and the need for distributed, multi-agent learning for GANs.

In a distributed learning architecture, multiple agents can potentially learn the GAN task in a decentralized fashion by sharing some sort of information with each other while preserving the privacy of their dataset. The goal of each agent in a distributed GAN would be to learn how to generate high quality real-like data samples. Distributed GAN learning schemes also can also reduce the communication and computational limitations of centralized GAN models making them more practical for large-scale scenarios with many agents.

1.1 Related Works

For deep learning models, several distributed architectures have been proposed to facilitate parallelism using multiple computational units(Dean and Ghemawat, 2008; Low et al., 2012; Dean et al., 2012; Konečnỳ et al., 2016). The authors in (Dean and Ghemawat, 2008) introduced the MapReduce architecture, in different agents aim at mapping the data into a new space reducing the data size. Moreover, GraphLab abstraction was proposed in (Low et al., 2012) to facilitate graph computation across multiple workstations. Furthermore, model parallelism is introduced in (Dean et al., 2012), in which the different layers and weights of a DNN structure are distributed between several agents. Recently, federated learning (FL) was introduced in (Konečnỳ et al., 2016) as an effective distributed learning mechanism that allows multiple agents to train a global model independently on their dataset and communicate training updates to a central server that aggregates the agent-side updates to train the global model. However, the works in (Dean and Ghemawat, 2008; Low et al., 2012; Dean et al., 2012; Konečnỳ et al., 2016) as well as follow ups on FL focus on inference models and do not deal with generative models or GAN.

Recently, in (Hoang et al., 2017; Durugkar et al., 2016; Ghosh et al., 2017; Hardy et al., 2019; Yonetani et al., 2019), the authors investigated the use of distributed architectures that take into account GAN’s unique structure which contains two separate DNNs (generator and discriminator). In (Hoang et al., 2017) and (Durugkar et al., 2016), multiple generators or discriminators are used to stabilize the learning process but not to learn from multiple datasets. In (Ghosh et al., 2017), a single discriminator is connected to multiple generators in order to learn multiple modalities of a dataset and to address the mode collapse problem. In (Hardy et al., 2019), the notion of privacy preserving GAN agents was studied for the first time, using two architectures: a) A multi-discriminator GAN (MDGAN) which contains multiple discriminators each located at every agent that owns private data and a central generator that generates the data and communicates its to each agent, and b) an adaptation of FL called FLGAN in which every agent trains a global GAN on its own data using a single per-agent discriminator and a per-agent generator and communicates the training updates to a central aggregator that learns a global GAN model. In (Yonetani et al., 2019), analogously to MDGAN, a forgiver-first update (F2U) GAN is proposed such that every agent owns a discriminator and a central node has a generator. However, unlike the MDGAN model, at each training step, the generator’s parameters are updated using the output of the most forgiving discriminator.

However, the works in (Dean and Ghemawat, 2008; Low et al., 2012; Dean et al., 2012; Hoang et al., 2017; Durugkar et al., 2016) consider a centralized dataset accessed by all of the agents. Moreover, the solutions in (Konečnỳ et al., 2016) and (Hardy et al., 2019) can cause communication overhead since they require the agents to communicate, at every iteration, the DNN trainable parameters to a central node. Also, none of the GAN architectures in (Hoang et al., 2017; Durugkar et al., 2016; Ghosh et al., 2017; Hardy et al., 2019; Yonetani et al., 2019) is fully distributed and they all require either a central generator or a central discriminator. In addition, the distributed GAN solutions in (Hoang et al., 2017; Durugkar et al., 2016; Ghosh et al., 2017; Hardy et al., 2019; Yonetani et al., 2019) do not consider heterogeneous computation and storage capabilities for the agents.

1.2 Contributions

The main contribution of this paper is a novel brainstorming GAN (BGAN) architecture that enables privacy-preserving agents to learn a data distribution in a fully distributed fashion. In the BGAN architecture, every agent contains a single generator and a single discriminator and owns a private dataset. At each step of training, the agents share their idea, i.e., their generated data samples with their neighbors in order to communicate some information about their dataset without sharing the actual data samples with other agents. As such, the proposed approach enables the GANs to collaboratively brainstorm in order to generate high quality real-like data samples. This is analogous to how humans brainstorm ideas to come up with solutions. To the best of our knowledge, this is the first work that proposes a fully distributed, multi-agent learning architecture for GANs that does not require a central controller. In particular, the proposed BGAN has the following key features:

  1. The BGAN architecture is fully distributed and does not require any centralized controller.

  2. It preserves data privacy for the agents, since they do not share their owned data with one another nor with any central server.

  3. It significantly reduces the communication overhead compared to previous distributed GAN models such as MDGAN, FLGAN, and F2U.

  4. It allows defining different DNN architectures for different agents depending on their computational and storage capabilities.

To characterize the performance of BGAN, we define a game between the BGAN agents and we analytically derive its Nash equilibrium (NE). We prove the uniqueness of the derived NE for the defined game. Moreover, we analyze each agent’s connection structure with neighboring agents and characterize the minimum connectivity requirements that enable each agent to gain information from all of the other agents. We compare the performance of our proposed BGAN with other state-of-the-art architectures such as MDGAN, FLGAN, and F2U and show that BGAN can outperform them in terms of Jensen-Shannon divergence (JSD), Frèchet Inception distance (FID), and communication requirements besides the fact that, unlike the other models, BGAN is fully distributed and allows different DNN architectures for agents.

The rest of the paper is organized as follows. Section 2 describes the multi-agent learning system model. In Section 3, the BGAN architecture is proposed and the analytical results are derived. Experimental results are presented in Section 4 and conclusions are drawn in Section 5.

2 System Model

Consider a set of agents such that every agent owns a dataset with a distribution . We define a set , that contains the total available data that follows a distribution . For each agent , is a data distribution that does not span the entire data space as good as . In this model, every agent tries to learn a generator distribution over its available dataset that can be close as possible to . To learn at every agent , we define a prior input noise with distribution and a mapping

from this random variable

to the data space, where

is a DNN with a vector of parameters

. For every agent , we also define another DNN called discriminator with vector of parameters that gets a data sample as an input and outputs a value between and . When the output of the discriminator is closer to , then the received data sample is deemed to be real and when the output is closer to it means the received data is fake. In our BGAN’s architecture, the goal is to find the distribution of the total data, under the constraint that no agent will share its available dataset and DNN parameters and with other agents. In contrast to MDGAN, FLGAN, and F2U, we will propose an architecture which is fully distributed and does not require any central controller. In BGANs, the agents only share their ideas about the data distribution with an idea being defined as the output of

, at every epoch of the training phase with the other agents. Then, they will use the shared ideas to brainstorm and collaboratively learn the required data distribution.

While every agent’s generator DNN tries to generate data samples close to the real data, the discriminator at every agent aims at discriminating the fake data samples from the real data samples that it owns. Hence, we model these interactions between the generators and discriminators of the agents by a game-theoretic framework. For a standalone agent that does not communicate with other agents, one can define a zero-sum game between its generator and discriminator such that its local value function is (Goodfellow et al., 2014):

(1)
Figure 1: The BGAN architecture.

In (1), the first term forces the discriminator to produce values equal to 1 for the real data. On the other hand, the second term penalizes the data samples generated by the generators. Therefore, the agent’s generator aims at minimizing the value function while its discriminator tries to maximize this value. It has been proven in (Goodfellow et al., 2014) that the NE for this game happens when and . At the NE, the discriminator cannot distinguish between the generated samples and agent ’s real data. Although a standalone GAN can learn the representation of its own dataset, if the owned dataset at each agent is not representative of the entire data space, then the standalone agent will learn a distribution that is not exactly the actual data representation. For instance, if an agent has a limited number of data samples that do not span the entire data space, then the learned distribution will be inaccurate (Goodfellow et al., 2016). In order to cope with this problem, we next introduce BGAN, in which every agent will only share their idea, i.e., their generated points with other neighboring agents without actually sharing their private dataset.

3 Brainstorming Generative Adversarial Networks Architecture

Let be the set of neighboring agents from whom agent receives ideas, let be the neighboring agents to whom agent sends ideas, and let be the directed graph of connections between the agents as shown in Figure 1. Here, a neighboring agent for agent is defined as an agent that is connected to agent in the connection graph via a direct link. For our BGAN architecture, we propose to modify the classical GAN value function in (1) into a brainstorming value function which integrates the received generated data samples (ideas) from other agents, as follows:

(2)

where is a mixture distribution of agent ’s owned data and the data that agent received from all neighboring agents. Formally, where . and represent, respectively, the importance of agent ’s own data and neighbor ’s generated data in the process of brainstorming. Such values can be assigned proportionally to the number of real data samples each agent owns since an agent having more data samples has more information about the data space. From (2), we can see that the brainstorming value functions of all agents in will be interdependent. Therefore, in order to find the optimal values for and , we define a multi-agent game between the discriminators and generators of agents. In this game, the generators collaboratively aim at generating real-like data to fool all of the discriminators while the discriminators try to distinguish between the generated and real data samples of the generators. To this end, we define the total utility function as follows:

(3)

In our BGAN, the generators aim at minimizing the total utility function defined in (3), while the discriminators try to maximize this value. Therefore, the optimal solutions for the discriminators and generators can be derived as follows:

(4)

where for notational simplicity we omit the arguments of . In what follows, we derive the NE for the defined game and characterize the optimal values for the generators and discriminators. At such NE, none of the agents can get a higher value from the game if it changes its generator and discriminator while other agents keeping their NE generator and discriminators.

Proposition 1.

For any given set of generators, , the optimal discriminator is:

(5)
Proof.

For any given set of generators,

, we can derive the probability distribution functions for the generators,

. Thus, we can write the total utility function as:

(6)

Next, to find the maximum of (6) with respect to all of the values, we can separate every term of the summation in (6) because each term contains for a single agent . Thus, the optimal value, , is the solution of the following problem:

(7)

In order to find the that maximizes (7), we can find the value of that maximizes the integrand of (7). which is given in (5). ∎

Having found the maximizing values for the discriminators, we can move to the minimization part of (4). To this end, using (5), we can rewrite (4) as follows:

(8)

Now, we can express as follows:

(9)

Next, we derive the global minimum of .

Theorem 1.

The global minimum of can be achieved at the solution of the following equation:

(10)

where is a matrix with element at every row and column , is a diagonal matrix whose -th diagonal element is , , and .

Proof.

We can rewrite as follows:

(11)

where is the JSD between and with a minimum at . Therefore, we can easily show that . Now, assume that the minimum of occurs when for . In this case, we will have:

(12)

which can be simplified to (10) if we move the term to the left side of the equation. In this case, we will have . Thus, will be simplified to:

(13)

By comparing (11) with (13), we can see that the solution of (10) yields , thus, minimizes . ∎

Theorem 1 shows that, in order to find the optimal values for , we need to solve (10). In the following, we prove that the solution of (10) is unique and is the only minimum of and, thus, the game defined in (4) has a unique NE.

1:  Initialize and for .
2:  Repeat:
3:   Parallel for :
4:    Generate samples using and .
5:    For :
6:     Send points from the generated samples (ideas),    , to agent .
7:    Sample data samples, ,   from .
8:    Update by ascending the following gradient:
(14)
9:    Generate samples using   and .
10:    Update by descending the following gradient:
(15)
11:  Until convergence to the NE
Algorithm 1 BGAN training.
Corollary 1.

The defined game between the generators and discriminators of our BGANs has a unique NE where and is the unique solution of (10).

Proof.

Since we know , then is a diagonally dominant matrix. In other words, for every row of , the magnitude of the diagonal element in a row is larger than or equal to the sum of the magnitudes of all the non-diagonal elements in that row. Therefore, using the Levy-Desplanques theorem (Horn and Johnson, 2012) we can show that is a non-singular matrix and, thus, (10) always has a unique solution. Moreover, since the solution of (10) results in having , that means it can minimize every term of the summation in (11), thus, it is the unique minimum point of . Also, using (5) the solution of (10) yields which completes the proof. ∎

Corollary 1 shows that the defined game between the discriminators and generators has a unique NE. At this NE, the agents can find the optimal value for the total utility function defined in (4). However, one key goal of our proposed BGAN is to show that using the brainstorming approach each generator can integrate the data distribution of the other agents into its generator distribution. To this end, in the following, we prove that in order to derive a generator that is a function of all of the agents’ datasets, the graph of connections between the agents, , must be strongly connected.

Definition 1.

An agent can reach an agent (agent is reachable from agent ) if there exists a sequence of neighbor agents which starts with and ends with .

Definition 2.

The graph is called strongly connected, if every agent is reachable from every other agent.

Theorem 2.

BGAN agents can integrate the real-data distribution of all agents into their generator if their connection graph, , is strongly connected.

Proof.

From Corollary 1, we know that (10) will always admit a unique solution, . In order to derive this solution, we can use the iterative Jacobi method. In this method, for an initial guess on the solution , the solution is obtained iteratively via:

(16)

where is the -th approximation on the value of . Letting , we will have:

(17)

Therefore, we have:

(18)

From (18), we can see that in order to have as a function of , there should be a whose entry in row and column is non-zero. This entry of is non-zero if is reachable from via steps in the graph . Therefore, in order to receive information from all of the agent datasets, every agent must be reachable from every other agent which completes the proof. ∎

Theorem 2 shows that, in order to have a BGAN architecture that enables the sharing of data (ideas) between all of the agents, must be strongly connected. In this case, the optimal solution for the generator is a linear mixture of agents real data distribution as follows:

(19)

where are some positive-valued constants that can be derived by solving (10). However, for the case in which is not strongly connected, the following corollary (whose proof is similar to Theorem 2) shows how the information is shared between the agents.

Corollary 2.

A BGAN agent can receive information from every other agent, if it is reachable from that agent.

Therefore, in BGANs, the agents can share information about their dataset with every agent which is reachable in . In practice, agent is reachable from agent if there is a communication path on the connection graph from agent to agent . Therefore, a BGAN agent can receive information (ideas) from every agent to which it is connected via a communication path. In this case, the optimal generator distribution of each agent will be a mixture of the data distributions of the agents which can reach agent in graph , written as . is the set of agents that can reach agent and and come from the solution of (10).

In order to implement the BGAN architecture, one of the important steps is to integrate the mixture model for each agent in the brainstorming step. For a batch size at each training episode, agent receives generated samples from agent . This approach guarantees that agent has contribution in the brainstorming phase compared to other neighbor agents in . Algorithm 1 summarizes the steps needed to implement the proposed BGAN architecture. Our BGAN architecture enables a fully distributed learning scheme in which every agent can gain information from all of the other agents without sharing their real data. Next, we showcase the key properties of BGAN by conducting extensive experiments.

4 Experiments

We empirically evaluate the proposed BGAN architecture on data samples drawn from multidimensional distributions as well as image datasets. Our goal here is to show how our BGAN architecture can improve the performance of agents by integrating a brainstorming mechanism compared to standalone GAN scenarios. Moreover, we perform extensive experiments to show the impact of architecture hyperparameters such as the number of agents, number of connections, and DNN architecture. In addition we compare our proposed BGAN architecture with MDGAN, FLGAN, and F2U, in terms of the quality of the generated data as well as the communication resources.

4.1 Implementation details

For every GAN, we use two simple multi-layer perceptrons with only two dense layers as shown in Figure

2. Note that our main goal is to show that our proposed BGAN architecture is fully distributed and preserves the privacy of the agents’ datasets. Thus, we do not use sophisticated DNNs as in (Radford et al., 2015; He et al., 2016; Gulrajani et al., 2017). However, since the proposed BGAN does not have any restrictions on the GAN architecture, architectures such as those in (Radford et al., 2015; He et al., 2016; Gulrajani et al., 2017)

can naturally be used to achieve higher accuracy in the generated data. In order to train our BGAN, we have used Tensorflow and 8 Tesla P100 GPUs which helped expediting the extensive experiments. We have used multiple values for the batch size and the reported values are the ones that had the best performance. Moreover, we have distributed an equal number of samples among the agents and we assign equal values for

and , unless otherwise stated.

Figure 2: DNN architecture for the generator and discriminator of every GAN agent. The blue color description is for the generator while the red ones is for the discriminator and the black ones are similar for both DNNs.

4.2 Datasets

For our experiments, we use two types of data samples. The first type which we call the ring dataset contains two dimensional samples where , , and and are chosen differently for multiple experiments. This dataset constitutes a ring shape in two-dimensional space as shown in Figure 3

and was used since a) the generated points by GAN agents can be visually evaluated, b) the dimensions of each sample has a nonlinear relationship between each other, and c) the stochastic behavior of the data samples is known since they are drawn from a gamma distribution and a uniform distribution and, thus, the JSD between the actual data and the generated data samples can be calculated. We use JSD as the quality measure for the BGAN architecture since, as shown in

(Goodfellow et al., 2014) and Theorem 1, the JSD between the generated data samples and the actual data must be minimized at the NE. The second type of datasets that we use is the well-known MNIST dataset (LeCun and Cortes, 2010). This dataset is used to compare BGAN’s generated samples with the other distributed GAN architectures such MDGAN, FLGAN, and F2U.

Figure 3: A illustration of the used data samples drawn from a nonlinear combination of gamma and uniform distributions.

4.3 Effect of the number of agents and data samples

GANs can provide better results if they are fed large datasets. Therefore, using the ring dataset, we try to find the minimum number of training samples that is enough for a standalone GAN to learn the distribution of the data samples and achieve a minimum JSD from the dataset. From Figure 5, we can see that the JSD remains constant after 1000 data samples and, hence, in order to showcase the benefits of brainstorming, we will assign less than 1000 samples to each agent to see if brainstorming can reduce the JSD.

For implementing BGAN, we consider a connection graph in which each agent receives data (idea) only from one neighbor and the graph is strongly connected as shown in Figure 4. We implement BGAN with to agents with to data samples for each agent. Figure 6 shows the JSD of generated points from the actual dataset for BGAN agents and for a conventional standalone (single) GAN agent for a different number of data samples. Figure 6 shows that, by increasing the number of data samples, the JSD decreases and reach its minimum value. More importantly, BGAN agents can compensate the lack of data samples that occurs in a standalone GAN by brainstorming with other agents. For instance a standalone agent with 10 data samples has a JSD of 24, however, when the same agent participates in the brainstorming process with 9 other agents (10 agents in total), then, it can achieve a JSD of 13.

Figure 4: The graph of connections with only one neighbor for each agent that has strong connectivity property.
Figure 5: JSD as a function of the number of available samples for a standalone GAN.
Figure 6: Effect of the number of available samples for each BGAN agent as well as number of agents on the system’s JSD.

Next, in Figure 7, we show the generated points by standalone and BGAN agents for different numbers of available data samples. When an agent has access to a small dataset, the GAN parameters will be underfitted, however, brainstorming can still, in some sense, increase the size of dataset by adding the generated samples of neighboring agents into the training set. Therefore, as seen from Figure 7, by participating in the brainstorming process, a BGAN agent with a limited dataset can generate data samples that are closer to the actual data distribution in Figure 3. This demonstrates that our BGAN architecture can significantly improve the learning and generation performance of agents.

Figure 7: Normalized generated samples of standalone GAN and BGAN with 5 and 10 agents where each agent has access to 10, 50, 100, 500, or 1000 data samples.

4.4 Effect of the connection graph

In order to study the effect of the connection graph on the performance of BGAN agents we consider two scenarios: 1) A strongly connected graph in which every agent has an equal number of neighbors as shown in Figure 8 and 2) a string such that all of the agents have only one neighbor except one agent, agent , that does not have any neighbors but sends data to one other agent as shown in Figure 9.

In the first scenario, we consider 10 agents and we implement the BGAN architecture while varying the number of neighbors for each agent between 1 to 9. We also consider 10, 20, 50, 100, and 1000 data samples for each agent. Figure 10 shows that for cases in which the number of samples is too small (10, 20, and 50) having more neighbors reduces the average JSD for the BGAN agents. However, when the number of samples is large enough adding more neighbors does not affect the JSD. We explain this phenomenon from (10) and (19), where the optimal generator distribution is shown to be related to the graph structure. For instance, Table 1 shows the dependence of each agent’s generator distribution on all of the agents’ datasets for the cases with 1 and 9 neighbors. From Table 1, we can see that, for the 1-neighbor scenario, as an agent gets farther away from another agent, it will less affect the other agent’s generator distribution since decreases when increases. However, for a 9-neighbor scenario, stays constant for. Therefore, when the agents have a small dataset and have a higher number of neighbors, they can gain information almost equally from other agents and can span all the data space. As such, as shown in Figure 10, the JSD of the agent which owns only 10 data samples gets better when the number of neighbors increases. However, when the number of data samples is large then for all . In this case, for and, thus, from (19) we will have irrespective of the graph structure.

(1 neighbor) (9 neighbors)
0
1
2
9
Table 1: Effect of graph structure on the generator distribution.
Figure 8: A strongly connected graph with more than 1 neighbor for each agent.
Figure 9: A string graph. All of the agents receive data from only one agent, however, the last agent does not have any neighbors to receive data from.
Figure 10: Effect of the number of connections on JSD.

In the second scenario, we again consider 10 agents. Now, agent 10 sends data to agent 9, agent 9 sends data to agent 8, and so on until agent 2 sends data to agent 1. However, unlike the first scenario, agent 1 does not close the loop and does not send any information to agent 10. Figure 11 shows the generated samples and JSD of every agent. From Figure 11, we can see that the samples generated by agents 1 to 5 are close to the ring dataset and they have a small JSD. However, as we move closer to the end of the string (agent 10), the generated samples diverge from the actual data distribution and the JSD increases. Therefore, in order for each agent to get information from the datasets of all of the agents, the graph should be strongly connected.

Figure 11: Normalized generated samples of agents of a disconnected BGAN.

4.5 Effect of non-overlapping data samples

Next, we prove that the BGAN agents can learn non-overlapping portions of the other agents’ data distributions. To this end, we consider two BGAN agents whereby agent 1 has access to data points only between 0 to degrees while agent 2 owns the data between to degrees. Figure 12 shows the available portions of the ring dataset for each agent as well their generated points after brainstorming. As can be seen from Figure 12, the agents can generate points from the ring dataset that they do not own. This showcases the fact that brainstorming helps agents to exchange information between them without sharing their private datasets. However, a standalone agent can at best learn to mimic the portion of data that it owns and cannot generate points from the data space that it does not have access to.

Figure 12: BGAN agents which own nonoverlapping data samples can learn about the other agents’ data samples.

4.6 Effect of different architectures

We now show how the proposed BGAN can allow the different agents to have a different DNN architecture considering a BGAN with 5 agents whose DNNs differ only in the size of their dense layer. In other words, each agent has a different number of neurons. In Figure

13, we compare the output of the agents resulting from both the standalone and BGAN scenarios. Figure 13 demonstrates that, in the standalone case, agents having denser DNNs will have a lower JSD compared to agents having a smaller number of trainable parameters at the dense layer. However, by participating in brainstorming, all of the GAN agents reduce their JSD and improve the quality of their generated points. This allows agents with lower computational capability to brainstorm with other agents and improve their learned data distribution. Note that this capability is not possible with the other baselines such as FLGAN.

Figure 13: Comparison between the standalone and BGAN agents with different number of trainable parameters.

4.7 Comparison with FLGAN, MDGAN, and F2U

In order to compare the generated points by BGAN agents with other state-of-the-art distributed GAN architectures, we use the ring and MNIST dataset. For the ring datasets, we run experiments with 2 to 10 agents where each agent owns only 100 data samples. We compare the average JSD of BGAN agents with the ones resulting from FLGAN, MDGAN, F2U, and standalone GAN agents. We consider samples for each standalone agent as the upper bound performance of distributed agents and samples for each standalone agent as the lower bound performance indicator. In other words, no distributed GAN agent can perform better than a standalone GAN that has access to all data samples. On the other hand, distributed architectures should always have a lower JSD compared to a standalone agent with the same number of available data samples.

Figure 14 shows the average JSD resulting from the various distributed GAN architectures. We can see from Figure 14 that BGAN agents will always have a lower JSD compared to the other distributed GAN agents whereby for the two-agent case, BGAN can achieve a JSD as low as a standalone with 200 data samples. Furthermore, all distributed agents yield a better performance than a standalone agent with 100 samples, however, they cannot achieve a JSD lower than a standalone agent with samples. In addition, adding more agents reduces the JSD for MDGAN and F2U agents while the JSD of FLGAN and BGAN agents stay constant. For BGAN, we have already seen this fact in Figure 6 where the JSD achieves a minimum value for a particular number of data samples such that adding more agents will not improve the performance of BGAN agents.

Figure 14: JSD comparison between BGAN, FLGAN, MDGAN, and F2U agents.
Figure 15: Images generated by BGAN agents.
Figure 16: FID comparison between BGAN, FLGAN, MDGAN, and F2U agents.

Furthermore, we compare the performance of BGAN agents with other distributed GANs using the MNIST dataset. In order to show the information flow between the agents, we consider 10 agents each of which owns only images of a single digit. Figure 15 shows the images generated by BGAN. From Figure 15, we can see that all BGAN agents can not only generate digits similar to the dataset that they own, but they can also generate digits similar to their neighbors’ datasets. This is a valuable result since, similar to the experiments on the non-overlapping datasets, we can see that BGAN can enable agents to transfer information among them while preserving the privacy of their own dataset. In addition, we calculate the FID between the generated samples of BGAN, MDGAN, FLGAN, and F2U during the training process. Figure 16 shows that BGAN outperforms the other architectures in terms of the FID value (normalized with respect to the maximum achieved FID) and has a more stable performance since the average FID does not deviate as much as the other architectures.

One of the benefits of BGAN is its low communication requirements particularly for cases with very deep architectures. Table 2 shows the communication requirements of different distributed GAN architectures. Every BGAN agent receives samples from its neighbors at every training epoch where is the batch size. Therefore, at every time step communication resources (e.g., bandwidth/power) are needed to transmit data between the agents, where is the size of each data sample. However, , , and communication resources are needed for MDGAN, FLGAN, and F2U, respectively. Clearly, BGAN can significantly reduce the communication overhead of distributed GANs. Further, BGANs do not require a central unit that aggregates information from multiple agents which makes them more robust than other distributed architectures for scenarios in which an agent fails to communicate with its neighbors. In contrast, since FLGAN, MDGAN, and F2U rely on a central unit, any failure to the central unit will disrupt the GAN output at all of the agents.

Architecture Communication resources
BGAN
MDGAN
FLGAN
F2U
Table 2: Communication requirements of different distributed GAN architectures

5 Conclusion

In this paper, we have proposed a novel BGAN architecture that enables agents to learn a data distribution in a distributed fashion in order to generate real-like data samples. We have formulated a game between BGAN agents and derived a unique NE at which agents can integrated data information from other agents without sharing their real data thus can preserve the privacy of their dataset. We have shown that the proposed BGAN architecture is fully distributed and does not require a central controller. Furthermore, the proposed BGAN architecture significantly reduces the communication overhead compared to other state-of-the-art distributed GAN architectures. In addition, unlike in other distributed GAN architectures, BGAN agents can have different DNN designs which enables even computationally limited agents to participate in brainstorming and gain information from other agents. Experimental results have shown that the proposed BGAN architecture can achieve lower JSD and FID in multiple data distributions compared to other state-of-the art distributed GAN architectures.

References

  • J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, et al. (2012) Large scale distributed deep networks. In Advances in neural information processing systems, pp. 1223–1231. Cited by: §1.1, §1.1.
  • J. Dean and S. Ghemawat (2008) MapReduce: simplified data processing on large clusters. Communications of the ACM 51 (1), pp. 107–113. Cited by: §1.1, §1.1.
  • I. Durugkar, I. Gemp, and S. Mahadevan (2016) Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673. Cited by: §1.1, §1.1.
  • A. Ferdowsi and W. Saad (2019) Generative adversarial networks for distributed intrusion detection in the Internet of Things. In Proceedings of IEEE Global Communications Conference, Waikoloa, HI, USA. Cited by: §1.
  • A. Ghosh, V. Kulharia, V. Namboodiri, P. H. Torr, and P. K. Dokania (2017) Multi-agent diverse generative adversarial networks. arXiv preprint arXiv:1704.02906. Cited by: §1.1, §1.1.
  • I. Goodfellow, Y. Bengio, and A. Courville (2016) Deep learning. MIT presss. Cited by: §2.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger (Eds.), pp. 2672–2680. Cited by: §1, §2, §2, §4.2.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in neural information processing systems, pp. 5767–5777. Cited by: §4.1.
  • C. Hardy, E. Le Merrer, and B. Sericola (2019) MD-GAN: multi-discriminator generative adversarial networks for distributed datasets. In IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 866–877. Cited by: §1.1, §1.1.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In

    Proceedings of the IEEE conference on computer vision and pattern recognition

    ,
    pp. 770–778. Cited by: §4.1.
  • Q. Hoang, T. D. Nguyen, T. Le, and D. Phung (2017) Multi-generator generative adversarial nets. arXiv preprint arXiv:1708.02556. Cited by: §1.1, §1.1.
  • R. A. Horn and C. R. Johnson (2012) Matrix analysis. Cambridge university press. Cited by: §3.
  • J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1.1, §1.1.
  • Y. LeCun and C. Cortes (2010) MNIST handwritten digit database. Note: http://yann.lecun.com/exdb/mnist/ External Links: Link Cited by: §4.2.
  • C. Li and M. Wand (2016) Precomputed real-time texture synthesis with markovian generative adversarial networks. In European Conference on Computer Vision, pp. 702–716. Cited by: §1.
  • Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M. Hellerstein (2012) Distributed graphlab: a framework for machine learning and data mining in the cloud. Proceedings of the VLDB Endowment 5 (8), pp. 716–727. Cited by: §1.1, §1.1.
  • S. Pascual, A. Bonafonte, and J. Serra (2017) SEGAN: speech enhancement generative adversarial network. arXiv preprint arXiv:1703.09452. Cited by: §1.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §4.1.
  • S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee (2016) Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396. Cited by: §1.
  • C. Vondrick, H. Pirsiavash, and A. Torralba (2016) Generating videos with scene dynamics. In Advances In Neural Information Processing Systems, pp. 613–621. Cited by: §1.
  • R. Yonetani, T. Takahashi, A. Hashimoto, and Y. Ushiku (2019) Decentralized learning of generative adversarial networks from multi-client non-iid data. arXiv preprint arXiv:1905.09684. Cited by: §1.1, §1.1.