1 Introduction
1.1 Generative Adversarial Networks
Generative adversarial networks (GANs) are a powerful framework for learning generative models. They have witnessed successful applications in a wide range of fields, including image synthesis [1, 2]
, image superresolution
[3, 4], and anomaly detection
[5]. A GAN maintains two deep neural networks: the discriminator and the generator. The generator aims to produce samples that resemble the data distribution, while the discriminator aims to distinguish the generated samples and the data samples.Mathematically, the standard GAN training aims to solve the following optimization problem:
(1) 
The global optimum point is reached when the generated distribution , which is the distribution of given , is equal to the data distribution. The optimal point is reached based on the assumption that the discriminator and generator are jointly optimized. Practical training of GANs, however, may not satisfy this assumption. In some training process, instead of ideal joint optimization, the discriminator and generator seek for best response by turns, namely the discriminator (resp. generator) is alternately updated with the generator (resp. discriminator) fixed.
Another conventional training methods are based on a gradient descent form of GAN optimization. In particular, they simultaneously take small gradient steps in both generator and discriminator parameters in each training iteration [6]. There have been some studies on the convergence behaviors of gradientbased training. The local convergence behavior has been studied in [7, 8]. The gradientbased optimization is proved to converge assuming that the discriminator and the generator is convex over the network parameters [9]. The inherent connection between gradientbased training and primaldual subgradient methods for solving convex optimizations is built in [10].
Despite the promising practical applications, a lot of works still witness the lack of convergence behaviors in training GANs. Two common failure modes are oscillation and mode collapse, where the generator only produces a small family of samples [6, 11, 12]. One important observation in [13] is that such non convergence behaviors stem from the fact that each generator update step is a partial collapse towards a delta function, which is the best response to the objective function. This motivates the study of this paper on the dynamics of bestresponse training and the proposal of a novel training method to address these convergence issues.
1.2 Contributions
In this paper, we view GANs as a twoplayer zerosum game and the training process as a repeated game. For the optimal solution to Eq. (1), the corresponding generated distribution and discriminator
is shown to be the unique Nash equilibrium in the game. Inspired by the wellestablished fictitious play mechanism in game theory, we propose a novel training algorithm to resolve the convergence issue and find this Nash equilibrium.
The proposed training algorithm is referred to as Fictitious GAN, where the discriminator (resp. generator) is updated based on the the mixed outputs from the sequence of historical trained generators (resp. discriminators). The previously trained models actually carry important information and can be utilized for the updates of the new model. We prove that Fictitious GAN achieves the optimal solution to Eq. (1). In particular, the discriminator outputs converge to the optimum discriminator function and the mixed output from the sequence of trained generators converges to the data distribution.
Moreover, Fictitious GAN can be regarded as a metaalgorithm that can be applied on top of existing GAN variants. Both synthetic data and realworld image datasets are used to demonstrate the improved performance due to the fictitious training mechanism.
2 Related Works
The idea of training using multiple GAN models have been considered in other works. In [14, 15]
, the mixed outputs of multiple generators is used to approximate the data distribution. The multiple generators with a modified loss function have been used to alleviate the mode collapse problem
[16]. In [13], the generator is updated based on a sequence of unrolled discriminators. In [17], dual discriminators are used to combine the KullbackLeibler (KL) divergence and reverse KL divergences into a unified objective function. Using an ensemble of discriminators or GAN models has shown promising performance [18, 19]. One distinguishing difference between the abovementioned methods and our proposed method is that in our method only a single deep neural network is trained at each training iteration, while multiple generators (resp. discriminators) only provide inputs to a single discriminator (resp. generators) at each training stage. Moreover, the outputs from multiple networks is simply uniformly averaged and serves as input to the target training network, while other works need to train the optimal weights to average the network models. The proposed method thus has a much lower computational complexity.The use of historical models have been proposed as a heuristic method to increase the diversity of generated samples
[20], while the theoretical convergence guarantee is lacking. Game theoretic approaches have been utilized to achieve a resourcebounded Nash equilibrium in GANs [21]. Another closely related work to this paper is the recent work [22] that applies the FollowtheRegularizedLeader (FTRL) algorithm to train GANs. In their work, the historical models are also utilized for online learning. There are at least two distinct features in our work. First, we borrow the idea of fictitious play from game theory to prove convergence to the Nash equilibrium for any GAN architectures assuming that networks have enough capacity, while [22] only proves convergence for semishallow architectures. Secondly, we prove that a single discriminator, instead of a mixture of multiple discriminators, asymptotically converges to the optimal discriminator. This provides important design guidelines for the training, where asymptotically a single discriminator needs to be maintained. ^{1}^{1}1Due to space constraints, all the proofs in the paper are omitted and can be found in the Supplementary materials.3 Toy Examples
In this section, we use two toy examples to show that both the bestresponse approach and the gradientbased training approach may oscillate for simple minimax optimization problems.
Take the GAN framework for instance, for the bestresponse training approach, the discriminator and the generator are updated to the optimum point at each iteration. Mathematically, the discriminator and the generator is alternately updated according to the following rules:
(2)  
(3) 
Example 1
Let the data follow the Bernoulli distribution
Bernoulli , where . Suppose the initial generated distribution Bernoulli , where . We show that in the bestresponse training process, the generated distribution oscillates between Bernoulli and Bernoulli .We show the oscillation phenomenon in training using bestresponse training approach. To minimize (3), it is equivalent to find such that
is minimized. At each iteration, the output distribution of the updated generator would concentrate all the probability mass at
if , or at if . Suppose , where is the indicator function, then by solving (2), the discriminator at the next iteration is updated as(4) 
which yields and . Therefore, the generated distribution at the next iteration becomes . The oscillation between Bernoulli and Bernoulli continues by induction. A similar phenomenon can be observed for Wasserstein GAN.
The first toy example implies that the oscillation behavior is a fundamental problem to the iterative bestresponse training. In practical training of GANs, instead of finding the best response, the discriminator and generator are updated based on gradient descent towards the bestresponse of the objective function. However, the next example adapted from [23] demonstrates the failure of convergence in a simple minimax problem using a gradientbased method.
Example 2
Consider the following minimax problem:
(5) 
Consider the gradient based training approach with step size . The update rule of and is:
(6) 
By using the knowledge of eigenvalues and eigenvectors, we can obtain
(7) 
where and are constants depending on the initial . As , since , the process will not converge.
Figure 1 shows the performance of gradient based approach, the initial value and step size is 0.01. It can be seen that both players’ actions do not converge. This toy example shows that even the gradient based approach with arbitrarily small step size may not converge.
We will revisit the convergence behavior in the context of game theory. A wellestablished learning mechanism in game theory naturally leads to a training algorithm that resolves the nonconvergence issues of these two toy examples.
4 Nash Equilibrium in ZeroSum Games
In this section, we introduce the twoplayer zerosum game and describe the learning mechanism of fictitious play, which provably achieves a Nash equilibrium of the game. We will show that the minimax optimization of GAN can be formulated as a twoplayer zerosum game, where the optimal solution corresponds to the unique Nash equilibrium in the game. In the next section we will propose a training algorithm which simulates the fictitious play mechanism and provably achieves the optimal solution.
4.1 ZeroSum Games
We start with some definitions in game theory. A game consists of a set of players, who are rational and take actions to maximize their own utilities. Each player chooses a pure strategy from the strategy space . Here player has strategies in her strategy space. A utility function , which is defined over all players’ strategies, indicates the outcome for player , where the subscript stands for all players excluding player . There are two kinds of strategies, pure and mixed strategy. A pure strategy provides a specific action that a player will follow for any possible situation in a game, while a mixed strategy for player
is a probability distribution over the
pure strategies in her strategy space with . The set of possible mixed strategies available to player is denoted by . The expected utility of mixed strategy for player is(8) 
For ease of notation, we write as
in the following. Note that a pure strategy can be expressed as a mixed strategy that places probability 1 on a single pure strategy and probability 0 on the others. A game is referred to as a finite game or a continuous game, if the strategy space is finite or nonempty and compact, respectively. In a continuous game, the mixed strategy indicates a probability density function (pdf) over the strategy space.
Definition 1
For player i, a strategy is called a best response to others’ strategy if for any
Definition 2
A set of mixed strategies is a Nash equilibrium if, for every player , is a best response to the strategies played by the other players in this game.
Definition 3
A zerosum game is one in which each player’s gain or loss is exactly balanced by the others’ loss or gain and the sum of the players’ payoff is always zero.
Now we focus on a continuous twoplayer zerosum game. In such a game, given the strategy pair , player 1 has a utility of , while player 2 has a utility of . In the framework of GAN, the training objective (1) can be regarded as a twoplayer zerosum game, where the generator and discriminator are two players with utility functions and , respectively. Both of them aim to maximize their utility and the sum of their utilities is zero.
Knowing the opponent is always seeking to maximize its utility, Player 1 and 2 choose strategies according to
(9)  
(10) 
Define and as the lower value and upper value of the game, respectively. Generally, . Sion [24] showed that these two values coincide under some regularity conditions:
Theorem 4.1 (Sion’s Minimax Theorem [24])
Let and be convex, compact spaces, and : . If for any , is upper semicontinuous and quasiconcave on and for any , is lower semicontinuous and quasiconvex on , then .
Hence, in a zerosum game, if the utility function satisfies the conditions in Theorem 4.1, then . We refer to as the value of the game. We further show that a Nash equilibrium of the zerosum game achieves the value of the game.
Corollary 1
In a twoplayer zerosum game with the utility function satisfying the conditions in Theorem 4.1, if a strategy is a Nash equilibrium, then .
Corollary 1 implies that if we have an algorithm that achieves a Nash equilibrium of a zerosum game, we may utilize this algorithm to optimally train a GAN. We next describe a learning mechanism to achieve a Nash equilibrium.
4.2 Fictitious Play
Suppose the zerosum game is played repeatedly between two rational players, then each player may try to infer her opponent’s strategy. Let denote the action taken by player at time . At time , given the previous actions chosen by player , one good hypothesis is that player is using stationary mixed strategies and chooses strategy , , with probability . Here we use the empirical frequency to approximate the probability in mixed strategies. Under this hypothesis, the best response for player at time is to choose the strategy satisfying:
(11) 
where is the empirical distribution of player 2’s historical actions. Similarly, player 2 can choose the best response assuming player 1 is choosing its strategy according to the empirical distribution of the historical actions.
Notice that the expected utility is a linear combination of utilities under different pure strategies, hence for any hypothesis , player can find a pure strategy as a best response. Therefore, we further assume each player plays the best pure response at each round. In game theory this learning rule is called fictitious play, proposed by Brown [25].
Danskin [26] showed that for any continuous zerosum games with any initial strategy profile, fictitious play will converge. This important result is summarized in the following theorem.
Theorem 4.2
Let be a continuous function defined on the direct product of two compact sets and . The pure strategy sequences and are defined as follows: and are arbitrary, and
(12) 
then
(13) 
where is the value of the game.
4.3 Effectiveness of Fictitious Play
In this section, we show that fictitious play enables the convergence of learning to the optimal solution for the two counterexamples in Section 3.
Example 1: Fig. 2 shows the performance of the bestresponse approach, where the data follows a Bernoulli distribution Bernoulli , the initialization is for and the initial generated distribution Bernoulli . It can be seen that the generated distribution based on best responses oscillates between and .
Assuming best response at each iteration , under fictitious play, the discriminator is updated according to and the generated distribution is updated according to . Fig 2 shows the change of and the empirical mean of the generated distributions as training proceeds. Although the bestresponse generated distribution at each iteration oscillates as in Fig. 1(a), the learning mechanism of fictitious play makes the empirical mean converge to the data distribution.
Example 2: At each iteration , player 1 chooses , which is equal to . Similarly, player 2 chooses according to . Hence regardless of what the initial condition is, both players will only choose 10 or 10 at each iteration. Consequently, as iteration goes to infinity, the empirical mixed strategy only proposes density on 10 and 10. It is proved in the Supplementary material that the mixed strategy that both players choose 10 and 10 with probability is a Nash equilibrium for this game. Fig 3 shows that under fictitious play, both players’ empirical mixed strategy converges to the Nash equilibrium and the expected utility for each player converges to 0.
One important observation is fictitious play can provide the Nash equilibrium if the equilibrium is unique in the game. However, if there exist multiple Nash equilibriums, different initialization may yield different solutions. In the above example, it is easy to check is also a Nash equilibrium, which means both players always choose 0, but fictitious play can lead to this solution only when the initialization is . The good thing we show in the next section is, due to the special structure of GAN (the utility function is linear over generated distribution), fictitious play can help us find the desired Nash equilibrium.
5 Fictitious GAN
5.1 Algorithm Description
As discussed in the last section, the competition between the generator and discriminator in GAN can be modeled as a twoplayer zerosum game. The following theorem proved in the supplementary material shows that the optimal solution of (1) is actually a unique Nash equilibrium in the game.
Theorem 5.1
By relating GAN with the twoplayer zerosum game, we can design a training algorithm to simulate the fictitious play such that the training outcome converges to the Nash equilibrium
Fictitious GAN, as described in Algorithm 1, adapts the fictitious play learning mechanism to train GANs. We use two queues and to store the historically trained models of the discriminator and the generator, respectively. At each iteration, the discriminator (resp. generator) is updated according to the best response to assuming that the generator (resp. discriminator) chooses a historical strategy uniformly at random. Mathematically, the discriminator and generator are updated according to (14) and (15), where the outputs due to the generator and the discriminator is mixed uniformly at random from the previously trained models. Note the the backpropagation is still performed on a single neural network at each training step. Different from standard training approaches, we perform gradient descent updates when training the discriminator and the generator in order to achieve the best response. In practical learning, queues and are maintained with a fixed size. The oldest model is discarded if the queue is full when we update the discriminator or the generator.
(14) 
(15) 
The following theorem provides the theoretical convergence guarantee for Fictitious GAN. It shows that assuming best response at each update in Fictitious GAN, the distribution of the mixture outputs from the generators converge to the data distribution. The intuition of the proof is that fictitious play achieves a Nash equilibrium in twoplayer zerosum games. Since the optimal solution of GAN is a unique equilibrium in the game, fictitious GAN achieves the optimal solution.
Theorem 5.2
Suppose the discriminator and the generator are updated according to the bestresponse strategy at each iteration in Fictitious GAN, then
(16)  
(17) 
where is the output from the th trained discriminator model and is the generated distribution due to the th trained generator.
5.2 Fictitious GAN as a MetaAlgorithm
One advantage of Fictitious GAN is that it can be applied on top of existing GANs. Consider the following minimax problem:
(18) 
where and are some quasiconcave functions depending on the GAN variants. Table 1 shows the family of fGAN [10, 9] and Wasserstein GAN.
We can model these GAN variants as twoplayer zerosum games and the training algorithms for these variants of GAN follow by simply changing and in the updating rule accordingly in Algorithm 1. Following the proof in Theorem 5.2, we can show that the time average of generated distributions will converge to the data distribution and the discriminator will converge to as shown in Table 1.
6 Experiments
Our Fictitious GAN is a metaalgorithm that can be applied on top of existing GANs. To demonstrate the merit of using Fictitious GAN, we apply our metaalgorithm on DCGAN [27] and its extension conditional DCGAN. Conditional DCGAN allows DCGAN to use external label information to generate images of some particular classes. We evaluate the performance on a synthetic dataset and three widely adopted realworld image datasets. Our experiment results show that Fictitious GAN could improve visual quality of both DCGAN and conditional GAN models.
Image dataset. (1) MNIST: contains 60,000 labeled images of 28 28 grayscale digits. (2) CIFAR10: consists of colored natural scene images sized at 32 32 pixels. There are 50,000 training images and 10,000 test images in 10 classes. (3) CelebA: is a largescale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations.
Parameter Settings.
We used Tensorflow for our implementation. Due to GPU memory limitation, we limit number of historical models to 5 in realworld image dataset experiments. More architecture details are included in supplementary material.
6.1 2D Mixture of Gaussian
Fig. 4 shows the performance of Fictitious GAN for a mixture of 8 Gaussain data on a circle in 2 dimensional space. We use the network structure in [13]
to evaluate the performance of our proposed method. The data is sampled from a mixture of 8 Gaussians uniformly located on a circle of radius 1.0. Each has standard deviation of 0.02. The input noise samples are a vector of 256 independent and identically distributed (i.i.d.) Gaussian variables with mean zero and unit standard deviation.
While the original GANs experience mode collapse [17, 13], Fictitious GAN is able to generate samples over all 8 modes, even with a single discriminator asymptotically.
6.2 Qualitative Results for Image Generation
We show visual quality of samples generated by DCGAN and conditional DCGAN, trained by proposed Fictitious GAN. In Fig. 5 first row corresponds to generated samples. We apply train DCGAN on CelebA dataset, and train conditional DCGAN on MNIST and CIFAR10. Each image in the first row corresponds to the image in the same grid position in second row of Fig. 5 . The second row shows the nearest neighbor in training dataset computed by Euclidean distance. The samples are randomly drawn without cherry picking, they are representative of model output distribution.
In CelebA, we can generate face images with various genders, skin colors and hairstyles. In MNIST dataset, all generated digits have almost visually identical samples. Also, digit images have diverse visual shapes and fonts. CIFAR10 dataset is more challenging, images of each object have large visual appearance variance. We observe some visual and label consistency in generated images and the nearest neigbhors, especially in the categories of airplane, horse and ship. Note that though we theoratical proved that Fictitious GAN could improve robustness of training in best response strategy, the visual quality still depends on the baseline GAN architecture and loss design, which in our case is conditional DCGAN.
6.3 Quantitative Results
In this section, we quantitatively show that DCGAN models trained by our Fictitious GAN could gain improvement over traditional training methods. Also, we may have a better performance by applying Fictitious gan on other existing gan models. The results of comparison methods are directly copied as reported.
Metric. The visual quality of generated images is measured by the widely used Inception score metric [20]. It measures visual objectiveness of generated image and correlates well with human scoring of the realism of generated images. Following evaluation scheme of [20] setup, we generate 50,000 images from our model to compute the score.
As shown in Table 2, Our method outperforms recent stateoftheart methods. Specifically, we improve baseline DCGAN from to ; and conditional DCGAN model from to . It sheds light on the advantage of training with the proposed learning algorithm. Note that in order to highlight the performance improvement gained from fictitious GAN, the inception score of reproduced DCGAN model is 6.72, obtained without using tricks as [20]. Also, we did not use any regularization terms such as conditional loss and entropy loss to train DCGAN, as in [28]. We expect higher inception score when more training tricks are used in addition to Fictitious GAN.
6.4 Ablation studies
One hyperparameter that affects the performance of Fictitious GAN is the number of historical generator (discriminator) models. We evaluate the performance of Fictitious GAN with different number of historical models, and report the inception scores on the 150th epoch in CIFAR10 dataset in Fig.
6. We keep the number of historical discriminators the same as the number of historical generators. We observe a trend of performance boost with an increasing number of historical models in 2 baseline GAN models. The mean of inception score slightly drops for JensonShannon divergence metric when the copy number is 4, due to random initialization and random noise generation in training.7 Conclusion
In this paper, we relate the minimax game of GAN to the twoplayer zerosum game. This relation enables us to leverage the mechanism of fictitious play to design a novel training algorithm, referred to as fictitious GAN. In the training algorithm, the discriminator (resp. generator) is alternately updated as best response to the mixed output of the stale generator models (resp. discriminator). This novel training algorithm can resolve the oscillation behavior due to the pure best response strategy and the inconvergence issue of gradient based training in some cases. Real world image datasets show that applying fictitious GAN on top of the existing DCGAN models yields a performance gain of up to 8%.
8 Appendix
8.1 Proof of (7)
8.2 Proof of Corollary 1
Suppose is a Nash equilibrium, then from the definition, we know
(22) 
for any and . Hence we have
(23)  
(24)  
(25)  
(26)  
(27)  
(28) 
Since , we obtain .
8.3 Proof of Nash Equilibrium for Example 2
Now, we show the mixed strategy that both players choose 10 and 10 with probability is a Nash equilibrium for the minimax game.
Take player 1 for instance, given , then for any possible mixed strategy , where indicates the probability she chooses value x, we know the expected utility for her is:
(29) 
Hence no matter what her strategy is, the expected utility is always 0 and therefore player 1 has no incentive to deviate from strategy given . Similarly, we can show player 2 has no incentive to deviate from given . Thus, the mixed strategy is a Nash equilibrium.
8.4 Proof of Theorem 5.1
Let be as defined in (37). A Nash equilibrium in a zerosum game is a mixed strategy with corresponding pdf of such that
(30)  
(31) 
Define . Note that is also a valid probability density over . We have
(32)  
(33)  
(34)  
(35) 
Hence (30) can be rewritten as:
(36) 
and given , the optimal strategy for the discriminator is to choose with probability 1, which means the best response is a pure strategy.
Therefore, at any Nash equilibrium, the generator generates data following a pure distribution , while the discriminator chooses a pure response . Moreover, is the only solution to (31), i.e., the generator has no incentive to deviate. Consequently, the only possible Nash equilibrium is and for any .
8.5 Proof of Theorem 5.2
Proof
For the minimax game of (1), let be the generated distribution. We rewrite the optimization problem as
(37) 
With fixed, is semicontinuous and quasiconcave in ; and with fixed, is semicontinuous and quasiconvex in . Thus, the utility function satisfies the conditions in Theorem 4.1.
Moreover, by Theorem 5.1, the unique Nash equilibrium of the game is shown to be and for all . The value of the game is .
In Fictitious GAN, the discrimination function of th model satisfies . Let . It is easy to see that is also a valid pdf. Then we have
(38)  
(39) 
Therefore, the optimal discrimination function of th model is calculated as
(40) 
Thus, , where is the JensenShannon divergence between and as defined in [6]. By Theorem 4.2, we have , which implies that tends to zero. Since if and only if , (16) is established. Combining (40) and (16) yields (17).
8.6 Network Architectures and Parameters
All architectures are chosen as recommended by a publicly avaiable implementation^{2}^{2}2https://github.com/carpedm20/DCGANtensorflow.
Experiment on synthetic data:
The generator has two hidden layers of size 128 with ReLU activation. The last layer is a linear projection to two dimensions. The discriminator has one hidden layer of size 128 with ReLU activation followed by a fully connected network to a sigmoid activation. All the biases are initialized to be zeros and the weights are initalilzed via the “Xavier” initialization
[29]. The training updates the discriminator and the generator using 3 subiterations. The Adam optimizer is used to train the discriminator with 2e4 learning rate and the generator with learning rate. The minibatch sample number is 64.Experiment on MNIST, CIFAR10, CelebA: GAN networks were trained using the Adam optimizer [30] with batches of size 64 and learning rate , for around 150K generator iterations in the case of CIFAR10 and 100K for MNIST.
References

[1]
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.:
Generative adversarial text to image synthesis.
In: International Conference on Machine Learning. (2016) 1060–1069

[2]
Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.:
Learning from simulated and unsupervised images through adversarial
training.
In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Volume 3. (2017) 6
 [3] Johnson, J., Alahi, A., FeiFei, L.: Perceptual losses for realtime style transfer and superresolution. In: European Conference on Computer Vision, Springer (2016) 694–711
 [4] Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photorealistic single image superresolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). (2017) 105–114

[5]
Zhai, S., Cheng, Y., Lu, W., Zhang, Z.:
Deep structured energy based models for anomaly detection.
In: International Conference on Machine Learning. (2016) 1100–1109  [6] Goodfellow, I., PougetAbadie, J., Mirza, M., Xu, B., WardeFarley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in neural information processing systems. (2014) 2672–2680
 [7] Nagarajan, V., Kolter, J.Z.: Gradient descent gan optimization is locally stable. In: Advances in Neural Information Processing Systems. (2017) 5585–5595
 [8] Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two timescale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems. (2017) 6626–6637
 [9] Nowozin, S., Cseke, B., Tomioka, R.: fGAN: Training generative neural samplers using variational divergence minimization. In: Advances in Neural Information Processing Systems. (2016) 271–279
 [10] Chen, X., Wang, J., Ge, H.: Training generative adversarial networks via primaldual subgradient methods: A lagrangian perspective on gan. In: Proc. Int. Conf. Learn. Representations. (2018)
 [11] Li, J., Madry, A., Peebles, J., Schmidt, L.: Towards understanding the dynamics of generative adversarial networks. arXiv preprint arXiv:1706.09884 (2017)
 [12] Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized generative adversarial networks. In: Proc. Int. Conf. Learn. Representations. (2017)
 [13] Metz, L., Poole, B., Pfau, D., SohlDickstein, J.: Unrolled generative adversarial networks. In: Proc. Int. Conf. Learn. Representations. (2017)
 [14] Arora, S., Ge, R., Liang, Y., Ma, T., Zhang, Y.: Generalization and equilibrium in generative adversarial nets (gans). In: International Conference on Machine Learning. (2017) 224–232
 [15] Hoang, Q., Nguyen, T.D., Le, T., Phung, D.: Multigenerator gernerative adversarial nets. arXiv preprint arXiv:1708.02556 (2017)
 [16] Ghosh, A., Kulharia, V., Namboodiri, V., Torr, P.H., Dokania, P.K.: Multiagent diverse generative adversarial networks. arXiv preprint arXiv:1704.02906 (2017)
 [17] Nguyen, T., Le, T., Vu, H., Phung, D.: Dual discriminator generative adversarial nets. In: Advances in Neural Information Processing Systems. (2017) 2667–2677
 [18] Durugkar, I., Gemp, I., Mahadevan, S.: Generative multiadversarial networks. arXiv preprint arXiv:1611.01673 (2016)
 [19] Tolstikhin, I.O., Gelly, S., Bousquet, O., SimonGabriel, C.J., Schölkopf, B.: Adagan: Boosting generative models. In: Advances in Neural Information Processing Systems. (2017) 5424–5433
 [20] Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems. (2016) 2234–2242
 [21] Oliehoek, F.A., Savani, R., Gallego, J., van der Pol, E., Gross, R.: Beyond local nash equilibria for adversarial networks. arXiv preprint arXiv:1806.07268 (2018)
 [22] Grnarova, P., Levy, K.Y., Lucchi, A., Hofmann, T., Krause, A.: An online learning approach to generative adversarial networks. In: Proc. Int. Conf. Learn. Representations. (2018)
 [23] Goodfellow, I.: Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160 (2016)
 [24] Sion, M.: On general minimax theorems. Pacific Journal of mathematics 8(1) (1958) 171–176
 [25] Brown, G.W.: Iterative solution of games by fictitious play. Activity analysis of production and allocation 13(1) (1951) 374–376
 [26] Danskin, J.M.: Fictitious play for continuous games. Naval Research Logistics (NRL) 1(4) (1954) 313–320
 [27] Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: Proc. Int. Conf. Learn. Representations. (2016)
 [28] Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Volume 2. (2017) 4

[29]
Glorot, X., Bengio, Y.:
Understanding the difficulty of training deep feedforward neural
networks.
In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. (2010) 249–256
 [30] Kingma, D.P., Ba, J.L.: Adam: A method for stochastic optimization. In: Proc. 3rd Int. Conf. Learn. Representations. (2014)
Comments
There are no comments yet.