I Introduction
Generative Adversarial Network (GAN) is a class of generative models was originally proposed by Goodfellow et. al. in 2014 [26]. GAN has been gained wide attention in recent years due to its potential to model highdimensional complex realworld data and quickly became a promising research direction [35]
. As a type of generative models, GANs do not minimize a single training criterion. They are used for estimating real data probability distribution. GAN usually comprises two neural networks, a discriminator, and a generator, which are trained simultaneously via an adversarial learning concept. In summary, GAN is more powerful in both feature learning and representation
[15]. The discriminator attempts to differentiate between real data samples and fake samples made by the generator, while the generator tries to create realistic samples that cannot be distinguished by the discriminator [45].In particular, GAN models do not rely on any assumptions about the distribution and can generate infinite realistic new samples from latent space [35]
. This feature enables GANs to successfully applied to various applications, ranging from image synthesis, computer vision, video and animation generation, to speech and language processing, and cybersecurity
[3].The core idea of GANs is inspired by the twoplayer zerosum minimax game between the discriminator and the generator, in which the total utilities of two players are zero, and each player’s gain or loss is exactly balanced by the loss or gain of another player. GANs are designed to reach a Nash equilibrium at which each player cannot increase their gain without reducing the other player’s gain [26, 68]. Despite the significant success of GANs in many domains, applying these generative models to realworld problems has been hindered by some challenges. The most significant problems of GANs are that they are hard to train, and suffer from instability problems such as mode collapse, nonconvergence, and vanishing the gradients. GAN needs to converge to the Nash equilibrium during the training process, but it is proved that the convergence is challenging [70, 72].
Since 2014, GAN has been widely studied, and numerous methods have been proposed to address its challenges. However, to produce the high quality generated samples, it is necessary to improve the GAN theory as incompetence in this part is one of the most important obstacles to develop better GANs [7]. As the basic principle of GANs is based on the game theory and the data distribution is learned via a game between the generator and discriminator, exploiting the game theory techniques became one of the most discussed topics, attracted research efforts in recent years.
Ia Motivation and Contribution
The biggest motivation behind this survey was the absence of any other review paper that has particularly focused on the game theory advances in GANs. However, many comprehensive surveys on GANs have investigated GANs in detail with different focuses (e.g. [28, 68, 8, 33, 25, 35, 6, 82, 55, 43, 57, 24, 60, 30, 38]), but, to the best of our knowledge, this work is the first one to explore the GAN advancement from a gametheoretical perspective. Hence, in this paper, we attempt to provide the readers with recent advances in GANs using the game theory by classifying and surveying recently proposed works.
Our survey first introduces some background and key concepts in this field. Then we classify the recent proposed game of GANs models into three major categories: modified game model, modified architecture in terms of the number of agents, and modified learning method. Each group further classify into several subcategories. We review the main contributions of each work in each subcategory. We also point to some existing problems in the discussed context and forecast the potential future research topics.
IB Paper Structure and Organization
The rest of this paper is organized as follows. Section II presents some background on the game theory, and GANs includes basic idea, learning method, and challenges. In Section III we have a glimpse to the other surveys conducted in the field of GANs. We provide our proposed taxonomy in Section IV and review the research models in each category in this section. The final section is devoted to discussion and conclusion.
Ii Background and Preliminaries
Before moving on to presenting of our taxonomy and discussing the works applying game theory methods in GANs, it needs to present some preliminary concepts in the field of game theory and GANs. Here, we start by presenting an overview of game theory, and then move toward to GANs. Table I lists the acronyms and their definitions used in this paper.
Acronym  Full Name 
GAN  Generative Adversarial Network 
WGAN  Wasserstein GAN 
MADGAN  MultiAgent Diverse GAN 
MADGAN  Multiagent Distributed GAN 
MPM GAN  Message Passing MultiAgent GAN 
SeqGAN  Sequence GAN 
LGAN  Latentspace GAN 
FedGAN  Federated GAN 
ORGAN  ObjectiveReinforced GAN 
CSGAN  CyclicSynthesized GAN 
SCHGAN  Semisupervised Crossmodal Hashing GAN 
MolGAN  Molecular GAN 
RNN  Recurrent Neural Network 
RL  Reinforcement Learning 
IRL  Inverse Reinforcement Learning 
NE  Nash Equilibrium 
JSD  JensenShannon Divergence 
KL  KullbackLeibler 
AE  AutoEncoder 
DDPG  Deep Deterministic Policy Gradient 
ODE  Ordinary Differential Equation 
OCR  Optical Character Recognition 
SS  SelfSupervised task 
MS  Multiclass minimax game based Selfsupervised tasks 
SPE  Subgame Perfect Equilibrium 
SNEP  Stochastic Nash Equilibrium Problem 
SVI  Stochastic Variational Inequality 
SRFB  Stochastic Relaxed ForwardBackward 
aSRFB  Averaging over Decision Variables 
SGD  Stochastic Gradient Descent 
NAS  Neural Architecture Search 
IID  Independent and Identically Distributed 
DDL  Discriminator Discrepancy Loss 
Iia Game Theory
Game theory aims to model situations in which some decision makers interact with each other. The interaction between these decision makers is called ”game” and the decision makers are called ”players”. In each turn of the game, players have actions which the set of these actions is called strategy set. It is usually assumed that players are rational, which means that each agent tries to maximize its utility, and it is achieved by choosing the action that maximizes its payoff. Players’ action is chosen with respect to other players’ action and due to this, each agent should have a belief system about the other players [53].
Several solution concepts has been introduced for analyzing the games and finding Nash equilibrium is one of them. ”Nash equilibrium” is a state where each player cannot increase its payoff by changing its strategy. In the other words, Nash equilibrium is a state where nobody regrets about its choice given others’ strategy and with respect to its own payoff [62]
. In the situation where the players assign a probability distribution to the strategy sets instead of choosing one strategy, the Nash equilibrium is called ”mixed Nash equilibrium”
[62]. Constantsum games are two player games in which sum of the two players’ utilities is equal to this amount in all other states. When this amount equal to zero, it is called zerosum game [62].Another solution concept is maxmin strategy and maxmin strategy method. In maximin strategy method, the decision maker maximizes its worst case payoff, which happens when all other players cause as much harm as they can to the decision maker. In minimax strategy, decision maker wants to cause harm to others. To rephrase it, the decision maker wants to minimize other players’ maximum payoff [62]. The value which players get in the minimax or maximin strategy method is called minmax (minimax) or maxmin (maximin) value, respectively. In [51], Neumann proved that in any finite, two players, zerosum games, all Nash equilibria coincides with minmax strategy and maxmin strategy for players. Also, the minmax value and maxmin value are equal to Nash equilibrium utility.
IiB Generative Adversarial Networks
We provide some preliminaries on the GANs in order to facilitate the understanding of the basic and key concept of this generative model. In particular, first, we briefly review the generative models. Then, we give a brief description of the GAN by reviewing its basic idea, learning method, and its challenges.
IiB1 Generative Models
A generative model is a model whose purpose is to simulate the distribution of training set. Generative models can be divided into three types. In the first type, the model gets a training set with distribution of (unknown to model) and tries to make distribution which is an approximation of . The second one is the one which is solely capable of producing samples from and finally models where can do both. GANs which are one kind of generative models, concentrate on producing samples mainly [28].
IiB2 GAN Fundamentals
In 2014, Goodfellow et al. [27] introduced GANs as a framework in which two players are playing a game with the other one. The result of the game is to have a generative model that can produce samples similar to training set. In the game introduced, players are named generator and discriminator . Generator is the one which at the end should produce samples and discriminator’s aim is to distinguish training set samples and generator’s samples. The more indistinguishable samples is produced, the better the generative model is [27]. Any differentiable function such as a multilayer neural network can represent generator and discriminator. Generator, , inputs a prior noise distribution and maps it to approximation of training data distribution . Discriminator, , maps input data distribution into a real number in the interval of , which is the probability of being real sample instead of fake sample (the sample that generator produces) [25].
IiB3 GAN Learning Models
The generator and discriminator can be trained using an iterative process of optimizing an objective function. The objective function stated in (1), introduced by Goodfellow et. al. [27].
(1) 
Where and can be replaced from TABLE II based on divergence metric. The first proposed GAN uses JensenShannon metric.
Divergence metric  Game Value  
KullbackLeibler  
Reverse KL  
JensenShannon  
WGAN 
To train the simple model shown in Fig. 1, we first, fix and optimize to discriminate optimally. Next, we fix and try to minimize the objective function. Discriminator operates optimally if discriminator cannot distinguish between real and fake data. For example in JensenShannon metric it happens when equals to . If both discriminator and generator work optimally, the game reaches the Nash equilibrium and the value of the minmax and maxmin value will be equal. As shown in TABLE II for JensenShannon metric it is equal to .
Iii Related Surveys
As GAN is becoming increasingly popular, the number of works in this field, and consequently the review articles, are also increasing. By now, many surveys for GANs have been presented (about 40), which can be classified into three categories. The works in the first category [28, 68, 8, 33, 25, 35, 6, 82, 55, 43, 57, 24, 60, 30, 38] explore a relatively broad scope in GANs, including key concepts, algorithms, applications, different variants and architecture. In contrast, the surveys in the second group [48, 3, 72, 44, 54] focus solely on a specific segment or issue in the GANs (e.g. regularization methods, or lass functions) and review how the researchers deal with that problem. And, in the third category, a plethora of survey studies[7, 70, 74, 69, 64, 67, 2, 39, 58, 77, 78, 24, 10, 20, 18, 61]
summarize the application of GAN in a specific field, from computer vision and image synthesis, to cybersecurity and anomaly detection. In the following, we briefly review surveys in each category and express how our paper differs from the others.
Iiia GAN General surveys
Goodfellow in his tutorial [28] answers the most frequent questions in context of GAN. Wang et al. [68] surveys theoretic and implementation models of GAN, its applications, as well as the advantages and disadvantages of this generative model. Creswell et al. [8] provide an overview of GANs, especially for the signal processing community, by characterizing different methods for training and constructing GANs, and challenges in the theory and applications. In [24] Ghosh et al. present a comprehensive summary on the progression and performance of GANs along with their various applications. Saxena et al. in [60] conduct a survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. Kumar et al. [43]
presents stateoftheart related work in the GAN, its applications, evaluation metrics, challenges, and benchmark datasets. In
[57] two new deep generative models, including GA, are compared, and the most remarkable GAN architectures are categorized and discussed by Salehi et al.Gui et al. in [30] provide a review on various GANs methods from the perspectives of algorithms, theory, and applications. [38] surveys different GAN variants, applications and several training solutions. Hitawala in [33] presents different versions of GANs and provided a comparison between them from some aspects such as learning, architecture, gradient updates, object, and performance metrics. In the similar manner, Gonog et al. in [25] review the extensional variants of GANs, and classified them regarding how they are optimized the original GAN or change the basic structure, as well as their learning methods. In [35] Hong et al. discuss the details of the GAN from the perspective of various object functions and architectures and the theoretical and practical issues in training the GANs. The authors also enumerate the GAN variants that are applied in different domains. Bissoto et al. in [6]
conduct a review of the GAN advancements in six fronts includes architectural contributions, conditional techniques, normalization and constraint contributions, loss functions, imagetoimage translations, and validation metrics. Zhang et al., in their review paper
[82], survey twelve extended GAN models and classified them in terms of the number of game players. Pan et al. in [55] analyze the differences among different generative models, and classified them from the perspective of architecture and objective function optimization, and discussed the training tricks, evaluation metrics, and expressed GANs applications and challenges.IiiB GAN Challenges
In a different manner, Lucic in [48] conduct an empirical comparison on GAN models, with focus on unconditional variants. As another survey in the second category, Alqahtani et al. [3] mainly focus on potential applications of GANs in different domains. This paper attempts to identify advantages, disadvantages and major challenges for successful implementation of GAN in different application areas. As another specific review paper, Wiatrak et al. in [72] survey current approaches for stabilizing the GAN training procedure, and categorizing various techniques and key concepts. More specifically, in [44], Lee et al. review the regularization methods used in the stable training of GANs, and classified them into several groups by their operation principles. In contrast, [54] performs a survey for the loss functions used in GANs, and analyze the pros and cons of these functions. As differentially private GAN models provides a promising direction for generating private synthetic data, Fan et al. in [13] survey the existing approaches presented for this purpose.
IiiC GAN Applications
As we mentioned before, GANs has been successfully applied to enormous application. In this way, some review articles survey these advances. The authors in [7, 70, 74, 69, 64, 67, 2, 39, 58, 77] conduct some review on the different aspects of GAN progress in the field of computer vision and image synthesis. Cao et al. [7] review recently proposed GAN models and their applications in computer vision. Cao et al. in [7] compared the classical and stareofthe art GAN algorithms in terms of the mechanism, visual results of generated samples, and so on. Wang et al. structure a review [70] towards addressing practical challenges relevant to computer vision. They discuss the most popular architecturevariant, and lossvariant GANs, for tackling these challenges. Wu et al. in [74]
present a survey of image synthesis and editing, and video generation with GANs. They cover recent papers that leverage GANs in image applications including texture synthesis, image inpainting, imagetoimage translation, image editing, as well as video generation. In the same way,
[69] introduces the recent research on GANs in the field of image processing, and categorized them in four fields including image synthesis, imagetoimage translation, image editing, and cartoon generation.Researches such as [2] and [39] focus on reviewing recent techniques to incorporate GANs in the problem of texttoimage synthesis. In [2], Agnese et al. propose a taxonomy to summarize GAN based texttoimage synthesis papers into four major categories: Semantic Enhancement GANs, Resolution Enhancement GANs, Diversity Enhancement GANS, and Motion Enhancement GANs. Different from the other surveys in this field, Sampath et al. [58] examine the most recent developments of GANs techniques for addressing imbalance problems in image data. The realworld challenges and implementations of synthetic image generation based on GANs are covered in this survey.
In [77, 64, 67], the authors deal with the medical applications of image synthesis by GANs. Yi et al. in [77] describe the promising applications of GANs in medical imaging, and identifies some remaining challenges that need to be solved. As another paper in this subject, [64] reviews GANs’ application in image denoising and reconstruction in radiology. Tschuchnig et al. in [67] summarize existing GAN architectures in the field of histological image analysis.
As another application of GANs, [78] and [24] structure reviews on the GANs in the cybersecurity. Yinka et al. [78] survey studies where the GAN plays a key role in the design of a security system or adversarial system. Ghosh et al. [24] focus on the various ways in which GANs have been used to provide both security advances and attack scenarios in order to bypass detection systems.
Di Mattia et al. [10] survey the principal GANbased anomaly detection methods. Geogres et al. [20] review the published literature on Observational Health Data to uncover the reasons for the slow adoption of GANs for this subject. Gao et al. in [18]
address the practical applications and challenges relevant to spatiotemporal applications such as trajectory prediction, events generation and timeseries data imputation. The recently proposed user mobility synthesis schemes based on GANs are summarized in
[61].According to the classification provided for review papers, our survey falls into the second category. We focus specifically on the recent progress of the application of gametheoretical approaches towards addressing the GAN challenges. While several surveys for GANs have been presented to date, to the best of our knowledge, our survey is the first to address this topic. Although the authors in [72] presented a few gamemodel GANs, they have not done a comprehensive survey in this field, and many new pieces of research have not been covered. We hope that our survey will serve as a reference for interested researchers on this subject.
Iv Game of GANs: A Taxonomy
In this section, we will present our taxonomy to summarize the reviewed papers into three categories by focusing on how these work are extended from the original GAN. The taxonomy is done in terms of 1. modified game mode, 2. architecture modification, and 3. modified learning algorithms, as shown in Fig. 2. Based on these primary classes, we further classify each category into some subsets (Fig. 2). In the following sections, we will introduce each category and the recent advances in each group will be discussed.
Iva Modified Game Model
The core of all GANs is a competition between a generator and a discriminator, which model as a game. Therefore, game theory plays a key role in this context. However, most of GANs relying on the basic model, formulating it as a twoplayer zerosum (minimax) game, but some research utilized other game variants to tackle the challenges in this field. In this section, we aim to review these literatures. We classify the works under this category into three subcategories. Section IVA1 presents researches that cast the training process as a stochastic game. Research works presented in Section IVA2 apply the idea of leaderfollower of the Stackelberg game in the GANs. Finally Section IVA3 presents GANs models as a Biaffine game. A summary of the reviewed researches in the modified game model category is shown in Table III.
IvA1 Stochastic game
One of the main issues for GANs is that these neural networks are very hard to train because of the convergence problems. Franci et al. in [17] addressed this problem by casting the training procedure as a stochastic Nash equilibrium problem (SNEP). The SNEP will recast as a stochastic variational inequality (SVI) and target the solutions that are SNE. The advantage of this approach is that there are many algorithms for finding the solution of an SVI, like the forwardbackward algorithm, also known as gradient descent. Franci et al. proposed a stochastic relaxed forwardbackward (SRFB) algorithm and a variant with an additional step for averaging over decision variables (aSRFB) for the training process of GANs. For proving convergence to a solution, we need monotonicity on the pseudogradient mapping, which is defined by Equation (2), where and are the payoff functions of the generator and the discriminator.
(2) 
If pseudogradient mapping of the game is monotone and the increasing number of samples is available, the algorithm converges to the exact solution but with only finite, fixed minibatch samples, and by using the averaging technique, it will converge to a neighborhood of the solution.
IvA2 Stackelberg game
One of the main issues for GAN is the convergence of the algorithm. Farnia et al. in [14] showed that ”GAN zerosum games may not have any local Nash equilibria” by presenting certain theoretical and numerical examples of standard GAN problems. Therefore, based on the natural sequential type of GANs where the generator moves first and follows the discriminator (leader), this problem can be considered as a Stackelberg game and focused on subgame perfect equilibrium (SPE). For solving the convergence issue, the authors tried to find the equilibrium called proximal equilibrium which enables traversing the spectrum between Stackelberg and Nash equilibria. In a proximal equilibrium, as shown in Equation (3) allow the discriminator locally to optimize in a normball nearby the primary discriminator. To keep the close to , they penalize the distance among the two functions by , as goes from zero to infinity, the equilibria change from Stackelberg to Nash.
(3) 
Farnia et al. also proposed proximal training which optimizes the proximal objective instead of the original objective that can apply to any twoplayer GAN. Zhang et al. in [80] also used model GAN by this game and presented Stackelberg GAN to tackles the instability of GANs training process. Stackelberg GAN is using a multigenerator architecture and the competition is between the generators (followers) and the discriminator (leader). We discussed the architecture details in Section IVB1.
IvA3 Biaffine game
Hssieh et al. in [37] examine the training of GANs by reconsidering the problem formulation from the mixed NE perspective. In the absence of convexity, the theory focuses only on the local convergence, and it implies that even the local theory can break down if intuitions are blindly applied from convex optimization. In [37]
the mixed Nash Equilibria of GANs is proposed, that they are, in fact, global optima of infinitedimensional biaffine games. Finitedimensional biaffine games are also applied for finding mixed NE of GANs. It’s also shown that we can relax all current GAN objectives into their mixed strategy forms. Eventually, in this article, it’s experimentally shown that their method achieves better or comparable performance than popular baselines such as SGD, Adam, and RMSProp.
Reference  Convergence  Methodology and Contributions  Pros  Cons 
Stackelberg Game: Subsection IVA2  
Stackelberg GAN [80]  SPNE  Models multigenerator GANs as a Stackelberg game  Can be built on top of standard GANs & Proved the convergence   
[14]  SPE  Theoretical examples of standard GANs with no NE. & proximal equilibrium as a solution, proximal training for GANs  Can apply to any twoplayer GANs, allow the discriminator to locally optimize  Focus only on the pure strategies of zerosum GANs in nonrealizable settings 
Stochastic Game: Subsection IVA1  
[17]  SNE  Cast the problem as SNEP and recast it to SVI, SRFB and aSRFB solutions  Proved the convergence  Need monotonicity on the pseudogradient mapping, increasing number of samples to reach an equilibrium 
Biaffine Game: Subsection IVA3  
[37]  Mixed NE  Tackling the training of GANs by reconsidering the problem formulation from the mixed Nash Equilibria perspective  Showing that all GANs can be relaxed to mixed strategy forms, flexibility   
IvB Modified Architecture
As we mentioned in Section II, GANs are a framework for producing a generative model through a twoplayer minimax game; however, in recent works, by extending the idea of using a single pair of generator and discriminator to the multiagent setting, the twoplayer game transforms to multiple games or multiagent games.
In this section, we review literature in which proposed GAN variants have modified the architecture in such a way that we have GANs with a mixture of generators, and/or discriminators and show how applying such methods can provide better convergence properties and prevent mode collapse. However, the majority of the works in this category focuses on introducing a larger number of generators and/or discriminators, but, in some papers, the number of generators and discriminators did not change, but another agent has been added that converts the problem to a multiagent one.
In Section IVB1, we will discuss GAN variants which extended the basic structure from a single generator to many generators. In Section IVB2, we are going to review articles that deal with the problem of mode collapse by increasing the number of discriminators in order to force the generator to produce different modes. Section IVB3 is dedicated to discussing works which develop GANs with multiple generators and multiple discriminators. Articles will be reviewed in Sections IVB4 and IVB5 extend the architecture by adding another agent, which is a classifier (Section IVB4 ) or an RL agent (Section IVB5), to show the benefits of an adding these agents to GANs. The methodology, contribution as well as the pros and cons of reviewed papers are summarized in Table IV.
IvB1 Multiple generators, One discriminator
The minimax gap is smaller in GANs with multigenerator architecture and more stable training performances are experienced in these GANs [80]. As we mentioned in Section IVA2, Zhang et al. in [80] tackle the problem of instability during the GAN training as a result of a gap between minimax and maximin objective values. To mitigate this issue, they design a multigenerator architecture and model the competition among agents as a Stackelberg game. Results have shown the minimax duality gap decreases as the number of generators increases. In this article, the mode collapse issue is also investigated and showed that this architecture effectively alleviates the mode collapse issue. One of the significant advantages of this architecture is that it can be applied to all variants of GANs, e.g., Wasserstein GAN, vanilla GAN, etc. Additionally, with an extra condition on the expressive power of generators, it is shown that Stackelberg GAN can achieve approximate equilibrium with generator [80].
Furthermore, Ghosh et al. in [22] proposed a multigenerator and single discriminator architecture for GANs named MultiAgent Diverse Generative Adversarial Networks (MADGAN). In this paper, different generators capture varied, high probability modes, and the discriminator is designed such that, along with finding the real and fake samples, identifies the generator that generated the given fake sample [22]. It is shown that at convergence, the global optimum value of is achieved, where k is the number of generators.
Comparing presented models in [22] and [80], in MADGAN [22] multiple generators are combined with the assumption that the generators and the discriminator have infinite capacity, but in the Stackelberg GAN [80] there is no assumption on the model capacity. Also, in MADGAN [22] the generators share common network parameters, although, in the Stackelberg GAN [80] various sampling schemes beyond the mixture model is allowed, and each generator has free parameters.
The assumption that increasing generators will cover the whole data space is not valid in practice. So Hoang et al. in [34], in contrast with [22], approximate data distribution by forcing generators to capture a subset of data modes independently of those of others instead of forcing generators by separating their samples. Thus, they established a minimax formulation among a classifier, a discriminator, and a set of generators. The classifier determines which generator generates the sample by performing multiclass classification. Each generator is encouraged to generate data separable from those produced by other generators because of the interaction between generators and the classifier. In this model, multiple generators create the samples. Then one of them will be randomly picked as the final output similar to the mechanism of a probabilistic mixture model. Therefore they theoretically prove that, at the equilibrium, the JensenShannon divergence (JSD) between final output and the data distribution is minimal. In contrast, the JSD amongst generators’ distributions is maximal, hence the mode collapse problem is effectively avoided. Moreover, the computational cost that is added to the standard GAN is minimal in the suggested model by applying parameter sharing. The proposed model can efficiently scale to largescale datasets as well.
Ke et al. propose a new architecture, multiagent distributed GAN (MADGAN) in this article [41]. In this framework, the discriminator is considered as a leader, and the generator is considered as a follower. Also, this article is based on the social group wisdom and the influence of the network structure on agents. MADGAN can have a multigenerator and multidiscriminator architecture (e.g., two discriminators and four generators) as well as multiple generators and single discriminator architecture, which is our discussed topic in this section. One of the vital contributions of MADGAN is that it can train multiple generators simultaneously, and the training results of all generators are consistent.
Furthermore, in Message Passing MultiAgent Generative Adversarial Networks [23] it is proposed that with two generators and one discriminator that communicate through message passing better image generation can be achieved. In this paper, there are two objectives such as competing and conceding. The competing is introduced based on the fact that the generators compete with each other to get better scores for their generations from the discriminator. However, the conceding is introduced based on the fact that the two generators try to guide each other in order to get better scores for their generations from the discriminator and ensures that the message sharing mechanism guides the other generator to generate better than itself. Generally, in this paper, innovative architectures and objectives aimed at training multiagent GANs are presented.
IvB2 One generator, Multiple discriminators
The multidiscriminators are constructed with homogeneous network architecture and trained for the same task from the same training data. In addition to introducing a multidiscriminators schema, Durugkar et al. in [11], from the perspective of the game theory, show that because of these similarities, the discriminators act like each other; thus, they will converge to similar decision boundaries. In the worst case, they may even converge to a single discriminator. So, Jin et al. in [40] by discriminator discrepancy loss (DDL), the multiplayer minimax game unifies the optimization of DDL and the GAN loss, seeking an optimal tradeoff between the accuracy and diversity of multi discriminators. Compared to [11], Hardy et al. in [32] distributed discriminators over multiple servers. Thus, they can train over datasets that are spread over numerous servers.
In FakeGAN proposed in [1], Aghakhani et al. use two discriminators and one generator. The discriminators use the Monte Carlo search algorithm to evaluate and pass the intermediate actionvalue as the reinforcement learning (RL) reward to the generator. The generator is modeled as a stochastic policy agent in RL [1]. Instead of one batch in [40], Mordido et al. in [50] divide generated samples into multiple microbatch. Then update each discriminator’s task to discriminate between different samples. Samples coming from its assigned fake microbatch and samples from the microbatches assign to the other discriminator, together with the real samples.
Unlike [11], Nguyen et al. in [52] combined the KullbackLeibler (KL) and reverse KL divergence (the measure of how one probability distribution is different from a second) into a unified objective function. Combining these two measures can exploit the divergence’s complementary statistical properties to diversify the estimated density in capturing multi modes effectively. From the perspective of game theory in [52], there are two discriminators and one generator with the analogy of a threeplayer minimax game. In this case, there is a two pair of players which are playing two minimax game simultaneously. In one of the games, the discriminator rewards high scores for samples from data distribution (reverse KL divergence) (4), while another conversely rewards high scores for samples from the generator, and the generator produces data to fool both two discriminators (KL divergence) (5).
(4) 
(5) 
hyperparameters are being used to control and stabilize the learning method.
Minimizing the KullbackLeibler (KL) divergence between data and model distributions covers multiple mods but may produce completely unseen and potentially undesirable samples. In reverse KL divergence, it is observed that optimization towards the reverse KL divergence criteria mimics the mod seeking process where the concentrates on a single mode of while ignoring other modes.
IvB3 Multiple generators, Multiple discriminators
The existence of equilibrium has always been considered one of the open theoretical problems in this game between generator and discriminator. Arora et al. in [4] turn to infinite mixtures of generator’s deep nets in order to investigate the existence of equilibria. Unsurprisingly, equilibrium exists in an infinite mixture. Therefore, in [4] showed that a mixture of a finite number of generators and discriminators can approximate minmax solution in GANs. This implies that an approximate equilibrium can be achieved with a mixture (not too many) of generators and discriminators. In this article [4]
, a heuristic approximation to the mixture idea is proposed to introduce a new framework for training called MIX+GAN: use a mixture of T components, where T is as large as allowed by the size of GPU memory (usually T
5).In fact, a mixture of T generators and T discriminators are trained which share the same network architecture, but have their own trainable parameters. Maintaining a mixture represents maintaining a weight for the generator which corresponds to the probability of selecting the output of. These weights for the generator are updated by backpropagation. This heuristic can be applied to existing methods like DCGAN, WGAN, etc. Experiments show MIX+GAN protocol improves the quality of several existing GAN training methods and can lead to more stable training.
As we mentioned earlier, one of the significant challenges in GAN algorithms is their convergence. Refer to this paper [56] this challenge is a result of the fact that cost functions may not converge using gradient descent in the minimax game between the discriminator and the generator. Convergence is also one of the considerable challenges in federated learning. This problem becomes even more challenging when data at different sources are not independent and identically distributed. Therefore, [56] proposed an algorithm for multigenerator and multidiscriminator architecture for training a GAN with distributed sources of nonindependentandidenticallydistributed data sources named Federated Generative Adversarial Network (FedGAN). Local generators and discriminators are used in this algorithm. These generators and discriminators are periodically synced via an intermediary that averages and broadcasts the generator and discriminator parameters. In fact, results from stochastic approximation for GAN convergence and communicationefficient SGD for federated learning are connected by Rasouli et. al. in this article to address FedGAN converge. One of the notable results in [56] is that FedGAN has similar performance to general distributed GAN while it converges and reduces communication complexity as well.
In [41] multiagent distributed GAN (MADGAN) framework is proposed based on the social group wisdom and the influence of the network structure on agents, in which the discriminator and the generator are regarded as the leader and the follower, respectively. The multiagent cognitive consistency problem in the largescale distributed network is addressed in MADGAN. In fact, in this paper [41]
the conditions of consensus are presented for a multigenerator and multidiscriminator distributed GAN by analyzing the existence of stationary distribution to the Markov chain of multiple agent states. The experimental results show that the generation effect of the generators trained by MADGAN can be comparable to that of the generator trained by GAN. More important, MADGAN can train multiple generators simultaneously, and the training results of all generators are consistent.
IvB4 One generator, One discriminator, One classifier
One of the issues that GANs face is catastrophic forgetting in the discriminator neural network. Selfsupervised (SS) tasks were planned to handle this issue, however, these methods enable a seriously modecollapsed generator to surpass the SS tasks. Tran et al. in [66] proposed new SS tasks, called Multiclass minimax game based Selfsupervised tasks (MS) which is based on a multiclass minimax game , including a discriminator, a generator, and a classifier. The SS task is a 4way classification task of recognizing one among the four image rotations (0, 90, 180, 270 degrees). The discriminator SS task is to train the classifier C that predicts the rotation applied to the real samples and the generator SS task is to train the generator G to produce fake samples for maximizing classification performance. The SS task helps the generator learn the data distribution and generate diverse samples by closing the gap between supervised and unsupervised image classification. The theoretical and experimental analysis showed that the convergence of this approach has progressed.
Li et al. in [45]
also used a classifier generating categorized text. The authors proposed a new framework CyclicSynthesized GAN (CSGAN), which uses GAN, RNN, and RL to generate better sentences. The classifier position is to ensure that the generated text contains the label information and the RNN is a character predictor because the model is built at the character level to limit the large searching space. We can divide the generation process into two steps, first adding category information into the model and making the model generate category sentences respectively, then combine category information in GAN to generate labeled sentences. CSGAN acts strongly in supervised learning, especially in the multicategories datasets.
IvB5 One generator, One discriminator, One RL agent
With an AL agent, we can have fast and robust control over the GAN’s output or input. This architecture also can be used to optimize the generation process by adding an arbitrary (not necessarily differentiable) objective function to the model.
In [9], Cao et al. used this architecture for generating molecules and drug discovery. The authors encoded the molecules as the original graphbased representation, which has no overhead comparing to similar approaches like SMILES [71], which generates a text sequence from the original graph. For the training part, authors were not only interested in generating chemically valid compounds, but also tried to optimize the generation process toward some nondifferentiable metrics(e.g., how likely the new molecule is watersoluble or fatsoluble) using RL agent. In Molecular GAN (MolGAN), an external software will compute the RL loss for each molecule. The linear combination of RL loss and WGAN loss is utilized by the generator.
Weininger et al. in [71] tackled the same problem. Comparing to [9], they encoded the molecules as text sequences by using SMILES, the string representation of the molecule, not the original graphbased one. They presented ObjectiveReinforced Generative Adversarial Networks (ORGAN), which is built on SeqGAN [79] and their RL agent uses REINFORCE [73], a gradientbased approach instead of deep deterministic policy gradient (DDPG) [47], an offpolicy actorcritic algorithm which Cao et al. used in [9]. MolGAN gains better chemical property scores comparing to ORGAN, but this model suffers from mode collapse because both the GAN and the RL objective do not encourage generating diverse outputs; alternatively, the ORGAN RL agent depends on REINFORCE, and the unique score is optimized penalizing nonunique outputs.
For controling the generator, we can also use an RL agent. Sarmad et al. in [59] presented RLGANNet,a realtime completion framework for point cloud shapes. Their suggested architecture is the combination of an autoencoder (AE), a reinforcement learning (RL) agent and a latentspace generative adversarial network (lGAN). Based on the pretrained AE, the RL agent selects the proper seed for the generator. This idea of controlling the GAN’s output can open up new potentialities to overcome the fundamental instabilities of current deep architectures.
Reference  Methodology and Contributions  Pros  Cons 
Multiple generators, One discriminator: Subsection IVB1  
Stackelberg GAN[80]  Tackling the instability problem in the training procedure with multigenerator architecture  More stable training performances, alleviate the mode collapse   
MADGAN[41]  A multiagent distributed GAN framework based on the social group wisdom  Simultaneously training of multiple generators, consistency of all generators’ training results   
MADGAN [22]  A multiagent diverse GAN architecture  Capturing diverse modes while producing highquality samples  Assumption of infinite capacity for players, global optimal is not practically reachable 
MGAN [34]  Encouraging generators to generate separable data by classifier  Overcoming the mode collapsing, diversity   
[23]  An innovative message passing model, where messages being passed among generators  Improvement in image generation, valuable representations from message generator   
One generator, Multiple discriminators: Subsection IVB2  
DDLGAN [40]  Using DDL  Diversity  Only applicable to multiple discriminators 
D2GAN [52]  Combine KL, reverse KL  Quality and diversity, scalable for largescale datasets  Not powerfull as the combination of autoencoder or GAN 
GMAN [11]  Multiple discriminators  Robust to mode collapse  Complexity, converge to same outputs 
microbatch GAN [50]  Using microbatch  Mitigate mode collapse   
MDGAN [32]  Parallel computation, distributed data  Less communication cost, computation complexity   
FakeGAN [1]  Text classification     
Multiple generators, Multiple discriminators: Subsection IVB3  
[4]  Tackling generalization and equilibrium in GANs  Improve the quality of several existing GAN training methods  Aggregation of losses with an extra regularization term, discourages the weights being too far away from uniform 
MADGAN[41]  Address the multiagent cognitive consistency problem in largescale distributed network  Simultaneously training of multiple generators, consistency of all generators’ training results   
FedGAN[56]  A multigenerator and multidiscriminator architecture for training a GAN with distributed sources  Similar performance to general distributed GAN with reduction in communication complexity   
One generators, One discriminator, One classifier: Subsection IVB4  
CSGAN [45]  Combile RNN, GAN, and RL, Use a classifier to validate category, A character level model  generate sentences based on category, limiting action space   
[66]  Multiclass minimax game based selfsupervised tasks,  Improve convergence, can integrate into GAN models   
One generators, One discriminator, One RL agent: Subsection IVB5  
MolGAN [9]  Use Original graphstructured data,use RL objective to generate specific chemical property  Better chemical property scores, no overhead in representation  Susceptible to mode collapse 
ORGAN [31]  Encode molecules as text sequences, control properties of generated samples with RL, use Wasserstein distance as loss function  Better result than trained RNNs via MLE or SeqGAN  Overhead in representation, works only on sequential data 
RLGANNet [59]  Use RL to find correct input for GAN, Combine AE, RL and lGAN  A real time point cloud shape completion, less complexity   
IvC Modified Learning Algorithm
This category covers methods in which the proposed improvements involve modification in learning methods. Here, in this section, we turn our attention to the literature which combines the other learning approaches such as fictitious play and reinforcement learning with GANs.
Different variation of GANs which are surveyed in IVC1 study GAN training process as a regret minimization problem instead of the popular view which seeks to minimize the divergence between real and generated distributions. As another learning method, subsection IVC2 utilizes fictitious play to simulate the training algorithm on GAN. IVC3 provides a review on the proposed GAN models that are used a federated learning framework which trains across distributed sources to overcome the data limitation of GANs. Researches in IVC4 seek to make a connection between GAN and RL. Table V summarizes the contributions, pros and limitations of literature reviewed in this catgory.
IvC1 Noregret learning
The best response algorithms for GAN are often computationally intractable, and they do not lead to convergence and have cycling behavior even in simple games. However, the simple solution, in that case, is to average the iterates. Regret minimization is the more suitable way to think about GAN training dynamics. In [42], Kodali et al. propose studying GAN training dynamics as a repeated game that both players use noregret algorithms. Also, the authors show that the GAN game’s convexconcave case has a unique solution. If G and D have enough capacity in the nonparametric limit and updates are made in the function space, the GAN game is convexconcave. It also can be guaranteed convergence (of averaged iterates) using noregret algorithms. With standard arguments from game theory literature, the authors show that the discriminator does not need to be optimal at each step.
In contrast to [42], much of the recent developments [28] are based on the unrealistic assumption that the discriminator is playing optimally; this corresponds to at least one player using the bestresponse algorithm. But in the practical case with neural networks, these convergence results do not hold because the game objective function is nonconvex. In nonconvex games, global regret minimization and equilibrium computation are computationally hard. Moreover, Kodali et al. in [42] also analyze GAN training’s convergence from this point of view to understand mode collapse. They show that mode collapse happens because of undesirable local equilibria in this nonconvex game (accompanied by sharp gradients of the discriminator function around some real data points). Furthermore, the authors show that a gradient penalty scheme can avoid the mode collapse by regularizing the discriminator to constrain its gradients in the ambient data space.
In [29] compares to [42], although Grnarova et al. use regret minimization, they provide a method that provably converges to an MN equilibrium. Because the minimax value of pure strategy for the generators is always higher than the minimax value of the mix equilibrium strategy of generators; thus, the generators are more suitable. This convergence happens for semishallow GAN architectures using regret minimization procedures for every player. Semishallow GAN architectures are architectures that the generator is any arbitrary network, and the discriminator consists of a single layer network. This method is done even though the game induced by such architectures is not convexconcave. Furthermore, they show that the minimax objective of the generator’s equilibrium strategy is optimal for the minimax objective.
IvC2 Fictitious play
GAN is a twoplayer zerosum game with a repeated game as the training process. If the zerosum game is played repeatedly between two rational players, they try to increase their payoff. Let show the action taken by player at time and are previous actions chosen by player . So player can choose the best response, assuming player is choosing its strategy according to the empirical distribution of . Thus, the expected utility is a linear combination of utilities under different pure strategies. So we can assume that each player plays the best pure response at each round. In the game theory, this learning rule is called fictitious play and can help us find the Nash equilibrium. The fictitious play achieves a Nash equilibrium in twoplayer zerosum games if the game’s equilibrium is unique. However, if there exist multiple Nash equilibriums, other initialization may yield other solutions.
By relating GAN with the twoplayer zerosum game, Ge et al. in [19] design a training algorithm to simulate the fictitious play on GAN and provide a theoretical convergence guarantee. They also show that by assuming the best response at each update in Fictitious GAN, the distribution of the mixture outputs from the generators converges to the data distribution. The discriminator outputs converge to the optimum discriminator function. The authors in [19] use two queues D and G, to store the historically trained models of the discriminator and the generator. They also show that Fictitious GAN can effectively resolve some convergence issues that the standard training approach cannot resolve and can be applied on top of existing GAN variants.
IvC3 Federated learning
Data limitation is a common drawback in deep learning models like GANs. We can solve this issue by using distributed data from multiple sources, but this is difficult due to some reasons like privacy concerns of users, communication efficiency and statistical heterogeneity, etc. This brings the idea of using federated learning in GANs to address these subjects [56, 12].
Rasouli et al. in [56], proposed a federated approach to GANs, which trains over distributed sources with nonindependentandidenticallydistributed data sources. In this model, every K time steps of local gradient, agents send their local discriminator and generator parameters to the intermediary and receive back the synchronized parameters. Due to the average communication per round per agent, FedGAN is more efficient compare to general distributed GAN. Experiments also proved FedGAN is robust by increasing K. For proving the convergence of this model, the authors connect the convergence of GAN to convergence of an Ordinary Differential Equation (ODE) representation of the parameter updates [49] under equal or two timescale updates of generators and discriminators. Rasouli et al. showed that the FedGAN ODE representation of parameters update asymptotically follows the ODE representing the parameter update of the centralized GAN. So by using the existing results for centralized GAN, FedGAN also converges.
Fan et al. in [12] also proposed a generative learning model using a federated learning framework. The aim is to train a unified central GAN model with the combined generative models of each client. Fan et al. examine 4 kinds of synchronization strategies, synchronizing each the central model of D and G to every client (Sync D&G) or simply sync the generator or the discriminator (Sync G or Sync D) or none of them (Sync None). In situations where communication costs are high, they recommend Sync G while losing some generative potential, otherwise synchronize both D and G. [12]
results showed that federate learning is commonly robust to the number of agents with Independent and Identically Distributed (IID) and fairly nonIID training data. However, for highly skewed data distribution, their model performed abnormality due to weight divergence.
IvC4 Reinforcement learning
Crossmodal hashing tries to map different multimedia data into a common Hamming space, realizing fast and flexible retrieval across different modalities. Crossmodal hashing has two weaknesses: (1) Depends on largescale labeled crossmodal training data. (2) Ignore the rich information contained in a large amount of unlabeled data across different modalities. So Zhang et al. in [81] propose Semisupervised Crossmodal Hashing GAN (SCHGAN) that exploits a large amount of unlabeled data to improve hashing learning. The generator takes the correlation score predicted by the discriminator as a reward and tries to pick margin examples of one modality from unlabeled data when giving another modality query. The discriminator tries to predict the correlation between query and chosen examples of the generator using Reinforcement learning.
An agent trained using RL is only able to achieve the single task specified via its reward function. So Florensa et al. in [16] provide Goal Generative Adversarial Network (Goal GAN). This method allows an agent to automatically discover the range of tasks at the appropriate level of complexity for the agent in its environment with no prior knowledge about the environment or the tasks being performed and allows an agent to generate its own reward functions. The goal discriminator is trained to evaluate whether a goal is at the appropriate level of difficulty for the current policy. The goal generator is prepared to generate goals that meet these criteria.
GAN has limitations when the goal is for generating sequences of discrete tokens. First, it is hard to provide the gradient update from the discriminator to the generator when the outputs are discrete. Second, The discriminator can only reward an entire sequence after generation; for a partially generated sequence, it is nontrivial to balance how well it is now and how well it will be in the future as the whole sequence. Yu et al. in [79] proposed Sequence GAN (SeqGAN) and model the data generator as a stochastic policy in reinforcement learning (RL). The RL reward signal comes from the discriminator decided on a complete sequence and, using the Monte Carlo search, is passed back to the intermediate stateaction steps. So in this method, they care about the longterm reward at every timestep. The authors consider not only the fitness of previous tokens but also the resulted future outcome. ”This is similar to playing the games such as Go or Chess, where players sometimes give up the immediate interests for the longterm victory” [63].
The main problem in [79] is that the classifier’s reward cannot accurately reflect the novelty of text. So, in [75] in comparison to [79], Yu et al. assign a low reward for repeatedly generated text and high reward for ”novel” and fluent text, encouraging the generator to produce diverse and informative text, and propose a novel languagemodel based discriminator, which can better distinguish novel text from repeated text without the saturation problem. The generator reward consists of two parts, the reward at the sentence level and that at the word level. The authors maximize the reward of real text and minimize fake text rewards to train the discriminator. The reason for minimizing the reward of generated text is that the text that is repeatedly generated by the generator can be identified by the discriminator and get a lower reward. The motivation of maximizing the reward of realworld data lies in that not only the uncommon text in the generated data can get a high reward, but also the discriminator can punish lowquality text to some extent.
The same notion of SeqGAN can be applied in domains such as image captioning. Image captioning’s aim is to describe an image with words. Former approaches for image captioning like maximum likelihood method suffer from a socalled exposure bias problem which happens when the model tries to produce a sequence of tokens based on previous tokens. In this situation, the model may generate tokens that were never seen in training data
[5]. Yan et al. in [76] used the idea of SeqGAN to address the problem of exposure bias. In this scheme, the image captioning generator is considered as the generator in the GAN framework whose aim is to describe the images. The discriminator has two duties, the first is to distinguish the real description and generated one and the second is to figure out if the description is related to the image or not. To deal with the discreteness of the generated text, the discriminator is considered as an agent which produces a reward for the generator. Although, lack of intermediate reward is another problem which solves by using the Monte Carlo rollout strategy same as SeqGAN.Finding new chemical compounds and generating molecules are also challenging tasks in a discrete setting. [31] and [9] tackled this problem and proposed two models that rely on SeqGAN. The main difference is adding an RL component to the basic architecture of GAN, where we discussed in subsection IVB5.
The idea behind SeqGAN has also been applied to generating sentences with certain labels. Li et al. in [45] introduced CSGAN, which consists of a generator and a descriptor (discriminator and classifier). In this model, the generator takes an action, and the descriptor task is to identify sentence categories by returning the reward. Details of this model are explained in subsection IVB4.
Aghakhani et al. in [1] introduce a system that for the first time expands GANs for a text classification task, specifically, detecting deceptive reviews (FakeGAN). Previous models for text classification have limitations: (1) Biased problems like Recurrent NN, where later words in a text have more weight than earlier words. (2) Correlation with the window size like CNN. Unlike standard GAN with a single Generator and Discriminator, FakeGAN uses two discriminators and one generator. The authors modeled the generator as a stochastic policy agent in reinforcement learning (RL) and used the Monte Carlo search algorithm for the discriminators to estimate and pass the intermediate actionvalue as the RL reward to the generator. One of the discriminators tries to distinguish between truthful and deceptive reviews, whereas the other tries to distinguish between fake and real reviews.
Ghosh et al. in [21] use GANs for learning the handwriting of an entity and combine it with reinforcement learning techniques to achieve faster learning. The generator can generate words looking similar to the reference word, and the discriminator network can be used as an OCR (optical character recognition) system. The concept of reinforcement learning comes into play when letters need to be joined to form words, such as the spacing between characters and strokes from one note to another, and provide suitable rewards or penalties for the generator to learn the handwriting with greater accuracy.
The optimized generation of sequences with particular desired goals is challenging in sequence generation tasks. Most of the current work mainly learns to generate outputs that are close to the real distribution. However, in many applications, we need to generate data similar to real ones and have specific properties or attributes. Hossam et al. in [36]
introduce the first GANcontrolled generative model for sequences that address the diversity issue in a principled approach. The authors combine GAN and RL policy learning benefits while avoiding modecollapse and high variance drawbacks. The authors show that if only pure RL is applied with the GANbased objective, the realistic quality of the output might be sacrificed for the cause of achieving a higher reward. For example, in the textgeneration case, by generating sentences in which few words are repeated all the time, the model could achieve a similar quality score. Hence, combining a GANbased objective with RL promotes the optimization process of RL to stay close to the actual data distribution. This model can be used for any GAN model to enable it to optimize the desired goal according to the given task directly.
A novel RLbased neural architecture search (NAS) methodology is proposed for GANs in [65]
by Tian et al. Markov decision process formulation is applied to redefine the issue of neural architecture search for GANs in this article, therefore a more effective RLbased search algorithm with more global optimization is achieved. Additionally, data efficiency can be improved due to better facilitation of offpolicy RL training by this formulation
[65]. Onpolicy RL is used in most of the formerly proposed search methods employed in RLbased GAN architecture, which may have a significantly long training time because of limited data efficiency. Agents in offpolicy RL algorithms are enabled to learn more accurately as these algorithms use past experience. However, using offpolicy data can lead to unstable policy network training because these training samples are systematically different from the onpolicy ones [65]. A new formulation in [65] supports the offpolicy strategy better and lessens the instability problem.Reference  Methodology and Contributions  Pros  Cons 
Noregret learning: Subsection IVC1  
DRAGAN [42]  Applying noregret algorithm, new regularizer  High stability across objective functions, mitigates mode collapse   
Chekhov GAN [29]  Online learning algorithm for semi concave game  Converge to mixed NE for semi shallow discriminator   
Fictitious play: Subsection IVC2  
Fictitious GAN [19]  Fictitious (historical models)  Solve the oscillation behavior, solve divergence issue on some cases, applicable  Applied only on 2 player zerosum games 
Federated learning: Subsection IVC3  
FedGAN [56]  Communicationefficient distributed GAN subject to privacy constraints, connect the convergence of GAN to ODE  Prove the convergence, less communication complexity compare to general distributed GAN   
[12]  Using a federated learning framework  Robustness to the number of clients with IID and moderately nonIID data  Performs anomaly for highly skewed data distribution, accuracy drops with nonIID data 
Reinforcement learning: Subsection IVC4  
[16]  Generate diverse appropriate level of difficulty set of goal     
Diversitypromoting GAN [75]  New objective function, generate text  Diversity and novelty   
[82]  Using GAN for crossmodel hashing  Extract rich information from unlabeled data   
SeqGAN [79]  Extending GANs to generate sequence of discrete tokens  Solve the problem of discrete data   
FakeGAN [1]  Text classification     
CSGAN [45]  Combine RL, GAN, RNN  More realistic, faster   
[21]  Handwriting recognition     
ORGAN [31]  RL agent + SeqGAN  Better result than RNN trained via MLE or SeqGAN  Works only on sequential data 
MolGAN [9]  RL agent + SeqGAN  optimizing nondifferentiable metrics by RL & Faster training time  Susceptible to mode collapse 
OptiGAN [36]  Combining MLE and GAN  Used for different models and goals   
[65]  Redefining the issue of neural architecture search for GANs by applying Markov decision process formulation  More effective RLbased search algorithm, smoother architecture sampling   
V Discussion, Conclusion and Future Work
Although there are various studies that have explored different aspects of GANs, but, several challenges still remain should be investigated. In this section, we discuss such challenges, especially in the discussed subject, game of GANs, and propose future research directions to tackle these problems.
Va Open Problems and Future Directions
While GANs achieve the stateoftheart performance and compelling results on various generative tasks, but, these results come at some challenges, especially difficulty in the training of GANs. Training procedure suffers from instability problems. While reaching to Nash equilibrium, generator and discriminator are trying to minimize their own cost function, regardless of the other one. This can cause the problem of nonconvergence and instability because of minimizing one cost can lead to maximizing the other one’s cost. Another main problem of GANs which needs to be addressed is mode collapse. This problem becomes more critical for unbalanced data sets or when the number of classes is high. In other hand, when discriminator works properly in distinguishing samples, generators gradients vanishes. This problem which is called vanishing gradient should also be considered. Compared with other generative models, the evaluation of GANs is more difficult. This is partially due to the lack of appropriate metrics. Most evaluation metrics are qualitative rather than being quantitative. Qualitative metrics such as human examination of samples, is an arduous task and depends on the subject.
More specifically, as the authors in [7] expressed one of the most important future direction is to improve theoretical aspects of GANs to solve problems such as model collapse, nonconvergence, and training difficulties. Although there have many works on the theory aspects, most of the current training strategies are based on the optimization theory, whose scope is restricted to local convergence due to the nonconvexity, and the utilization of game theory techniques is still in its infancy. At present, the game theory variant GANs are limited, and much of them are highly restrictive, and are rarely directly applicable. Hence, there is much room for research in gamebased GANs which are involving other game models.
From the convergence viewpoint, most of the current training methods converge to a local Nash equilibrium, which can be far from an actual and global NE. While there is vast literature on the GAN’ training, only few researches such as [37] formulation the training procedure from the mixed NE perspective, and investigation for mixed NE of GANs should be examined in more depth. On the other hand, existence of an equilibrium does not imply that it can be easily find by a simple algorithm. In particular, training GANs requires finding Nash equilibria in nonconvex games, and computing the equilibria in these game is computationally hard. In the future, we are expected to see more solutions tries to make GAN training more stable and converge to actual NE.
Multiagent models such as [80, 22, 34, 23, 11, 40, 32, 1, 50, 52, 4, 56, 41] are computationally more complex and expensive than twoplayer models and this factor should be taken into account in development of such variants. Moreover, in multigenerator structures the divergence among the generators should be considered such that all of them do not generate the same samples.
One of the other directions that we expect to witness the innovation in the future is integrating GANs with other learning methods. There are a variety of methods in multiagent learning literature which should be explored as they may be useful when applying in the multiagent GANs. In addition, it looks like much more research on the relationship and combination between GANs and current applied learning approach such as RL is still required, and it also will be a promising research direction in the next few years. Moreover, GAN is proposed as unsupervised learning, but adding a certain number of labels, specially in practical applications, can substantially improve its generating capability. Therefore, how to combine GAN and semisupervised learning is also one of the potential future research topics.
As the final note, GAN is a relatively novel and new model with significant recent progress, so the landscape of possible applications remains open for exploration. The advancements in solving the above challenges can be decisive for GANs to be more employed in real scenarios.
VB Conclusion
We conduct this review of recent progresses in GANs using the game theory which can serve as a reference for future research. Comparing this survey to the other reviews in the literature, and considering the many published works which deal with GAN challenges, we emphasize on the theory aspects. This is all done through the taking of a game theory perspective based on our proposed taxonomy. In this survey, we first provided detailed background information on the game theory and GAN. In order to present a clear roadmap, we introduced our taxonomy which have three major categorizes includes the game model, architecture and learning approaches. Following the proposed taxonomy, we discussed each taxonomy separately in detail and presented the GANs based solutions in each subcategory. We hope this paper is beneficial for researchers interested in this field.
References
 [1] (May 2018) Detecting deceptive reviews using generative adversarial networks. In 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, pp. 89–95. Cited by: Fig. 2, Fig. 2, §IVB2, §IVC4, TABLE IV, TABLE V, §VA.
 [2] (February 2020) A survey and taxonomy of adversarial neural networks for texttoimage synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10 (4), pp. e1345. Cited by: §IIIC, §IIIC, §III.
 [3] (March 2021) Applications of generative adversarial networks (GANs): an updated review. Archives of Computational Methods in Engineering 28 (3), pp. 525–552. Cited by: §I, §IIIB, §III.
 [4] (2017) Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprint arXiv:1703.00573. Cited by: Fig. 2, §IVB3, TABLE IV, §VA.
 [5] (September 2015) Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems  Volume 1, NIPS’15, Cambridge, MA, USA, pp. 1171–1179. Cited by: §IVC4.
 [6] (2019) The six fronts of the generative adversarial networks. arXiv preprint arXiv:1910.13076. Cited by: §IA, §IIIA, §III.
 [7] (December 2018) Recent advances of generative adversarial networks in computer vision. IEEE Access 7, pp. 14985–15006. Cited by: §I, §IIIC, §III, §VA.
 [8] (2018) Generative adversarial networks: an overview. IEEE Signal Processing Magazine 35 (1), pp. 53–65. Cited by: §IA, §IIIA, §III.
 [9] (2018) MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973. Cited by: Fig. 2, Fig. 2, §IVB5, §IVB5, §IVC4, TABLE IV, TABLE V.
 [10] (2019) A survey on GANs for anomaly detection. arXiv preprint arXiv:1906.11632. Cited by: §IIIC, §III.
 [11] (2016) Generative multiadversarial networks. arXiv preprint arXiv:1611.01673. Cited by: Fig. 2, §IVB2, §IVB2, TABLE IV, §VA.
 [12] (2020) Federated generative adversarial learning. arXiv preprint arXiv:2005.03793. Cited by: Fig. 2, §IVC3, §IVC3, TABLE V.

[13]
(February 2020)
A survey of differentially private generative adversarial networks.
In
The AAAI Workshop on PrivacyPreserving Artificial Intelligence
, New York, NY, USA. Cited by: §IIIB.  [14] (2020) GANs may have no Nash equilibria. arXiv preprint arXiv:2002.09124. Cited by: Fig. 2, §IVA2, TABLE III.
 [15] (2017) Many paths to equilibrium: GANs do not need to decrease a divergence at every step. arXiv preprint arXiv:1710.08446. Cited by: §I.

[16]
(July 2018)
Automatic goal generation for reinforcement learning agents.
In
Proceedings of the 35th International Conference on Machine Learning
, Vol. 80, Stockholm, Sweden, pp. 1515–1528. Cited by: Fig. 2, §IVC4, TABLE V.  [17] (July 2020) Generative adversarial networks as stochastic Nash games. arXiv preprint arXiv:2010.10013. Cited by: Fig. 2, §IVA1, TABLE III.
 [18] (2020) Generative adversarial networks for spatiotemporal data: a survey. arXiv preprint arXiv:2008.08903. Cited by: §IIIC, §III.
 [19] (September 2018) Fictitious GAN: training GANs with historical models. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 119–134. Cited by: TABLE II, Fig. 2, §IVC2, TABLE V.
 [20] (November 2020) Synthetic observational health data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?. Cited by: §IIIC, §III.
 [21] (2016) Handwriting profiling using generative adversarial networks. arXiv preprint arXiv:1611.08789. Cited by: Fig. 2, §IVC4, TABLE V.

[22]
(June 2018)
Multiagent diverse generative adversarial networks.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, Salt Lake City, Utah, pp. 8513–8521. Cited by: Fig. 2, §IVB1, §IVB1, §IVB1, TABLE IV, §VA.  [23] (2016) Message passing multiagent GANs. arXiv preprint arXiv:1612.01294. Cited by: Fig. 2, §IVB1, TABLE IV, §VA.
 [24] (July 2020) A survey on the progression and performance of generative adversarial networks. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, pp. 1–8. Cited by: §IA, §IIIA, §IIIC, §III.
 [25] (June 2019) A review: generative adversarial networks. In 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, pp. 505–510. Cited by: §IA, §IIB2, §IIIA, §III.
 [26] (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §I, §I.
 [27] (December 2014) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems  Volume 2, NIPS’14, Cambridge, MA, USA, pp. 2672–2680. Cited by: §IIB2, §IIB3.
 [28] (2016) NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160. Cited by: §IA, §IIB1, §IIIA, §III, Fig. 2, §IVC1.
 [29] (2017) An online learning approach to generative adversarial networks. arXiv preprint arXiv:1706.03269. Cited by: Fig. 2, §IVC1, TABLE V.
 [30] (2020) A review on generative adversarial networks: algorithms, theory, and applications. arXiv preprint arXiv:2001.06937. Cited by: §IA, §IIIA, §III.
 [31] (2017) Objectivereinforced generative adversarial networks (ORGAN) for sequence generation models. arXiv preprint arXiv:1705.10843. Cited by: Fig. 2, Fig. 2, §IVC4, TABLE IV, TABLE V.
 [32] (May 2019) MDGAN: multidiscriminator generative adversarial networks for distributed datasets. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, pp. 866–877. Cited by: Fig. 2, §IVB2, TABLE IV, §VA.
 [33] (2018) Comparative study on generative adversarial networks. arXiv preprint arXiv:1801.04271. Cited by: §IA, §IIIA, §III.
 [34] (February 2018) MGAN: training generative adversarial nets with multiple generators. In International Conference on Learning Representations, Vancouver Canada. Cited by: Fig. 2, §IVB1, TABLE IV, §VA.
 [35] (February 2019) How generative adversarial networks and their variants work: an overview. ACM Computing Surveys (CSUR) 52 (1), pp. 1–43. Cited by: §IA, §I, §I, §IIIA, §III.
 [36] (2020) OptiGAN: generative adversarial networks for goal optimized sequence generation. arXiv preprint arXiv:2004.07534. Cited by: Fig. 2, §IVC4, TABLE V.
 [37] (June 2019) Finding mixed Nash equilibria of generative adversarial networks. In International Conference on Machine Learning, Long Beach, CA, USA, pp. 2810–2819. Cited by: Fig. 2, §IVA3, TABLE III, §VA.
 [38] (2020) A survey on generative adversarial networks: variants, applications, and training. arXiv preprint arXiv:2006.05132. Cited by: §IA, §IIIA, §III.
 [39] (July 2020) Generative adversarial training and its utilization for text to image generation: a survey and analysis. Journal of Critical Reviews 7 (8), pp. 1455–1463. Cited by: §IIIC, §IIIC, §III.
 [40] (July 2020) A multiplayer minimax game for generative adversarial networks. In 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, pp. 1–6. Cited by: Fig. 2, §IVB2, §IVB2, TABLE IV, §VA.
 [41] (October 2020) Consistency of multiagent distributed generative adversarial networks. IEEE Transactions on Cybernetics, early access, pp. 1–11. Cited by: Fig. 2, Fig. 2, §IVB1, §IVB3, TABLE IV, §VA.
 [42] (2017) On convergence and stability of GANs. arXiv preprint arXiv:1705.07215. Cited by: Fig. 2, §IVC1, §IVC1, §IVC1, TABLE V.
 [43] (March 2021) Generative adversarial networks: a survey on applications and challenges. International Journal of Multimedia Information Retrieval 10 (1), pp. 1–24. Cited by: §IA, §IIIA, §III.
 [44] (2020) Regularization methods for generative adversarial networks: an overview of recent studies. arXiv preprint arXiv:2005.09165. Cited by: §IIIB, §III.
 [45] (June 2018) A generative model for category text generation. Information Sciences 450, pp. 301–315. Cited by: §I, Fig. 2, Fig. 2, §IVB4, §IVC4, TABLE IV, TABLE V.
 [46] (January 2020) Sequence generative adversarial networks for wind power scenario generation. IEEE Journal on Selected Areas in Communications 38 (1), pp. 110–118. Cited by: Fig. 2.
 [47] (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Cited by: §IVB5.
 [48] (2017) Are GANs created equal? a largescale study. arXiv preprint arXiv:1711.10337. Cited by: §IIIB, §III.
 [49] (December 2017) The numerics of GANs. In Proceedings from the conference ”Neural Information Processing Systems 2017., Vol. 30, Red Hook, NY, USA, pp. 1825–1835. Cited by: §IVC3.
 [50] (March 2020) microbatchGAN: stimulating diversity with multiadversarial discrimination. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, pp. 3050–3059. Cited by: Fig. 2, §IVB2, TABLE IV, §VA.
 [51] (December 1928) Zur theorie der gesellschaftsspiele. Mathematische annalen 100 (1), pp. 295–320. Cited by: §IIA.
 [52] (December 2017) Dual discriminator generative adversarial nets. Advances in neural information processing systems (NIPS 2017) 30, pp. 2670–2680. Cited by: Fig. 2, §IVB2, TABLE IV, §VA.
 [53] (2004) An introduction to game theory. Vol. 3, Oxford university press New York. Cited by: §IIA.
 [54] (August 2020) Loss functions of generative adversarial networks (GANs): opportunities and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (4). Cited by: §IIIB, §III.
 [55] (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7, pp. 36322–36333. Cited by: §IA, §IIIA, §III.
 [56] (2020) FedGAN: federated generative adversarial networks for distributed data. arXiv preprint arXiv:2006.07228. Cited by: Fig. 2, Fig. 2, §IVB3, §IVC3, §IVC3, TABLE IV, TABLE V, §VA.
 [57] (2020) Generative adversarial networks (GANs): an overview of theoretical model, evaluation metrics, and recent developments. arXiv preprint arXiv:2005.13178. Cited by: §IA, §IIIA, §III.
 [58] (January 2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data 8 (1), pp. 27. Cited by: §IIIC, §IIIC, §III.
 [59] (June 2019) RLGANNet: a reinforcement learning agent controlled GAN network for realtime point cloud shape completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 5898–5907. Cited by: Fig. 2, §IVB5, TABLE IV.
 [60] (2020) Generative adversarial networks (GANs): challenges, solutions, and future directions. arXiv preprint arXiv:2005.00065. Cited by: §IA, §IIIA, §III.
 [61] (February 2020) User mobility synthesis based on generative adversarial networks: a survey. In 2020 22nd International Conference on Advanced Communication Technology (ICACT), Phoenix Park, Korea (South), pp. 94–103. Cited by: §IIIC, §III.
 [62] (2008) Multiagent systems: algorithmic, gametheoretic, and logical foundations. Cambridge University Press. Cited by: §IIA, §IIA.
 [63] (January 2016) Mastering the game of go with deep neural networks and tree search. Nature 529 (7587), pp. 484–489. Cited by: §IVC4.
 [64] (August 2020) Creating artificial images for radiology applications using generative adversarial networks (GANs)–a systematic review. Academic Radiology 27, pp. 1175–1185. Cited by: §IIIC, §IIIC, §III.
 [65] (August 2020) Offpolicy reinforcement learning for efficient and effective GAN architecture search. In European Conference on Computer Vision (ECCV), pp. 175–192. Cited by: Fig. 2, §IVC4, TABLE V.
 [66] (2019) Selfsupervised GAN: analysis and improvement with multiclass minimax game. Advances in Neural Information Processing Systems (NeurIPS 2019) 32, pp. 13253–13264. Cited by: Fig. 2, §IVB4, TABLE IV.
 [67] (September 2020) Generative adversarial networks in digital pathology: a survey on trends and future potential. Patterns 1 (6), pp. 100089. Cited by: §IIIC, §IIIC, §III.
 [68] (2017) Generative adversarial networks: introduction and outlook. IEEE/CAA Journal of Automatica Sinica 4 (4), pp. 588–598. Cited by: §IA, §I, Fig. 1, §IIIA, §III.
 [69] (March 2020) A stateoftheart review on image synthesis with generative adversarial networks. IEEE Access 8, pp. 63514–63537. Cited by: §IIIC, §III.
 [70] (2019) Generative adversarial networks in computer vision: a survey and taxonomy. arXiv preprint arXiv:1906.01529. Cited by: §I, §IIIC, §III.
 [71] (February 1998) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28 (1), pp. 31–36. Cited by: §IVB5, §IVB5.
 [72] (2019) Stabilizing generative adversarial network training: a survey. arXiv preprint arXiv:1910.00927. Cited by: §I, §IIIB, §IIIC, §III.
 [73] (May 1992) Simple statistical gradientfollowing algorithms for connectionist reinforcement learning. Machine Learning 8 (3), pp. 229–256. Cited by: §IVB5.
 [74] (December 2017) A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology 22 (6), pp. 660–674. Cited by: §IIIC, §III.

[75]
(November 2018)
Diversitypromoting GAN: a crossentropy based generative adversarial network for diversified text generation.
In
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
, Brussels, Belgium, pp. 3940–3949. Cited by: Fig. 2, §IVC4, TABLE V.  [76] (August 2018) Image captioning using adversarial networks and reinforcement learning. In 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, pp. 248–253. Cited by: Fig. 2, §IVC4.
 [77] (December 2019) Generative adversarial network in medical imaging: a review. Medical image analysis (MEDIA) 58, pp. 101552. Cited by: §IIIC, §IIIC, §III.
 [78] (March 2020) A review of generative adversarial networks and its application in cybersecurity. Artificial Intelligence Review 53 (3), pp. 1721–1736. Cited by: §IIIC, §III.
 [79] (February 2017) SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings of the ThirtyFirst AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp. 2852––2858. Cited by: Fig. 2, §IVB5, §IVC4, §IVC4, TABLE V.
 [80] (2018) Stackelberg GAN: towards provable minimax equilibrium via multigenerator architectures. arXiv preprint arXiv:1811.08010. Cited by: Fig. 2, Fig. 2, §IVA2, §IVB1, §IVB1, TABLE III, TABLE IV, §VA.
 [81] (February 2018) SCHGAN: semisupervised crossmodal hashing by generative adversarial network. IEEE transactions on cybernetics 50 (2), pp. 489–502. Cited by: Fig. 2, §IVC4.
 [82] (July 2018) Recent advance on generative adversarial networks. In 2018 International Conference on Machine Learning and Cybernetics (ICMLC), Chengdu, China, pp. 69–74. Cited by: §IA, §IIIA, §III, TABLE V.
Comments
There are no comments yet.