Generative Adversarial Network (GAN) is a class of generative models was originally proposed by Goodfellow et. al. in 2014 . GAN has been gained wide attention in recent years due to its potential to model high-dimensional complex real-world data and quickly became a promising research direction 
. As a type of generative models, GANs do not minimize a single training criterion. They are used for estimating real data probability distribution. GAN usually comprises two neural networks, a discriminator, and a generator, which are trained simultaneously via an adversarial learning concept. In summary, GAN is more powerful in both feature learning and representation. The discriminator attempts to differentiate between real data samples and fake samples made by the generator, while the generator tries to create realistic samples that cannot be distinguished by the discriminator .
In particular, GAN models do not rely on any assumptions about the distribution and can generate infinite realistic new samples from latent space 
. This feature enables GANs to successfully applied to various applications, ranging from image synthesis, computer vision, video and animation generation, to speech and language processing, and cybersecurity.
The core idea of GANs is inspired by the two-player zero-sum minimax game between the discriminator and the generator, in which the total utilities of two players are zero, and each player’s gain or loss is exactly balanced by the loss or gain of another player. GANs are designed to reach a Nash equilibrium at which each player cannot increase their gain without reducing the other player’s gain [26, 68]. Despite the significant success of GANs in many domains, applying these generative models to real-world problems has been hindered by some challenges. The most significant problems of GANs are that they are hard to train, and suffer from instability problems such as mode collapse, non-convergence, and vanishing the gradients. GAN needs to converge to the Nash equilibrium during the training process, but it is proved that the convergence is challenging [70, 72].
Since 2014, GAN has been widely studied, and numerous methods have been proposed to address its challenges. However, to produce the high quality generated samples, it is necessary to improve the GAN theory as incompetence in this part is one of the most important obstacles to develop better GANs . As the basic principle of GANs is based on the game theory and the data distribution is learned via a game between the generator and discriminator, exploiting the game theory techniques became one of the most discussed topics, attracted research efforts in recent years.
I-a Motivation and Contribution
The biggest motivation behind this survey was the absence of any other review paper that has particularly focused on the game theory advances in GANs. However, many comprehensive surveys on GANs have investigated GANs in detail with different focuses (e.g. [28, 68, 8, 33, 25, 35, 6, 82, 55, 43, 57, 24, 60, 30, 38]), but, to the best of our knowledge, this work is the first one to explore the GAN advancement from a game-theoretical perspective. Hence, in this paper, we attempt to provide the readers with recent advances in GANs using the game theory by classifying and surveying recently proposed works.
Our survey first introduces some background and key concepts in this field. Then we classify the recent proposed game of GANs models into three major categories: modified game model, modified architecture in terms of the number of agents, and modified learning method. Each group further classify into several subcategories. We review the main contributions of each work in each subcategory. We also point to some existing problems in the discussed context and forecast the potential future research topics.
I-B Paper Structure and Organization
The rest of this paper is organized as follows. Section II presents some background on the game theory, and GANs includes basic idea, learning method, and challenges. In Section III we have a glimpse to the other surveys conducted in the field of GANs. We provide our proposed taxonomy in Section IV and review the research models in each category in this section. The final section is devoted to discussion and conclusion.
Ii Background and Preliminaries
Before moving on to presenting of our taxonomy and discussing the works applying game theory methods in GANs, it needs to present some preliminary concepts in the field of game theory and GANs. Here, we start by presenting an overview of game theory, and then move toward to GANs. Table I lists the acronyms and their definitions used in this paper.
|GAN||Generative Adversarial Network|
|MAD-GAN||Multi-Agent Diverse GAN|
|MADGAN||Multiagent Distributed GAN|
|MPM GAN||Message Passing Multi-Agent GAN|
|SCH-GAN||Semi-supervised Cross-modal Hashing GAN|
|RNN||Recurrent Neural Network|
|IRL||Inverse Reinforcement Learning|
|DDPG||Deep Deterministic Policy Gradient|
|ODE||Ordinary Differential Equation|
|OCR||Optical Character Recognition|
|MS||Multi-class minimax game based Self-supervised tasks|
|SPE||Subgame Perfect Equilibrium|
|SNEP||Stochastic Nash Equilibrium Problem|
|SVI||Stochastic Variational Inequality|
|SRFB||Stochastic Relaxed Forward-Backward|
|aSRFB||Averaging over Decision Variables|
|SGD||Stochastic Gradient Descent|
|NAS||Neural Architecture Search|
|IID||Independent and Identically Distributed|
|DDL||Discriminator Discrepancy Loss|
Ii-a Game Theory
Game theory aims to model situations in which some decision makers interact with each other. The interaction between these decision makers is called ”game” and the decision makers are called ”players”. In each turn of the game, players have actions which the set of these actions is called strategy set. It is usually assumed that players are rational, which means that each agent tries to maximize its utility, and it is achieved by choosing the action that maximizes its payoff. Players’ action is chosen with respect to other players’ action and due to this, each agent should have a belief system about the other players .
Several solution concepts has been introduced for analyzing the games and finding Nash equilibrium is one of them. ”Nash equilibrium” is a state where each player cannot increase its payoff by changing its strategy. In the other words, Nash equilibrium is a state where nobody regrets about its choice given others’ strategy and with respect to its own payoff 
. In the situation where the players assign a probability distribution to the strategy sets instead of choosing one strategy, the Nash equilibrium is called ”mixed Nash equilibrium”. Constant-sum games are two player games in which sum of the two players’ utilities is equal to this amount in all other states. When this amount equal to zero, it is called zero-sum game .
Another solution concept is max-min strategy and max-min strategy method. In maximin strategy method, the decision maker maximizes its worst case payoff, which happens when all other players cause as much harm as they can to the decision maker. In minimax strategy, decision maker wants to cause harm to others. To rephrase it, the decision maker wants to minimize other players’ maximum payoff . The value which players get in the minimax or maximin strategy method is called min-max (minimax) or max-min (maximin) value, respectively. In , Neumann proved that in any finite, two players, zero-sum games, all Nash equilibria coincides with min-max strategy and max-min strategy for players. Also, the min-max value and max-min value are equal to Nash equilibrium utility.
Ii-B Generative Adversarial Networks
We provide some preliminaries on the GANs in order to facilitate the understanding of the basic and key concept of this generative model. In particular, first, we briefly review the generative models. Then, we give a brief description of the GAN by reviewing its basic idea, learning method, and its challenges.
Ii-B1 Generative Models
A generative model is a model whose purpose is to simulate the distribution of training set. Generative models can be divided into three types. In the first type, the model gets a training set with distribution of (unknown to model) and tries to make distribution which is an approximation of . The second one is the one which is solely capable of producing samples from and finally models where can do both. GANs which are one kind of generative models, concentrate on producing samples mainly .
Ii-B2 GAN Fundamentals
In 2014, Goodfellow et al.  introduced GANs as a framework in which two players are playing a game with the other one. The result of the game is to have a generative model that can produce samples similar to training set. In the game introduced, players are named generator and discriminator . Generator is the one which at the end should produce samples and discriminator’s aim is to distinguish training set samples and generator’s samples. The more indistinguishable samples is produced, the better the generative model is . Any differentiable function such as a multi-layer neural network can represent generator and discriminator. Generator, , inputs a prior noise distribution and maps it to approximation of training data distribution . Discriminator, , maps input data distribution into a real number in the interval of , which is the probability of being real sample instead of fake sample (the sample that generator produces) .
Ii-B3 GAN Learning Models
Where and can be replaced from TABLE II based on divergence metric. The first proposed GAN uses Jensen-Shannon metric.
|Divergence metric||Game Value|
To train the simple model shown in Fig. 1, we first, fix and optimize to discriminate optimally. Next, we fix and try to minimize the objective function. Discriminator operates optimally if discriminator cannot distinguish between real and fake data. For example in Jensen-Shannon metric it happens when equals to . If both discriminator and generator work optimally, the game reaches the Nash equilibrium and the value of the min-max and max-min value will be equal. As shown in TABLE II for Jensen-Shannon metric it is equal to .
Iii Related Surveys
As GAN is becoming increasingly popular, the number of works in this field, and consequently the review articles, are also increasing. By now, many surveys for GANs have been presented (about 40), which can be classified into three categories. The works in the first category [28, 68, 8, 33, 25, 35, 6, 82, 55, 43, 57, 24, 60, 30, 38] explore a relatively broad scope in GANs, including key concepts, algorithms, applications, different variants and architecture. In contrast, the surveys in the second group [48, 3, 72, 44, 54] focus solely on a specific segment or issue in the GANs (e.g. regularization methods, or lass functions) and review how the researchers deal with that problem. And, in the third category, a plethora of survey studies[7, 70, 74, 69, 64, 67, 2, 39, 58, 77, 78, 24, 10, 20, 18, 61]
summarize the application of GAN in a specific field, from computer vision and image synthesis, to cybersecurity and anomaly detection. In the following, we briefly review surveys in each category and express how our paper differs from the others.
Iii-a GAN General surveys
Goodfellow in his tutorial  answers the most frequent questions in context of GAN. Wang et al.  surveys theoretic and implementation models of GAN, its applications, as well as the advantages and disadvantages of this generative model. Creswell et al.  provide an overview of GANs, especially for the signal processing community, by characterizing different methods for training and constructing GANs, and challenges in the theory and applications. In  Ghosh et al. present a comprehensive summary on the progression and performance of GANs along with their various applications. Saxena et al. in  conduct a survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. Kumar et al. 
presents state-of-the-art related work in the GAN, its applications, evaluation metrics, challenges, and benchmark datasets. In two new deep generative models, including GA, are compared, and the most remarkable GAN architectures are categorized and discussed by Salehi et al.
Gui et al. in  provide a review on various GANs methods from the perspectives of algorithms, theory, and applications.  surveys different GAN variants, applications and several training solutions. Hitawala in  presents different versions of GANs and provided a comparison between them from some aspects such as learning, architecture, gradient updates, object, and performance metrics. In the similar manner, Gonog et al. in  review the extensional variants of GANs, and classified them regarding how they are optimized the original GAN or change the basic structure, as well as their learning methods. In  Hong et al. discuss the details of the GAN from the perspective of various object functions and architectures and the theoretical and practical issues in training the GANs. The authors also enumerate the GAN variants that are applied in different domains. Bissoto et al. in 
conduct a review of the GAN advancements in six fronts includes architectural contributions, conditional techniques, normalization and constraint contributions, loss functions, image-to-image translations, and validation metrics. Zhang et al., in their review paper, survey twelve extended GAN models and classified them in terms of the number of game players. Pan et al. in  analyze the differences among different generative models, and classified them from the perspective of architecture and objective function optimization, and discussed the training tricks, evaluation metrics, and expressed GANs applications and challenges.
Iii-B GAN Challenges
In a different manner, Lucic in  conduct an empirical comparison on GAN models, with focus on unconditional variants. As another survey in the second category, Alqahtani et al.  mainly focus on potential applications of GANs in different domains. This paper attempts to identify advantages, disadvantages and major challenges for successful implementation of GAN in different application areas. As another specific review paper, Wiatrak et al. in  survey current approaches for stabilizing the GAN training procedure, and categorizing various techniques and key concepts. More specifically, in , Lee et al. review the regularization methods used in the stable training of GANs, and classified them into several groups by their operation principles. In contrast,  performs a survey for the loss functions used in GANs, and analyze the pros and cons of these functions. As differentially private GAN models provides a promising direction for generating private synthetic data, Fan et al. in  survey the existing approaches presented for this purpose.
Iii-C GAN Applications
As we mentioned before, GANs has been successfully applied to enormous application. In this way, some review articles survey these advances. The authors in [7, 70, 74, 69, 64, 67, 2, 39, 58, 77] conduct some review on the different aspects of GAN progress in the field of computer vision and image synthesis. Cao et al.  review recently proposed GAN models and their applications in computer vision. Cao et al. in  compared the classical and stare-of-the art GAN algorithms in terms of the mechanism, visual results of generated samples, and so on. Wang et al. structure a review  towards addressing practical challenges relevant to computer vision. They discuss the most popular architecture-variant, and loss-variant GANs, for tackling these challenges. Wu et al. in 
present a survey of image synthesis and editing, and video generation with GANs. They cover recent papers that leverage GANs in image applications including texture synthesis, image inpainting, image-to-image translation, image editing, as well as video generation. In the same way, introduces the recent research on GANs in the field of image processing, and categorized them in four fields including image synthesis, image-to-image translation, image editing, and cartoon generation.
Researches such as  and  focus on reviewing recent techniques to incorporate GANs in the problem of text-to-image synthesis. In , Agnese et al. propose a taxonomy to summarize GAN based text-to-image synthesis papers into four major categories: Semantic Enhancement GANs, Resolution Enhancement GANs, Diversity Enhancement GANS, and Motion Enhancement GANs. Different from the other surveys in this field, Sampath et al.  examine the most recent developments of GANs techniques for addressing imbalance problems in image data. The real-world challenges and implementations of synthetic image generation based on GANs are covered in this survey.
In [77, 64, 67], the authors deal with the medical applications of image synthesis by GANs. Yi et al. in  describe the promising applications of GANs in medical imaging, and identifies some remaining challenges that need to be solved. As another paper in this subject,  reviews GANs’ application in image denoising and reconstruction in radiology. Tschuchnig et al. in  summarize existing GAN architectures in the field of histological image analysis.
As another application of GANs,  and  structure reviews on the GANs in the cybersecurity. Yinka et al.  survey studies where the GAN plays a key role in the design of a security system or adversarial system. Ghosh et al.  focus on the various ways in which GANs have been used to provide both security advances and attack scenarios in order to bypass detection systems.
Di Mattia et al.  survey the principal GAN-based anomaly detection methods. Geogres et al.  review the published literature on Observational Health Data to uncover the reasons for the slow adoption of GANs for this subject. Gao et al. in 
address the practical applications and challenges relevant to spatio-temporal applications such as trajectory prediction, events generation and time-series data imputation. The recently proposed user mobility synthesis schemes based on GANs are summarized in.
According to the classification provided for review papers, our survey falls into the second category. We focus specifically on the recent progress of the application of game-theoretical approaches towards addressing the GAN challenges. While several surveys for GANs have been presented to date, to the best of our knowledge, our survey is the first to address this topic. Although the authors in  presented a few game-model GANs, they have not done a comprehensive survey in this field, and many new pieces of research have not been covered. We hope that our survey will serve as a reference for interested researchers on this subject.
Iv Game of GANs: A Taxonomy
In this section, we will present our taxonomy to summarize the reviewed papers into three categories by focusing on how these work are extended from the original GAN. The taxonomy is done in terms of 1. modified game mode, 2. architecture modification, and 3. modified learning algorithms, as shown in Fig. 2. Based on these primary classes, we further classify each category into some subsets (Fig. 2). In the following sections, we will introduce each category and the recent advances in each group will be discussed.
Iv-a Modified Game Model
The core of all GANs is a competition between a generator and a discriminator, which model as a game. Therefore, game theory plays a key role in this context. However, most of GANs relying on the basic model, formulating it as a two-player zero-sum (minimax) game, but some research utilized other game variants to tackle the challenges in this field. In this section, we aim to review these literatures. We classify the works under this category into three subcategories. Section IV-A1 presents researches that cast the training process as a stochastic game. Research works presented in Section IV-A2 apply the idea of leader-follower of the Stackelberg game in the GANs. Finally Section IV-A3 presents GANs models as a Bi-affine game. A summary of the reviewed researches in the modified game model category is shown in Table III.
Iv-A1 Stochastic game
One of the main issues for GANs is that these neural networks are very hard to train because of the convergence problems. Franci et al. in  addressed this problem by casting the training procedure as a stochastic Nash equilibrium problem (SNEP). The SNEP will recast as a stochastic variational inequality (SVI) and target the solutions that are SNE. The advantage of this approach is that there are many algorithms for finding the solution of an SVI, like the forward-backward algorithm, also known as gradient descent. Franci et al. proposed a stochastic relaxed forward-backward (SRFB) algorithm and a variant with an additional step for averaging over decision variables (aSRFB) for the training process of GANs. For proving convergence to a solution, we need monotonicity on the pseudogradient mapping, which is defined by Equation (2), where and are the payoff functions of the generator and the discriminator.
If pseudogradient mapping of the game is monotone and the increasing number of samples is available, the algorithm converges to the exact solution but with only finite, fixed mini-batch samples, and by using the averaging technique, it will converge to a neighborhood of the solution.
Iv-A2 Stackelberg game
One of the main issues for GAN is the convergence of the algorithm. Farnia et al. in  showed that ”GAN zero-sum games may not have any local Nash equilibria” by presenting certain theoretical and numerical examples of standard GAN problems. Therefore, based on the natural sequential type of GANs where the generator moves first and follows the discriminator (leader), this problem can be considered as a Stackelberg game and focused on subgame perfect equilibrium (SPE). For solving the convergence issue, the authors tried to find the equilibrium called proximal equilibrium which enables traversing the spectrum between Stackelberg and Nash equilibria. In a proximal equilibrium, as shown in Equation (3) allow the discriminator locally to optimize in a norm-ball nearby the primary discriminator. To keep the close to , they penalize the distance among the two functions by , as goes from zero to infinity, the equilibria change from Stackelberg to Nash.
Farnia et al. also proposed proximal training which optimizes the proximal objective instead of the original objective that can apply to any two-player GAN. Zhang et al. in  also used model GAN by this game and presented Stackelberg GAN to tackles the instability of GANs training process. Stackelberg GAN is using a multi-generator architecture and the competition is between the generators (followers) and the discriminator (leader). We discussed the architecture details in Section IV-B1.
Iv-A3 Bi-affine game
Hssieh et al. in  examine the training of GANs by reconsidering the problem formulation from the mixed NE perspective. In the absence of convexity, the theory focuses only on the local convergence, and it implies that even the local theory can break down if intuitions are blindly applied from convex optimization. In 
the mixed Nash Equilibria of GANs is proposed, that they are, in fact, global optima of infinite-dimensional bi-affine games. Finite-dimensional bi-affine games are also applied for finding mixed NE of GANs. It’s also shown that we can relax all current GAN objectives into their mixed strategy forms. Eventually, in this article, it’s experimentally shown that their method achieves better or comparable performance than popular baselines such as SGD, Adam, and RMSProp.
|Reference||Convergence||Methodology and Contributions||Pros||Cons|
|Stackelberg Game: Subsection IV-A2|
|Stackelberg GAN ||SPNE||Models multi-generator GANs as a Stackelberg game||Can be built on top of standard GANs & Proved the convergence||-|
|||SPE||Theoretical examples of standard GANs with no NE. & proximal equilibrium as a solution, proximal training for GANs||Can apply to any two-player GANs, allow the discriminator to locally optimize||Focus only on the pure strategies of zero-sum GANs in non-realizable settings|
|Stochastic Game: Subsection IV-A1|
|||SNE||Cast the problem as SNEP and recast it to SVI, SRFB and aSRFB solutions||Proved the convergence||Need monotonicity on the pseudogradient mapping, increasing number of samples to reach an equilibrium|
|Bi-affine Game: Subsection IV-A3|
|||Mixed NE||Tackling the training of GANs by reconsidering the problem formulation from the mixed Nash Equilibria perspective||Showing that all GANs can be relaxed to mixed strategy forms, flexibility||-|
Iv-B Modified Architecture
As we mentioned in Section II, GANs are a framework for producing a generative model through a two-player minimax game; however, in recent works, by extending the idea of using a single pair of generator and discriminator to the multi-agent setting, the two-player game transforms to multiple games or multi-agent games.
In this section, we review literature in which proposed GAN variants have modified the architecture in such a way that we have GANs with a mixture of generators, and/or discriminators and show how applying such methods can provide better convergence properties and prevent mode collapse. However, the majority of the works in this category focuses on introducing a larger number of generators and/or discriminators, but, in some papers, the number of generators and discriminators did not change, but another agent has been added that converts the problem to a multi-agent one.
In Section IV-B1, we will discuss GAN variants which extended the basic structure from a single generator to many generators. In Section IV-B2, we are going to review articles that deal with the problem of mode collapse by increasing the number of discriminators in order to force the generator to produce different modes. Section IV-B3 is dedicated to discussing works which develop GANs with multiple generators and multiple discriminators. Articles will be reviewed in Sections IV-B4 and IV-B5 extend the architecture by adding another agent, which is a classifier (Section IV-B4 ) or an RL agent (Section IV-B5), to show the benefits of an adding these agents to GANs. The methodology, contribution as well as the pros and cons of reviewed papers are summarized in Table IV.
Iv-B1 Multiple generators, One discriminator
The minimax gap is smaller in GANs with multi-generator architecture and more stable training performances are experienced in these GANs . As we mentioned in Section IV-A2, Zhang et al. in  tackle the problem of instability during the GAN training as a result of a gap between minimax and maximin objective values. To mitigate this issue, they design a multi-generator architecture and model the competition among agents as a Stackelberg game. Results have shown the minimax duality gap decreases as the number of generators increases. In this article, the mode collapse issue is also investigated and showed that this architecture effectively alleviates the mode collapse issue. One of the significant advantages of this architecture is that it can be applied to all variants of GANs, e.g., Wasserstein GAN, vanilla GAN, etc. Additionally, with an extra condition on the expressive power of generators, it is shown that Stackelberg GAN can achieve -approximate equilibrium with generator .
Furthermore, Ghosh et al. in  proposed a multi-generator and single discriminator architecture for GANs named Multi-Agent Diverse Generative Adversarial Networks (MAD-GAN). In this paper, different generators capture varied, high probability modes, and the discriminator is designed such that, along with finding the real and fake samples, identifies the generator that generated the given fake sample . It is shown that at convergence, the global optimum value of is achieved, where k is the number of generators.
Comparing presented models in  and , in MAD-GAN  multiple generators are combined with the assumption that the generators and the discriminator have infinite capacity, but in the Stackelberg GAN  there is no assumption on the model capacity. Also, in MAD-GAN  the generators share common network parameters, although, in the Stackelberg GAN  various sampling schemes beyond the mixture model is allowed, and each generator has free parameters.
The assumption that increasing generators will cover the whole data space is not valid in practice. So Hoang et al. in , in contrast with , approximate data distribution by forcing generators to capture a subset of data modes independently of those of others instead of forcing generators by separating their samples. Thus, they established a minimax formulation among a classifier, a discriminator, and a set of generators. The classifier determines which generator generates the sample by performing multi-class classification. Each generator is encouraged to generate data separable from those produced by other generators because of the interaction between generators and the classifier. In this model, multiple generators create the samples. Then one of them will be randomly picked as the final output similar to the mechanism of a probabilistic mixture model. Therefore they theoretically prove that, at the equilibrium, the Jensen-Shannon divergence (JSD) between final output and the data distribution is minimal. In contrast, the JSD amongst generators’ distributions is maximal, hence the mode collapse problem is effectively avoided. Moreover, the computational cost that is added to the standard GAN is minimal in the suggested model by applying parameter sharing. The proposed model can efficiently scale to large-scale datasets as well.
Ke et al. propose a new architecture, multiagent distributed GAN (MADGAN) in this article . In this framework, the discriminator is considered as a leader, and the generator is considered as a follower. Also, this article is based on the social group wisdom and the influence of the network structure on agents. MADGAN can have a multi-generator and multi-discriminator architecture (e.g., two discriminators and four generators) as well as multiple generators and single discriminator architecture, which is our discussed topic in this section. One of the vital contributions of MADGAN is that it can train multiple generators simultaneously, and the training results of all generators are consistent.
Furthermore, in Message Passing Multi-Agent Generative Adversarial Networks  it is proposed that with two generators and one discriminator that communicate through message passing better image generation can be achieved. In this paper, there are two objectives such as competing and conceding. The competing is introduced based on the fact that the generators compete with each other to get better scores for their generations from the discriminator. However, the conceding is introduced based on the fact that the two generators try to guide each other in order to get better scores for their generations from the discriminator and ensures that the message sharing mechanism guides the other generator to generate better than itself. Generally, in this paper, innovative architectures and objectives aimed at training multi-agent GANs are presented.
Iv-B2 One generator, Multiple discriminators
The multi-discriminators are constructed with homogeneous network architecture and trained for the same task from the same training data. In addition to introducing a multi-discriminators schema, Durugkar et al. in , from the perspective of the game theory, show that because of these similarities, the discriminators act like each other; thus, they will converge to similar decision boundaries. In the worst case, they may even converge to a single discriminator. So, Jin et al. in  by discriminator discrepancy loss (DDL), the multiplayer minimax game unifies the optimization of DDL and the GAN loss, seeking an optimal trade-off between the accuracy and diversity of multi discriminators. Compared to , Hardy et al. in  distributed discriminators over multiple servers. Thus, they can train over datasets that are spread over numerous servers.
In FakeGAN proposed in , Aghakhani et al. use two discriminators and one generator. The discriminators use the Monte Carlo search algorithm to evaluate and pass the intermediate action-value as the reinforcement learning (RL) reward to the generator. The generator is modeled as a stochastic policy agent in RL . Instead of one batch in , Mordido et al. in  divide generated samples into multiple micro-batch. Then update each discriminator’s task to discriminate between different samples. Samples coming from its assigned fake micro-batch and samples from the micro-batches assign to the other discriminator, together with the real samples.
Unlike , Nguyen et al. in  combined the Kullback-Leibler (KL) and reverse KL divergence (the measure of how one probability distribution is different from a second) into a unified objective function. Combining these two measures can exploit the divergence’s complementary statistical properties to diversify the estimated density in capturing multi modes effectively. From the perspective of game theory in , there are two discriminators and one generator with the analogy of a three-player minimax game. In this case, there is a two pair of players which are playing two minimax game simultaneously. In one of the games, the discriminator rewards high scores for samples from data distribution (reverse KL divergence) (4), while another conversely rewards high scores for samples from the generator, and the generator produces data to fool both two discriminators (KL divergence) (5).
hyperparameters are being used to control and stabilize the learning method.
Minimizing the Kullback-Leibler (KL) divergence between data and model distributions covers multiple mods but may produce completely unseen and potentially undesirable samples. In reverse KL divergence, it is observed that optimization towards the reverse KL divergence criteria mimics the mod seeking process where the concentrates on a single mode of while ignoring other modes.
Iv-B3 Multiple generators, Multiple discriminators
The existence of equilibrium has always been considered one of the open theoretical problems in this game between generator and discriminator. Arora et al. in  turn to infinite mixtures of generator’s deep nets in order to investigate the existence of equilibria. Unsurprisingly, equilibrium exists in an infinite mixture. Therefore, in  showed that a mixture of a finite number of generators and discriminators can approximate min-max solution in GANs. This implies that an approximate equilibrium can be achieved with a mixture (not too many) of generators and discriminators. In this article 
, a heuristic approximation to the mixture idea is proposed to introduce a new framework for training called MIX+GAN: use a mixture of T components, where T is as large as allowed by the size of GPU memory (usually T5).In fact, a mixture of T generators and T discriminators are trained which share the same network architecture, but have their own trainable parameters. Maintaining a mixture represents maintaining a weight for the generator which corresponds to the probability of selecting the output of
. These weights for the generator are updated by backpropagation. This heuristic can be applied to existing methods like DCGAN, W-GAN, etc. Experiments show MIX+GAN protocol improves the quality of several existing GAN training methods and can lead to more stable training.
As we mentioned earlier, one of the significant challenges in GAN algorithms is their convergence. Refer to this paper  this challenge is a result of the fact that cost functions may not converge using gradient descent in the minimax game between the discriminator and the generator. Convergence is also one of the considerable challenges in federated learning. This problem becomes even more challenging when data at different sources are not independent and identically distributed. Therefore,  proposed an algorithm for multi-generator and multi-discriminator architecture for training a GAN with distributed sources of non-independent-and-identically-distributed data sources named Federated Generative Adversarial Network (FedGAN). Local generators and discriminators are used in this algorithm. These generators and discriminators are periodically synced via an intermediary that averages and broadcasts the generator and discriminator parameters. In fact, results from stochastic approximation for GAN convergence and communication-efficient SGD for federated learning are connected by Rasouli et. al. in this article to address FedGAN converge. One of the notable results in  is that FedGAN has similar performance to general distributed GAN while it converges and reduces communication complexity as well.
In  multi-agent distributed GAN (MADGAN) framework is proposed based on the social group wisdom and the influence of the network structure on agents, in which the discriminator and the generator are regarded as the leader and the follower, respectively. The multi-agent cognitive consistency problem in the large-scale distributed network is addressed in MADGAN. In fact, in this paper 
the conditions of consensus are presented for a multi-generator and multi-discriminator distributed GAN by analyzing the existence of stationary distribution to the Markov chain of multiple agent states. The experimental results show that the generation effect of the generators trained by MADGAN can be comparable to that of the generator trained by GAN. More important, MADGAN can train multiple generators simultaneously, and the training results of all generators are consistent.
Iv-B4 One generator, One discriminator, One classifier
One of the issues that GANs face is catastrophic forgetting in the discriminator neural network. Self-supervised (SS) tasks were planned to handle this issue, however, these methods enable a seriously mode-collapsed generator to surpass the SS tasks. Tran et al. in  proposed new SS tasks, called Multi-class minimax game based Self-supervised tasks (MS) which is based on a multi-class minimax game , including a discriminator, a generator, and a classifier. The SS task is a 4-way classification task of recognizing one among the four image rotations (0, 90, 180, 270 degrees). The discriminator SS task is to train the classifier C that predicts the rotation applied to the real samples and the generator SS task is to train the generator G to produce fake samples for maximizing classification performance. The SS task helps the generator learn the data distribution and generate diverse samples by closing the gap between supervised and unsupervised image classification. The theoretical and experimental analysis showed that the convergence of this approach has progressed.
Li et al. in 
also used a classifier generating categorized text. The authors proposed a new framework Cyclic-Synthesized GAN (CS-GAN), which uses GAN, RNN, and RL to generate better sentences. The classifier position is to ensure that the generated text contains the label information and the RNN is a character predictor because the model is built at the character level to limit the large searching space. We can divide the generation process into two steps, first adding category information into the model and making the model generate category sentences respectively, then combine category information in GAN to generate labeled sentences. CS-GAN acts strongly in supervised learning, especially in the multi-categories datasets.
Iv-B5 One generator, One discriminator, One RL agent
With an AL agent, we can have fast and robust control over the GAN’s output or input. This architecture also can be used to optimize the generation process by adding an arbitrary (not necessarily differentiable) objective function to the model.
In , Cao et al. used this architecture for generating molecules and drug discovery. The authors encoded the molecules as the original graph-based representation, which has no overhead comparing to similar approaches like SMILES , which generates a text sequence from the original graph. For the training part, authors were not only interested in generating chemically valid compounds, but also tried to optimize the generation process toward some non-differentiable metrics(e.g., how likely the new molecule is water-soluble or fat-soluble) using RL agent. In Molecular GAN (MolGAN), an external software will compute the RL loss for each molecule. The linear combination of RL loss and WGAN loss is utilized by the generator.
Weininger et al. in  tackled the same problem. Comparing to , they encoded the molecules as text sequences by using SMILES, the string representation of the molecule, not the original graph-based one. They presented Objective-Reinforced Generative Adversarial Networks (ORGAN), which is built on SeqGAN  and their RL agent uses REINFORCE , a gradient-based approach instead of deep deterministic policy gradient (DDPG) , an off-policy actor-critic algorithm which Cao et al. used in . MolGAN gains better chemical property scores comparing to ORGAN, but this model suffers from mode collapse because both the GAN and the RL objective do not encourage generating diverse outputs; alternatively, the ORGAN RL agent depends on REINFORCE, and the unique score is optimized penalizing non-unique outputs.
For controling the generator, we can also use an RL agent. Sarmad et al. in  presented RL-GAN-Net,a real-time completion framework for point cloud shapes. Their suggested architecture is the combination of an auto-encoder (AE), a reinforcement learning (RL) agent and a latent-space generative adversarial network (l-GAN). Based on the pre-trained AE, the RL agent selects the proper seed for the generator. This idea of controlling the GAN’s output can open up new potentialities to overcome the fundamental instabilities of current deep architectures.
|Reference||Methodology and Contributions||Pros||Cons|
|Multiple generators, One discriminator: Subsection IV-B1|
|Stackelberg GAN||Tackling the instability problem in the training procedure with multi-generator architecture||More stable training performances, alleviate the mode collapse||-|
|MADGAN||A multiagent distributed GAN framework based on the social group wisdom||Simultaneously training of multiple generators, consistency of all generators’ training results||-|
|MAD-GAN ||A multi-agent diverse GAN architecture||Capturing diverse modes while producing high-quality samples||Assumption of infinite capacity for players, global optimal is not practically reachable|
|MGAN ||Encouraging generators to generate separable data by classifier||Overcoming the mode collapsing, diversity||-|
|||An innovative message passing model, where messages being passed among generators||Improvement in image generation, valuable representations from message generator||-|
|One generator, Multiple discriminators: Subsection IV-B2|
|DDL-GAN ||Using DDL||Diversity||Only applicable to multiple discriminators|
|D2GAN ||Combine KL, reverse KL||Quality and diversity, scalable for large-scale datasets||
Not powerfull as the combination of autoencoder or GAN
|GMAN ||Multiple discriminators||Robust to mode collapse||Complexity, converge to same outputs|
|microbatch GAN ||Using microbatch||Mitigate mode collapse||-|
|MD-GAN ||Parallel computation, distributed data||Less communication cost, computation complexity||-|
|FakeGAN ||Text classification||-||-|
|Multiple generators, Multiple discriminators: Subsection IV-B3|
|||Tackling generalization and equilibrium in GANs||Improve the quality of several existing GAN training methods||Aggregation of losses with an extra regularization term, discourages the weights being too far away from uniform|
|MADGAN||Address the multiagent cognitive consistency problem in large-scale distributed network||Simultaneously training of multiple generators, consistency of all generators’ training results||-|
|FedGAN||A multi-generator and multi-discriminator architecture for training a GAN with distributed sources||Similar performance to general distributed GAN with reduction in communication complexity||-|
|One generators, One discriminator, One classifier: Subsection IV-B4|
|CS-GAN ||Combile RNN, GAN, and RL, Use a classifier to validate category, A character level model||generate sentences based on category, limiting action space||-|
|||Multi-class minimax game based self-supervised tasks,||Improve convergence, can integrate into GAN models||-|
|One generators, One discriminator, One RL agent: Subsection IV-B5|
|MolGAN ||Use Original graph-structured data,use RL objective to generate specific chemical property||Better chemical property scores, no overhead in representation||Susceptible to mode collapse|
|ORGAN ||Encode molecules as text sequences, control properties of generated samples with RL, use Wasserstein distance as loss function||Better result than trained RNNs via MLE or SeqGAN||Overhead in representation, works only on sequential data|
|RL-GAN-Net ||Use RL to find correct input for GAN, Combine AE, RL and l-GAN||A real time point cloud shape completion, less complexity||-|
Iv-C Modified Learning Algorithm
This category covers methods in which the proposed improvements involve modification in learning methods. Here, in this section, we turn our attention to the literature which combines the other learning approaches such as fictitious play and reinforcement learning with GANs.
Different variation of GANs which are surveyed in IV-C1 study GAN training process as a regret minimization problem instead of the popular view which seeks to minimize the divergence between real and generated distributions. As another learning method, subsection IV-C2 utilizes fictitious play to simulate the training algorithm on GAN. IV-C3 provides a review on the proposed GAN models that are used a federated learning framework which trains across distributed sources to overcome the data limitation of GANs. Researches in IV-C4 seek to make a connection between GAN and RL. Table V summarizes the contributions, pros and limitations of literature reviewed in this catgory.
Iv-C1 No-regret learning
The best response algorithms for GAN are often computationally intractable, and they do not lead to convergence and have cycling behavior even in simple games. However, the simple solution, in that case, is to average the iterates. Regret minimization is the more suitable way to think about GAN training dynamics. In , Kodali et al. propose studying GAN training dynamics as a repeated game that both players use no-regret algorithms. Also, the authors show that the GAN game’s convex-concave case has a unique solution. If G and D have enough capacity in the non-parametric limit and updates are made in the function space, the GAN game is convex-concave. It also can be guaranteed convergence (of averaged iterates) using no-regret algorithms. With standard arguments from game theory literature, the authors show that the discriminator does not need to be optimal at each step.
In contrast to , much of the recent developments  are based on the unrealistic assumption that the discriminator is playing optimally; this corresponds to at least one player using the best-response algorithm. But in the practical case with neural networks, these convergence results do not hold because the game objective function is non-convex. In non-convex games, global regret minimization and equilibrium computation are computationally hard. Moreover, Kodali et al. in  also analyze GAN training’s convergence from this point of view to understand mode collapse. They show that mode collapse happens because of undesirable local equilibria in this non-convex game (accompanied by sharp gradients of the discriminator function around some real data points). Furthermore, the authors show that a gradient penalty scheme can avoid the mode collapse by regularizing the discriminator to constrain its gradients in the ambient data space.
In  compares to , although Grnarova et al. use regret minimization, they provide a method that provably converges to an MN equilibrium. Because the minimax value of pure strategy for the generators is always higher than the minimax value of the mix equilibrium strategy of generators; thus, the generators are more suitable. This convergence happens for semi-shallow GAN architectures using regret minimization procedures for every player. Semi-shallow GAN architectures are architectures that the generator is any arbitrary network, and the discriminator consists of a single layer network. This method is done even though the game induced by such architectures is not convex-concave. Furthermore, they show that the minimax objective of the generator’s equilibrium strategy is optimal for the minimax objective.
Iv-C2 Fictitious play
GAN is a two-player zero-sum game with a repeated game as the training process. If the zero-sum game is played repeatedly between two rational players, they try to increase their payoff. Let show the action taken by player at time and are previous actions chosen by player . So player can choose the best response, assuming player is choosing its strategy according to the empirical distribution of . Thus, the expected utility is a linear combination of utilities under different pure strategies. So we can assume that each player plays the best pure response at each round. In the game theory, this learning rule is called fictitious play and can help us find the Nash equilibrium. The fictitious play achieves a Nash equilibrium in two-player zero-sum games if the game’s equilibrium is unique. However, if there exist multiple Nash equilibriums, other initialization may yield other solutions.
By relating GAN with the two-player zero-sum game, Ge et al. in  design a training algorithm to simulate the fictitious play on GAN and provide a theoretical convergence guarantee. They also show that by assuming the best response at each update in Fictitious GAN, the distribution of the mixture outputs from the generators converges to the data distribution. The discriminator outputs converge to the optimum discriminator function. The authors in  use two queues D and G, to store the historically trained models of the discriminator and the generator. They also show that Fictitious GAN can effectively resolve some convergence issues that the standard training approach cannot resolve and can be applied on top of existing GAN variants.
Iv-C3 Federated learning
Data limitation is a common drawback in deep learning models like GANs. We can solve this issue by using distributed data from multiple sources, but this is difficult due to some reasons like privacy concerns of users, communication efficiency and statistical heterogeneity, etc. This brings the idea of using federated learning in GANs to address these subjects [56, 12].
Rasouli et al. in , proposed a federated approach to GANs, which trains over distributed sources with non-independent-and-identically-distributed data sources. In this model, every K time steps of local gradient, agents send their local discriminator and generator parameters to the intermediary and receive back the synchronized parameters. Due to the average communication per round per agent, FedGAN is more efficient compare to general distributed GAN. Experiments also proved FedGAN is robust by increasing K. For proving the convergence of this model, the authors connect the convergence of GAN to convergence of an Ordinary Differential Equation (ODE) representation of the parameter updates  under equal or two time-scale updates of generators and discriminators. Rasouli et al. showed that the FedGAN ODE representation of parameters update asymptotically follows the ODE representing the parameter update of the centralized GAN. So by using the existing results for centralized GAN, FedGAN also converges.
Fan et al. in  also proposed a generative learning model using a federated learning framework. The aim is to train a unified central GAN model with the combined generative models of each client. Fan et al. examine 4 kinds of synchronization strategies, synchronizing each the central model of D and G to every client (Sync D&G) or simply sync the generator or the discriminator (Sync G or Sync D) or none of them (Sync None). In situations where communication costs are high, they recommend Sync G while losing some generative potential, otherwise synchronize both D and G. 
results showed that federate learning is commonly robust to the number of agents with Independent and Identically Distributed (IID) and fairly non-IID training data. However, for highly skewed data distribution, their model performed abnormality due to weight divergence.
Iv-C4 Reinforcement learning
Cross-modal hashing tries to map different multimedia data into a common Hamming space, realizing fast and flexible retrieval across different modalities. Cross-modal hashing has two weaknesses: (1) Depends on large-scale labeled cross-modal training data. (2) Ignore the rich information contained in a large amount of unlabeled data across different modalities. So Zhang et al. in  propose Semi-supervised Cross-modal Hashing GAN (SCH-GAN) that exploits a large amount of unlabeled data to improve hashing learning. The generator takes the correlation score predicted by the discriminator as a reward and tries to pick margin examples of one modality from unlabeled data when giving another modality query. The discriminator tries to predict the correlation between query and chosen examples of the generator using Reinforcement learning.
An agent trained using RL is only able to achieve the single task specified via its reward function. So Florensa et al. in  provide Goal Generative Adversarial Network (Goal GAN). This method allows an agent to automatically discover the range of tasks at the appropriate level of complexity for the agent in its environment with no prior knowledge about the environment or the tasks being performed and allows an agent to generate its own reward functions. The goal discriminator is trained to evaluate whether a goal is at the appropriate level of difficulty for the current policy. The goal generator is prepared to generate goals that meet these criteria.
GAN has limitations when the goal is for generating sequences of discrete tokens. First, it is hard to provide the gradient update from the discriminator to the generator when the outputs are discrete. Second, The discriminator can only reward an entire sequence after generation; for a partially generated sequence, it is non-trivial to balance how well it is now and how well it will be in the future as the whole sequence. Yu et al. in  proposed Sequence GAN (SeqGAN) and model the data generator as a stochastic policy in reinforcement learning (RL). The RL reward signal comes from the discriminator decided on a complete sequence and, using the Monte Carlo search, is passed back to the intermediate state-action steps. So in this method, they care about the long-term reward at every timestep. The authors consider not only the fitness of previous tokens but also the resulted future outcome. ”This is similar to playing the games such as Go or Chess, where players sometimes give up the immediate interests for the long-term victory” .
The main problem in  is that the classifier’s reward cannot accurately reflect the novelty of text. So, in  in comparison to , Yu et al. assign a low reward for repeatedly generated text and high reward for ”novel” and fluent text, encouraging the generator to produce diverse and informative text, and propose a novel language-model based discriminator, which can better distinguish novel text from repeated text without the saturation problem. The generator reward consists of two parts, the reward at the sentence level and that at the word level. The authors maximize the reward of real text and minimize fake text rewards to train the discriminator. The reason for minimizing the reward of generated text is that the text that is repeatedly generated by the generator can be identified by the discriminator and get a lower reward. The motivation of maximizing the reward of real-world data lies in that not only the uncommon text in the generated data can get a high reward, but also the discriminator can punish low-quality text to some extent.
The same notion of SeqGAN can be applied in domains such as image captioning. Image captioning’s aim is to describe an image with words. Former approaches for image captioning like maximum likelihood method suffer from a so-called exposure bias problem which happens when the model tries to produce a sequence of tokens based on previous tokens. In this situation, the model may generate tokens that were never seen in training data. Yan et al. in  used the idea of SeqGAN to address the problem of exposure bias. In this scheme, the image captioning generator is considered as the generator in the GAN framework whose aim is to describe the images. The discriminator has two duties, the first is to distinguish the real description and generated one and the second is to figure out if the description is related to the image or not. To deal with the discreteness of the generated text, the discriminator is considered as an agent which produces a reward for the generator. Although, lack of intermediate reward is another problem which solves by using the Monte Carlo roll-out strategy same as SeqGAN.
Finding new chemical compounds and generating molecules are also challenging tasks in a discrete setting.  and  tackled this problem and proposed two models that rely on SeqGAN. The main difference is adding an RL component to the basic architecture of GAN, where we discussed in subsection IV-B5.
The idea behind SeqGAN has also been applied to generating sentences with certain labels. Li et al. in  introduced CS-GAN, which consists of a generator and a descriptor (discriminator and classifier). In this model, the generator takes an action, and the descriptor task is to identify sentence categories by returning the reward. Details of this model are explained in subsection IV-B4.
Aghakhani et al. in  introduce a system that for the first time expands GANs for a text classification task, specifically, detecting deceptive reviews (FakeGAN). Previous models for text classification have limitations: (1) Biased problems like Recurrent NN, where later words in a text have more weight than earlier words. (2) Correlation with the window size like CNN. Unlike standard GAN with a single Generator and Discriminator, FakeGAN uses two discriminators and one generator. The authors modeled the generator as a stochastic policy agent in reinforcement learning (RL) and used the Monte Carlo search algorithm for the discriminators to estimate and pass the intermediate action-value as the RL reward to the generator. One of the discriminators tries to distinguish between truthful and deceptive reviews, whereas the other tries to distinguish between fake and real reviews.
Ghosh et al. in  use GANs for learning the handwriting of an entity and combine it with reinforcement learning techniques to achieve faster learning. The generator can generate words looking similar to the reference word, and the discriminator network can be used as an OCR (optical character recognition) system. The concept of reinforcement learning comes into play when letters need to be joined to form words, such as the spacing between characters and strokes from one note to another, and provide suitable rewards or penalties for the generator to learn the handwriting with greater accuracy.
The optimized generation of sequences with particular desired goals is challenging in sequence generation tasks. Most of the current work mainly learns to generate outputs that are close to the real distribution. However, in many applications, we need to generate data similar to real ones and have specific properties or attributes. Hossam et al. in 
introduce the first GAN-controlled generative model for sequences that address the diversity issue in a principled approach. The authors combine GAN and RL policy learning benefits while avoiding mode-collapse and high variance drawbacks. The authors show that if only pure RL is applied with the GAN-based objective, the realistic quality of the output might be sacrificed for the cause of achieving a higher reward. For example, in the text-generation case, by generating sentences in which few words are repeated all the time, the model could achieve a similar quality score. Hence, combining a GAN-based objective with RL promotes the optimization process of RL to stay close to the actual data distribution. This model can be used for any GAN model to enable it to optimize the desired goal according to the given task directly.
A novel RL-based neural architecture search (NAS) methodology is proposed for GANs in 
by Tian et al. Markov decision process formulation is applied to redefine the issue of neural architecture search for GANs in this article, therefore a more effective RL-based search algorithm with more global optimization is achieved. Additionally, data efficiency can be improved due to better facilitation of off-policy RL training by this formulation. On-policy RL is used in most of the formerly proposed search methods employed in RL-based GAN architecture, which may have a significantly long training time because of limited data efficiency. Agents in off-policy RL algorithms are enabled to learn more accurately as these algorithms use past experience. However, using off-policy data can lead to unstable policy network training because these training samples are systematically different from the on-policy ones . A new formulation in  supports the off-policy strategy better and lessens the instability problem.
|Reference||Methodology and Contributions||Pros||Cons|
|No-regret learning: Subsection IV-C1|
|DRAGAN ||Applying no-regret algorithm, new regularizer||High stability across objective functions, mitigates mode collapse||-|
|Chekhov GAN ||Online learning algorithm for semi concave game||Converge to mixed NE for semi shallow discriminator||-|
|Fictitious play: Subsection IV-C2|
|Fictitious GAN ||Fictitious (historical models)||Solve the oscillation behavior, solve divergence issue on some cases, applicable||Applied only on 2 player zero-sum games|
|Federated learning: Subsection IV-C3|
|FedGAN ||Communication-efficient distributed GAN subject to privacy constraints, connect the convergence of GAN to ODE||Prove the convergence, less communication complexity compare to general distributed GAN||-|
|||Using a federated learning framework||Robustness to the number of clients with IID and moderately non-IID data||Performs anomaly for highly skewed data distribution, accuracy drops with non-IID data|
|Reinforcement learning: Subsection IV-C4|
|||Generate diverse appropriate level of difficulty set of goal||-||-|
|Diversity-promoting GAN ||New objective function, generate text||Diversity and novelty||-|
|||Using GAN for cross-model hashing||Extract rich information from unlabeled data||-|
|SeqGAN ||Extending GANs to generate sequence of discrete tokens||Solve the problem of discrete data||-|
|FakeGAN ||Text classification||-||-|
|CS-GAN ||Combine RL, GAN, RNN||More realistic, faster||-|
|ORGAN ||RL agent + SeqGAN||Better result than RNN trained via MLE or SeqGAN||Works only on sequential data|
|MolGAN ||RL agent + SeqGAN||optimizing non-differentiable metrics by RL & Faster training time||Susceptible to mode collapse|
|OptiGAN ||Combining MLE and GAN||Used for different models and goals||-|
|||Redefining the issue of neural architecture search for GANs by applying Markov decision process formulation||More effective RL-based search algorithm, smoother architecture sampling||-|
V Discussion, Conclusion and Future Work
Although there are various studies that have explored different aspects of GANs, but, several challenges still remain should be investigated. In this section, we discuss such challenges, especially in the discussed subject, game of GANs, and propose future research directions to tackle these problems.
V-a Open Problems and Future Directions
While GANs achieve the state-of-the-art performance and compelling results on various generative tasks, but, these results come at some challenges, especially difficulty in the training of GANs. Training procedure suffers from instability problems. While reaching to Nash equilibrium, generator and discriminator are trying to minimize their own cost function, regardless of the other one. This can cause the problem of non-convergence and instability because of minimizing one cost can lead to maximizing the other one’s cost. Another main problem of GANs which needs to be addressed is mode collapse. This problem becomes more critical for unbalanced data sets or when the number of classes is high. In other hand, when discriminator works properly in distinguishing samples, generators gradients vanishes. This problem which is called vanishing gradient should also be considered. Compared with other generative models, the evaluation of GANs is more difficult. This is partially due to the lack of appropriate metrics. Most evaluation metrics are qualitative rather than being quantitative. Qualitative metrics such as human examination of samples, is an arduous task and depends on the subject.
More specifically, as the authors in  expressed one of the most important future direction is to improve theoretical aspects of GANs to solve problems such as model collapse, non-convergence, and training difficulties. Although there have many works on the theory aspects, most of the current training strategies are based on the optimization theory, whose scope is restricted to local convergence due to the non-convexity, and the utilization of game theory techniques is still in its infancy. At present, the game theory variant GANs are limited, and much of them are highly restrictive, and are rarely directly applicable. Hence, there is much room for research in game-based GANs which are involving other game models.
From the convergence viewpoint, most of the current training methods converge to a local Nash equilibrium, which can be far from an actual and global NE. While there is vast literature on the GAN’ training, only few researches such as  formulation the training procedure from the mixed NE perspective, and investigation for mixed NE of GANs should be examined in more depth. On the other hand, existence of an equilibrium does not imply that it can be easily find by a simple algorithm. In particular, training GANs requires finding Nash equilibria in non-convex games, and computing the equilibria in these game is computationally hard. In the future, we are expected to see more solutions tries to make GAN training more stable and converge to actual NE.
Multi-agent models such as [80, 22, 34, 23, 11, 40, 32, 1, 50, 52, 4, 56, 41] are computationally more complex and expensive than two-player models and this factor should be taken into account in development of such variants. Moreover, in multi-generator structures the divergence among the generators should be considered such that all of them do not generate the same samples.
One of the other directions that we expect to witness the innovation in the future is integrating GANs with other learning methods. There are a variety of methods in multi-agent learning literature which should be explored as they may be useful when applying in the multi-agent GANs. In addition, it looks like much more research on the relationship and combination between GANs and current applied learning approach such as RL is still required, and it also will be a promising research direction in the next few years. Moreover, GAN is proposed as unsupervised learning, but adding a certain number of labels, specially in practical applications, can substantially improve its generating capability. Therefore, how to combine GAN and semi-supervised learning is also one of the potential future research topics.
As the final note, GAN is a relatively novel and new model with significant recent progress, so the landscape of possible applications remains open for exploration. The advancements in solving the above challenges can be decisive for GANs to be more employed in real scenarios.
We conduct this review of recent progresses in GANs using the game theory which can serve as a reference for future research. Comparing this survey to the other reviews in the literature, and considering the many published works which deal with GAN challenges, we emphasize on the theory aspects. This is all done through the taking of a game theory perspective based on our proposed taxonomy. In this survey, we first provided detailed background information on the game theory and GAN. In order to present a clear roadmap, we introduced our taxonomy which have three major categorizes includes the game model, architecture and learning approaches. Following the proposed taxonomy, we discussed each taxonomy separately in detail and presented the GANs based solutions in each subcategory. We hope this paper is beneficial for researchers interested in this field.
-  (May 2018) Detecting deceptive reviews using generative adversarial networks. In 2018 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, pp. 89–95. Cited by: Fig. 2, Fig. 2, §IV-B2, §IV-C4, TABLE IV, TABLE V, §V-A.
-  (February 2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 10 (4), pp. e1345. Cited by: §III-C, §III-C, §III.
-  (March 2021) Applications of generative adversarial networks (GANs): an updated review. Archives of Computational Methods in Engineering 28 (3), pp. 525–552. Cited by: §I, §III-B, §III.
-  (2017) Generalization and equilibrium in generative adversarial nets (GANs). arXiv preprint arXiv:1703.00573. Cited by: Fig. 2, §IV-B3, TABLE IV, §V-A.
-  (September 2015) Scheduled sampling for sequence prediction with recurrent neural networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS’15, Cambridge, MA, USA, pp. 1171–1179. Cited by: §IV-C4.
-  (2019) The six fronts of the generative adversarial networks. arXiv preprint arXiv:1910.13076. Cited by: §I-A, §III-A, §III.
-  (December 2018) Recent advances of generative adversarial networks in computer vision. IEEE Access 7, pp. 14985–15006. Cited by: §I, §III-C, §III, §V-A.
-  (2018) Generative adversarial networks: an overview. IEEE Signal Processing Magazine 35 (1), pp. 53–65. Cited by: §I-A, §III-A, §III.
-  (2018) MolGAN: an implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973. Cited by: Fig. 2, Fig. 2, §IV-B5, §IV-B5, §IV-C4, TABLE IV, TABLE V.
-  (2019) A survey on GANs for anomaly detection. arXiv preprint arXiv:1906.11632. Cited by: §III-C, §III.
-  (2016) Generative multi-adversarial networks. arXiv preprint arXiv:1611.01673. Cited by: Fig. 2, §IV-B2, §IV-B2, TABLE IV, §V-A.
-  (2020) Federated generative adversarial learning. arXiv preprint arXiv:2005.03793. Cited by: Fig. 2, §IV-C3, §IV-C3, TABLE V.
A survey of differentially private generative adversarial networks.
The AAAI Workshop on Privacy-Preserving Artificial Intelligence, New York, NY, USA. Cited by: §III-B.
-  (2020) GANs may have no Nash equilibria. arXiv preprint arXiv:2002.09124. Cited by: Fig. 2, §IV-A2, TABLE III.
-  (2017) Many paths to equilibrium: GANs do not need to decrease a divergence at every step. arXiv preprint arXiv:1710.08446. Cited by: §I.
Automatic goal generation for reinforcement learning agents.
Proceedings of the 35th International Conference on Machine Learning, Vol. 80, Stockholm, Sweden, pp. 1515–1528. Cited by: Fig. 2, §IV-C4, TABLE V.
-  (July 2020) Generative adversarial networks as stochastic Nash games. arXiv preprint arXiv:2010.10013. Cited by: Fig. 2, §IV-A1, TABLE III.
-  (2020) Generative adversarial networks for spatio-temporal data: a survey. arXiv preprint arXiv:2008.08903. Cited by: §III-C, §III.
-  (September 2018) Fictitious GAN: training GANs with historical models. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 119–134. Cited by: TABLE II, Fig. 2, §IV-C2, TABLE V.
-  (November 2020) Synthetic observational health data with GANs: from slow adoption to a boom in medical research and ultimately digital twins?. Cited by: §III-C, §III.
-  (2016) Handwriting profiling using generative adversarial networks. arXiv preprint arXiv:1611.08789. Cited by: Fig. 2, §IV-C4, TABLE V.
Multi-agent diverse generative adversarial networks.
Proceedings of the IEEE conference on computer vision and pattern recognition, Salt Lake City, Utah, pp. 8513–8521. Cited by: Fig. 2, §IV-B1, §IV-B1, §IV-B1, TABLE IV, §V-A.
-  (2016) Message passing multi-agent GANs. arXiv preprint arXiv:1612.01294. Cited by: Fig. 2, §IV-B1, TABLE IV, §V-A.
-  (July 2020) A survey on the progression and performance of generative adversarial networks. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, pp. 1–8. Cited by: §I-A, §III-A, §III-C, §III.
-  (June 2019) A review: generative adversarial networks. In 2019 14th IEEE Conference on Industrial Electronics and Applications (ICIEA), Xi’an, China, pp. 505–510. Cited by: §I-A, §II-B2, §III-A, §III.
-  (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §I, §I.
-  (December 2014) Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, Cambridge, MA, USA, pp. 2672–2680. Cited by: §II-B2, §II-B3.
-  (2016) NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160. Cited by: §I-A, §II-B1, §III-A, §III, Fig. 2, §IV-C1.
-  (2017) An online learning approach to generative adversarial networks. arXiv preprint arXiv:1706.03269. Cited by: Fig. 2, §IV-C1, TABLE V.
-  (2020) A review on generative adversarial networks: algorithms, theory, and applications. arXiv preprint arXiv:2001.06937. Cited by: §I-A, §III-A, §III.
-  (2017) Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models. arXiv preprint arXiv:1705.10843. Cited by: Fig. 2, Fig. 2, §IV-C4, TABLE IV, TABLE V.
-  (May 2019) MD-GAN: multi-discriminator generative adversarial networks for distributed datasets. In 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, pp. 866–877. Cited by: Fig. 2, §IV-B2, TABLE IV, §V-A.
-  (2018) Comparative study on generative adversarial networks. arXiv preprint arXiv:1801.04271. Cited by: §I-A, §III-A, §III.
-  (February 2018) MGAN: training generative adversarial nets with multiple generators. In International Conference on Learning Representations, Vancouver Canada. Cited by: Fig. 2, §IV-B1, TABLE IV, §V-A.
-  (February 2019) How generative adversarial networks and their variants work: an overview. ACM Computing Surveys (CSUR) 52 (1), pp. 1–43. Cited by: §I-A, §I, §I, §III-A, §III.
-  (2020) OptiGAN: generative adversarial networks for goal optimized sequence generation. arXiv preprint arXiv:2004.07534. Cited by: Fig. 2, §IV-C4, TABLE V.
-  (June 2019) Finding mixed Nash equilibria of generative adversarial networks. In International Conference on Machine Learning, Long Beach, CA, USA, pp. 2810–2819. Cited by: Fig. 2, §IV-A3, TABLE III, §V-A.
-  (2020) A survey on generative adversarial networks: variants, applications, and training. arXiv preprint arXiv:2006.05132. Cited by: §I-A, §III-A, §III.
-  (July 2020) Generative adversarial training and its utilization for text to image generation: a survey and analysis. Journal of Critical Reviews 7 (8), pp. 1455–1463. Cited by: §III-C, §III-C, §III.
-  (July 2020) A multi-player minimax game for generative adversarial networks. In 2020 IEEE International Conference on Multimedia and Expo (ICME), London, UK, pp. 1–6. Cited by: Fig. 2, §IV-B2, §IV-B2, TABLE IV, §V-A.
-  (October 2020) Consistency of multiagent distributed generative adversarial networks. IEEE Transactions on Cybernetics, early access, pp. 1–11. Cited by: Fig. 2, Fig. 2, §IV-B1, §IV-B3, TABLE IV, §V-A.
-  (2017) On convergence and stability of GANs. arXiv preprint arXiv:1705.07215. Cited by: Fig. 2, §IV-C1, §IV-C1, §IV-C1, TABLE V.
-  (March 2021) Generative adversarial networks: a survey on applications and challenges. International Journal of Multimedia Information Retrieval 10 (1), pp. 1–24. Cited by: §I-A, §III-A, §III.
-  (2020) Regularization methods for generative adversarial networks: an overview of recent studies. arXiv preprint arXiv:2005.09165. Cited by: §III-B, §III.
-  (June 2018) A generative model for category text generation. Information Sciences 450, pp. 301–315. Cited by: §I, Fig. 2, Fig. 2, §IV-B4, §IV-C4, TABLE IV, TABLE V.
-  (January 2020) Sequence generative adversarial networks for wind power scenario generation. IEEE Journal on Selected Areas in Communications 38 (1), pp. 110–118. Cited by: Fig. 2.
-  (2015) Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Cited by: §IV-B5.
-  (2017) Are GANs created equal? a large-scale study. arXiv preprint arXiv:1711.10337. Cited by: §III-B, §III.
-  (December 2017) The numerics of GANs. In Proceedings from the conference ”Neural Information Processing Systems 2017., Vol. 30, Red Hook, NY, USA, pp. 1825–1835. Cited by: §IV-C3.
-  (March 2020) microbatchGAN: stimulating diversity with multi-adversarial discrimination. In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass, CO, USA, pp. 3050–3059. Cited by: Fig. 2, §IV-B2, TABLE IV, §V-A.
-  (December 1928) Zur theorie der gesellschaftsspiele. Mathematische annalen 100 (1), pp. 295–320. Cited by: §II-A.
-  (December 2017) Dual discriminator generative adversarial nets. Advances in neural information processing systems (NIPS 2017) 30, pp. 2670–2680. Cited by: Fig. 2, §IV-B2, TABLE IV, §V-A.
-  (2004) An introduction to game theory. Vol. 3, Oxford university press New York. Cited by: §II-A.
-  (August 2020) Loss functions of generative adversarial networks (GANs): opportunities and challenges. IEEE Transactions on Emerging Topics in Computational Intelligence 4 (4). Cited by: §III-B, §III.
-  (2019) Recent progress on generative adversarial networks (GANs): a survey. IEEE Access 7, pp. 36322–36333. Cited by: §I-A, §III-A, §III.
-  (2020) FedGAN: federated generative adversarial networks for distributed data. arXiv preprint arXiv:2006.07228. Cited by: Fig. 2, Fig. 2, §IV-B3, §IV-C3, §IV-C3, TABLE IV, TABLE V, §V-A.
-  (2020) Generative adversarial networks (GANs): an overview of theoretical model, evaluation metrics, and recent developments. arXiv preprint arXiv:2005.13178. Cited by: §I-A, §III-A, §III.
-  (January 2021) A survey on generative adversarial networks for imbalance problems in computer vision tasks. Journal of Big Data 8 (1), pp. 27. Cited by: §III-C, §III-C, §III.
-  (June 2019) RL-GAN-Net: a reinforcement learning agent controlled GAN network for real-time point cloud shape completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, pp. 5898–5907. Cited by: Fig. 2, §IV-B5, TABLE IV.
-  (2020) Generative adversarial networks (GANs): challenges, solutions, and future directions. arXiv preprint arXiv:2005.00065. Cited by: §I-A, §III-A, §III.
-  (February 2020) User mobility synthesis based on generative adversarial networks: a survey. In 2020 22nd International Conference on Advanced Communication Technology (ICACT), Phoenix Park, Korea (South), pp. 94–103. Cited by: §III-C, §III.
-  (2008) Multiagent systems: algorithmic, game-theoretic, and logical foundations. Cambridge University Press. Cited by: §II-A, §II-A.
-  (January 2016) Mastering the game of go with deep neural networks and tree search. Nature 529 (7587), pp. 484–489. Cited by: §IV-C4.
-  (August 2020) Creating artificial images for radiology applications using generative adversarial networks (GANs)–a systematic review. Academic Radiology 27, pp. 1175–1185. Cited by: §III-C, §III-C, §III.
-  (August 2020) Off-policy reinforcement learning for efficient and effective GAN architecture search. In European Conference on Computer Vision (ECCV), pp. 175–192. Cited by: Fig. 2, §IV-C4, TABLE V.
-  (2019) Self-supervised GAN: analysis and improvement with multi-class minimax game. Advances in Neural Information Processing Systems (NeurIPS 2019) 32, pp. 13253–13264. Cited by: Fig. 2, §IV-B4, TABLE IV.
-  (September 2020) Generative adversarial networks in digital pathology: a survey on trends and future potential. Patterns 1 (6), pp. 100089. Cited by: §III-C, §III-C, §III.
-  (2017) Generative adversarial networks: introduction and outlook. IEEE/CAA Journal of Automatica Sinica 4 (4), pp. 588–598. Cited by: §I-A, §I, Fig. 1, §III-A, §III.
-  (March 2020) A state-of-the-art review on image synthesis with generative adversarial networks. IEEE Access 8, pp. 63514–63537. Cited by: §III-C, §III.
-  (2019) Generative adversarial networks in computer vision: a survey and taxonomy. arXiv preprint arXiv:1906.01529. Cited by: §I, §III-C, §III.
-  (February 1998) Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of Chemical Information and Computer Sciences 28 (1), pp. 31–36. Cited by: §IV-B5, §IV-B5.
-  (2019) Stabilizing generative adversarial network training: a survey. arXiv preprint arXiv:1910.00927. Cited by: §I, §III-B, §III-C, §III.
-  (May 1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 (3), pp. 229–256. Cited by: §IV-B5.
-  (December 2017) A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology 22 (6), pp. 660–674. Cited by: §III-C, §III.
Diversity-promoting GAN: a cross-entropy based generative adversarial network for diversified text generation.
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, pp. 3940–3949. Cited by: Fig. 2, §IV-C4, TABLE V.
-  (August 2018) Image captioning using adversarial networks and reinforcement learning. In 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, pp. 248–253. Cited by: Fig. 2, §IV-C4.
-  (December 2019) Generative adversarial network in medical imaging: a review. Medical image analysis (MEDIA) 58, pp. 101552. Cited by: §III-C, §III-C, §III.
-  (March 2020) A review of generative adversarial networks and its application in cybersecurity. Artificial Intelligence Review 53 (3), pp. 1721–1736. Cited by: §III-C, §III.
-  (February 2017) SeqGAN: sequence generative adversarial nets with policy gradient. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, pp. 2852––2858. Cited by: Fig. 2, §IV-B5, §IV-C4, §IV-C4, TABLE V.
-  (2018) Stackelberg GAN: towards provable minimax equilibrium via multi-generator architectures. arXiv preprint arXiv:1811.08010. Cited by: Fig. 2, Fig. 2, §IV-A2, §IV-B1, §IV-B1, TABLE III, TABLE IV, §V-A.
-  (February 2018) SCH-GAN: semi-supervised cross-modal hashing by generative adversarial network. IEEE transactions on cybernetics 50 (2), pp. 489–502. Cited by: Fig. 2, §IV-C4.
-  (July 2018) Recent advance on generative adversarial networks. In 2018 International Conference on Machine Learning and Cybernetics (ICMLC), Chengdu, China, pp. 69–74. Cited by: §I-A, §III-A, §III, TABLE V.