Generative models are trained to learn the true data distribution from the training set and are capable of generating new data points when the training is completed. In recent years, they have been successfully applied to a wide range of applications, including image generation , stylization , semi-supervised classification , and natural language generation [3, 20, 29], etc. In this paper, we tackle the emerging task of text generation, which is typically modeled as a sequential discrete data generation process . Such tasks play a pivot role in many real world applications, such as machine translation 19, 22], and dialogue systems [33, 18].
The training of sequential text generation models has been greatly relying on applying teacher forcing5]. However, training the generative models with teacher forcing would suffer from exposure bias , i.e., the models are fed to their predicted data rather than the ground-truth data at inference time and thus result in generating poor samples due to the accumulated error. To address the exposure bias
issue, a major on-going research for text generation centers on utilizing adversarial training techniques to derive better text generation models. Generally, such attempts could be classified into the following two strands: the first line of approaches combine generative adversarial network (GAN) 
with reinforcement learning (RL), denoted as RL-based; the second line of approaches solely play the two-player adversarial game without using RL, denoted as RL-free.
Both RL-based and RL-free text generation approaches suffer from mode collapse, a notoriously known challenge for training GAN-based models . That is, as the adversarial training progresses, the generated distribution tends to contrast towards generating subset of modes for the data. As a result, the generator outputs repeated sentences and thus no longer expressively represents the data generating distribution. Such effect has been quantitatively evaluated in a recent study, which shows that the entropy of the generator’s output distribution would experience a clear drop when moving from MLE training to adversarial training phase . To derive better text generation models with GAN-based techniques, one critical thing is to achieve a better quality-diversity trade-off by efficiently slowing down the mode collapse of the adversarial generator, i.e., to let the generator get abundant gradient information from adversarial update for making its output more real (i.e., improve quality) while bearing with small mode collapse effect (i.e., decrease diversity). However, limited number of existing RL-based or RL-free approaches explicitly consider dealing with mode collapse of GAN training. In this work, we propose a cooperative training mechanism which explicitly tackles the challenge of mode collapse for adversarial training, resulting in an improved text generation model.
Overall, the contributions of this paper are three-fold. Firstly, we propose a novel cooperative training approach where we utilize a language model to efficiently shape the output distribution of the adversarial text generator. Our proposed approach could efficiently slow down the mode collapse of the adversarial text generator and thus lead the text generation towards a better quality-diversity trade-off. Secondly, to optimize the cooperative training loss for the generator, we propose a novel meta-learning mechanism. In our setting, the cooperative training task serves as a meta task and the adversarial training serves as a base task. Thus, our proposed approach ensures that the generator parameters after the adversarial update would be resistant for mode collapse. Thirdly, we conduct extensive experiments on synthetic and real-world datasets to demonstrate that our proposed approach is able to produce better text generation models in terms of both the quality and the diversity.
2 Related Work
Besides the conventional approaches of training language models with teacher forcing
, today’s approaches for text generation could be generally classified as RL-based or RL-free approaches. Most RL-based approaches formulate text generation as a Markov Decision Process (MDP). Often, the generator is updated by policy gradient algorithm or its variants using reward signals derived from GAN’s discriminator. Prominent examples for this type of approaches include SeqGAN , RankGAN , LeakGAN  and MaskGAN 
. The noisy reward signals derived from the discriminator model makes such RL-based models suffer from high-variance gradients to update the generator’s parameters. Besides high-variance of gradient, the RL-based approaches also face the difficulties brought by partial sequence evaluation, slow learning, and sensitive hyperparameters. Considering such challenges for the RL-based approaches, in this work, our proposed method resides in, but not restricted to, the category of RL-free approach for text generation. Prominent examples of RL-free approaches include TextGAN , FM-GAN , GSGAN , and RelGAN . Such approaches feed the generator with low variance gradient and often lead to more stable training.
Most of the adversarial text generation models are firstly pretrained by MLE, and then are continuously optimized by adversarial training under either RL-based or RL-free mechanism. When switched from MLE training to adversarial training phase, the generator models for both RL-based and RL-free approaches would suffer from mode collapse issue. In this work, our core intuition is to utilize a cooperatively trained language model to decelerate the mode collapse of adversarial training. Such intuition of utilizing language model to facilitate adversarial text generation aligns with the works proposed in [35, 24]. In 
, the discriminator for adversarial training is modeled as a language model, which maximizes the probability for real data and minimizes that for generated data. Furthermore, the output derived from the language model is adopted as reward signal to promote generation diversity under an RL-based set-up. Our work is mostly related to the cooperative training method proposed in, where a language model is trained online to offer a target distribution for minimizing the Jensen-Shannon divergence between the real data distribution and the generated distribution. In our work, we adopt a similar strategy to train the language model, but the cooperative training for the generator model is different from . Furthermore, we propose a distinct meta learning set-up to optimize the cooperative training loss for the generator. To the best of our knowledge, our work is the first attempt that adopts meta learning on text generation GANs.
The task of text generation is typically modelled as sequential discrete data generation process. Let be the data points drawn from an underlying data generating distribution . Each data point is represented as a sequence of discrete tokens: , where denotes the -th token and denotes the length of the sequence. Let denote the generator model parameterized by . Conventional text generation approaches typically train a language model with maximum likelihood estimation (MLE) as follows:
where the probability of each sequence x is represented in an autoregressive manner:
with denoting the sequence of previous tokens .
The approaches utilizing GANs for text generation attempt to play a two-player game between the generator and a discriminator . Let the discriminator be parameterized by . Under the adversarial set-up, the generator is trained to generate realistic sentences given samples from , and the discriminator attempts to distinguish between ’s generating distribution and the real data distribution . Thus, the above mentioned process could be formulated as an adversarial training mechanism as follows:
where the generator and discriminator attempt to minimize and maximize the function, respectively. We denote the adversarial loss in (1) in terms of the generator model and the discriminator model as and , respectively.
With the autoregressive generation process, the -th token is generated by sampling from the generator’s output distribution, conditioned on its previous tokens
. Performing such sampling introduces considerable difficulty for the generator to utilize the discriminator’s prediction outcome. That is, the backpropagation route for adversarial loss, i.e.,
becomes non differentiable w.r.t. the generator’s parameters , since would be zero due to the sampling. To overcome the above issue, the RL-based approaches mostly rely on the REINFORCE algorithm  or its variants to derive the gradient to optimize the generator, where the discriminator’s predictions could be utilized to derive reward signals. The RL-free approaches often relax the non-differentiable sampling function by some continuous approximations, such as soft-argmax  or gumbel-softmax . In this paper, our proposed approach adopts the gumbel-softmax relaxation which models the effect of sampling as introducing noise to the input so that the outputs become continuous and differentiable. Specifically, the noise is modeled by a Gumbel distribution, which is formed as follows:
where denotes the Gumbel noise to be applied to the
-th logits. With the Gumbel noise, the token for next stepis derived in a deterministic manner:
where denotes the logits output by the generator for sampling token , and denotes vocabulary size. To make the discriminator’s loss differentiable, the argmax operator is replaced by a softmax function , i.e., , where is a real-valued temperature hyperparameter, with .
Language generators trained with adversarial training mechanism (both RL-based and RL-free approaches) suffer from mode collapse when switched from teacher forcing to the adversarial training phase. In this section, we introduce a novel meta cooperative training algorithm to overcome such challenges. Overall, our objective is to achieve a better quality-diversity trade-off for the language generators via decelerating mode collapse of their adversarial training. That is, the algorithm allows the generator to get abundant gradient information from the adversarial training for increasing generation quality, while sacrificing little in terms of generation diversity. Overall, we engage a language model to decelerate the mode collapse of the generator’s output distribution. The language model is cooperatively trained with the generator during adversarial training. We utilize the output of language model over samples from real data distribution to shape the generator’s output distribution. Furthermore, the supervision is formulated with a meta optimization setup.
4.1 Cooperative Training Formulation
We introduce a cooperative training paradigm that engages an interleaved training procedure for an adversarial generator , an adversarial discriminator , and a language model , where denotes the parameters for the language model. Figure 1 depicts a high-level overview for the proposed cooperative training procedure. When the generator is trained by the adversarial loss, its generation diversity would progressively decrease for increasing the generation quality due to mode collapse issue. To overcome such challenge, we cooperatively train a language model . The language model would pose a supervision over ’s output distribution towards preserving desirable generation probability for the real data.
During the cooperative training process, the language model is optimized consistently by MLE loss. To offer a smoothly changing target distribution for the generator, it is trained with data from a mixture distribution with balanced samples from real data and generated data, i.e., . Formally, the cooperative training loss for updating the language model with MLE is defined in (2). It could be interpreted as minimizing the direct KL divergence between and an optimal mixture density model which has a distribution of .
Consistently updating the language model with samples from real data and using the teacher forcing loss makes it experience mild mode collapse effect. Thus, its output predictions could offer an effective supervision over the generator ’s output distribution for decelerating mode collapse. Moreover, updating with the mixture distribution, compared to only using the real data distribution, would offer a target distribution that is smoothly changing towards the generator’s update, which turns out to be more beneficial. Formally, the cooperative training loss for the generator model is proposed as follows,
where is the -th token from the sequence x. Thus, the KL-loss distills the output distribution given by the language model to the generator [12, 28, 36]. When considering the mode collapse, we would only be interested in preserving the distribution for the real data from , rather than those from . Therefore, when optimizing (3), we only adopt samples from the real data distribution to compute the KL-loss. With the above cooperative training loss, the gradient for updating the generator’s parameters is derived as follows,
As such, the effect of applying cooperative training on the generator is equivalent to increasing the density of the real data in a weighted manner.
4.2 Meta Cooperative Optimization
In this section, we introduce a meta learning paradigm to interleave the optimization of the adversarial training loss and the cooperative training loss for the generator model parameters. Unlike the conventional meta-learning approaches that work on achieving faster learning , task generalization  or deriving adaptive models , our intuition is to preserve the generative distribution for the adversarial text generator model to decelerate its mode collapse.
To this end, optimizing the adversarial loss is modelled as a base task, and optimizing the cooperative training loss is modeled as the meta task. With such setting, the meta optimization scheme ensures that after optimizing the generator parameters with the adversarial training loss for increasing generation quality, the resultant parameters would demonstrate considerable resistance towards mode collapse, i.e., increasing generation quality while preserving considerable generation diversity.
Formally, we first perform one gradient update on the generator parameters by optimizing the base task loss:
Then, we obtain new samples from the real data distribution: and inference the meta-loss for the real samples on the updated parameters . The meta gradient is weighted by and added to the base task gradient to update the parameters . Finally, the adversarial update under our proposed meta cooperative training paradigm could be formulated as below:
The full algorithm for meta cooperative training is presented in Algorithm 1.
We denote our proposed meta cooperative training generative adversarial networks as Meta-CoTGAN. In the experiment section, first, we compare our proposed algorithm with its closest cooperative training counterpart, CoT  on the synthetic dataset. Then we show the comparison result between our method and several RL-based and RL-free approaches on two commonly used real-world text generation datasets: COCO Image Captions  and EMNLP 2017 WMT News 111http://statmt.org/wmt17/translation-task.html.
We implement our proposed algorithm on top of RelGAN , an RL-free adversarial text generation model that is among the state-of-the-art approaches. Specifically, RelGAN adopts a relational memory to model the long-distance dependencies among the input tokens, and a gumbel-softmax relaxation to overcome the non-differentiable issue in the generator training. The relational memory adopts 1 memory slot, multi-head attention with 2 heads, and the attention key size is set to be 512. The language model for cooperative training adopts the identical network architecture as the generator, and the weights for the generator’s parameters are assigned to the language model after pretraining. The discriminator adopts multiple representations with size to be 64. We adopt Adam  as the optimization algorithm for updating all the model parameters. The source code of our framework is based on PaddlePaddle222https://www.paddlepaddle.org.cn/ platform.
For comparison, we evaluate the models in terms of sample quality and sample diversity simultaneously. Following most of today’s text generation works (e.g., [37, 23]), the sample quality is evaluated by the BLEU score metrics when testified on real datasets, and loss when testified on the synthetic dataset. The loss is defined as the negative log likelihood derived from the target LSTM model for the data generated by . The sample diversity is evaluated in terms of loss, which is in the following form:
where the density of the real data is evaluated on the generator model. Thus, models with better sample diversity would have a broader coverage over the real data space and result in lower loss. Models that suffer from severe mode collapse would no longer represent the real data well and result in higher loss.
To evaluate the efficiency of our proposed approach, we consider MLE as well as the RL-based baselines, including SeqGAN , RankGAN  and LeakGAN . Also, we compare with the most related RL-free baseline RelGAN . During evaluation, we follow the temperature settings proposed in RelGAN and present the results for our method when evaluated with temperature values of and , respectively.
5.1 Synthetic Dataset
Our first evaluation domain is the synthetic oracle dataset, which is first proposed in . The experiment engages a randomly initialized LSTM model as the target model to simulate real-world sequences and generate data from real data distribution. The synthetic experiments are conducted with the sequence length set to be 20. The objective for experimenting in this domain is to compare our proposed method with its closest cooperative training counterpart CoT. While these two models adopt same way to train the language model, we investigate on the efficiency of adopting the respective cooperative training losses on the generator model as proposed in these two methods.
We demonstrate the learning curves for loss in Figure 2. Note that CoT takes no pretraining stage and its loss progressively decreases. Our method takes a pretraining stage and the loss decreases in both the pretraining stage and the adversarial training stage. We could notice that upon convergence, the loss for our method is significantly lower than CoT. This demonstrates that the cooperative training mechanism proposed by CoT is not comparable to our method in terms of sample quality. We also present the evaluation scores for and in Table 1. When comparing , our method could achieve much lower loss scale than CoT. This demonstrates that our proposed algorithm convey greater efficiency in preserving the sample diversity. Overall, considering the inferior performance and long training time of this model, we do not consider it further in the following real-world dataset experiments.
|RelGAN (100)||0.849 0.030||0.687 0.047||0.502 0.048||0.331 0.044||0.756 0.054|
|RelGAN (1000)||0.814 0.012||0.634 0.020||0.455 0.023||0.303 0.020||0.655 0.048|
|Meta-CoTGAN (100)||0.858 0.003||0.692 0.005||0.518 0.007||0.363 0.009||0.578 0.036|
|Meta-CoTGAN (1000)||0.842 0.011||0.675 0.019||0.502 0.026||0.349 0.024||0.583 0.028|
|RelGAN (100)||0.881 0.013||0.705 0.019||0.501 0.023||0.319 0.018||2.482 0.031|
|RelGAN (1000)||0.837 0.012||0.654 0.010||0.435 0.011||0.265 0.011||2.285 0.025|
|Meta-CoTGAN (100)||0.882 0.014||0.734 0.017||0.542 0.016||0.358 0.015||2.299 0.011|
|Meta-CoTGAN (1000)||0.868 0.015||0.703 0.014||0.500 0.016||0.318 0.016||2.205 0.053|
5.2 COCO Image Captions Dataset
Our second evaluation domain is the COCO Image Captions dataset. We follow the pre-processing method proposed in . The training and testing set consist of 10, 000 sentences respectively. The sentences in COCO have minimum length of 7 and maximum length of 37. The vocabulary size is 4,682.
We present the scores of BLEU-2 to BLEU-5 for measuring sample quality, and the score for measuring sample diversity in Table 2. Overall, our method demonstrates significant advantage over all the sample quality/diversity metrics. Notably, our method leads to loss significantly lower than the other baseline approaches. This indicates that our method could provide an efficient control over the mode collapse for the adversarial training and eventually leads to superior sample diversity. While decelerating the mode collapse, the cooperative training could result in model with better sample quality as well.
|RelGAN (100)||0.849 0.030||0.687 0.047||0.502 0.048||0.331 0.044||0.756 0.054|
|Meta-CoTGAN (100)||0.858 0.003||0.692 0.005||0.518 0.007||0.363 0.009||0.578 0.036|
|Meta-CoTGAN (100)||0.824 0.011||0.647 0.022||0.466 0.028||0.315 0.022||0.580 0.031|
|Meta-CoTGAN (100)||0.835 0.013||0.661 0.016||0.487 0.016||0.338 0.014||0.587 0.019|
|RelGAN (1000)||0.814 0.023||0.634 0.020||0.455 0.023||0.303 0.020||0.655 0.048|
|Meta-CoTGAN (1000)||0.842 0.011||0.675 0.019||0.502 0.026||0.349 0.024||0.583 0.028|
|Meta-CoTGAN (1000)||0.824 0.007||0.643 0.009||0.497 0.013||0.324 0.015||0.582 0.017|
|Meta-CoTGAN (1000)||0.817 0.021||0.638 0.027||0.465 0.025||0.319 0.018||0.589 0.022|
To further validate this, we present the learning curves for the sample diversity metric and BLEU-5 as a representative sample quality metric in Figure 3. We could observe that the for RelGAN would fast go up, which is a sign of mode collapse. However, that for MetaCoTGAN progresses rather slowly. It shows that our proposed method could efficiently decelerate mode collapse and control the loss from explode. When investigating on the sample quality metric, we could observe the BLEU-5 score for RelGAN would go up faster than MetaCoTGAN. But eventually, our model could achieve a significantly higher standard than RelGAN. Also, we observe that when
for RelGAN explode (e.g., after 400 epochs), the repeat rate is rather high and therefore the generator just becomes useless. However, our method could preserve much better diversity. Also, we observe from the generated real sentences that our model could generate quite long sentences, while most of the GAN models that fall short.
5.3 EMNLP2017 WMT News Dataset
Our third evaluation domain is the EMNLP2017 WMT News dataset. The size of this dataset is much larger than Image COCO, involving a training set of 270,000 sentences. The testing set consists of 10,000 sentences. The sentences have maximum length of 51. The vocabulary size is 5,255.
The results for EMNLP dataset are presented in Table 3. We can see that our proposed method consistently outperforms all baselines in terms of all the BLEU metrics and . Under the temperature setting of 100, our method outperforms the strong RelGAN baseline by on BLEU-4/BLEU-5. Noticeably, the best BLEU scores for our method are obtained when the loss is at a significantly lower level than RelGAN. This indicates that by conducting cooperative training, we could derive generator model with better sample quality and sample diversity
simultaneously. Moreover, it shows that our method could robustly perform well in rather challenging and diverse real-world datasets like EMNLP. Meanwhile, the performance of our method is quite robust, consistently outperforming RelGAN under both temperature settings, over all the evaluation metrics. By investigating through the generated real samples, we observe that the generated sentences convey rather diverse semantics and the output consists of considerably long sentences, unlike the conventional adversarial text generators that would shortly fall to the phase of generating short and repeated sentences.
5.4 Ablation Study
5.4.1 Impact of Cooperative Training Language Model
We demonstrate the impact of using an online updated language model to conduct our proposed cooperative training process. To this end, a direct comparison is to use a pretrained language model not updated with cooperative training. We denote such baseline as Meta-CoTGAN. We demonstrate the result on COCO Image Captions dataset in Table 4. We could observe that when online update to the language model is turned off, the model still preserve comparable sample diversity in terms of , since the cooperative training loss is still employed on the real data. However, under both temperature setting, the sample quality metrics could not perform as well as the full set of the proposed method. This shows that it is beneficial to update the language model jointly with the generator to let it offer a smoothly chanting target distribution to the generator.
5.4.2 Impact of Meta Optimization
We also evaluate the impact of the meta optimization setup. To this end, we compare our approach with a principled way of engaging the cooperative training loss for optimizing the generator parameters, which is proposed in the form of linearly summing up the adversarial loss and the cooperative training loss in a weighted manner, i.e., . We denote such baseline as Meta-CoTGAN. The results are shown in Table 4. Overall, Meta-CoTGAN obtain comparable scores for . However, its performance in terms of the sample quality metrics is still much inferior than using full set of solution. Thus, we could conclude that meta optimization is an important ingredient for balancing the quality-diversity trade-off. Intuitively, our proposed meta optimization set-up offers an efficient way to ensure the generator parameters after the adversarial update would decelerate from mode collapse, which is critical to derive the superior performance.
6 Conclusion and Discussion
We propose a meta cooperative training approach to facilitate the training of adversarial text generation models. Our method utilizes a cooperatively trained language model to effectively decelerate the mode collapse of adversarial training via distilling the prediction output distribution of the language model over the real data to the adversarial generator model. We evaluate our proposed method in both synthetic dataset and two real-world datasets with sequence length at a range from 7 to 51. As a result, our proposed method could consistently outperform the baseline algorithms on sample quality metrics and sample diversity metric simultaneously. Our proposed approach is general and could be promising to work with distinct RL-based or RL-free adversarial text generation algorithms as long as they face the issue of mode collapse. Our future work would be to apply meta cooperative training on more emerging RL-based/free GAN models.
-  (2018) Continuous adaptation via meta-learning in nonstationary and competitive environments. In ICLR, Cited by: §4.2.
-  (2017) Wasserstein generative adversarial networks. In ICML, pp. 214–223. Cited by: §1, §1.
-  (2015) Neural machine translation by jointly learning to align and translate. In ICLR, Cited by: §1.
-  (2018) Language gans falling short. arXiv preprint arXiv:1811.02549. Cited by: §1, §2, §5.2.
-  (2017) Maximum-likelihood augmented discrete generative adversarial networks. arXiv preprint arXiv:1702.07983. Cited by: §1.
-  (2018) Adversarial text generation via feature-mover’s distance. In NeurIPS, pp. 4666–4677. Cited by: §1, §2.
-  (2015) Microsoft coco captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325. Cited by: §5.
-  (2018) MaskGAN: better text generation via filling in the_. In ICLR, Cited by: §2.
-  (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pp. 1126–1135. Cited by: §4.2.
-  (2014) Generative adversarial nets. In NIPS, pp. 2672–2680. Cited by: §1.
-  (2018) Long text generation via adversarial training with leaked information. In AAAI, pp. 5141–5148. Cited by: §2, §5.
Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §4.1.
-  (2017) Categorical reparameterization with gumbel-softmax. In ICLR, Cited by: §3.
-  (2015) Adam: a method for stochastic optimization. In ICLR, Cited by: §5.
-  (2016) Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv preprint arXiv:1611.04051. Cited by: §2.
-  (2016) Professor forcing: a new algorithm for training recurrent networks. In NIPS, pp. 4601–4609. Cited by: §1.
-  (2018) Learning to generalize: meta-learning for domain generalization. In AAAI, pp. 3490–3497. Cited by: §4.2.
-  (2016) Deep reinforcement learning for dialogue generation. In EMNLP, pp. 1192–1202. Cited by: §1.
-  (2017) Deep recurrent generative decoder for abstractive text summarization. In EMNLP, pp. 2091–2100. Cited by: §1.
-  (2019) Multi-agent discussion mechanism for natural language generation. In AAAI, pp. 6096–6103. Cited by: §1.
-  (2017) Adversarial ranking for language generation. In NIPS, pp. 3155–3165. Cited by: §2, §5.
Query-oriented multi-document summarization via unsupervised deep learning. In AAAI, Cited by: §1.
-  (2019) CoT: cooperative training for generative modeling of discrete data. In ICML, pp. 4164–4172. Cited by: §2, §5, §5.
-  (2018) Neural text generation: past, present and beyond. arXiv preprint arXiv:1803.07133. Cited by: §2.
-  (2019) RelGAN: relational generative adversarial networks for text generation. In ICLR, Cited by: §2, §5, §5.
-  (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: §1.
-  (2017) Adversarial generation of natural language. arXiv preprint arXiv:1705.10929. Cited by: §1.
-  (2016) Policy distillation. In ICLR, Cited by: §4.1.
-  (2018) Logician and orator: learning from the duality between language and knowledge in open domain. In EMNLP, pp. 2119–2130. Cited by: §1.
-  (2014) Sequence to sequence learning with neural networks. In NIPS, pp. 3104–3112. Cited by: §1.
-  (2000) Policy gradient methods for reinforcement learning with function approximation. In NIPS, pp. 1057–1063. Cited by: §2.
-  (2016) Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022. Cited by: §1.
-  (2015) Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In EMNLP, pp. 1711–1721. Cited by: §1.
-  (1992) Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning 8 (3-4), pp. 229–256. Cited by: §3.
-  (2018) DP-gan: diversity-promoting generative adversarial network for generating informative and diversified text. arXiv preprint arXiv:1802.01345. Cited by: §2.
-  (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. In AAAI, pp. 1640–1646. Cited by: §4.1.
-  (2017) Seqgan: sequence generative adversarial nets with policy gradient. In AAAI, pp. 2852–2858. Cited by: §2, §5, §5, §5.1.
-  (2017) Adversarial feature matching for text generation. In ICML, pp. 4006–4015. Cited by: §2, §3.
-  (2018) Texygen: a benchmarking platform for text generation models. In SIGIR, pp. 1097–1100. Cited by: §5.2.