Creative GANs for generating poems, lyrics, and metaphors

Generative models for text have substantially contributed to tasks like machine translation and language modeling, using maximum likelihood optimization (MLE). However, for creative text generation, where multiple outputs are possible and originality and uniqueness are encouraged, MLE falls short. Methods optimized for MLE lead to outputs that can be generic, repetitive and incoherent. In this work, we use a Generative Adversarial Network framework to alleviate this problem. We evaluate our framework on poetry, lyrics and metaphor datasets, each with widely different characteristics, and report better performance of our objective function over other generative models.


page 1

page 2

page 3

page 4


On the Performance of Generative Adversarial Network (GAN) Variants: A Clinical Data Study

Generative Adversarial Network (GAN) is a useful type of Neural Networks...

TextKD-GAN: Text Generation using KnowledgeDistillation and Generative Adversarial Networks

Text generation is of particular interest in many NLP applications such ...

Improving Maximum Likelihood Training for Text Generation with Density Ratio Estimation

Auto-regressive sequence generative models trained by Maximum Likelihood...

Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

Autoregressive generative models are commonly used, especially for those...

Generating Text through Adversarial Training using Skip-Thought Vectors

In the past few years, various advancements have been made in generative...

How (not) to Train your Generative Model: Scheduled Sampling, Likelihood, Adversary?

Modern applications and progress in deep learning research have created ...

Design Guidelines for Prompt Engineering Text-to-Image Generative Models

Text-to-image generative models are a new and powerful way to generate v...

1 Introduction and related work

Language models can be optimized to recognize syntax and semantics with great accuracy Radford et al. (2019). However, the output generated can be repetitive and generic leading to monotonous or uninteresting responses (e.g “I don’t know”) regardless of the input Li et al. (2015). While application of attention Bahdanau et al. (2014); Vaswani et al. (2017) and advanced decoding mechanisms like beam search and variation sampling Holtzman et al. (2019) have shown improvements, it does not solve the underlying problem. In creative text generation, the objective is not strongly bound to the ground truth—instead the objective is to generate diverse, unique or original samples. We attempt to do this through a discriminator which can give feedback to the generative model through a cost function that encourages sampling of creative tokens. The contributions of this paper are in the usage of a GAN framework to generate creative pieces of writing. Our experiments suggest that generative text models, while very good at encapsulating semantic, syntactic and domain information, perform better with external feedback from a discriminator for fine-tuning objectiveless decoding tasks like that of creative text. We show this by evaluating our model on three very different creative datasets containing poetry, metaphors and lyrics.

Previous work on handling the shortcomings of MLE include length-normalizing sentence probability 

Wu et al. (2016)

, future cost estimation 

Schmaltz et al. (2016), diversity-boosting objective function Shao et al. (2017); Li et al. (2015) or penalizing repeating tokens Paulus et al. (2017). When it comes to poetry generation using generative text models, Zhang and Lapata Zhang and Lapata (2014), Yi et al. Yi et al. (2016) and Wang et al. Wang et al. (2016) use language modeling to generate Chinese poems. However, none of these methods provide feedback on the quality of the generated sample and hence, do not address the qualitative objective required for creative decoding. For the task of text generation, MaskGAN Fedus et al. (2018)

uses a Reinforcement Learning signal from the discriminator, FMD-GAN 

Chen et al. (2018) uses an optimal transport mechanism as an objective function. GumbelGAN Jang et al. (2016) uses Gumbel-Softmax distribution that replaces the non-differentiable sample from a categorical distribution with a differentiable sample to propagate stronger gradients. Li et al. Li et al. (2015) use a discriminator for a diversity promoting objective. Yu et al. Yu et al. (2017) use SeqGAN to generate poetry and comment on the performance of SeqGAN over MLE in human evaluations, encouraging our study of GANs for creative text generation. However, these studies do not focus solely on creative text.

2 GANs for creative text generation

Using GANs, we can train generative models in a two-player game setting between a discriminator and a generator, where the discriminator (a binary classifier) learns to distinguish between real and fake data samples and the generator tries to fool the discriminator by generating authentic and high quality output 

Goodfellow et al. (2014). GANs have shown to be successful in image generation tasks Denton et al. (2015) and recently, some progress has been observed in text generation Chen et al. (2018); Fedus et al. (2018); Yu et al. (2017)

. Our generator is a language model trained using backpropagation through time 

Mozer (1995). During the pre-training phase we optimize for MLE and during the GAN training phase, we optimize on the creativity reward from the discriminator. The discriminator’s encoder has the same architecture as the generator encoder module with the addition of a pooled decoder layer. The decoder contains blocks and an addtional

layer. The discriminator decoder takes the hidden state at the last time step of a sequence concatenated with both the max-pooled and mean-pooled representation of the hidden states 

Howard and Ruder (2018) and outputs a number in the range . The difficulty of using GANs in text generation comes from the discrete nature of text, making the model non-differentiable hence, we update parameters for the generator model with policy gradients as described in Yu Yu et al. (2017).

We utilize AWD-LSTM Merity et al. (2017) and TransformerXL Dai et al. (2019)

based language models. For model hyperparameters please to refer to Supplementary Section Table 

2. We use Adam optimizer Kingma and Ba (2014) with and similar to  Howard and Ruder (2018) and use a batch size of 50. Other practices for LM training were the same as Dai et al. (2019) and Merity et al. (2017) for Transformer-XL and AWD-LSTM respectively. We refer to our proposed GAN as Creative-GAN and compare it to a baseline (a language model equivalent to our pre-trained generator) and a GumbelGAN model Jang et al. (2016) across all proposed datasets. We use three creative English datasets with distinct linguistic characteristics: (1) A corpus of classical and contemporary English poems, (2) a corpus of metaphor sentences retrieved from a metaphor database website 111 and (3) a corpus of song lyrics ranging across genres. The mix of linguistic styles within this corpus offers the potential for interesting variation during the generation phase. We use the same pre-processing as in earlier work Howard and Ruder (2018); Howard and others (2018). We reserve 10% of our data for test set and another 10% for our validation set.

We first pre-train our generator on the Gutenberg dataset Lahiri (2014) for epochs and then fine-tune Howard and Ruder (2018) them to our target datasets with a language modeling objective. The discriminator’s encoder is initialized to the same weights as our fine-tuned language model. Once we have our fine-tuned encoders for each target dataset, we train in an adversarial manner. The discriminator objective here is to score the quality of the creative text. The discriminator is trained for iterations for every iteration of the generator, a practice seen in previous work Arjovsky et al. (2017). Creative-GAN relies on using the reward from the discriminator Fedus et al. (2018); Yu et al. (2017) for backpropagation. We follow a similar training procedure for GumbelGAN. Outputs are generated through sampling over a multinomial distribution for all methods, instead of on the log-likelihood probabilities, as sampling has shown to produce better output quality Holtzman et al. (2019). Please refer to Supplementary Section Table 3 for training parameters of each dataset and Table  2 for hyperparameters of each encoder. We pick these values after experimentation with our validation set. Training and output generation code can be found online222

3 Evaluation and conclusion

Evaluating creative generation tasks is both critical and complex Potash et al. (2018). Along the lines of previous research on evaluating text generation tasks Potash et al. (2018), we report the perplexity scores of our test set on the evaluated models in the Supplementary Section, Table 1 Our model shows improvements over baseline and GumbelGAN. Common computational methods like BLEU Papineni et al. (2002)

and perplexity are at best a heuristic and not strong indicators of good performance in text generation models 

Theis et al. (2016). Particularly, since these scores use target sequences as a reference, it has the same pitfalls as relying on MLE. The advantages in this approach lie in the discriminator’s ability to influence the generator to explore other possibilities. Sample outputs for our model can be found on our website 333


  • [1] M. Arjovsky, S. Chintala, and L. Bottou (2017-01) Wasserstein GAN. arXiv e-prints, pp. arXiv:1701.07875. External Links: 1701.07875 Cited by: §2.
  • [2] D. Bahdanau, K. Cho, and Y. Bengio (2014) Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, External Links: Document, 1409.0473, Link Cited by: §1.
  • [3] L. Chen, S. Dai, C. Tao, D. Shen, Z. Gan, H. Zhang, Y. Zhang, and L. Carin (2018) Adversarial Text Generation via Feature-Mover’s Distance. In 32nd Conference on Neural Information Processing Systems, External Links: 1809.06297, Link Cited by: §1, §2.
  • [4] Z. Dai, Z. Yang, Y. Yang, W. W. Cohen, J. Carbonell, Q. V. Le, and R. Salakhutdinov (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860. Cited by: §2.
  • [5] E. Denton, S. Chintala, A. Szlam, and R. Fergus (2015) Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In Neural Information Processing Systems, pp. 1486–1494. External Links: Document, 1506.05751, ISBN 1505.05770, ISSN 10495258, Link Cited by: §2.
  • [6] W. Fedus, I. Goodfellow, and A. M. Dai (2018) MaskGAN: Better Text Generation via Filling in the______. In International Conference on Learning Representations, External Links: 1801.07736, ISBN 9085591708, Link Cited by: §1, §2, §2.
  • [7] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative Adversarial Nets. In Advances in neural information processing systems, pp. 2672–2680. External Links: 1406.2661v1, Link Cited by: §2.
  • [8] A. Holtzman, J. Buys, M. Forbes, and Y. Choi (2019) The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751. Cited by: §1, §2.
  • [9] J. Howard et al. (2018) Fastai. GitHub. Note: Cited by: §2.
  • [10] J. Howard and S. Ruder (2018) Fine-tuned language models for text classification. CoRR abs/1801.06146. External Links: Link, 1801.06146 Cited by: §2, §2, §2.
  • [11] E. Jang, S. Gu, and B. Poole (2016) Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144. Cited by: §1, §2.
  • [12] D. P. Kingma and J. L. Ba (2014) Adam: amethod for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations, Cited by: §2.
  • [13] S. Lahiri (2014-04) Complexity of Word Collocation Networks: A Preliminary Structural Analysis. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 96–105. External Links: Link Cited by: §2.
  • [14] J. Li, M. Galley, C. Brockett, J. Gao, and B. Dolan (2015) A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055. Cited by: §1, §1.
  • [15] S. Merity, N. S. Keskar, and R. Socher (2017) Regularizing and optimizing lstm language models. arXiv preprint arXiv:1708.02182. Cited by: §2.
  • [16] M. C. Mozer (1995) A focused backpropagation algorithm for temporal. Backpropagation: Theory, architectures, and applications 137. Cited by: §2, Table 2.
  • [17] K. Papineni, S. Roukos, T. Ward, and W. Zhu (2002) BLEU : a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. External Links: Link Cited by: §3.
  • [18] R. Paulus, C. Xiong, and R. Socher (2017) A deep reinforced model for abstractive summarization. CoRR abs/1705.04304. External Links: Link, 1705.04304 Cited by: §1.
  • [19] P. Potash, A. Romanov, and A. Rumshisky (2018) Evaluating creative language generation: the case of rap lyric ghostwriting. In Proceedings of the Second Workshop on Stylistic Variation, pp. 29–38. Cited by: §3.
  • [20] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever (2019) Language models are unsupervised multitask learners. OpenAI Blog 1 (8). Cited by: §1.
  • [21] A. Schmaltz, A. M. Rush, and S. M. Shieber (2016) Word Ordering Without Syntax. In

    Empirical Methods in Natural Language Processing

    pp. 2319–2324. External Links: Document, 1604.08633, ISBN 10.1890/07-1864.1, ISSN 0012-9658, Link Cited by: §1.
  • [22] L. Shao, S. Gouws, D. Britz, A. Goldie, B. Strope, and R. Kurzweil (2017) Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models. In Empirical Methods in Natural Language Processing, pp. 2210–2219. External Links: Document, 1701.03185, ISBN 0000344001, ISSN 0148396X, Link Cited by: §1.
  • [23] L. Theis, A. van den Oord, and M. Bethge (2016) A note on the evaluation of generative models. In International Conference on Learning Representations, External Links: arXiv:1511.01844v3, Link Cited by: §3.
  • [24] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §1.
  • [25] Q. Wang, T. Luo, D. Wang, and C. Xing (2016) Chinese song iambics generation with neural attention-based model. CoRR abs/1604.06274. External Links: Link, 1604.06274 Cited by: §1.
  • [26] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, and J. Dean (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. External Links: Link, 1609.08144 Cited by: §1.
  • [27] X. Yi, R. Li, and M. Sun (2016) Generating chinese classical poems with RNN encoder-decoder. CoRR abs/1604.01537. External Links: Link, 1604.01537 Cited by: §1.
  • [28] L. Yu, W. Zhang, J. Wang, and Y. Yu (2017) SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient. In

    Association for the Advancement of Artificial Intelligence

    pp. 2852–2858. External Links: Document, 1609.05473, ISBN 1581138285, ISSN 21686106, Link Cited by: §1, §2, §2.
  • [29] X. Zhang and M. Lapata (2014)

    Chinese Poetry Generation with Recurrent Neural Networks

    In Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 670–680. External Links: Document, arXiv:1303.5778v1, ISBN 9781937284961, ISSN 1000-9825, Link Cited by: §1.

4 Supplementary Material

In this section, we report our results on computational metrics, hyperparameters and training configurations for our models. Table 1 shows the results of the perplexity score evaluation of the evaluated models, Table 2 shows hyperparameters for each encoding method and Table 3 shows our training parameters. In Table 3, the values for Gutenberg dataset in columns, GumbelGAN and Creative-GAN are empty as we only pretrain our LMs with the Gutenberg dataset

AWD-LSTM Transformer-XL
Poetry Metaphor Lyrics Poetry Metaphor Lyrics
LM 50.73 63.59 20.08 47.46 62.76 16.11
GumbelGAN 55.03 68.72 22.19 46.27 63.43 12.58
Creative-GAN 49.40 51.84 17.11 42.45 65.35 9.02
Table 1: Perplexity Scores, bold denotes best performance
Model W. Emb. Size Layers Hidden Backprop though time [16]
AWD-LSTM 400 3 1150 70
Transformer-XL 410 12 2100 150
Table 2: Encoder Hyperparameters
LM GumbelGAN Creative-GAN
Epochs LR Epochs LR Epochs LR
Poems 8 10 10
Metaphors 8 10 10
Lyrics 15 12 12
Gutenberg 20
Table 3: Training Parameters