1 Introduction and related work
Language models can be optimized to recognize syntax and semantics with great accuracy Radford et al. (2019). However, the output generated can be repetitive and generic leading to monotonous or uninteresting responses (e.g “I don’t know”) regardless of the input Li et al. (2015). While application of attention Bahdanau et al. (2014); Vaswani et al. (2017) and advanced decoding mechanisms like beam search and variation sampling Holtzman et al. (2019) have shown improvements, it does not solve the underlying problem. In creative text generation, the objective is not strongly bound to the ground truth—instead the objective is to generate diverse, unique or original samples. We attempt to do this through a discriminator which can give feedback to the generative model through a cost function that encourages sampling of creative tokens. The contributions of this paper are in the usage of a GAN framework to generate creative pieces of writing. Our experiments suggest that generative text models, while very good at encapsulating semantic, syntactic and domain information, perform better with external feedback from a discriminator for fine-tuning objectiveless decoding tasks like that of creative text. We show this by evaluating our model on three very different creative datasets containing poetry, metaphors and lyrics.
Previous work on handling the shortcomings of MLE include length-normalizing sentence probabilityWu et al. (2016)
, future cost estimationSchmaltz et al. (2016), diversity-boosting objective function Shao et al. (2017); Li et al. (2015) or penalizing repeating tokens Paulus et al. (2017). When it comes to poetry generation using generative text models, Zhang and Lapata Zhang and Lapata (2014), Yi et al. Yi et al. (2016) and Wang et al. Wang et al. (2016) use language modeling to generate Chinese poems. However, none of these methods provide feedback on the quality of the generated sample and hence, do not address the qualitative objective required for creative decoding. For the task of text generation, MaskGAN Fedus et al. (2018)
uses a Reinforcement Learning signal from the discriminator, FMD-GANChen et al. (2018) uses an optimal transport mechanism as an objective function. GumbelGAN Jang et al. (2016) uses Gumbel-Softmax distribution that replaces the non-differentiable sample from a categorical distribution with a differentiable sample to propagate stronger gradients. Li et al. Li et al. (2015) use a discriminator for a diversity promoting objective. Yu et al. Yu et al. (2017) use SeqGAN to generate poetry and comment on the performance of SeqGAN over MLE in human evaluations, encouraging our study of GANs for creative text generation. However, these studies do not focus solely on creative text.
2 GANs for creative text generation
Using GANs, we can train generative models in a two-player game setting between a discriminator and a generator, where the discriminator (a binary classifier) learns to distinguish between real and fake data samples and the generator tries to fool the discriminator by generating authentic and high quality outputGoodfellow et al. (2014). GANs have shown to be successful in image generation tasks Denton et al. (2015) and recently, some progress has been observed in text generation Chen et al. (2018); Fedus et al. (2018); Yu et al. (2017)
. Our generator is a language model trained using backpropagation through timeMozer (1995). During the pre-training phase we optimize for MLE and during the GAN training phase, we optimize on the creativity reward from the discriminator. The discriminator’s encoder has the same architecture as the generator encoder module with the addition of a pooled decoder layer. The decoder contains blocks and an addtional
layer. The discriminator decoder takes the hidden state at the last time step of a sequence concatenated with both the max-pooled and mean-pooled representation of the hidden statesHoward and Ruder (2018) and outputs a number in the range . The difficulty of using GANs in text generation comes from the discrete nature of text, making the model non-differentiable hence, we update parameters for the generator model with policy gradients as described in Yu Yu et al. (2017).
based language models. For model hyperparameters please to refer to Supplementary Section Table2. We use Adam optimizer Kingma and Ba (2014) with and similar to Howard and Ruder (2018) and use a batch size of 50. Other practices for LM training were the same as Dai et al. (2019) and Merity et al. (2017) for Transformer-XL and AWD-LSTM respectively. We refer to our proposed GAN as Creative-GAN and compare it to a baseline (a language model equivalent to our pre-trained generator) and a GumbelGAN model Jang et al. (2016) across all proposed datasets. We use three creative English datasets with distinct linguistic characteristics: (1) A corpus of classical and contemporary English poems, (2) a corpus of metaphor sentences retrieved from a metaphor database website 111http://metaphors.iath.virginia.edu/ and (3) a corpus of song lyrics ranging across genres. The mix of linguistic styles within this corpus offers the potential for interesting variation during the generation phase. We use the same pre-processing as in earlier work Howard and Ruder (2018); Howard and others (2018). We reserve 10% of our data for test set and another 10% for our validation set.
We first pre-train our generator on the Gutenberg dataset Lahiri (2014) for epochs and then fine-tune Howard and Ruder (2018) them to our target datasets with a language modeling objective. The discriminator’s encoder is initialized to the same weights as our fine-tuned language model. Once we have our fine-tuned encoders for each target dataset, we train in an adversarial manner. The discriminator objective here is to score the quality of the creative text. The discriminator is trained for iterations for every iteration of the generator, a practice seen in previous work Arjovsky et al. (2017). Creative-GAN relies on using the reward from the discriminator Fedus et al. (2018); Yu et al. (2017) for backpropagation. We follow a similar training procedure for GumbelGAN. Outputs are generated through sampling over a multinomial distribution for all methods, instead of on the log-likelihood probabilities, as sampling has shown to produce better output quality Holtzman et al. (2019). Please refer to Supplementary Section Table 3 for training parameters of each dataset and Table 2 for hyperparameters of each encoder. We pick these values after experimentation with our validation set. Training and output generation code can be found online222https://github.com/Machine-Learning-Tokyo/Poetry-GAN.
3 Evaluation and conclusion
Evaluating creative generation tasks is both critical and complex Potash et al. (2018). Along the lines of previous research on evaluating text generation tasks Potash et al. (2018), we report the perplexity scores of our test set on the evaluated models in the Supplementary Section, Table 1 Our model shows improvements over baseline and GumbelGAN. Common computational methods like BLEU Papineni et al. (2002)
and perplexity are at best a heuristic and not strong indicators of good performance in text generation modelsTheis et al. (2016). Particularly, since these scores use target sequences as a reference, it has the same pitfalls as relying on MLE. The advantages in this approach lie in the discriminator’s ability to influence the generator to explore other possibilities. Sample outputs for our model can be found on our website 333https://www.ai-fragments.com.
-  (2017-01) Wasserstein GAN. arXiv e-prints, pp. arXiv:1701.07875. External Links: Cited by: §2.
-  (2014) Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, External Links: Cited by: §1.
-  (2018) Adversarial Text Generation via Feature-Mover’s Distance. In 32nd Conference on Neural Information Processing Systems, External Links: Cited by: §1, §2.
-  (2019) Transformer-xl: attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860. Cited by: §2.
-  (2015) Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks. In Neural Information Processing Systems, pp. 1486–1494. External Links: Cited by: §2.
-  (2018) MaskGAN: Better Text Generation via Filling in the______. In International Conference on Learning Representations, External Links: Cited by: §1, §2, §2.
-  (2014) Generative Adversarial Nets. In Advances in neural information processing systems, pp. 2672–2680. External Links: Cited by: §2.
-  (2019) The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751. Cited by: §1, §2.
-  (2018) Fastai. GitHub. Note: https://github.com/fastai/fastai Cited by: §2.
-  (2018) Fine-tuned language models for text classification. CoRR abs/1801.06146. External Links: Cited by: §2, §2, §2.
-  (2016) Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144. Cited by: §1, §2.
-  (2014) Adam: amethod for stochastic optimization. In Proc. 3rd Int. Conf. Learn. Representations, Cited by: §2.
-  (2014-04) Complexity of Word Collocation Networks: A Preliminary Structural Analysis. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics, Gothenburg, Sweden, pp. 96–105. External Links: Cited by: §2.
-  (2015) A diversity-promoting objective function for neural conversation models. arXiv preprint arXiv:1510.03055. Cited by: §1, §1.
-  (2017) Regularizing and optimizing lstm language models. arXiv preprint arXiv:1708.02182. Cited by: §2.
-  (1995) A focused backpropagation algorithm for temporal. Backpropagation: Theory, architectures, and applications 137. Cited by: §2, Table 2.
-  (2002) BLEU : a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311–318. External Links: Cited by: §3.
-  (2017) A deep reinforced model for abstractive summarization. CoRR abs/1705.04304. External Links: Cited by: §1.
-  (2018) Evaluating creative language generation: the case of rap lyric ghostwriting. In Proceedings of the Second Workshop on Stylistic Variation, pp. 29–38. Cited by: §3.
-  (2019) Language models are unsupervised multitask learners. OpenAI Blog 1 (8). Cited by: §1.
Word Ordering Without Syntax.
Empirical Methods in Natural Language Processing, pp. 2319–2324. External Links: Cited by: §1.
-  (2017) Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models. In Empirical Methods in Natural Language Processing, pp. 2210–2219. External Links: Cited by: §1.
-  (2016) A note on the evaluation of generative models. In International Conference on Learning Representations, External Links: Cited by: §3.
-  (2017) Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008. Cited by: §1.
-  (2016) Chinese song iambics generation with neural attention-based model. CoRR abs/1604.06274. External Links: Cited by: §1.
-  (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. CoRR abs/1609.08144. External Links: Cited by: §1.
-  (2016) Generating chinese classical poems with RNN encoder-decoder. CoRR abs/1604.01537. External Links: Cited by: §1.
SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient.
Association for the Advancement of Artificial Intelligence, pp. 2852–2858. External Links: Cited by: §1, §2, §2.
Chinese Poetry Generation with Recurrent Neural Networks. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 670–680. External Links: Cited by: §1.
4 Supplementary Material
In this section, we report our results on computational metrics, hyperparameters and training configurations for our models. Table 1 shows the results of the perplexity score evaluation of the evaluated models, Table 2 shows hyperparameters for each encoding method and Table 3 shows our training parameters. In Table 3, the values for Gutenberg dataset in columns, GumbelGAN and Creative-GAN are empty as we only pretrain our LMs with the Gutenberg dataset
|Model||W. Emb. Size||Layers||Hidden||Backprop though time |