Generating natural language sentences is a long-term vision and goal of natural language processing and has a broad range of real-life applicationsGatt and Krahmer (2018). The mainstream methods for natural language generation usually generate a sentence from scratch. Recently, Guu et al. (2018) present the pioneering work on a new paradigm to generate sentences. Specifically, they propose a new generative model of sentences that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Compared to traditional models that generate from scratch either left-to-right or by first sampling a latent sentence vector, the prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation.
|Source sentence: The noodles and pork belly was my favourite .|
|: The pork belly was my favourite .|
|: The pork was very good .|
|: The staff was very good .|
|: The staff is very friendly .|
|Target sentence: Love how friendly the staff is !|
We introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences. We show a concrete example of the text morphing task in Table 1 to elaborate on this task. Ideally, given a source sentence and a target sentence, our goal is to edit the source sentence step by step toward the target sentence where the source sentence is “The noodles and pork belly was my favourite .” and target sentence is “Love how friendly the staff is !”. At the first step, we remove the term “noodles” from the source sentence, as it does not appear in the target sentence. After the deletion operation, is more closed to the target sentence in terms of the lexical similarity. The generated sentence is treated as the input of the second step. After two more editing operations, the editing process is terminated as a generated sentence is closed enough to the target sentence. Furthermore, we can find that the editing path is smooth because every editing operation only modifies a small part of the input sentence.
To this end, we present an end-to-end neural networks model for generating morphing sentences between the source sentence and the target sentence. It consists of two parts, namely the editing vector generation networks and sentence editing networks. We design the editing vector generation networks to generate editing vectors with a recurrent neural networks model from the lexical gap between the source sentence and the target sentence. Then the sentence editing networks generate new sentences with the current editing vector and the sentence generated in the previous step. The two models are jointly trained and optimized.
Text morphing provides a new direction to generate sentences. Different from traditional sentence generation that generates sentences from scratch, and text editing that generates sentences from a prototype sentence Guu et al. (2018), text morphing generates sentences from two anchor sentences, namely the source sentence and the target sentence. We conduct experiments with 10 million text morphing sequences which are extracted from the Yelp data set Yelp (2017), that consists of 30 million review sentences on Yelp 111https://www.yelp.com. Experiment results show the effectiveness of the models. We also discuss directions and opportunities for future research of text morphing.
2 Related Work
Our work is related to text editing. Guu et al. (2018) propose a new generative model that first samples a prototype sentence from the training corpus and then edits it into a new sentence. Experiments on Yelp review corpus Yelp (2017) and the One Billion Word Language Model Benchmark Chelba et al. (2013) show that the prototype-then-edit model improves perplexity on language modeling and generates higher quality outputs according to human evaluation. Grangier and Auli (2018) propose a framework for computer-assisted text editing. It applies to translation post-editing and paraphrasing. A human editor modifies a sentence by marking tokens they would like the system to change, and the system then generates a new sentence which reformulates the initial sentence by avoiding marked words. They demonstrate the advantage of their approach to translation post-editing and paraphrasing. Zeldes (2018) describes how to use word embeddings trained with word2vec Mikolov et al. (2013) and A* search algorithm to morph between words.
This work is also related to the image morphing work which has been widely studied in the image processing community. Morphing between two images is a special effect in motion pictures and animations that changes (or morphs) one image or shape into another through a seamless transition. Most often it is used to depict one person turning into another through technological means or as part of a fantasy or surreal sequence. The readers are referred to Wolberg1998 for a survey on image morphing. Our work focuses on morphing between two sentences which is different from image morphing.
3 Problem Statement
Our goal is to learn a generative model for text morphing. In this section, we formulate this task mathematically, in which the input, output, and requirements of the task are defined clearly.
Suppose that we have a data set , where is a sentence sequence that represents an existing path of changing one sentence into another through a seamless transition. The sequence satisfies the conditions that , , and , where is an arbitrary text similarity metric. We wish the transition is smooth, so is introduced to control the degree of each sentence change. In addition, the change should move toward the target sentence , as well as away from source sentence , thus the last two conditions are added. Furthermore, is a readable sentence that does not suffer from grammatical error.
With such dataset , our goal is to learn a model that is capable of performing sentence morphing from an arbitrarily sentence to . During this process, we require the generated morphing path satisfies above conditions.
4 Morphing Networks
4.1 Model Overview
The target of text morphing is to generate the intermediate sentences that are fluency and smooth with the two input sentences. In practice, we design a morphing networks model that consists of editing vector generation and sentence editing. In particular, Figure 1 depicts the iterative process of our model:
1. Editing vector generation: as the lexical gap between start sentence and end sentence is huge, we should determine which words will be edited at the current step, and then encode the information of these words into an editing vector. Two factors play a role in the editing vector generation. One is the lexical differences between and , and the other is the editing vector of the last step.
2. Sentence editing: we further edit our source sentence with the editing vector and get a sentence . After that, the next iteration begins with as the source sentence and edit it into another new sentence again. Through -th editing, we obtain the fluent and smooth morphing path.
Given a source sentence and a target sentence , our model generates a sentence sequence after editing step. The overview of our model is shown in Figure 2, we first compute the editing vector based on different words between and and the last editing vector . We then build our sentence editing model on a left-to-right sequence-to-sequence model with attention, which integrates the edit vector into the decoder. In following, we will introduce the details of editing vector generation and sentence editing.
4.2 Editing Vector Generation
Given a source sentence and a target sentence , our model needs to edit steps to generate a morphing path , which means that we will prepare editing vectors respectively. For each editing vector, there are two important factors to consider. One is the different words between source sentence and target sentence which provides the information which words will be edited at the current step. The other is the editing vector of the last step, which contains some information about which words have been edited. We leverage an RNN structure to generate the editing vector for each step, which can capture information about which words have been edited in the previous time steps.
We first compute an insertion word set where each element appears in but not in , and a deletion word set where each element appears in but not in . We look up an embedding table to transform words in and to dense vectors, forming an editing table , where is the -th insertion word embedding and is the -th deletion word embedding. We use to denote the editing table of and .
Given a sentence , we learn its representation with a GRU based encoder which reads the input sentence into vectors like:
where is the -th word of and is a hidden state at time . The hidden state of each word in sentence is .
After that, we apply the attention mechanism to generate a diff vector . Specifically, the weight of -th word in is computed by
where and are parameters, and is the weight of the -th word in . is the last hidden state of the encoder. The weight of in the insertion word set is obtained with the similar process, that is denoted as . Subsequently, we weighted average word embeddings to construct an insertion vector and a deletion vector separately, and then they are concatenated to form a diff vector , which is formulated as
where is a concatenation operation. Equation 4 plays an important role in our morphing model, because only a subset of the insertion/deletion word set will be used in each step. The larger weight is, the greater role of the word will play in the editing process. For instance, if a word in the deletion set is assigned with a large weight, the word is likely to be deleted in this step.
We employ a recurrent neural networks structure to compute the editing vector
, especially we use the gated recurrent unit (GRU)Chung et al. (2014) as the recurrent unit. When the prototype sentence is and denotes the representation of prototype sentence , the editing vector is defined as
where is a concatenation operation and is the last editing vector. , and are parameters. The editing vector generation leverages the attention mechanism to determine which words will be encoded into an editing vector and the GRU structure to capture the information which words have been edited in the previous time steps.
4.3 Sentence Editing
We build our sentence editing model on a sequence-to-sequence with an attention mechanism model, which integrates the edit vector into the decoder.
The decoder takes the encoder hidden state and edit vector as input and generate a new sentence by a GRU language model with attention. The hidden state of decoder is computed by
where is the last step hidden state in decoder and we concatenate the word embedding of -th word and editing vector as input.
Then we compute a context vector , which is a weighted sum of the input hidden states Luong, Pham, and Manning (2015):
where is given by
are parameters. The generative probability distribution is given by
where and are two parameters. We append the edit vector to every input embedding of the decoder in Equation 1, so the edit information can be utilized in the entire generation process.
We aim at maximizing the likelihood of the generated sentences for training data . We learn our model by minimizing the negative log-likelihood (NLL) and the loss is computed by:
where is and is .
We extract morphing sequences for training with the use of Yelp data set Yelp (2017), that comprises of 30 million review sentences on Yelp 222https://www.yelp.com. Before extracting morphing sequences, we tokenize these sentences and replace named entities with their NER tags with spaCy 333https://honnibal.github.io/spaCy. Subsequently, we construct morphing dataset with the process shown in Algorithm 1.
We aim to could collect million morphing instances. For each source sentence, we find at most possible morphing sequences, since we wish the model is capable of learning different strategies for editing a sentence. is set as , that ensures the morphing path is smooth enough. Given a sentence, we use Minhash to search its similar sentences (Jaccard similarity is larger than ). An open source tool named as datasketch 444https://github.com/ekzhu/datasketch
is employed to index sentences and find similar sentences. The hyperparameter, number of permutation, is chosen aswhich is a good trade-off between accuracy and efficiency. and are and respectively.
After removing duplications, we collect million morphing sequences, whose average sequence length is . The average sentence length is 7.22, indicating that it is easier to find similar sentences for shorter text. We randomly select for training, for validation and for test. We denote the test set as test set 1.
Apart from the validation set and testing set mentioned above, we randomly select sentences as source sentences and sentences as target sentences from the million Yelp dataset to construct another testing set. The testing set differs from the former one, since a morphing sequence cannot be obtained with a retrieval strategy. We denote the test set as test set 2. The dataset is available at https://1drv.ms/u/s!AmcFNgkl1JIngn4-tpg1yMYmh3bi.
5.2 Evaluation Metrics
We evaluate this task in two aspects, fluency and smoothness. We train a 2-layer GRU based language model with 512 units on the 30 million Yelp dataset as a ruler of fluency. Given a morphing sequence, the metric reflects how fluent the sentences in the morphing sequence are, which is formulated as
where denotes the negative log-likelihood probability of the sentence . We average the fluency scores of morphing sequences as the final score.
An ideal morphing sequence is smooth, meaning two adjacent sentences ( and
) are similar on the lexicon. Jaccard distance is employed to calculate the smoothness of an editing operation. For a morphing sequence, we define two metrics,and , to indicate the smoothness of a morphing sequence as follows:
We compare our method with following baselines:
: A variational autoencoder (VAE)Kingma and Welling (2013) allows us to generate sentences from a continuous space Bowman et al. (2015)
. Regarding to natural language morphing, we linear interpolatesentences between the source sentence and target sentence , where is a hyper-parameter. Specifically, we first represent and with two latent vectors and . , the -th latent vector is obtained by
Subsequently, a sentence is decoded from the vector with a GRU based decoder. In practice, we implement the baseline method with the open source code at https://github.com/timbmg/Sentence-VAE, in which the KL cost annealing is applied to prevent the decoder ignores and yields an undesirable stable equilibrium with the KL cost term at zero. SVAE is trained on the 30 million Yelp data.
5.4 Implementation details
We use PyTorch to implement our model. The GRU hidden size is 512, word embedding size is 300, edit vector size is 256, and attention vector size is 512. The vocabulary size is chosen as
. We optimize the objective function using back-propagation and the parameters are updated by stochastic gradient descent with Adam algorithmKingma and Ba (2014). The initial learning rate is , and the parameters of Adam, and are and respectively. We employ early-stopping as a regularization strategy. Models are trained in mini-batches with a batch size of . In the testing phase, we stop the editing process when , or , or .
5.5 Experiment Results
The evaluation results are shown in Table 2 and 3. Morphing Network is significantly better than SVAE method in terms of smooth metrics, indicating that the morphing network is able to transform a sentence to another with a series of small changes, whereas SVAE sometimes modifies a sentence massively in a morphing sequence. This is attributed to our model attends to a part of words at each step, so as to preserve other contents in the sentence. However, SVAE could generate more fluent sentences because SVAE is essentially a language model that pays more attention on fluency. More experimental results of SVAE with different linear interpolate number N can be found in Table 5 in Appendix. As the average number of inserted sentences is 3.6 in Morphing Network model, we report evaluation results of SVAE when N equals 4 in Table 2 and 3.
5.6 Case Study
|Source sentence: their tuna sandwich quality depends the location .|
|: i am a fan of the tuna sandwich.|
|: i am a big fan of the pita jungle .|
|Target sentence: i am a big fan of the pita jungle .|
|Source sentence: i opted for the wagyu filet .|
|: i opted for the filet mignon .|
|: i loved the filet mignon .|
|: my friend loved the filet mignon .|
|Target sentence: my friend loved the gluten free crust .|
|Source sentence: the hot dishes were served piping hot .|
|: the hot dishes were served hot|
|: the hot and hot dogs were hot and delicious .|
|: the hot dogs were hot and delicious .|
|: the hot dogs were hot and delicious , the service was great .|
|:the hot dogs were ok , the service was great .|
|Target sentence: the service was great , and the food was ok .|
Table 4 shows some text morphing examples given by our model. Our model is capable of transferring a sentence into another through a sequence of plausible sentences. In addition, our model can dynamically control the length of a morphing sequence rather than setting a hyper-parameter like SVAE. The three examples finish the text morphing by different times of text editing. The first example completes morphing with two revisions, and its editing attention heap map is depicted in Figure 3.
When we regard the source sentence as an input, “I”, “a”, “of” and “fan” get large weights in the insertion word attention, and “their”, “location” and “depends” are top three in the deletion word set. Consequently, the source sentence is transformed into , where words with large weights are inserted/deleted from the source sentence. Subsequently, the insertion word set and deletion word set are updated according to the differences between and the target sentence. The attention heat map of the second round editing is shown in Figure 3(b). We can find that the weights of inserted words are averaged, so all of these words are inserted into . The weight of “tuna” dominates the deletion attention distribution, but “sandwich” and "quality” are deleted from as well. This is mainly because that the decoder language model is forced to insert the phrase, “pita jungle”, so the word, “sandwich”, has to be deleted together with “tuna” to guarantee the fluency of the generated sentence.
In this paper, we introduce the Text Morphing task for natural language generation. It aims to generate the intermediate sentences that are fluency and smooth with the two input sentences. We present the Morphing Networks that consist of two parts. The editing vector generation network uses a recurrent neural networks model to generate editing vectors from the lexical gap between the source sentence and the target sentence. Then the sentence editing networks iteratively generate new sentences with the current editing vector and the sentence generated in the previous step. Experiments results on 10 million morphing sequences from the Yelp review dataset illustrate the effectiveness of the proposed models.
The work presented in this paper can be advanced from different perspectives. First, it is very interesting to use the idea of in AlphaGo Silver et al. (2016, 2017) in designing the Sequential Editing Networks and the Morphing Networks. We can learn the policy networks and value networks to guide and control the strategy of generating the editing vectors in the process of text morphing. Second, as for application, we are interested in using morphing networks to conduct quantitative evaluation and identification of literal creativity of writings, such as literary masterpiece, articles, news and so on. We can use the morphing probability of the sentences in the writings to sentences in existing literature as a metrics of literal creativity. Moreover, a more general and open question is that “ are any two sentences reachable through morphing with trainable morphing models?” We leave them as the future work on text morphing.
|Test set 1||Test set 2|
- Bowman et al. (2015) Bowman, Samuel R, Luke Vilnis, Oriol Vinyals, Andrew M Dai, Rafal Jozefowicz, and Samy Bengio. 2015. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349.
- Chelba et al. (2013) Chelba, Ciprian, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, and Phillipp Koehn. 2013. One billion word benchmark for measuring progress in statistical language modeling. CoRR, abs/1312.3005.
Chung et al. (2014)
Chung, Junyoung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014.
Empirical evaluation of gated recurrent neural networks on sequence
NIPS 2014 Deep Learning and Representation Learning Workshop.
- Gatt and Krahmer (2018) Gatt, Albert and Emiel Krahmer. 2018. Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. J. Artif. Intell. Res., 61:65–170.
- Grangier and Auli (2018) Grangier, David and Michael Auli. 2018. Quickedit: Editing text & translations via simple delete actions. NAACL.
- Guu et al. (2018) Guu, K., T. B. Hashimoto, Y. Oren, and P. Liang. 2018. Generating sentences by editing prototypes. Transactions of the Association for Computational Linguistics (TACL).
- Kingma and Ba (2014) Kingma, Diederik P and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Kingma and Welling (2013) Kingma, Diederik P and Max Welling. 2013. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
- Luong, Pham, and Manning (2015) Luong, Minh-Thang, Hieu Pham, and Christopher D Manning. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- Mikolov et al. (2013) Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 26. Curran Associates, Inc., pages 3111–3119.
- Silver et al. (2016) Silver, David, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489.
- Silver et al. (2017) Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. 2017. Mastering the game of go without human knowledge. Nature, 550:354–.
- Wolberg (1998) Wolberg, George. 1998. Image morphing: a survey. The Visual Computer, 14(8):360–372.
- Yelp (2017) Yelp. 2017. Yelp dataset challenge.
- Zeldes (2018) Zeldes, Yoel. 2018. Word morphing.