Imitating input text is a popular natural language generation task. One way of achieving this is by manually writing an context-free grammar, which often results in a relatively small and predictable generative space due to the large effort it takes to create such a grammar. Markov chains are a useful tool for learning to imitate input sequences automatically. These usually work best for generating short texts, as they tend to derail quickly due to their limited memory. A textual Markov chain learner calculates how often a certain word follows the previous words . One consideration designers of such systems have is choosing this : a high number enforces grammatically more sensible sentences, but more source text plagiarism, whereas a low number leads to more random but original output . In this paper, we propose two potential solutions for this trade-off for generating short, philosophical statements, by developing an algorithm that generates more locally coherent text, and another algorithm that mimics the global structure of a philosophical statement . We demonstrate our algorithms by training on the tweets and columns of the previous KU Leuven rector, Rik Torfs, and deploying it as a Twitterbot called TorfsBot.
2.1 Interpolated Markov Model
A first solution to partially mitigate the classic trade-off between originality and syntactic correctness is by interpolating between several different Markov models. That is, the algorithm trains multiple Markov models with a different order (e.g. with ), and gives a weight to every Markov model (e.g.
). When the interpolated Markov model chooses a next word, the count of every possible candidate is multiplied with the weight of the Markov model it originates from. It then selects the next word from all weighted candidates using roulette wheel selection on the summed counts of all candidates. In this step, additional heuristic functions
could further influence the weights, e.g. preferring certain topical words, rhymes etc. In summary, the model thus does not only look at the previous two words to know the probability of of the next word, but also at the previous three and previous four words, sums all probabilities of all possible words according to these other models, but gives a higher probability of picking a word predicted by a model looking at more previous words.
Thus, the new formula to calculate the probability of candidate words can be derived from the classic Markov model by adding some extensions. The classic Markov model of order , which looks back words to predict the probability of word vocabulary from previous words , uses the following formula :
with the number of times occurred after , , , in the training corpus, divided by the number of times , , , followed by any word occurred.
We first extend this with a normalisation function , increasing the variety while decreasing the size of the resulting automaton. This function maps the words to lowercase and removes any non-alphanumeric character.
The following formula combines evidence from multiple such Markov models with maximum order by weighting every model using weight , creating an interpolated Markov model:
These weights can then be further influenced by a function , which increases the weight if the new word fulfills certain conditions, such as having close phonetic distance, contextually related to previously generated words, or combinations thereof:
The algorithm uses any start of a sentence in the training data as , , to kick start the generatibe process using Equation 4, until an ending token is obtained as . After generating a candidate, the algorithm performs several post-processing extensions for improving the quality of the final generated texts.
Since philosophical statements and tweets tend to be better when short, the first post-processing step of the algorithm shortens generated text if it is longer than a specific threshold. For our purposes, we set this threshold to the previous Twitter limit of 140 characters. It achieves this shortening by removing sentences between the first and last sentence, and leaving some middle sentences if there is enough space left. Due to the fact that the Markov chains enforce that both the beginning and the endings truly occur as beginnings and endings in the training data, they tend to feel as respectively introductions and conclusions. This pattern of skipping intermediary sentences, and thus just going from introduction to conclusion, is a property also generally present in philosophical statements and tweets. Another way the algorithm achieves snappy texts is by first generating multiple candidates and then preferring shorter ones by using roulette wheel selection using the inverse text length as the main factor of the weight.
2.1.2 Punctuation fixing
As a second post-processing step, the algorithm fixes the brackets and quotation marks by adding complementary brackets and quotation marks at the beginning or endings of clauses or sentences. If no good position for the other bracket can be found, the bracket or quotation mark is deleted. This is a necessary step due to the fact that Markov models are only able to keep track of the last several generated words. Contrary to many neural networks text generators[4, 10], these Markov models are thus most of the time unaware what the current level of nested brackets is during generation. As such, this post-processing step fixes several obvious mistakes in the generated texts.
2.1.3 News insertion
Since tweets are often bound to events happening when the tweet was written, using them as training data might result in referencing past dates, events and news. To mitigate this, the third post-processing step is to link specifics in the generated text such to the present and to current events, as this makes a text feel more relevant and thus improves its quality 
. To achieve this, the algorithm searches for occurrences of full dates, months and years using regular expressions, and replaces them with a (parts of) a random date in the near future; we chose one to four days. It also searches the text for named entities, as these are either tied to events from the time it was written, or serve as good candidates for adding references to current events. It finds these by searching for title-cased words in unexpected spots, as this is a well-working proxy for named entity recognition in our target languages. The system filters out named entities that occur more than a specific threshold in the training corpus, such that it does not replace names that are archetypal references for the author the system imitates. The system then crawls the front page of a news website and selects a news article based on unigram similarity to the training corpus, where the score is calculated in a way similar to the scores of the naive Bayes algorithm. It uses this article to replace the filtered names from the generated text by the named entities of the chosen news article, giving priority to the most frequent named entities in the article. If the generated text contains any quotes that are longer than three words (and thus a proper quote, and not word emphasis), it also replaces the quote with a quote from the article if there is one. All these replacements help the generated text feel more grounded in the present and current affairs.
2.1.4 Originality check
The final post-processing step is checking if the generated text is original enough compared to the training corpus. While there are clever extensions to create a Markov automaton that checks for originality while generating , we opted for a post-processing step due to being less memory intensive as well as the limited length of the generated texts. This is achieved by normalising both by converting to lower case and removing non-alphanumeric characters, and checking if the normalised generated text occurs fully in, or as a part of an element of the normalised training corpus. If there is overlap, it restarts the full process and generates a new text.
2.2 Dynamic Templates
Templates are often used for generating text, and are especially popular in computational humour [1, 6, 12, 13]. A template is essentially a sentence with slots, where every slot is a variable that a data source can fill. Context-free grammars allow designers to generate texts using specified templates, and have been used to create thousands of Twitterbots , e.g. by using the popular Tracery tool . However, creating such a grammar by hand is not only tedious, but also tends to caricaturize the target due to having a limited number of templates.
We propose solving the issue of creating templates by hand by dynamically extracting templates from base texts by knowing what types of content we want to insert, and identify good slots for replacing words by identifying the key words of a chosen sentence. This way, the global structure of the statement is retained. These base texts with yet-to-define slots is what we call “dynamic templates”.
To generate a text using dynamic templates, the algorithm first receives a corpus of base texts , a corpus of content text and a unigram model as input. The algorithm then picks several consecutive lines from and uses their words as a proxy for finding a set of related context words. It then creates a mapping from their part-of-speech tags to every present word. For the dynamic template itself, it selects a random text from the base text corpus and analyses all the present part-of-speech tags. The algorithm then proposes a list of possible replacements based on matching part-of-speech tags and the mapping to content text words. The part-of-speech tags do not only contain information about the type, but also more specific information e.g. the tense of a verb. This ensures that words that are replaced generally still have the correct syntactical relation to other words in the sentence. Words that do not have matching part-of-speech tags present in the selected lines from are not replaced. Some types of words, such as auxiliary verbs and other structural words, are also not allowed to be replaced due to breaking the grammatical correctness of the sentence. The possible replacements are sorted by ascending frequency using the unigram model , to prefer rare words, as a proxy for the key contextual words. The algorithm must replace a certain minimum number of words, proportional to the length of the string, since we do not want short sentences with too common words without replacements, or long sentences with only one word being replaced. The replacement process continues until words to be replaced are more common than a minimum percentile in , modelling the word commonness distribution. Similar to the interpolated Markov model model, it then also performs the news insertion step to make the generated texts more tied to current events. The pseudocode for this algorithm can be found in Algorithm 1.
The algorithm thus inserts context words into a base text that is used as a template to dynamically respond to the content that needs to be inserted. This mix of context words into a structure used in a different context can often cause a bisociation, a jump between two frames of references usually present in creative works, between the original narrative and the new context . As an example, consider one of the recent tweets by our Twitterbot employing this algorithm, which we translated from Dutch to English. By using an original tweet by Rik Torfs, namely
“Are there also atheists that don’t believe in atheism?”111https://twitter.com/torfsrik/status/620144837719932928
and combining this with a fragment of a column by the same author as context text
= “They see the fact that the former Supreme Being is not trying to deny this newly acquired insight as proof of them being right. Even with the Church, things are not going well. Norse popes.”222http://www.standaard.be/cnt/s73709av
we get the following result after executing our algorithm, and finding the replacements 333Note that “Supreme Being” in Dutch is only one word (namely “opperwezen”), and that “atheism” in Dutch has an article, thus providing “the”.:
“Are there also popes that do not believe in the supreme being?”444https://twitter.com/TorfsBot/status/1101507600095633410
2.3 Automatic Replying
There are several methods for making the text generation algorithms interactive, such that the Twitterbot would not only send out stand-alone tweets into the world but also respond to user replies. One way would be to bias the interpolated Markov model towards words related to words of the conversation by incorporating a factor in of Equation 4 that gives a higher weight to relevant context words. This method could however influence the generator to obsessively use certain rare words, which it would not do in normal circumstances. Another way of adapting to a user text is inserting relevant context words that have been used by the user as in our dynamic templates approach. This is dangerous due to inserting potentially out-of-corpus words from an untrusted source, which attackers might use to make the bot behave offensively and break platform guidelines.
We devised a new method for generating replies using any text generation algorithm without modifying the algorithm. Since philosophical statements stereotypically tend to be vague and only somewhat related to previous text, the approach first generates philosophical statements like it normally does, and then picks an optimal one to use in the conversation as a reply. The algorithm first analyses the conversation so far by making a weighted unigram of the words present in it. The weight of a word in the conversation unigram is multiplied by a factor depending on the author and how recent the text is. The weight factor is zero for any reply in the conversation longer than ten replies ago, and one if the reply came from the bot. For the replies of the user, the most recent one has a high weight factor and decreasing linearly for every previous reply, with a minimum total weight factor of one. The algorithm then generates thousands of random candidates using the interpolated Markov model, as this is a very efficient generator. It ends by picking the best candidate by looking for the text that has the most rare words in common with weighted unigram coming from the conversation, by summing the weights of the words the text has in common with the conversation and dividing this by the word count in unigram model . The score of a candidate is also inversely proportional to the difference in length between the last reply and the candidate, such that the bot prefers replying short if the user replies with short answers, and engage using long arguments if the user is also replying using long answers. This thus helps incorporating topics from earlier in the conversation, and follow the tone of the conversation 555Some (Dutch) examples can be found on https://twitter.com/TorfsBot/status/ for .
3.1 Qualitative Performance
As mentioned earlier, we implemented a Twitterbot called TorfsBot666https://twitter.com/TorfsBot
, which imitates tweets made by previous KU Leuven rector Rik Torfs. This bot uses both the interpolated Markov model and dynamic templates to generate tweets, and uses the interpolated Markov model as text generator for the reply algorithm. For the interpolated Markov model algorithm, the bot uses both tweets and columns written by Torfs as training data, putting more emphasis on the former by multiplying the weight of the n-grams from this source by ten. For the dynamic templates, we used the tweets as base texts and the columns for content texts. TorfsBot has more than 850 followers, making it one of the most popular Dutch Twitterbots777https://botwiki.org/?s=dutch. We compared the algorithms and their average interactions () in Table 1
. We also compare a subset of the replies, namely those not from a conversation with one single outlier user, who is responsible for 1883 total replies. From these results, we can see that the locally coherent algorithm (namely the interpolated Markov model) usually produces more prefered content than the globally coherent algorithm (the dynamic template algorithm). One reason for this result could be that, compared to spotting errors in complicated grammatical structures, it is easier to detect words that can not follow each other, which then drastically lowers the credence in the wisdom of the philosophical statement.
|Interpolated Markov Model||2375||1.11|
|Replies without outlier user||2388||0.39|
3.2 Applicability to Other Domains and Languages
We also implemented a similar bot using the interpolated Markov models for generating poetry based on the works of Belgian poet Maarten Inghels, called InghelsBot888https://twitter.com/InghelsBot. This system uses the weight influencers of Equation 4 to prefer similar sounding words next to each other using rhyming dictionaries and Levenshtein distance. It also uses the additional normalisation function on some of its internal Markov models to normalise to just the part-of-speech tag. This bot shows that interpolated Markov model works for multiple languages as well as for other text types (namely English and Dutch poetry) without much modifications.
5 Future Work
There are several possible improvements to further develop the discussed algorithms. One large improvement would be using better methods for discovering related words, e.g. by training word embeddings on the training data 
. This would allow the dynamic template algorithm to replace the words of the dynamic template with words that have all have similar difference vectors, such that all context words of the resulting sentence are analogous to the context of the base text. The embeddings could also help the reply generator estimate the relatedness of words better than just plainly using the words themselves.
Another interesting improvement would be interpolating the interpolated Markov model with other sequential generators, such as LSTMs or fine-tuned versions of GPT-2 , as these are able to keep track of the previously generated texts for much longer and thus produce texts with better grammar.
Since our bots are deployed on Twitter and are thus able to receive constant feedback from users, it would also be interesting to use this feedback mechanism to improve the perceived quality of generated texts. For example, it might be able to learn which contexts or which base texts work great in dynamic templates.
We designed and implemented two different algorithms for imitating input text, and made it interactive. We then showed that the text generated by our systems are well-appreciated by users, and found that the interpolated Markov model got the most positive user feedback, implying that local coherence might be more important than the global structure when generating philosophical statements.
This work was partially supported by Research foundation - Flanders (project G.0428.15).
-  Binsted, K., Ritchie, G.: An implemented model of punning riddles. CoRR abs/cmp-lg/9406022 (1994)
-  Compton, K., Kybartas, B., Mateas, M.: Tracery: An author-focused generative text tool. In: Schoenau-Fog, H., Bruni, L.E., Louchart, S., Baceviciute, S. (eds.) Interactive Storytelling. pp. 154–161. Springer International Publishing, Cham (2015)
-  Compton, K., Pagnutti, J., Whitehead, J.: A shared language for creative communities of artbots. In: Proceedings of the Eighth International Conference on Computational Creativity, Atlanta, US. ACC (2017)
-  Karpathy, A., Johnson, J., Li, F.: Visualizing and understanding recurrent networks. CoRR abs/1506.02078 (2015)
-  Koestler, A.: The Act of Creation. An Arkana book : psychology/psychiatry, Arkana (1964), https://books.google.be/books?id=tJC5pDXFY8oC
Manurung, R., Ritchie, G., Pain, H., Waller, A., Mara, D., Black, R.: The construction of a pun generator for language skills development. Applied Artificial Intelligence22(9), 841–869 (2008). https://doi.org/10.1080/08839510802295962
-  Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
-  Norris, J.R.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics, Cambridge University Press (1997). https://doi.org/10.1017/CBO9780511810633
-  Papadopoulos, A., Roy, P., Pachet, F.: Avoiding plagiarism in markov sequence generation. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. pp. 2731–2737. AAAI’14, AAAI Press (2014), http://dl.acm.org/citation.cfm?id=2892753.2892930
-  Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I.: Language models are unsupervised multitask learners. OpenAI Blog 1(8) (2019)
-  Winters, T.: Generating dutch punning riddles about current affairs. 29th Meeting of Computational Linguistics in the Netherlands (CLIN 2019): Book of Abstracts (01 2019)
-  Winters, T., Nys, V., De Schreye, D.: Automatic joke generation: Learning humor from examples. In: Distributed, Ambient and Pervasive Interactions: Technologies and Contexts. vol. 10922 LNCS, pp. 360–377. Streitz, Norbert, Springer International Publishing (2018)
-  Winters, T., Nys, V., De Schreye, D.: Towards a general framework for humor generation from rated examples. Proceedings of the 10th International Conference on Computational Creativity pp. 274–281 (2019)