Discovering Textual Structures: Generative Grammar Induction using Template Trees

09/09/2020 ∙ by Thomas Winters, et al. ∙ 1

Natural language generation provides designers with methods for automatically generating text, e.g. for creating summaries, chatbots and game content. In practise, text generators are often either learned and hard to interpret, or created by hand using techniques such as grammars and templates. In this paper, we introduce a novel grammar induction algorithm for learning interpretable grammars for generative purposes, called Gitta. We also introduce the novel notion of template trees to discover latent templates in corpora to derive these generative grammars. By using existing human-created grammars, we found that the algorithm can reasonably approximate these grammars using only a few examples. These results indicate that Gitta could be used to automatically learn interpretable and easily modifiable grammars, and thus provide a stepping stone for human-machine co-creation of generative models.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories


Grammar Induction using a Template Tree Approach

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.


Text generation is a prominent tool within computational creativity, due to many creative fields using text as its primary medium, e.g. poetry and humor. As such, many computational creative systems have applied a large variety of text generation methods. Methods like templates and grammars generally grant the model designer relatively more control over the output, but tend to be labor-intensive to create. Neural text generation approaches (e.g. RNNs, transformer models) on the other hand can create impressive language generators, but at the cost of predictability, interpretability and ease of regulating its outputs in a directed way. In this paper, we explore a new technique for learning grammars for generative purposes by discovering latent templates such that the grammar is easily interpretable and modifiable by designers of generative textual models for creative purposes.



Using templates is a popular approach for generating text. In this context, a template is a piece of text with several slots that are later filled in using a particular data source. While they have a lot of obvious merits for e.g. chatbots and front-end web development, templates are also popular in computational creative applications. For these creative purposes, templates are often paired with schemas for providing sensible content to the templates that internally encode how the template slot values relate to each other [Binsted and Ritchie1994, Winters, Nys, and De Schreye2018]. For example, the expansion of non-terminal in Figure 1 could be seen as a template, where a schema (not in the figure) would create sensible pairings for and . There have been several efforts into automatically learning such templates and schemas from single examples by analysing linguistic relationships and properties [Hong and Ong2009, Winters2019a].

Generative Grammars

Grammars are another popular way of generating text. A context-free grammar (CFG) is a four-element tuple , where is a finite set of non-terminals, a finite set of terminals, a set of production rules that map elements of to and the start symbol. While generative grammars were initially mainly used for generating according to a text need, they are also used for creative purposes. Tracery is a popular language among casual creators for designing generative grammars. Such grammars usually extend CFGs, e.g. adding stored assignments and rule weights [Compton, Kybartas, and Mateas2015, Winters2019b]. A prominent design pattern in these grammars is specifying production rules that map non-terminals either to templates, or to a list of possible values for a particular template slot (Figure 1). The grammar then fills the templates with randomly generated slot value combinations.

    S -> I like putting <T> on my <F>
    T -> cheese | pineapple | soy sauce
    F -> pizza | salad | muesli | sushi
Figure 1:

An example grammar capable of generating 12 different sentences specifying (odd) dish toppings.

Learning Grammars

There are many different algorithms for inducing CFGs, usually designed for a particular class of grammar. The most popular type of grammar induction induces part-of-speech tag structures from treebanks or plain text. Another popular type of grammar induction is discovering repetitive structures to help encode the input text efficiently, inducing grammars where each non-terminal has only a single production rule [Nevill-Manning and Witten1997]. The grammars induced by these algorithms are however shaped differently than typical generative grammars with the template-like production rules. The latter generally avoids recursive production rules, as generating texts of unbounded lengths is usually undesirable for the creative goal. Non-recursive grammars are thus tools for compactly specifying a finite space or interesting texts. In this paper, we introduce an algorithm that can learn such non-recursive context-free grammars using a template-focused approach, which can thus easily be interpreted and adapted by generative grammar creators.

Template Trees

We create and define the notion of template trees as an intermediary step for inducing a generative grammar, and propose an algorithm for learning template trees from input text.

Template Tree Definition

A template tree is a connected acyclic directed graph where each node represents a template that is more general than the template of all its child nodes, thus defining a partial ordering. The leaves of a template tree are templates without slots, i.e. the input sentences used to learn this tree. A template slot maps to zero or more other template elements (i.e. slots and/or word tokens). A simple template tree can be seen in Figure 2.





hello world

hello people

hi world

hi people

Figure 2: A template tree example

Learning Template Trees

Merging Templates

To create a template for the parent node of two templates nodes, we use a longest common subsequence algorithm on their word tokens to create the most specific template that is more general than its children. More specifically, we adapted the Wagner-Fischer algorithm and use the displacement matrix to insert slots when tokens differ [Wagner and Fischer1974]. When there are multiple longest subsequences, the algorithm ignores templates that are longer than the original templates111E.g., when merging ”” and ””, it discards ”” to avoid overgeneralisation, and prefers ””., and prefers less slots.

Template Distance

The distance between two templates is defined in terms of the merged template causing the lowest distance to templates and :

where is the number of non-slot elements of a template , and the number of slots in .

Learning Algorithm

Algorithm 1 shows how a template tree is learned from input texts . Initially, all pairs of input texts are stored in a priority queue , sorted by their distance as defined above. The algorithm keeps track of active templates, i.e. templates without parent, in a list . As long as contains more than one template, the algorithm will take all minimally distant pairs of templates from , where both templates are still active. Every such minimally distant pair is merged, making both templates inactive and the new merged template active. All new templates are then paired up with other active templates, and added to . If only contains one template, then this template becomes the root of the template tree. The template tree is reconstructed by adding all templates that once merged to a particular template as children of the node of this template.

0:  input texts
0:   where
  while  do
     for all  do
     end for
     for all  do
     end for
  end while
Algorithm 1 Calculating template tree from input texts

Gitta: Template Tree to Grammar

We introduce a new grammar induction algorithm named Gitta (Grammar Induction using a Template Tree Approach). Gitta aims to induce a non-recursive CFG, thus compactly representing a finite number of similar finite strings. While any finite language of size can trivially be represented by a simple grammar with production rules, having fewer production rules implies that patterns have been induced. This allows the grammar to potentially generate unseen examples from the language, and also be more easily modifiable. Gitta converts the template tree into a grammar by assuming independence between slot values, and simplifying the template tree. The resulting slot values and root template then specify the grammar.

Pruning Template Tree

Gitta first prunes redundant children of template tree nodes. A child is redundant if all its descendant leaves are reachable through the other children. For each level, nodes are checked in ascending order of number of descendants, pruning nodes with less general templates first.

Merging Slots

To convert the template tree into a grammar, Gitta assumes that all possible slot values are independent from all other slot values of the template. For every slot, all possible slot values are extracted from the templates of the children of the nodes having a template with this slot.

After finding all slot values for every slot , the algorithm merges similar slots if , where is determined by the user. Lower values of thus require slots to have less overlap in slot values in order to be merged. Gitta also removes if there is a slot such that and . If , the slot will also be removed from the slot values. If , then is replaced with . For example, for the tree in Figure 2, the algorithm would discover that has the same slot values as , and thus should be replaced by . This process continues until there is an iteration without any update.

Collapsing Template Tree

Using the merged slots, several simplifications are made to the template tree. First, the replacement mapping reduces the number of different slots of the template tree. Second, knowing the slot values for a slot helps reduce the number of nodes in the tree. For a node with template , and a child with template that contains a slot of and for which the template can be obtained by filling in other slots with known slot values into , then this child node is redundant and can be pruned. All children of are then added as direct children of . For example, for Figure 2, if the root template would be “”, the four middle nodes would collapse into their parent, leaving only “” as parent of the four leaf nodes. After collapsing the template tree using knowledge of the slot values, the template of each node is recalculated, which leads to the aforementioned new root node template of Figure 2. This process of simplification of the template tree and recalculation of the templates keeps repeating until the template tree is unchanged after an iteration. The resulting grammar is derived by mapping from start symbol to the root template of the template tree, and using slot values mappings as production rules.

Experiment: Reverse-engineering Grammars

To measure the performance of the algorithm, we test how well it can induce grammars from generations of a human-made grammars. We use Tracery grammars to generate a fixed number of sentences that serve as examples for Gitta. The algorithm also receives the depth of the original grammar as a parameter to limit the height of the template tree. After inducing a grammar , we compare how many elements that are generatable using are in the language defined by , and how many elements of are not in . We also compare size of the grammars, i.e. number of production rules of the induced grammar to of the original grammar . These production rules can not have disjunctions on the right side, meaning a rule with the shape would be normalised to the two rules and . Smaller grammars are generally more interpretable, and for Gitta also an indication for how well the grammar compacted information.

Out of all Tracery bots on listed BotWiki222 (159), we downloaded all sources that were available on CheapBotsDoneQuick333 (58) and used all grammars without advanced syntax (47) that only generated text (31) with at most one million possible generations (10) in order to make it feasible to calculate . We also removed non-terminal modifiers, used e.g. for capitalisation and pluralisation, from the grammars, leaving only the bare non-terminals. We ran Gitta five times on every grammar on randomized subsets of of size 25, 50 and 100 examples, and took the median values over the runs.

Grammar from 25 examples from 50 examples from 100 examples
id Name
1 botdoesnot 380292 363 648 0 64 2420 0 115 1596 4 179
2 BotSpill 43452 249 75 0 32 150 0 62 324 0 126
3 coldteabot 448 24 39 0 38 149 19 63 388 9 78
4 hometapingkills 4080 138 440 0 48 1184 3240 76 2536 7481 106
5 InstallingJava 390096 95 437 230 72 2019 1910 146 1156 3399 228
6 pumpkinspiceit 6781 6885 25 0 26 50 0 54 100 8 110
7 SkoolDetention 224 35 132 0 31 210 29 41 224 29 49
8 soundesignquery 15360 168 256 179 52 76 2 83 217 94 152
9 whatkilledme 4192 132 418 0 45 1178 0 74 2646 0 108
10 Whinge_Bot 450805 870 3092 6 80 16300 748 131 59210 1710 222
Table 1: Grammar induction results given a specific number of random generations of , measuring median number of generations of the induced grammar that are in and not in the target language, as well as their median sizes, over five runs.

As can be seen in Table 1, the algorithm is generally able to induce grammars that generate significantly more elements of the original language than shown as example to the algorithm, with usually relatively few elements not in the original language. However, Gitta also sometimes uses relatively more rules to generate relatively less generations , most notably in grammars 1 and 2. This indicates that many rules are likely redundant or should be decomposed into simpler rules to allow for more generations. For grammar 6, generalisation is not possible due to the origin template having one slot, and this slot mapping to different word lists, which also explains why it has more production rules than generations.

For grammars 4 and 5, Gitta tends to induce grammars with relatively large numbers of generations that are not in . This is usually due to overgeneralisation. For example, a grammar that has the production rule “”, might lead to Gitta creating a more general rule “”, with “” and mapping to all values of and . For grammar 5 in particular, the origin template has four consecutive non-terminals separated from two other non-terminals by only one terminal, all mapping to varying number of terminals. This property makes it unclear for Gitta where slots start and end, thus leading to overly specific production rules being added instead of finding clear slot values.

Discussion & Future Work

Gitta could be employed in a collaborative generative grammar building tool, where a designer and the algorithm create a generative grammar together. In this scenario, the designer could first illustrate several examples or use an existing corpus specifying what the grammar should generate, for which the algorithm will propose a suitable grammar by discovering latent templates, thus creating an initial grammar prototype. The designer can then add, remove and modify production rules to further suit their needs, thus allowing more meaningful interactions than black-box generative text generators generally allow. This direct control could be used e.g. for limiting the possibilities of generating offensive or unwanted content, which is an important aspect for many text generation domains such as game development.

One limitation compared to other grammar induction algorithms is that it cannot induce recursive grammars. As such, production rules like (= the bracket language) are not able to be learned by our system. However, since recursion is generally an unwanted property of generative grammars due to making grammars able to generate unbounded texts, our proposed algorithm thus prevents language model overgeneralization caused by recursion.


creates a basis for learning more complex, interpretable generative models. It could be trivially extended by learning probabilities of rules as a post-processing step using the input sentences. Another interesting extension is learning constraints that hold between expansions of non-terminals, and thus create complex generative schemas.

We mainly see the use for this algorithm in automatically mimicking patterns or extending data sets that have some sort (possibly latent) template in their texts, such as forum topic titles or writing and comedy prompts. Template trees in itself could also be used for discovering frequently occurring templates in a corpus, and provide similar functionality as clustering algorithms. The code of Gitta is available on


We introduced a new way for learning context-free grammars, focusing on interpretability and its generative performance. We introduced the notion of template trees to achieve this purpose, as well as a learning algorithm for this structure and transformations. The experiments indicate that the grammar induction algorithm is able to induce real grammars from little examples, showing its potential for use in collaborative modelling of grammars. We hope that this system could be a stepping stone towards automatic co-creation of complex but interpretable generative grammars.


Thomas Winters is a fellow of the Research Foundation-Flanders (FWO-Vlaanderen).


  • [Binsted and Ritchie1994] Binsted, K., and Ritchie, G. 1994. An implemented model of punning riddles. CoRR abs/cmp-lg/9406022.
  • [Compton, Kybartas, and Mateas2015] Compton, K.; Kybartas, B.; and Mateas, M. 2015. Tracery: An author-focused generative text tool. In Schoenau-Fog, H.; Bruni, L. E.; Louchart, S.; and Baceviciute, S., eds., Interactive Storytelling, 154–161. Cham: Springer International Publishing.
  • [Hong and Ong2009] Hong, B. A., and Ong, E. 2009. Automatically extracting word relationships as templates for pun generation. In Proceedings of the Workshop on Computational Approaches to Linguistic Creativity, CALC ’09, 24–31. Association for Computational Linguistics.
  • [Nevill-Manning and Witten1997] Nevill-Manning, C. G., and Witten, I. H. 1997. Identifying hierarchical structure in sequences: A linear-time algorithm.

    Journal of Artificial Intelligence Research

  • [Wagner and Fischer1974] Wagner, R. A., and Fischer, M. J. 1974. The string-to-string correction problem. Journal of the ACM (JACM) 21(1):168–173.
  • [Winters, Nys, and De Schreye2018] Winters, T.; Nys, V.; and De Schreye, D. 2018. Automatic joke generation: Learning humor from examples. In Distributed, Ambient and Pervasive Interactions: Technologies and Contexts, volume 10922 LNCS, 360–377. Springer International Publishing.
  • [Winters2019a] Winters, T. 2019a.

    Generating philosophical statements using interpolated markov models and dynamic templates.

    In 31st European Summer School in Logic, Language and Information Student Session Proceedings, 181–189. ESSLLI.
  • [Winters2019b] Winters, T. 2019b.

    Modelling mutually interactive fictional character conversational agents.


    Proceedings of the 31st Benelux Conference on Artificial Intelligence (BNAIC 2019) and the 28th Belgian Dutch Conference on Machine Learning (Benelearn 2019)

    , volume 2491.