Reinforcement learning of minimalist grammars

04/30/2020 ∙ by Peter beim Graben, et al. ∙ 0

Speech-controlled user interfaces facilitate the operation of devices and household functions to laymen. State-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic slots to interpret the user's intent. In order to develop proper cognitive information and communication technologies, simple slot-filling should be replaced by utterance meaning transducers (UMT) that are based on semantic parsers and a mental lexicon, comprising syntactic, phonetic and semantic features of the language under consideration. This lexicon must be acquired by a cognitive agent during interaction with its users. We outline a reinforcement learning algorithm for the acquisition of syntax and semantics of English utterances, based on minimalist grammar (MG), a recent computational implementation of generative linguistics. English declarative sentences are presented to the agent by a teacher in form of utterance meaning pairs (UMP) where the meanings are encoded as formulas of predicate logic. Since MG codifies universal linguistic competence through inference rules, thereby separating innate linguistic knowledge from the contingently acquired lexicon, our approach unifies generative grammar and reinforcement learning, hence potentially resolving the still pending Chomsky-Skinner controversy.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Speech-controlled user interfaces Young10 such as Amazon’s Alexa, Apple’s Siri or Cortana by Microsoft substantially facilitate the operation of devices and household functions to laymen. Instead of using keyboard and display as input-output interfaces, the operator pronounces requests or instructions to the device and listens to its responses. One important future development will be Smart Home and Ambient Assisted Living applications in the health sciences MartinWeibel18 .

State-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic frames Minsky74 ; Fillmore88 to interpret the user’s intent. This slot-filling procedure Allen03 ; TurHakkaniEA11 ; MesnilDauphinEA15

is based on large language corpora that are evaluated by standard machine learning methods, such as conditional random fields


or by deep neural networks

MesnilDauphinEA15 , for instance. The necessity to overcome traditional slot-filling by proper cognitive information and communication technologies BaranyiCsapoSallai15 has already been emphasized by Allan Allen17 . His research group trains semantic parsers from large language data bases such as WordNet or VerbNet that are constrained by hand-crafted expert knowledge and semantic ontologies Allen03 ; Allen14 ; AllenBahkshandehEA18 ; PereraAllenEA18b .

One particular demand on cognitive user interfaces are the processing and understanding of declarative or imperative sentences. Consider, e.g., a speech-controlled heating device, a cognitive heating DuckhornHuberEA17 ; KlimczakWolffLindemannEA14 ; TschopeEA18 ; WolffMeyerRomer15 ; WolffRomerWirsching15

, with the operator’s utterance “I am cold!” This declarative sentence must firstly be analyzed syntactically to assigning “I” to the subject position and attributing “am cold” to the predicate. Secondly, a semantic analysis interprets “I” as the speaker and “am cold” as a particular subjective state. From this representation of the speaker’s state, the system must compute logical inferences and then respond accordingly, by increasing the room temperature and probably by giving a linguistic feedback signal “I increase the temperature to 22 degrees”. Technically, this could be achieved using feature-value relations (FVR)


as semantic representations and modified Markov-decision processes (MDP) for behavior control

TschopeEA18 ; WolffMeyerRomer15 ; WolffRomerWirsching15 .

Recent research in computational linguistics has demonstrated that quite different grammar formalisms, such as tree-adjoining grammar JoshiLevyTakahashi75 , multiple context-free grammar SekiEA91 , range concatenation grammar Boullier05 , and minimalist grammar Stabler97 ; StablerKeenan03 converge toward universal description models KuhlmannKollerSatta15 ; JoshiVijayWeir90 ; Michaelis01a ; Stabler11a . Minimalist grammar (MG) has been developed by Stabler Stabler97 to mathematically codify Chomsky’s Minimalist Program Chomsky95 in the generative grammar framework. A minimalist grammar consists of a mental lexicon storing linguistic signs as arrays of syntactic, phonetic and semantic features, on the one hand, and of two structure-building functions, called “merge” and “move”, on the other hand. Syntactic features in the lexicon are, e.g., the linguistic base types noun (n), verb (v), adjective (a), determiner (d), inflection (i.e. tense t), or, preposition (p). These are syntactic heads selecting other categories either as complements or as adjuncts. The structure generation is controlled by selector categories that are “merged” together with their selected counterparts. Moreover, one distinguishes between licensors and licensees, triggering the movement of maximal projections. An MG does not comprise any phrase structure rules; all syntactic information is encoded in the feature array of the mental lexicon. Furthermore, syntax and compositional semantics can be combined via the lambda calculus Niyogi01 ; Kobele09 , while MG parsing can be straightforwardly implemented through bottom-up Harkema01 , top-down Harkema01 ; Mainguy10 ; Stabler11b , and in the meantime also by left-corner automata StanojevicStabler18 .

One important property of MG is their effective learnability in the sense of Gold’s formal learning theory Gold67 . Specifically, MG can be acquired by positive examples BonatoRetore01 ; KobeleCollierEA02 ; StablerEA03 from linguistic dependence graphs BostonHaleKuhlmann10 ; Nivre03 ; KleinManning04 , which is consistent with psycholinguistic findings on early-child language acquisition Ellis06 ; Diessel13 ; Gee94 ; Pinker95 ; Tomasello06 . However, learning through positive examples only, could easily lead to overgeneralization. According to Pinker Pinker95 this could substantially be avoided through reinforcement learning Skinner15 ; SuttonBarto18 . Although there is only little psycholinguistic evidence for reinforcement learning in human language acquisition Moerk83 ; SundbergEA96 , we outline a machine learning algorithm for the acquisition of an MG mental lexicon GrabenEA20 of the syntax and semantics for English declarative sentences through reinforcement learning in this chapter. Instead of looking at pure syntactic dependencies as in BonatoRetore01 ; KobeleCollierEA02 ; StablerEA03 , our approach directly uses their underlying semantic dependencies for the simultaneous segmentation of syntax and semantics.

2 Minimalist Grammar

Our language acquisition approach for minimalist grammar combines methods from computational linguistics, formal logic, and abstract algebra. Starting point of our algorithm are utterance meaning pairs (UMP) WirschingLorenz13 ; GrabenEA20 .


where is the spoken or written utterance, given as the exponent of a linguistic sign Kracht03 . Technically, exponents are strings taken from the Kleene hull of some finite alphabet, , i.e. . The sign’s semantics is a logical term, expressed by means of predicate logic and the (untyped) lambda calculus Church36 .

2.1 Semantics

As an example, we consider the simple UMP


in the sequel. We use typewriter font throughout the chapter to emphasize that utterances are regarded plainly as symbolic tokens without any intended meaning in the first place. This applies even to the “semantic” representation in terms of first order predicate logic where we use the Schönfinkel-Curry Lohnstein11 ; Schonfinkel24 notation here. Therefore, the expression above indicates that is a binary predicate, fetching first its direct object to form a unary predicate, , that then takes its subject in the second step to build the proposition of the utterance (2).

Following Kracht Kracht03 , we regard a linguistic sign as an ordered triple


with the same exponent and semantics as in the UMP (1). In addition, is a syntactic type that we encode by means of minimalist grammar (MG) in its chain representation StablerKeenan03 . The type controls the generation of syntactic structure and hence the order of lambda application, analogously to the typed lambda calculus Church40 in Montague semantics Lohnstein11 . In order to avoid redundancy, we use the plain (untyped) lambda calculus in the following Church36 .

Lambda calculus is a mathematical formalism developed by Church in the 1930s “to model the mathematical notion of substitution of values for bound variables” according to Wegner2003 . Although the original application was in the area of computability (cf. Church36 ) the substitution of parts of a term with other terms is often the central notion when lambda calculus is used. This is also true in our case and we have to clarify the concepts first. Namely variable, bound and free, term, and substitution.

To be applicable to any universe of discourse one prerequisite of lambda calculus is that variables are elements from “an enumerably infinite set of symbols” Church36 . However, for usage in a specific domain, a finite set is sufficient. Since we aim at terms from first order predicate logic, treating them with the operations from lambda calculus, all their predicates and individuals need to be in the set of variables. Additionally, we will use the symbols as variables for individuals and as variables for (parts of) logical terms. The set is thus used as the set of variables. Note, that the distinction made by and is not on the level of lambda calculus but rather a visual clue to the reader.

The term algebra of lambda calculus is inductively defined as follows. i) Every variable is a term and is a free variable in the term ; specifically, also every well-formed formula of predicate logic is a term. ii) Given a term and a variable which is free in , the expression is also a term and the variable is now bound in . Every other variable in different from is free (bound) in if it is free (bound) in . iii) Given two terms and , the expression is also a term and every variable which is free (bound) in or is free (bound) in . Such a term is often referred to as operator-operand combination Wegner2003 or functional application Lohnstein11 . For disambiguation we also allow parentheses around terms. The introduced syntax differs from the original one where additionally braces and brackets are used to mark the different types of terms (cf. Church36 ). Sometimes, is also written as and the dot between and the variable is left out (cf. Wegner2003 ).

For a given variable and two terms and the operation of substitution is (originally written as in Church36 and sometimes without the right bar, i. e. as in Wegner2003 ) and stands for the result of substituting for all instances of in .

Church defined three conversions based on substitution.

  • Renaming bound variables by replacing any part of a term by when the variable does not occur in the term .

  • Lambda application by replacing any part of a term by , when the bound variables in are distinct both from and the free variables in .

  • Lambda abstraction by replacing any part of a term by , when the bound variables in are distinct both from and the free variables in .

The first conversion simply states that names of bound variables have no particular meaning on their own. The second and third conversions are of special interest to our aims. Lambda application allows the composition of logical terms out of predicates, individuals and other logical terms while lambda abstraction allows the creation of templates of logical terms.

Applied to our example (2), we have the sign


where the now appearing MG type :c indicates that the sign is complex (not lexical) and a complementizer phrase of type c. Its compositional semantics Lohnstein11 can be described by the terms and , the predicate and the individuals and . Consider the term . This is converted by two successive lambda applications via into the logical term . It is also possible to rearrange parts of the term in a different way. Consider now the term , the logical term and the individual . Then the term is converted by two successive lambda applications into the logical term . Thus, logical terms can be composed through lambda application.

Moreover, given the logical term two successive lambda abstractions yield the term , leaving out the operand parts. In that way, templates of logical terms are created where different individuals can be inserted for term evaluation. Both processes are crucial for our utterance-meaning transducer and machine language acquisition algorithms below.

2.2 Syntax

An MG consists of a data base, the mental lexicon, containing signs as arrays of syntactic, phonetic and semantic features, and of two structure-generating functions, called “merge” and “move”. Syntactic features are the basic types from a finite set , with , etc, together with a set of their respective selectors that are unified by the “merge” operation. Moreover, one distinguishes between a set of licensers and another set of their corresponding licensees triggering the “move” operation. is another finite set of movement identifiers. is called the feature set. Finally, one has a two-element set of categories,111 Departing from the convention in the literature StablerKeenan03 , we call the elements of categories due to a tentative interpretation in terms of indexed grammars Aho68 ; Staudacher93 . We shall address this interesting issue in subsequent research. where “::” indicates simple, lexical categories while “:” denotes complex, derived categories. The ordering of syntactic features as they appear originally in the lexicon is prescribed as regular expressions, i.e. is the set of syntactic lexical types Stabler97 ; StablerKeenan03 . The set of linguistic signs is then given as Kracht03 .

Let be exponents, semantic terms in the lambda calculus, one feature identifier, feature strings compatible with the regular types in , and sequences of signs, then and form signs in the sense of (3). A sequence of signs is called a minimalist expression, and the first sign of an expression is called its head, controlling the structure building through “merge” and “move” as follows.

The MG function “merge” is defined through inference schemata


Correspondingly, “move” is given through


where only one sign with licensee may appear in the expression licensed by in the head. This so-called shortest movement constraint (SMC) guarantees syntactic locality demands Stabler97 ; StablerKeenan03 .

A minimalist derivation terminates when all syntactic features besides one distinguished start symbol, which is c in our case, have been consumed. We conventionally use complementizer phrase c as the start symbol.

For illustrating the rules (59) and their applicability, let us stick with the example UMP (2). Its syntactic analysis in terms of generative grammar Haegeman94 yields the (simplified) phrase structure tree in Fig. 1(a).222 For the sake of simplicity we refrain from presenting full-fledged X-bar hierarchies Haegeman94 .

(a) [.CP [.DP [.D the ].D [.N mouse ].N ].DP [.IP [.I ].I [.I [.I -s ].I [.VP [.V ].V [.N cheese ].N ].VP ].I ].IP ].CP (b) [. [. !3cm ] ]

Figure 1: Generative grammar analysis of example UMP (2). (a) Syntactic phrase structure tree. (b) Semantic tree from lambda calculus.

The syntactic categories in Fig. 1(a) are the maximal projections CP (complementizer phrase), IP (inflection phrase), VP (verbal phrase), and DP (determiner phrase). Furthermore, there are the intermediary node and the heads I (inflection), D (determiner), V (verb), and N (noun), corresponding to t, d, v, and n in MG, respectively. Note that inflection is lexically realized only by the present tense suffix -s. Moreover, the verb eat has been moved out of its base-generated position leaving the empty string there. Movement is indicated by co-indexing with .

Correspondingly, we present a simple semantic analysis in Fig. 1(b) using the notation from Sect. 2.1 together with the lambda calculus of the binary predicate in its Schönfinkel-Curry representation Lohnstein11 ; Schonfinkel24 .

Guided by the linguistic analyses in Fig. 1, an expert could construe a minimalist lexicon as given in Tab. 1 by hand StablerKeenan03 .

Table 1: Minimalist lexicon for example grammar Fig. 1.

We adopt a shallow semantic model, where the universe of discourse only contains two individuals, the mouse and a piece of cheese.333 Moreover, we abstract our analysis from temporal and numeral semantics and also from the intricacies of the semantics of noun phrases in the present exposition. Then, the lexicon Tab. 1 is interpreted as follows. Since all entries are contained in the MG lexicon, they are of category “::”. There are two nouns (n), mouse and cheese with their respective semantics as individual constants, and . In contrast to mouse, the latter possesses a licensee -k for case marking. The same holds for the determiner the selecting a noun (=n) as its complement to form a determiner phrase d which also requires case assignment (-k) afterwards. The verb (v) eat selects a noun as a complement and has to be moved for inflection -f. Its compositional semantics is given by the binary predicate whose argument variables are bounded by two lambda expressions . Moreover, we have an inflection suffix -s for present tense in third person singular, taking a predicate (pred) as complement, then triggering firstly inflection movement +f and secondly case assignment +k, whose type is tense (t). Finally, there are two entries that are phonetically not realized. The first one selects a verbal phrase =v and assigns case +k afterwards; then, it selects a determiner phrase =d as subject and has its own type predicate pred; additionally, we prescribe an intertwiner of two abstract lambda expressions as its semantics. The last entry provides a simple type conversion from tense t to complementizer c in order to arrive at a well-formed sentence with start symbol c.

Using the lexicon Tab. 1, the sign (4) is obtained by the minimalist derivation (10).


In the first step, (-1), the determiner the takes the noun mouse as its complement to form a determiner phrase d that requires licensing through case marking afterwards. In step 2, the finite verb eat selects the noun cheese as direct object, thus forming a verbal phrase v. As there remain unchecked features, only merge-3 applies yielding a minimalist expression, i.e. a sequence of signs. In step 3, the phonetically empty predicate pred merges with the formerly built verbal phrase. Since pred assigns accusative case, the direct object is moved in (-4) toward the first position through case marking by simultaneously concatenating the respective lambda terms. Thus, lambda application entails the expression in step 5. Then, in step 6, the predicate selects its subject, the formerly construed determiner phrase. In the seventh step, (-7), the present-tense suffix unifies with the predicate, entailing an inflection phrase pred, whose verb is moved into the first position in step 8, thereby yielding the inflected verb eat-s. In steps 9 and 10 two lambda applications result into the correct semantics, already displayed in Fig. 1(b). Step 11 assigns nominative case to the subject through movement into specifier position. A further lambda application in step 12 yields the intended interpretation of predicate logics. Finally, in step 13, the syntactic type t is converted into c to obtain the proper start symbol of the grammar.

2.3 Utterance-Meaning Transducer

Derivations such as (10

) are essential for minimalist grammar. However, their computation is neither incremental nor predictive. Therefore, they are not suitable for natural language processing in their present form of data-driven bottom-up processing. A number of different parsing architectures have been suggested in the literature to remedy this problem

Harkema01 ; Mainguy10 ; Stabler11b ; StanojevicStabler18 . From a psycholinguistic point of view, predictive parsing appears most plausible, because a cognitive agent should be able to make informed guesses about a speaker’s intents as early as possible, without waiting for the end of an utterance Hale11 . This makes either an hypothesis-driven top-down parser, or a mixed-strategy left-corner parser desirable also for language engineering applications. In this section, we briefly describe a bidirectional utterance-meaning transducer (UMT) for MG that is based upon Stabler’s top-down recognizer Stabler11b as outlined earlier in GrabenMeyerEA19 . Its generalization towards the recent left-corner parser StanojevicStabler18 is straightforward.

The central object for MG language processing is the derivation tree obtained from a bottom-up derivation as in (10). Figure 2 depicts this derivation tree, where we present a comma-separated sequence of exponents for the sake of simplicity. Additionally, every node is addressed by an index tuple that is computed according to Stabler’s algorithm Stabler11b .

[.the mouse eats cheese (0) !1cm [.the mouse eats cheese(1) [.eats cheese, the mouse(11, 10) [.-s cheese, eat, the mouse(111, 110, 10) -s(1110) !1cm [.cheese, eat, the mouse(1111, 110, 10) [.cheese, eat(1111, 110) !2.2cm [., eat, cheese(11111, 110, 11110) (11111) !2cm [.eat, cheese(110, 11110) eat(110) cheese(11110) ].eat, cheese(110, 11110) ]., eat, cheese(11111, 110, 11110) ].cheese, eat(1111, 110) !5.5cm [.the mouse(10) the(100) !2cm mouse(101) ].the mouse(10) ].cheese, eat, the mouse(1111, 110, 10) ].-s cheese, eat, the mouse(111, 110, 10) ].eats cheese, the mouse(11, 10) ].the mouse eats cheese(1) ].the mouse eats cheese

Figure 2: Simplified derivation tree of (10). Exponents of different signs are separated by commas. Nodes are also addressed by index tuples.

Pursuing the tree paths in Fig. 2 from the bottom to the top, provides exactly the derivation (10). However, reading it from the top towards the bottom allows for an interpretation in terms of multiple context-free grammars SekiEA91 ; Michaelis01a (MCFG) where categories are -ary predicates over string exponents. Like in context-free grammars, every branching in the derivation tree Fig. 2 leads to one phrase structure rule in the MCFG. Thus, the MCFG enumerated in (11) codifies the MG Tab. 1.


In (11), the angular brackets enclose the MCFG categories that are obviously formed by tuples of MG categories and syntactic types. These categories have the same number of string arguments as prescribed in the type tuples. Because MCFG serve only for syntactic parsing in our UMT, we deliberately omit the semantic terms here; they are reintroduced below. The MCFG rules (-1-9) are directly obtained from the derivation tree Fig. 2 by reverting the merge and move operations of (10) through their “unmerge” and “unmove” counterparts Harkema01 . The MCFG axioms, i.e. the lexical rules (-10-16), are reformulations of the entries in the MG lexicon Tab. 1.

Language Production

The UMT’s language production module finds a semantic representation of an intended utterance in form of a Schönfinkel-Curry Lohnstein11 ; Schonfinkel24 formula of predicate logic, such as , for instance. According to Fig. 1(b) this is a hierarchical data structure that can control the MG derivation (10). Thus, the cognitive agent accesses its mental lexicon, either through Tab. 1 or its MCFG twin (11) in order to retrieve the linguistic signs for the denotations eat, cheese, and mouse. Then, the semantic tree Fig. 1(b) governs the correct derivation (10) up to lexicon entries that are phonetically empty. These must occasionally be queried from the data base whenever required. At the end of the derivation the computed exponent the mouse eats cheese is uttered.

Language Understanding

The language understanding module of our UMT comprises three memory tapes: the input sequence, a syntactic priority queue, and also a semantic priority queue. Input tape and syntactic priority queue together constitute Stabler’s priority queue top-down parser Stabler11b . Yet, in order to compute the meaning of an utterance in the semantic priority queue, we slightly depart from the original proposal by omitting the simplifying trim function. Table 2 presents the temporal evolution of the top-down recognizer’s configurations while processing the utterance the mouse eats cheese.

step input syntactic queue operation
1. the mouse eats cheese expand (-1)
2. the mouse eats cheese scan (-16)
3. the mouse eats cheese expand (-2)
4. the mouse eats cheese expand (-3)
5. the mouse eats cheese expand (-4)
6. the mouse eats cheese sort
7. the mouse eats cheese expand (-5)
8. the mouse eats cheese sort
9. the mouse eats cheese expand (-9)
10. the mouse eats cheese scan (-12)
11. mouse eats cheese scan (-10)
12. eats cheese expand (-6)
13. eats cheese expand (-7)
14. eats cheese sort
15. eats cheese expand (-8)
16. eats cheese sort
17. eats cheese scan (-13)
18. -s cheese scan (-14)
19. cheese scan (-11)
20. scan (-15)
21. accept
Table 2: MG top-down parse of the mouse eats cheese.

The parser is initialized with the input string to be processed and the MCFG start symbol — corresponding to the MG start symbol c — at the top of the priority queue. For each rule of the MCFG (11), the algorithm replaces its string variables by an index tuple that addresses the corresponding nodes in the derivation tree Fig. 2 Stabler11b . These indices allow for an ordering relation where shorter indices are smaller than longer ones, while indices of equal length are ordered lexicographically. As a consequence, the MCFG axioms in (11) become ordered according to their temporal appearance in the utterance. Using the notation of the derivation tree Fig. 2, we get

Hence, index sorting ensures incremental parsing.

Besides the occasional sorting of the syntactic priority queue, the automaton behaves as a conventional context-free top-down parser. When the first item in the queue is an MCFG category appearing at the left hand side of an MCFG rule, this item is expanded into the right hand side of that rule. When the first item in the queue is a predicted MCFG axiom whose exponent also appears on top of the input tape, this item is scanned from the input and thereby removed from queue and input simultaneously. Finally, if queue and input both contain only the empty word, the utterance has been successfully recognized and the parser terminates in the accepting state.

Interestingly, the index formalism leads to a straightforward implementation of the UMT’s semantic parser as well. The derivation tree Fig. 2 reveals that the index length correlates with the tree depth. In our example, the items and in the priority queue have the longest indices. These correspond precisely to the lambda terms and , respectively, that are unified by lambda application in derivation step (-5). Moreover, also the semantic analysis in Fig. 1(b) illustrates that the deepest nodes are semantically unified first.

Every time, when the syntactic parser scans a correctly predicted item from the input tape, this item is removed from both input tape and syntactic priority queue. Simultaneously, the semantic content of its sign is pushed on top of the semantic priority queue, yet preserving its index. When some or all semantic items are stored in the queue, they are sorted in reversed index order to get highest semantic priority on top of the queue. Table 3 illustrates the semantic parsing for the given example.

step input semantic queue operation
1. the mouse eats cheese scan (-16)
2. the mouse eats cheese apply
3. the mouse eats cheese scan (-12)
4. mouse eats cheese apply
5. mouse eats cheese scan (-10)
6. eats cheese scan (-13)
7. -s cheese scan (-14)
8. cheese apply
9. cheese scan (-11)
10. scan (-15)
11. sort
11. apply
12. apply
13. apply
14. apply
15. understand
Table 3: Semantic processing of the mouse eats cheese.

In analogy to the syntactic recognizer, the semantic parser operates in similar modes. Both processors share their common scan operation. In contrast to the syntactic parser which sorts indices in ascending order, the semantic module sorts them in descending order for operating on the deepest nodes in the derivation tree first. Most of the time, it attempts lambda application (apply) which is always preferred for -items on the queue. When apply has been sufficiently performed to a term, the last index number is removed from its index (sometimes it might also be necessary to exchange two items for lambda application). Finally, the semantic parser terminates in the understanding state.

3 Reinforcement Learning

Sofar we discussed how a cognitive agent, being either human or an intelligent machine, could produce and understand utterances that are described in terms of minimalist grammar. An MG is given by a mental lexicon as in example Tab. 1, encoding a large amount of linguistic expert knowledge. Therefore, it seems unlikely that speech-controlled user interfaces could be build and sold by engineering companies for little expenses.

Yet, it has been shown that MG are effectively learnable in the sense of Gold’s formal learning theory Gold67 . The studies BonatoRetore01 ; KobeleCollierEA02 ; StablerEA03 demonstrated how MG can be acquired by positive examples from linguistic dependence graphs BostonHaleKuhlmann10 ; Nivre03

. The required dependency structures can be extracted from linguistic corpora by means of big data machine learning techniques, such as the expectation maximization (EM) algorithm

KleinManning04 .

In our terminology, such statistical learning methods only consider correlations at the exponent level of linguistic signs. By contrast, in the present study we propose an alternative training algorithm that simultaneously analyzes similarities between exponents and semantic terms. Moreover, we exploit both positive and negative examples to obtain a better performance through reinforcement learning Skinner15 ; SuttonBarto18 .

The language learner is a cognitive agent in a state , to be identified with ’s mental lexicon at training time . At time , is initialized as a tabula rasa with empty lexicon


and exposed to UMPs produced by a teacher . Note that we assume presenting already complete UMPs and not singular utterances to . Thus we circumvent the symbol grounding problem of firstly assigning meanings to uttered exponents Harnad90 , which will be addressed in future research. Moreover, we assume that is instructed to reproduce ’s utterances based on its own semantic understanding. This provides a feedback loop and therefore applicability of reinforcement learning Skinner15 ; SuttonBarto18 . For our introductory example, we adopt the simple semantic model from Sect. 2. In each iteration, the teacher utters an UMP that should be learned by the learner.

First iteration

Let the teacher make the first utterance (2)

As long as is not able to detect patterns or common similarities in ’s UMPs, it simply adds new entries directly to its mental lexicon, assuming that all utterances are complex “:” and possessing base type c, i.e. the MG start symbol. Hence, ’s state evolves according to the update rule


when is the UMP presented at time by .

In this way, the mental lexicon shown in Tab. 4 has been acquired at time .

Table 4: Learned minimalist lexicon at time .
Second iteration

Next, let the teacher utter another proposition


Looking at

together, the agent’s pattern matching module is able to find similarities between exponents and semantics, underlined in (



Thus, creates two distinct items for the mouse and the rat, respectively, and carries out lambda abstraction to obtain the updated lexicon in Tab. 5.

Table 5: Learned minimalist lexicon at time .

Note that the induced variable symbol y and syntactic types d, c are completely arbitrary and do not have any particular meaning to the agent.

As indicated by underlines in Tab. 5, the exponents the mouse and the rat, could be further segmented through pattern matching, that is not reflected by their semantic counterparts, though. Therefore, a revised lexicon , displayed in Tab. 6 can be devised.

Table 6: Revised minimalist lexicon .

For closing the reinforcement cycle, is supposed to produce utterances upon its own understanding. Thus, we assume that wants to express the proposition . According to our discussion in Sect. 2.3, the corresponding signs are retrieved from the lexicon and processed through a valid derivation leading to the correct utterance the rat eats cheese, that is subsequently endorsed by .

Third iteration

In the third training session, the teacher’s utterance might be


Now we have to compare with the lexicon entry for eats cheese in (17).


Another lambda abstraction entails the lexicon in Tab. 7.

Table 7: Learned minimalist lexicon at time .

Here, the learner assumes that eats is a simple, lexical category without having further evidence as in Sect. 2.

Since is instructed to produce well-formed utterances, it could now generate a novel semantic representation, such as . This leads through data base query from the mental lexicon to the correct derivation (18) that is rewarded by .

Forth iteration

In the fourth iteration, we suppose that utters


that is unified with the previous lexicon through our pattern matching algorithm to yield in Tab. 8 in the first place.

Table 8: Learned minimalist lexicon at time .

Underlined are again common strings in exponents or semantics that could entail further revisions of the MG lexicon.

Next, let us assume that would express the meaning