Vector symbolic architectures for context-free grammars

03/11/2020 ∙ by Peter beim Graben, et al. ∙ 0

Background / introduction. Vector symbolic architectures (VSA) are a viable approach for the hyperdimensional representation of symbolic data, such as documents, syntactic structures, or semantic frames. Methods. We present a rigorous mathematical framework for the representation of phrase structure trees and parse-trees of context-free grammars (CFG) in Fock space, i.e. infinite-dimensional Hilbert space as being used in quantum field theory. We define a novel normal form for CFG by means of term algebras. Using a recently developed software toolbox, called FockBox, we construct Fock space representations for the trees built up by a CFG left-corner (LC) parser. Results. We prove a universal representation theorem for CFG term algebras in Fock space and illustrate our findings through a low-dimensional principal component projection of the LC parser states. Conclusions. Our approach could leverage the development of VSA for explainable artificial intelligence (XAI) by means of hyperdimensional deep neural computation. It could be of significance for the improvement of cognitive user interfaces and other applications of VSA in machine learning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Claude E. Shannon, the pioneer of information theory, presented in 1952 a “maze-solving machine” as one of the first proper technical cognitive systems Shannon (1953).111 See also Shannon’s instructive video demonstration at https://www.youtube.com/watch?v=vPKkXibQXGA. It comprises a maze in form of a rectangular board partitioned into discrete cells that are partially separated by removable walls, and a magnetized “mouse” (nicknamed “Theseus”, after the ancient Greek hero) as a cognitive agent. The mouse possesses as an actuator a motorized electromagnet beneath the maze board. The magnet pulls the mouse through the maze. Sensation and memory are implemented by a circuit of relays, switching their states after encounters with a wall. In this way, Shannon technically realized a simple, non-hierarchic perception-action cycle (PAC) Young (2010), depicted in Fig. 1 as a viable generalization of a cybernetic feedback loop.

In general, PAC form the core of a cognitive dynamic system Young (2010); Haykin (2012). They describe the interaction of a cognitive agent with a dynamically changing world as shown in Fig. 1. The agent is equipped with sensors for the perception of its current state in the environment and with actuators

allowing for active state changes. A central control prescribes goals and strategies for problem solving that could be trained by either trial-and-error learning as in Shannon’s construction, or, more generally, by reinforcement learning

Haykin (2012).

The World

Sensor Signals

Actuator Signals

Analysis

Synthesis

Strings

Strings

Interpretation

Articulation

Meaning

Meaning

Behavior Controller

Scope of Paper

Perception

Action
Figure 1: Hierarchical perception-action cycle (PAC) for a cognitive dynamic system. The scope of the present paper is indicated by the dashed boundary.

In Shannon’s mouse-maze system, the motor (the actuator) pulls the mouse along a path until it bumps into a wall which is registered by a sensor. This perception is stored by switching a relay, subsequently avoiding the corresponding action. The behavior control prescribes a certain maze cell where the agent may find a “piece of cheese” as a goal. When the goal is eventually reached, no further action is necessary. In a first run, the mouse follows an irregular path according to a trial-and-error strategy, while building up a memory trace in the relay array. In every further run, the successfully learned path is pursued at once. However, when the operator modifies the arrangement of walls, the previously learned path becomes useless and the agent has to learn from the very beginning. Therefore, Shannon (1953, p. 1238) concludes:

The maze-solver may be said to exhibit at a very primitive level the abilities to (1) solve problems by trial and error, (2) repeat the solutions without the errors, (3) add and correlate new information to a partial solution, (4) forget a solution when it is no longer applicable.

In Shannon’s original approach, the mouse learns by trial-and-error whenever it bumps into a wall. More sophisticated cognitive dynamic systems should be able to draw logical inferences and to communicate either with each other or with an external operator, respectively Römer et al. (2012). This requires higher levels of mental representations such as formal logics and grammars. Consider, e.g., the operator’s utterance:

the mouse ate cheese (1)

(note that symbols will be set in typewriter font in order to abstract from their conventional meaning in the first place). In the PAC described in Fig. 1, the acoustic signal has firstly to be analyzed in order to obtain a phonetic string representation. For understanding its meaning, the agent has secondly to process the utterance grammatically through syntactic parsing. Finally, the syntactic representation, e.g. in form of a phrase structure tree, must be interpreted as a semantic representation which the agent can ultimately understand Karttunen (1984). Depending upon such understanding, the agent can draw logical inferences and derive the appropriate behavior for controlling the actuators. In case of verbal behavior Skinner (2015), the agent therefore computes an appropriate response, first as a semantic representation, that is articulated into a syntactic and phonetic form and finally synthesized as an acoustic signal. In any case, high-level representations are symbolic and their processing is rule-driven, in contrast to low-level sensation and actuation where physical signals are essentially continuous.

Originally, Shannon used an array of relays as the agent’s memory. This has later been termed the “learning matrix” by Steinbuch and Schmitt (1967). Learning matrices and vector symbolic architectures

(VSA) provide a viable interface between hierarchically organized symbolic data structures such as phrase structure trees or semantic representations and continuous state space approaches as required for neural networks. Beginning with seminal studies by

Smolensky (1990) and Mizraji (1989), and later pursued by Plate (1995), beim Graben and Potthast (2009), and Kanerva (2009) among many others, those architectures have been dubbed VSA by Gayler (2006).

In a VSA, symbols and variables are represented as filler and role vectors of some underlying linear spaces, respectively. When a symbol is assigned to a variable, the corresponding filler vector is “bound” to the corresponding role vector. Different filler-role bindings can be “bundled” together to form a data structure, such as a list, a frame, or a table of a relational data base. Those structures can be recursively bound to other fillers and further bundled together to yield arbitrarily complex data structures.

VSA have recently been employed for semantic spaces Recchia et al. (2015), logical inferences Emruli et al. (2013); Widdows and Cohen (2014), data base queries Kleyko et al. (2016), and autoassociative memories Gritsenko et al. (2017); Mizraji et al. (2018). Wolff et al. (2018a) developed a VSA model for cognitive representations and their induction in Shannon’s mouse-maze system. In this article, we focus to the dashed region in Fig. 1, by elaborating earlier approaches for VSA language processors beim Graben et al. (2004, 2008a, 2008b); beim Graben and Gerth (2012); Carmantini et al. (2017). We present rigorous proofs for the vector space representations of context-free grammars (CFG) and push-down automata Hopcroft and Ullman (1979). To this end, we suggest a novel normal form for CFG, allowing to express CFG parse trees as terms over a symbolic term algebra. Rule-based derivations over that algebra are then represented as transformation matrices in Fock space Fock (1932); Aerts (2009).

Our approach could lead to the development of new machine learning algorithms for training neural networks as rule-based symbol processors. In contrast to black-box deep neural network models, our method is essentially transparent and hence explainable Doran et al. (2017).

2 Methods

We start from a symbolic, rule-based system that can be described in terms of formal grammar and automata theory. Specifically, we chose context-free grammars (CFG) and push-down automata as their processors here

Hopcroft and Ullman (1979). In the second step, we reformulate these languages through term algebras and their processing through partial functions over term algebras. We introduce a novel normal form for CFG, called term normal form, and prove that any CFG in Chomsky normal form can be transformed into term normal form. Finally, we introduce a vector symbolic architecture by assigning basis vectors of a high-dimensional linear space to the respective symbols and their roles in a phrase structure tree. We suggest a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and prove a representation theorem for the partial rule-based processing functions.

2.1 Context-free Grammars

Consider again the simple sentence (1) as a motivating example. According to linguistic theory, sentences such as (1) exhibit a hierarchical structure, indicating a logical subject-predicate relationship. In (1) “the mouse” appears as subject and the phrase “ate cheese” as the predicate, which is further organized into a transitive verb “ate” and its direct object “cheese”. The hierarchical structure of sentence (1) can therefore be either expressed through regular brackets, as in (2)

(2)

or, likewise as a phrase structure tree as in Fig. 2

[.S [.NP [.D the ].D [.N mouse ].N ].NP [.VP [.V ate ].V [.N cheese ].N ].VP ].S

Figure 2: Phrase structure tree of example sentence (1).

In Fig. 2 every internal node of the tree denotes a syntactic category: S stands for “sentence”, NP for “noun phrase”, the sentence’s subject, VP for “verbal phrase”, the predicate, D for “determiner”, N for “noun”, and V for “verb”.

The phrase structure tree Fig. 2 immediately gives rise to a context-free grammar (CFG) by interpreting every branch as a rewriting rule in Chomsky normal form Kracht (2003); Hopcroft and Ullman (1979)

S (3a)
NP (3b)
VP (3c)
D (3d)
N (3e)
V (3f)
N (3g)

where one distinguishes between syntactical rules (3a3c) and lexical rules (3d3g), respectively. More abstractly, a CFG is given as a quadruple , such that in our example is the set of words or terminal symbols, is the set of categories or nonterminal symbols, is the distinguished start symbol, and is a set of rules. A rule is usually written as a production where denotes a category and a finite string of terminals or categories of length .

Context-free grammars can be processed by push-down automata Hopcroft and Ullman (1979). Regarding psycholinguistic plausibilty, the left-corner (LC) parser is particularly relevant because input-driven bottom-up and expectation-driven top-down processes are tightly intermingled with each other Hale (2011). An LC parser possesses, such as any other push-down automaton, two memory tapes: firstly a working memory, called stack, operating in a last-in-first-out (LIFO) fashion, and an input tape storing the sentence to be processed.

In the most simple cases, when a given CFG does not contain ambiguities (as in (3a3g) for our example (1)), an LC parser can work deterministically. The LC parsing algorithm operates in four different modes: i) if nothing else is possible and if the input tape is not empty, the first word of the input is shifted into the stack; ii) if the first symbol in the stack is the left corner of a syntactic rule, the first stack symbol is rewritten by a predicted category (indicated by square brackets in Tab. 1) followed by the left-hand side of the rule (project); iii) if a category in the stack was correctly predicted, the matching symbols are removed from the stack (complete); iv) if the input tape is empty and the stack only contains the start symbol of the grammar, the automaton moves into the accepting state; otherwise, syntactic language processing had failed. Applying the LC algorithm to our example CFG leads to the symbolic process shown in Tab. 1.

step stack input operation
0 the mouse ate cheese shift
1 the mouse ate cheese project (3d)
2 D mouse ate cheese project (3b)
3 [N] NP mouse ate cheese shift
4 mouse [N] NP ate cheese project (3e)
5 N [N] NP ate cheese complete
6 NP ate cheese project (3a)
7 [VP] S ate cheese shift
8 ate [VP] S cheese project (3f)
9 V [VP] S cheese project (3c)
10 [N] VP [VP] S cheese shift
11 cheese [N] VP [VP] S project (3g)
12 N [N] VP [VP] S complete
13 VP [VP] S complete
15 S accept
Table 1: Left-corner parser processing the example sentence (1). The stack expands to the left.

The left-corner parser shown in Tab. 1 essentially operates autonomously in modes project, complete and accept, but interactively in shift mode. Thus, we can significantly simplify the parsing process through a mapping from one intermediary automaton configuration to another one that is mediated by the interactively shifted input word beim Graben et al. (2008a); Wegner (1998). Expressing the configurations as temporary phrase structure trees yields then the symbolic computation in Fig. 3.

Figure 3: Interactive LC parse of the example sentence (1).

According to our previous definitions, the states of the processor are the automaton configurations in Tab. 1 or the temporary phrase structures trees in Fig. 3, that are both interpretable in terms of LC parsing and language processing for an informed expert observer. Moreover, the processing steps in the last column of Tab. 1 and also the interactive mappings Fig. 3 are understandable and thereby explainable by the observer. In principle, one could augment the left-corner parser with a “reasoning engine” Doran et al. (2017) that translates the formal language used in those symbolic representations into everyday language. The result would be something like the (syntactic) “meaning” of a word that can be regarded as the operator mapping a tree in Fig. 3 to its successor. This interactive interpretation of meaning is well-known in dynamic semantics Gärdenfors (1988); Groenendijk and Stokhof (1991); Kracht (2002); beim Graben (2014). Therefore, symbolic AI is straightforwardly interpretable and explainable Doran et al. (2017).

2.2 Algebraic Description

In order to prepare the construction of a vector symbolic architecture (VSA) Dolan and Smolensky (1989); Mizraji (1989); Plate (1995); Gayler (2006); Kanerva (2009); beim Graben and Potthast (2009) in the next step, we need an algebraically more sophisticated description. This is provided by the concept of a term algebra beim Graben and Gerth (2012); Kracht (2003). A term algebra is defined over a signature where is a finite set of function symbols and is an arity function, assigning to each symbol an integer indicating the number of arguments that has to take.

To apply this idea to a CFG, we introduce a new kind of grammar normal form that we call term normal form in the following. A CFG is said to be in term normal form if for every category holds: if is expanded into rules, to , then .

It can be easily demonstrated that every CFG can be transformed into a weakly equivalent CFG in term normal form, where weak equivalence means that two different grammars derive the same context-free language. A proof is presented in Appendix 6.1.

Obviously, the rules (3a3c) of our example above are already in term normal form, simply because they are not ambiguous. Thus, we define a term algebra by regarding the set of variables as signature with arity function such that i) for all , i.e. terminals are nullary symbols and hence constants; ii) for categories , that are expanded through rules . Moreover, when is given in Chomsky normal form, for all categories appearing exclusively in lexical rules , i.e. lexical categories (D, N, V) are unary functions. Whereas, for all categories that appear exclusively in syntactic rules, which are hence binary functions.

For a general CFG in term normal form, we define the term algebra inductively. i) every terminal symbol is a term, . ii) Let be a category with and let be terms, then is a term. Additionally, we want to describe LC phrase structure trees as well. To this end, we extend the signature by the predicted categories , that are interpreted as constants with for . The enlarged term algebra is denoted by . We also allow for .

In the LC term algebra , we encode the tree of step 1 in Fig. 3 (beginning with the empty tree in step 0) as term

(4)

because , , and . Likewise we obtain

(5)

as the term representation of the succeeding step 2 in Fig. 3.

Next, we define several partial functions over as follows beim Graben and Gerth (2012); Smolensky (2006).

(6a)
(6b)
(6c)

Here, the function yields the category, i.e. the function symbol of the term . The functions for term extraction and as term constructor are defined only partially, when , if and , as well as , if .

By means of the term transformations (6a6c) we can express the action of an incrementally and interactively shifted word through a term operator . For the transition from, e.g., LC tree 1 to LC tree 2 in Fig. 3 we obtain

(7)

Therefore, the (syntactic) meaning of the word “mouse” is its impact on the symbolic term algebra.

2.3 Vector Symbolic Architectures

In vector-symbolic architectures (VSA) Dolan and Smolensky (1989); Mizraji (1989); Plate (1995); Gayler (2006); Kanerva (2009); beim Graben and Potthast (2009)

hierarchically organized complex data structures are represented as vectors in high dimensional linear spaces. The composition of these structures is achieved by two basic operations: binding and bundling. While bundling is commonly implemented as vector superposition, i.e. addition, different VSA realize binding in particular ways: originally through tensor products

Dolan and Smolensky (1989); Smolensky (1990); Mizraji (1989, 1992), through circular convolution in reduced holographic representations (HRR) Plate (1995), through XOR spatter code Kanerva (1994) or through Hadamard products Levy and Gayler (2008). While HRR, spatter code, Hadamard products or a combination of tensor products with nonlinear compression Smolensky (2006) are lossy representations that require a clean-up module (usually an attractor neural network, cf. Kanerva (2009)), tensor product representations of basis vectors are faithful, thereby allowing interpretable and explainable VSA Doran et al. (2017).

Coming back to our linguistic example, we construct a homomorphism from the term algebra unified with its categories to a vector space in such a way, that the structure of the transformations (6a6c) is preserved. The resulting images for terms become vector space operators, i.e. essentially matrices acting on .

Again, we proceed inductively. First we map the symbols in onto vectors. To each atomic symbol we assign a so-called filler basis vector , calling the subspace the filler space. Its dimension corresponds to the number of atomic symbols in , which is in our example.

Let further be the length of the largest production of grammar . Then, we define so-called role vectors , spanning the role space . Note that we employ the so-called Dirac notation from quantum mechanics that allows a coordinate-free and hence representation-independent description here Dirac (1939). Then, the role denotes the 1st daughter node, the 2nd daugther and so on, until the last daughter . The remaining role bounds the mother node in the phrase structure trees of grammar . In our example, because has Chomsky normal form, we have such that there are three roles for positions in a binary branching tree: left daughter , right daughter , and mother . For binary trees, we also use a more intuitive symbolic notation: left daughter , right daughter , and mother .

Let be a term. Then, we define the tensor product representation of in vector space recursively as follows

(8)

As a shorthand notation, we suggest the Dirac expression

(9)

Here the symbol “” refers to the (Kronecker) tensor product, mapping two vectors onto another vector, in contrast to the dyadic (outer) tensor product, which yields a matrix, which is a vector space operator. In addition, “” denotes the (outer) direct sum that is mandatory for the superposition of vectors from spaces with different dimensionality.

Obviously, the (in principle) infinite recursion of the mapping leads to an infinite-dimensional representation space

(10)

that is known as Fock space from quantum field theory Fock (1932); Aerts (2009); beim Graben and Potthast (2009); Smolensky (2012).

In quantum field theory, there is a distinguished state , the vacuum state, spanning a one-dimensional subspace, the vacuum sector that is isomorphic to the underlying number field. According to (10), this sector is contained in the subspace spanned by filler and role spaces, . Therefore, we could represent the empty tree in Fig. 3 by an arbitrary role; a suitable choice is the mother role , hence symbolizing the vacuum state.

Using the tensor product representation (8), we can recursively compute the images of our example terms above. For (4) we obtain

(11)

where we used the compressed Dirac notation in the last steps. The last line is easily interpretable in terms of phrase structure: It simply states that NP occupies the root of the tree, D appears as its immediate left daughter, the is the left daughter’s left daughter and a leave, and finally [N] is a leave bound to the right daughter of the root. Note that the Dirac kets have to be interpreted from the right to the left (reading the arabic way). The vector belongs to a Fock subspace of dimension

(12)

where , and the embedding depth in the phrase structure tree step 1 of Fig. 3. This leads to for .

Similarly, we get for (5)

(13)

where we have again utilized the more intuitive branching notation in the last line which can be straightforwardly interpreted in terms of tree addresses as depicted in Fig. 3 (step 2). Computing the dimension of the respective Fock subspace according to (12) yields for .

In Fock space, the interactive and incremental action of a word is then represented as a matrix operator . For the transition from (4) to (5) we obtain

(14)

In order to prove a homomorphism, we define the following linear maps on .

(15a)
(15b)
(15c)

here, denotes the unit operator (i.e. the unit matrix) and the Dirac “bra” vectors are linear forms from the dual role space that are adjoined to the role “ket” vectors such that with Kronecker’s for .

By means of these homomorphisms we compute the meaning of “mouse” as Fock space operator through

(16)

Inserting (15a15c) yields

(17)

where we have expanded as in (13) above. Note that the meaning of “mouse” crucially depends on the given state subjected to the operator , making meaning highly contextual. This is an important feature of dynamic semantics as well Gärdenfors (1988); Groenendijk and Stokhof (1991); Kracht (2002); beim Graben (2014).

3 Results

The main result of this study is a Fock space representation theorem for vector symbolic architectures of context-free grammars that follows directly from the definitions (15a15c) and is proven in Appendix 6.2.

The tensor product representation is a homomorphism with respect to the term transformations (6a6c). It holds

(18a)
(18b)
(18c)

For the particular example discussed above, we obtain the Fock space trajectory in Tab. 2.

# Fock vector dim operation
0 16 shift the
1 172 shift mouse
2 523 shift ate
3 523 shift cheese
4 523 accept
Table 2: Fock space representation of LC parser processing the example sentence (1).

Moreover, we present the complete Fock space LC parse generated by FockBox which is a MATLAB toolbox provided by Wolff et al. (2018b)

as its three-dimensional projection after principal component analysis (PCA

beim Graben et al. (2008a); beim Graben and Gerth (2012); Wolff et al. (2018b)) in Fig. 4 as illustration.

Figure 4: Principal component (PC) projection of the LC parser’s Fock space representation. Shown are the first three PCs.

4 Discussion

In this article we developed a representation theory for context-free grammars and push-down automata in Fock space as a vector symbolic architecture (VSA). We presented rigorous proofs for the representations of suitable term algebras. To this end, we suggested a novel normal form for CFG allowing to express CFG parse trees as terms over a symbolic term algebra. Rule-based derivations over that algebra are then represented as transformation matrices in Fock space.

Motivated by a seminal study of Shannon (1953) on cognitive dynamic systems Haykin (2012), our work could be of significance for levering research on cognitive user interfaces (CUI) Young (2010); Duckhorn et al. (2017); Huber et al. (2018); Tschöpe et al. (2018). Such systems are subject of ambitious current research. Instead of using keyboards and displays as input-output interfaces, users pronounce requests or instructions to a device as spoken language and listen to its uttered responses. To this aim, state-of-the-art language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic frames Minsky (1974) to interpret the user’s intent. This slot filling procedure Allen (2003); Tur et al. (2011); Mesnil et al. (2015)

is based on large language corpora that are evaluated by machine learning methods, such as deep learning of neural networks

Mesnil et al. (2015). The necessity to overcome traditional slot filling techniques by proper semantic analyses technologies has already been emphasized by Allen (2017). His research group trains semantic parsers from large language data bases such as WordNet or VerbNet that are constrained by hand-crafted expert knowledge and semantic ontologies Allen (2003); Allen et al. (2018).

Another road toward realistic CUI systems is the development of utterance-meaning transducers (UMT) that map syntactic representations obtained from the speech signal onto semantic representations in terms of feature value relations (FVR) Karttunen (1984); Huber et al. (2018). This is achieved through a perception action cycle, comprising the three components: perception, action and behavior control. The perception module transforms the input from the signal layer to the semantic symbolic layer, the module for behavior control solves decision problems based on semantic information and computes appropriate actions. Finally, the action module executes the result by producing acoustic feedback. Behavior control can flexibly adapt to user’s demands through reinforcement learning.

For the implementation of rule-based symbolic computations in cognitive dynamic systems, such as neural networks, VSA provide a viable approach. Our results contribute a formally sound basis for this kind of future research and engineering.

5 Conclusion

We reformulated context-free grammars (CFG) through term algebras and their processing through push-down automata by partial functions over term algebras. We introduced a novel normal form for CFG, called term normal form, and proved that any CFG in Chomsky normal form can be transformed into term normal form. Finally, we introduced a vector symbolic architecture (VSA) by assigning basis vectors of a high-dimensional linear space to the respective symbols and their roles in a phrase structure tree. We suggested a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and proved a representation theorem for the partial rule-based processing functions. We illustrated our findings by an interactive left-corner parser and used FockBox, a freely accessible MATLAB toolbox, for the generation and visualization of Fock space VSA. Our approach directly encodes symbolic, rule-based knowledge into the hyperdimensional computing framework of VSA and can thereby supply substantial insights into the future development of explainable artifical intelligence (XAI).

Conflict of interest

The authors declare that they have no conflict of interest.

References

  • Shannon [1953] C. E. Shannon. Computers and automata. Proceedings of the Institute of Radio Engineering, 41(10):1234 – 1241, 1953.
  • Young [2010] Steve Young. Cognitive user interfaces. IEEE Signal Processing Magazine, 27(3):128–140, 2010.
  • Haykin [2012] S. Haykin. Cognitive Dynamic Systems. Cambridge University Press, 2012.
  • Römer et al. [2012] R. Römer, P. beim Graben, M. Huber, M. Wolff, G. Wirsching and I. Schmitt. Behavioral control of cognitive agents using database semantics and minimalist grammars. IEEE Conference on Cognitive Info-Communication, Naples, 2019.
  • Skinner [2015] B. F. Skinner. Verbal Behavior. Martino Publishing, Mansfield Centre (CT), 2015. 1st Edition 1957.
  • Steinbuch and Schmitt [1967] K. Steinbuch and E. Schmitt. Adaptive systems using learning matrices. In H. L. Oestericicher and D. R. Moore, editors, Biocybernetics in Avionics, pages 751 – 768. Gordon and Breach, New York, 1967. Reprinted in J. A. Anderson, Pellionisz and E. Rosenfeld (1990), pp. 65ff.
  • Smolensky [1990] P. Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1-2):159 – 216, 1990.
  • Mizraji [1989] E. Mizraji. Context-dependent associations in linear distributed memories. Bulletin of Mathematical Biology, 51(2):195 – 205, 1989.
  • Plate [1995] T. A. Plate. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623 – 641, 1995.
  • beim Graben and Potthast [2009] P. beim Graben and R. Potthast. Inverse problems in dynamic cognitive modeling. Chaos, 19(1):015103, 2009.
  • Kanerva [2009] P. Kanerva.

    Hyperdimensional computing: An introduction to computing in distributed representation with high-dimensional random vectors.

    Cognitive Computation, 1(2):139 – 159, 2009.
  • Gayler [2006] R. W. Gayler. Vector symbolic architectures are a viable alternative for Jackendoff’s challenges. Behavioral and Brain Sciences, 29:78 – 79, 2 2006.
  • Recchia et al. [2015] G. Recchia, M. Sahlgren, P. Kanerva, and M. N. Jones. Encoding sequential information in semantic space models: Comparing holographic reduced representation and random permutation. Computational Intelligence and Neuroscience, 2015:58, 2015.
  • Emruli et al. [2013] B. Emruli, R. W. Gayler, and F. Sandin. Analogical mapping and inference with binary spatter codes and sparse distributed memory. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1 – 8, 2013.
  • Widdows and Cohen [2014] D. Widdows and T. Cohen. Reasoning with vectors: A continuous model for fast robust inference. Logic Journal of the IGPL, 23(2):141 – 173, 11 2014.
  • Kleyko et al. [2016] D. Kleyko, E. Osipov, and R. W. Gayler. Recognizing permuted words with vector symbolic architectures: A Cambridge test for machines. Procedia Computer Science, 88:169 – 175, 2016. Proceedings of the 7th Annual International Conference on Biologically Inspired Cognitive Architectures (BICA 2016).
  • Gritsenko et al. [2017] V. I. Gritsenko, D. A. Rachkovskij, A. A. Frolov, R. Gayler, D. Kleyko, and E. Osipov. Neural distributed autoassociative memories : A survey. Cybernetics and Computer Engineering Journal, 188(2):5 – 35, 2017.
  • Mizraji et al. [2018] E. Mizraji, A. Pomi, and J. Lin. Improving neural models of language with input-output tensor contexts. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 430 – 440, Cham, 2018. Springer.
  • Wolff et al. [2018a] M. Wolff, M. Huber, G. Wirsching, R. Römer, P. beim Graben, and I. Schmitt. Towards a quantum mechanical model of the inner stage of cognitive agents. In IEEE International Conference on Cognitive Infocommunications (CogInfoCom), volume 9, pages 147 – 152, 2018a.
  • beim Graben et al. [2004] P. beim Graben, B. Jurish, D. Saddy, and S. Frisch. Language processing by dynamical systems. International Journal of Bifurcation and Chaos, 14(2):599 – 621, 2004.
  • beim Graben et al. [2008a] P. beim Graben, S. Gerth, and S. Vasishth. Towards dynamical system models of language-related brain potentials. Cognitive Neurodynamics, 2(3):229 – 255, 2008a.
  • beim Graben et al. [2008b] P. beim Graben, D. Pinotsis, D. Saddy, and R. Potthast. Language processing with dynamic fields. Cognitive Neurodynamics, 2(2):79 – 88, 2008b.
  • beim Graben and Gerth [2012] P. beim Graben and S. Gerth. Geometric representations for minimalist grammars. Journal of Logic, Language and Information, 21(4):393 – 432, 2012.
  • Carmantini et al. [2017] G. S. Carmantini, P. beim Graben, M. Desroches, and S. Rodrigues.

    A modular architecture for transparent computation in recurrent neural networks.

    Neural Networks, 85:85 – 105, 2017.
  • Hopcroft and Ullman [1979] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison–Wesley, Menlo Park, California, 1979.
  • Fock [1932] V. Fock. Konfigurationsraum und zweite Quantelung. Zeitschrift für Physik, 75(9):622 – 647, 1932.
  • Aerts [2009] D. Aerts. Quantum structure in cognition. Journal of Mathematical Psychology, 53(5):314 – 348, 2009.
  • Doran et al. [2017] D. Doran, S. Schulz, and T. R. Besold. What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794 [cs.AI], 2017.
  • Kracht [2003] M. Kracht. The Mathematics of Language. Number 63 in Studies in Generative Grammar. Mouton de Gruyter, Berlin, 2003.
  • Hale [2011] J. T. Hale. What a rational parser would do. Cognitive Science, 35(3):399 – 443, 2011.
  • Wegner [1998] P. Wegner. Interactive foundations of computing. Theoretical Computer Science, 192:315 – 351, 1998.
  • Dolan and Smolensky [1989] C. P. Dolan and P. Smolensky. Tensor product production system: A modular architecture and representation. Connection Science, 1(1):53 – 68, 1989.
  • Gärdenfors [1988] P. Gärdenfors. Knowledge in Flux. Modeling the Dynamics of Epistemic States. MIT Press, Cambridge (MA), 1988.
  • Groenendijk and Stokhof [1991] J. Groenendijk and M. Stokhof. Dynamic predicate logic. Linguistics and Philosophy, 14(1):39 – 100, 1991.
  • Kracht [2002] M. Kracht. Dynamic semantics. Linguistische Berichte, Sonderheft X:217 – 241, 2002.
  • beim Graben [2014] P. beim Graben. Order effects in dynamic semantics. Topics in Cognitive Science, 6(1):67 – 73, 2014.
  • Mizraji [1992] E. Mizraji. Vector logics: The matrix-vector representation of logical calculus. Fuzzy Sets and Systems, 50:179 – 185, 1992.
  • Kanerva [1994] P. Kanerva. The binary spatter code for encoding concepts at many levels. In M. Marinaro and P. Morasso, editors, Proceedings of International Conference on Artificial Neural Networks (ICANN 1994), volume 1, pages 226 – 229, London, 1994. Springer.
  • Levy and Gayler [2008] S. D. Levy and R. Gayler. Vector Symbolic Architectures: A new building material for artificial general intelligence. In Proceedings of the Conference on Artificial General Intelligence, pages 414 – 418, 2008.
  • Smolensky [2006] P. Smolensky. Harmony in linguistic cognition. Cognitive Science, 30:779 – 801, 2006.
  • Dirac [1939] P. A. M. Dirac. A new notation for quantum mechanics. Mathematical Proceedings of the Cambridge Philosophical Society, 35(3):416 – 418, 1939.
  • Smolensky [2012] P. Smolensky. Symbolic functions from neural computation. Philosophical Transactions of the Royal Society London, A 370(1971):3543 – 3569, 2012.
  • Wolff et al. [2018b] M. Wolff, G. Wirsching, M. Huber, P. beim Graben, R. Römer, and I. Schmitt. A Fock space toolbox and some applications in computational cognition. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 757 – 767, Cham, 2018b. Springer.
  • Duckhorn et al. [2017] F. Duckhorn, M. Huber, W. Meyer, O. Jokisch, C. Tschöpe, and M. Wolff. Towards an autarkic embedded cognitive user interface. In F. Lacerda, editor, Proceedings of the Interspeech Conference Stockholm, pages 3435 – 3436, 2017.
  • Huber et al. [2018] M. Huber, M. Wolff, W. Meyer, O. Jokisch, and K. Nowack. Some design aspects of a cognitive user interface. Online Journal of Applied Knowledge Management, 6(1):15 – 29, 2018.
  • Tschöpe et al. [2018] C. Tschöpe, F. Duckhorn, M. Huber, W. Meyer, and M. Wolff. A cognitive user interface for a multi-modal human-machine interaction. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 707 – 717, Cham, 2018. Springer.
  • Minsky [1974] M. Minsky. A framework for representing knowledge. Technical Report AIM-306, M.I.T., Cambridge (MA), 1974.
  • Allen [2003] J. F. Allen. Natural language processing. In Encyclopedia of Computer Science, pages 1218 – 1222. Wiley, Chichester (UK), 2003.
  • Tur et al. [2011] G. Tur, D. Hakkani-Tür, L. Heck, and S. Parthasarathy. Sentence simplification for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5628 – 5631, 2011.
  • Mesnil et al. [2015] G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. Hakkani-Tur, X. He, L. Heck, G. Tur, D. Yu, and G. Zweig. Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech and Language Processing, 23(3):530 – 539, 2015.
  • Allen [2017] J. Allen. Dialogue as collaborative problem solving. In Proceedings of Interspeech Conference, page 833, 2017.
  • Allen et al. [2018] J. F. Allen, O. Bahkshandeh, W. de Beaumont, L. Galescu, and C. M. Teng. Effective broad-coverage deep parsing. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  • Karttunen [1984] L. Karttunen. Features and values. In Proceedings of the 10th International Conference on Computational Linguistics, pages 28 – 33, Stroudsburg (PA), 1984. Association for Computational Linguistics (ACL).

6 Appendix

6.1 Proof of Term Normal Form

Definition 1 (Context-Free Grammer)

A context-free grammar (CFG) is a quadruple with a set of terminals , a set of nonterminals , the start symbol and a set of rules . A rule is usually written as a production .

Definition 2 (Chomsky Normal Form)

According to Hopcroft and Ullman [1979] a CFG is said to be in Chomsky normal form iff every production is one of

(19a)
(19b)
(19c)

with , and .

It is a known fact, that for every CFG there is an equivalent CFG in Chomsky normal form Hopcroft and Ullman [1979]. It is also known that if does not produce the empty string — absence of production (19c) — then there is an equivalent CFG in Chomsky reduced form Hopcroft and Ullman [1979].

Definition 3 (Chomsky Reduced Form)

A CFG is said to be in Chomsky reduced form iff every production is one of

(20a)
(20b)

with and .

By utilizing some of the construction steps for establishing Chomsky normal form from Hopcroft and Ullman [1979] we deduce

Corollary 1

For every CFG in Chomsky reduced form there is an equivalent CFG in Chomsky normal form without a rule corresponding to production (19c).

Proof

Let be a CFG in Chomsky reduced form. Clearly does not produce the empty string. The only difference to Chomsky normal form is the allowed presence of the start symbol on the right-hand side of rules in . By introducing a new start symbol and inserting rules we eliminate this presence and obtain an equivalent CFG in Chomsky normal form without a production of form (19c). ∎

Definition 4 (Term Normal Form)

A CFG is said to be in term normal form iff and for every two rules and

holds.

We state and proof by construction:

Theorem 6.1

For every CFG not producing the empty string there is an equivalent CFG in term normal form.

Proof

Let be a CFG not producing the empty string. Let be the equivalent CFG in Chomsky reduced form and be the set of all nonterminals from which have productions of both forms (20a) and (20b).

We establish term normal form by applying the following transformations to :

  1. For every nonterminal let be the rules corresponding to productions of form (20a) and be the rules corresponding to productions of form (20b). We add

    1. new nonterminals and ,

    2. a new rule for every rule and

    3. a new rule for every rule .

    Finally, we remove all rules from .

  2. For every nonterminal let be the set of rules where appears at first position on the right-hand side. For every rule we add

    1. a new rule and

    2. a new rule .

    Finally, we remove all rules from .

  3. For every nonterminal let be the set of rules where appears at second position on the right-hand side. For every rule we add

    1. a new rule and

    2. a new rule .

    Finally, we remove all rules from .

  4. If then we add

    1. a new start symbol ,

    2. a new rule and

    3. a new rule .

  5. Finally, we remove from . ∎

We immediately deduce

Corollary 2

For every CFG only producing strings of either exactly length or at least length there is an equivalent CFG in term normal form which is also in Chomsky normal form.

Proof

We handle the two cases separately.

Case 1

Let be a CFG producing strings of exactly length . Since does not produce the empty string there is an equivalent CFG in Chomsky reduced form where every rule is of form (20b) and the only nonterminal being the start symbol. Obviously, is in Chomsky normal form and also in term normal form.

Case 2

Let be a CFG producing strings of at least length . Since does not produce the empty string there is an equivalent CFG in Chomsky reduced form and from corollary 1 follows that there is an equivalent CFG in Chomsky normal form. Applying the construction from theorem 6.1 to this CFG leads to a CFG in term normal formal. Since does not produce strings of length step 4 is omitted by the construction and stays in Chomsky normal form. ∎

We also state the opposite direction.

Corollary 3

Every CFG for which an equivalent CFG in Chomsky normal form exists which is also in term normal form, produces either only strings of length or at least of length .

Proof

Let be a CFG in Chomsky normal form and term normal form at the same time. Clearly, does not produce the empty string. Let be the set of rules with the start symbols on the left side. Since is in term normal form we have to consider the following two cases.

Case 1

Let be a rule where . Then every rule in the set has to be of the same form. It follows that only produces strings of length .

Case 2

Let be a rule with . Then every rule in the set has to be of the same form. It follows that strings produced by have to be at least of length . ∎

We instantly deduce

Theorem 6.2

Those CFGs for which a Chomsky normal form in term normal exists are exactly the CFGs producing either only strings of length or strings with at least length .

which follows directly from corollaries 2 and 3.

6.2 Proof of Representation Theorem

The proof of the Fock space representation theorem for vector symbolic architectures follows from direct calculation using the definition of the tensor product representation (9).

Proof