1 Introduction
Claude E. Shannon, the pioneer of information theory, presented in 1952 a “mazesolving machine” as one of the first proper technical cognitive systems Shannon (1953).^{1}^{1}1 See also Shannon’s instructive video demonstration at https://www.youtube.com/watch?v=vPKkXibQXGA. It comprises a maze in form of a rectangular board partitioned into discrete cells that are partially separated by removable walls, and a magnetized “mouse” (nicknamed “Theseus”, after the ancient Greek hero) as a cognitive agent. The mouse possesses as an actuator a motorized electromagnet beneath the maze board. The magnet pulls the mouse through the maze. Sensation and memory are implemented by a circuit of relays, switching their states after encounters with a wall. In this way, Shannon technically realized a simple, nonhierarchic perceptionaction cycle (PAC) Young (2010), depicted in Fig. 1 as a viable generalization of a cybernetic feedback loop.
In general, PAC form the core of a cognitive dynamic system Young (2010); Haykin (2012). They describe the interaction of a cognitive agent with a dynamically changing world as shown in Fig. 1. The agent is equipped with sensors for the perception of its current state in the environment and with actuators
allowing for active state changes. A central control prescribes goals and strategies for problem solving that could be trained by either trialanderror learning as in Shannon’s construction, or, more generally, by reinforcement learning
Haykin (2012).In Shannon’s mousemaze system, the motor (the actuator) pulls the mouse along a path until it bumps into a wall which is registered by a sensor. This perception is stored by switching a relay, subsequently avoiding the corresponding action. The behavior control prescribes a certain maze cell where the agent may find a “piece of cheese” as a goal. When the goal is eventually reached, no further action is necessary. In a first run, the mouse follows an irregular path according to a trialanderror strategy, while building up a memory trace in the relay array. In every further run, the successfully learned path is pursued at once. However, when the operator modifies the arrangement of walls, the previously learned path becomes useless and the agent has to learn from the very beginning. Therefore, Shannon (1953, p. 1238) concludes:
The mazesolver may be said to exhibit at a very primitive level the abilities to (1) solve problems by trial and error, (2) repeat the solutions without the errors, (3) add and correlate new information to a partial solution, (4) forget a solution when it is no longer applicable.
In Shannon’s original approach, the mouse learns by trialanderror whenever it bumps into a wall. More sophisticated cognitive dynamic systems should be able to draw logical inferences and to communicate either with each other or with an external operator, respectively Römer et al. (2012). This requires higher levels of mental representations such as formal logics and grammars. Consider, e.g., the operator’s utterance:
the mouse ate cheese  (1) 
(note that symbols will be set in typewriter font in order to abstract from their conventional meaning in the first place). In the PAC described in Fig. 1, the acoustic signal has firstly to be analyzed in order to obtain a phonetic string representation. For understanding its meaning, the agent has secondly to process the utterance grammatically through syntactic parsing. Finally, the syntactic representation, e.g. in form of a phrase structure tree, must be interpreted as a semantic representation which the agent can ultimately understand Karttunen (1984). Depending upon such understanding, the agent can draw logical inferences and derive the appropriate behavior for controlling the actuators. In case of verbal behavior Skinner (2015), the agent therefore computes an appropriate response, first as a semantic representation, that is articulated into a syntactic and phonetic form and finally synthesized as an acoustic signal. In any case, highlevel representations are symbolic and their processing is ruledriven, in contrast to lowlevel sensation and actuation where physical signals are essentially continuous.
Originally, Shannon used an array of relays as the agent’s memory. This has later been termed the “learning matrix” by Steinbuch and Schmitt (1967). Learning matrices and vector symbolic architectures
(VSA) provide a viable interface between hierarchically organized symbolic data structures such as phrase structure trees or semantic representations and continuous state space approaches as required for neural networks. Beginning with seminal studies by
Smolensky (1990) and Mizraji (1989), and later pursued by Plate (1995), beim Graben and Potthast (2009), and Kanerva (2009) among many others, those architectures have been dubbed VSA by Gayler (2006).In a VSA, symbols and variables are represented as filler and role vectors of some underlying linear spaces, respectively. When a symbol is assigned to a variable, the corresponding filler vector is “bound” to the corresponding role vector. Different fillerrole bindings can be “bundled” together to form a data structure, such as a list, a frame, or a table of a relational data base. Those structures can be recursively bound to other fillers and further bundled together to yield arbitrarily complex data structures.
VSA have recently been employed for semantic spaces Recchia et al. (2015), logical inferences Emruli et al. (2013); Widdows and Cohen (2014), data base queries Kleyko et al. (2016), and autoassociative memories Gritsenko et al. (2017); Mizraji et al. (2018). Wolff et al. (2018a) developed a VSA model for cognitive representations and their induction in Shannon’s mousemaze system. In this article, we focus to the dashed region in Fig. 1, by elaborating earlier approaches for VSA language processors beim Graben et al. (2004, 2008a, 2008b); beim Graben and Gerth (2012); Carmantini et al. (2017). We present rigorous proofs for the vector space representations of contextfree grammars (CFG) and pushdown automata Hopcroft and Ullman (1979). To this end, we suggest a novel normal form for CFG, allowing to express CFG parse trees as terms over a symbolic term algebra. Rulebased derivations over that algebra are then represented as transformation matrices in Fock space Fock (1932); Aerts (2009).
Our approach could lead to the development of new machine learning algorithms for training neural networks as rulebased symbol processors. In contrast to blackbox deep neural network models, our method is essentially transparent and hence explainable Doran et al. (2017).
2 Methods
We start from a symbolic, rulebased system that can be described in terms of formal grammar and automata theory. Specifically, we chose contextfree grammars (CFG) and pushdown automata as their processors here
Hopcroft and Ullman (1979). In the second step, we reformulate these languages through term algebras and their processing through partial functions over term algebras. We introduce a novel normal form for CFG, called term normal form, and prove that any CFG in Chomsky normal form can be transformed into term normal form. Finally, we introduce a vector symbolic architecture by assigning basis vectors of a highdimensional linear space to the respective symbols and their roles in a phrase structure tree. We suggest a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and prove a representation theorem for the partial rulebased processing functions.2.1 Contextfree Grammars
Consider again the simple sentence (1) as a motivating example. According to linguistic theory, sentences such as (1) exhibit a hierarchical structure, indicating a logical subjectpredicate relationship. In (1) “the mouse” appears as subject and the phrase “ate cheese” as the predicate, which is further organized into a transitive verb “ate” and its direct object “cheese”. The hierarchical structure of sentence (1) can therefore be either expressed through regular brackets, as in (2)
(2) 
or, likewise as a phrase structure tree as in Fig. 2
In Fig. 2 every internal node of the tree denotes a syntactic category: S stands for “sentence”, NP for “noun phrase”, the sentence’s subject, VP for “verbal phrase”, the predicate, D for “determiner”, N for “noun”, and V for “verb”.
The phrase structure tree Fig. 2 immediately gives rise to a contextfree grammar (CFG) by interpreting every branch as a rewriting rule in Chomsky normal form Kracht (2003); Hopcroft and Ullman (1979)
S  (3a)  
NP  (3b)  
VP  (3c)  
D  (3d)  
N  (3e)  
V  (3f)  
N  (3g) 
where one distinguishes between syntactical rules (3a – 3c) and lexical rules (3d – 3g), respectively. More abstractly, a CFG is given as a quadruple , such that in our example is the set of words or terminal symbols, is the set of categories or nonterminal symbols, is the distinguished start symbol, and is a set of rules. A rule is usually written as a production where denotes a category and a finite string of terminals or categories of length .
Contextfree grammars can be processed by pushdown automata Hopcroft and Ullman (1979). Regarding psycholinguistic plausibilty, the leftcorner (LC) parser is particularly relevant because inputdriven bottomup and expectationdriven topdown processes are tightly intermingled with each other Hale (2011). An LC parser possesses, such as any other pushdown automaton, two memory tapes: firstly a working memory, called stack, operating in a lastinfirstout (LIFO) fashion, and an input tape storing the sentence to be processed.
In the most simple cases, when a given CFG does not contain ambiguities (as in (3a – 3g) for our example (1)), an LC parser can work deterministically. The LC parsing algorithm operates in four different modes: i) if nothing else is possible and if the input tape is not empty, the first word of the input is shifted into the stack; ii) if the first symbol in the stack is the left corner of a syntactic rule, the first stack symbol is rewritten by a predicted category (indicated by square brackets in Tab. 1) followed by the lefthand side of the rule (project); iii) if a category in the stack was correctly predicted, the matching symbols are removed from the stack (complete); iv) if the input tape is empty and the stack only contains the start symbol of the grammar, the automaton moves into the accepting state; otherwise, syntactic language processing had failed. Applying the LC algorithm to our example CFG leads to the symbolic process shown in Tab. 1.
step  stack  input  operation 

0  the mouse ate cheese  shift  
1  the  mouse ate cheese  project (3d) 
2  D  mouse ate cheese  project (3b) 
3  [N] NP  mouse ate cheese  shift 
4  mouse [N] NP  ate cheese  project (3e) 
5  N [N] NP  ate cheese  complete 
6  NP  ate cheese  project (3a) 
7  [VP] S  ate cheese  shift 
8  ate [VP] S  cheese  project (3f) 
9  V [VP] S  cheese  project (3c) 
10  [N] VP [VP] S  cheese  shift 
11  cheese [N] VP [VP] S  project (3g)  
12  N [N] VP [VP] S  complete  
13  VP [VP] S  complete  
15  S  accept 
The leftcorner parser shown in Tab. 1 essentially operates autonomously in modes project, complete and accept, but interactively in shift mode. Thus, we can significantly simplify the parsing process through a mapping from one intermediary automaton configuration to another one that is mediated by the interactively shifted input word beim Graben et al. (2008a); Wegner (1998). Expressing the configurations as temporary phrase structure trees yields then the symbolic computation in Fig. 3.
According to our previous definitions, the states of the processor are the automaton configurations in Tab. 1 or the temporary phrase structures trees in Fig. 3, that are both interpretable in terms of LC parsing and language processing for an informed expert observer. Moreover, the processing steps in the last column of Tab. 1 and also the interactive mappings Fig. 3 are understandable and thereby explainable by the observer. In principle, one could augment the leftcorner parser with a “reasoning engine” Doran et al. (2017) that translates the formal language used in those symbolic representations into everyday language. The result would be something like the (syntactic) “meaning” of a word that can be regarded as the operator mapping a tree in Fig. 3 to its successor. This interactive interpretation of meaning is wellknown in dynamic semantics Gärdenfors (1988); Groenendijk and Stokhof (1991); Kracht (2002); beim Graben (2014). Therefore, symbolic AI is straightforwardly interpretable and explainable Doran et al. (2017).
2.2 Algebraic Description
In order to prepare the construction of a vector symbolic architecture (VSA) Dolan and Smolensky (1989); Mizraji (1989); Plate (1995); Gayler (2006); Kanerva (2009); beim Graben and Potthast (2009) in the next step, we need an algebraically more sophisticated description. This is provided by the concept of a term algebra beim Graben and Gerth (2012); Kracht (2003). A term algebra is defined over a signature where is a finite set of function symbols and is an arity function, assigning to each symbol an integer indicating the number of arguments that has to take.
To apply this idea to a CFG, we introduce a new kind of grammar normal form that we call term normal form in the following. A CFG is said to be in term normal form if for every category holds: if is expanded into rules, to , then .
It can be easily demonstrated that every CFG can be transformed into a weakly equivalent CFG in term normal form, where weak equivalence means that two different grammars derive the same contextfree language. A proof is presented in Appendix 6.1.
Obviously, the rules (3a – 3c) of our example above are already in term normal form, simply because they are not ambiguous. Thus, we define a term algebra by regarding the set of variables as signature with arity function such that i) for all , i.e. terminals are nullary symbols and hence constants; ii) for categories , that are expanded through rules . Moreover, when is given in Chomsky normal form, for all categories appearing exclusively in lexical rules , i.e. lexical categories (D, N, V) are unary functions. Whereas, for all categories that appear exclusively in syntactic rules, which are hence binary functions.
For a general CFG in term normal form, we define the term algebra inductively. i) every terminal symbol is a term, . ii) Let be a category with and let be terms, then is a term. Additionally, we want to describe LC phrase structure trees as well. To this end, we extend the signature by the predicted categories , that are interpreted as constants with for . The enlarged term algebra is denoted by . We also allow for .
In the LC term algebra , we encode the tree of step 1 in Fig. 3 (beginning with the empty tree in step 0) as term
(4) 
because , , and . Likewise we obtain
(5) 
as the term representation of the succeeding step 2 in Fig. 3.
Next, we define several partial functions over as follows beim Graben and Gerth (2012); Smolensky (2006).
(6a)  
(6b)  
(6c) 
Here, the function yields the category, i.e. the function symbol of the term . The functions for term extraction and as term constructor are defined only partially, when , if and , as well as , if .
By means of the term transformations (6a – 6c) we can express the action of an incrementally and interactively shifted word through a term operator . For the transition from, e.g., LC tree 1 to LC tree 2 in Fig. 3 we obtain
(7) 
Therefore, the (syntactic) meaning of the word “mouse” is its impact on the symbolic term algebra.
2.3 Vector Symbolic Architectures
In vectorsymbolic architectures (VSA) Dolan and Smolensky (1989); Mizraji (1989); Plate (1995); Gayler (2006); Kanerva (2009); beim Graben and Potthast (2009)
hierarchically organized complex data structures are represented as vectors in high dimensional linear spaces. The composition of these structures is achieved by two basic operations: binding and bundling. While bundling is commonly implemented as vector superposition, i.e. addition, different VSA realize binding in particular ways: originally through tensor products
Dolan and Smolensky (1989); Smolensky (1990); Mizraji (1989, 1992), through circular convolution in reduced holographic representations (HRR) Plate (1995), through XOR spatter code Kanerva (1994) or through Hadamard products Levy and Gayler (2008). While HRR, spatter code, Hadamard products or a combination of tensor products with nonlinear compression Smolensky (2006) are lossy representations that require a cleanup module (usually an attractor neural network, cf. Kanerva (2009)), tensor product representations of basis vectors are faithful, thereby allowing interpretable and explainable VSA Doran et al. (2017).Coming back to our linguistic example, we construct a homomorphism from the term algebra unified with its categories to a vector space in such a way, that the structure of the transformations (6a – 6c) is preserved. The resulting images for terms become vector space operators, i.e. essentially matrices acting on .
Again, we proceed inductively. First we map the symbols in onto vectors. To each atomic symbol we assign a socalled filler basis vector , calling the subspace the filler space. Its dimension corresponds to the number of atomic symbols in , which is in our example.
Let further be the length of the largest production of grammar . Then, we define socalled role vectors , spanning the role space . Note that we employ the socalled Dirac notation from quantum mechanics that allows a coordinatefree and hence representationindependent description here Dirac (1939). Then, the role denotes the 1st daughter node, the 2nd daugther and so on, until the last daughter . The remaining role bounds the mother node in the phrase structure trees of grammar . In our example, because has Chomsky normal form, we have such that there are three roles for positions in a binary branching tree: left daughter , right daughter , and mother . For binary trees, we also use a more intuitive symbolic notation: left daughter , right daughter , and mother .
Let be a term. Then, we define the tensor product representation of in vector space recursively as follows
(8) 
As a shorthand notation, we suggest the Dirac expression
(9) 
Here the symbol “” refers to the (Kronecker) tensor product, mapping two vectors onto another vector, in contrast to the dyadic (outer) tensor product, which yields a matrix, which is a vector space operator. In addition, “” denotes the (outer) direct sum that is mandatory for the superposition of vectors from spaces with different dimensionality.
Obviously, the (in principle) infinite recursion of the mapping leads to an infinitedimensional representation space
(10) 
that is known as Fock space from quantum field theory Fock (1932); Aerts (2009); beim Graben and Potthast (2009); Smolensky (2012).
In quantum field theory, there is a distinguished state , the vacuum state, spanning a onedimensional subspace, the vacuum sector that is isomorphic to the underlying number field. According to (10), this sector is contained in the subspace spanned by filler and role spaces, . Therefore, we could represent the empty tree in Fig. 3 by an arbitrary role; a suitable choice is the mother role , hence symbolizing the vacuum state.
Using the tensor product representation (8), we can recursively compute the images of our example terms above. For (4) we obtain
(11) 
where we used the compressed Dirac notation in the last steps. The last line is easily interpretable in terms of phrase structure: It simply states that NP occupies the root of the tree, D appears as its immediate left daughter, the is the left daughter’s left daughter and a leave, and finally [N] is a leave bound to the right daughter of the root. Note that the Dirac kets have to be interpreted from the right to the left (reading the arabic way). The vector belongs to a Fock subspace of dimension
(12) 
where , and the embedding depth in the phrase structure tree step 1 of Fig. 3. This leads to for .
Similarly, we get for (5)
(13) 
where we have again utilized the more intuitive branching notation in the last line which can be straightforwardly interpreted in terms of tree addresses as depicted in Fig. 3 (step 2). Computing the dimension of the respective Fock subspace according to (12) yields for .
In Fock space, the interactive and incremental action of a word is then represented as a matrix operator . For the transition from (4) to (5) we obtain
(14) 
In order to prove a homomorphism, we define the following linear maps on .
(15a)  
(15b)  
(15c) 
here, denotes the unit operator (i.e. the unit matrix) and the Dirac “bra” vectors are linear forms from the dual role space that are adjoined to the role “ket” vectors such that with Kronecker’s for .
By means of these homomorphisms we compute the meaning of “mouse” as Fock space operator through
(16) 
(17) 
where we have expanded as in (13) above. Note that the meaning of “mouse” crucially depends on the given state subjected to the operator , making meaning highly contextual. This is an important feature of dynamic semantics as well Gärdenfors (1988); Groenendijk and Stokhof (1991); Kracht (2002); beim Graben (2014).
3 Results
The main result of this study is a Fock space representation theorem for vector symbolic architectures of contextfree grammars that follows directly from the definitions (15a – 15c) and is proven in Appendix 6.2.
The tensor product representation is a homomorphism with respect to the term transformations (6a – 6c). It holds
(18a)  
(18b)  
(18c) 
For the particular example discussed above, we obtain the Fock space trajectory in Tab. 2.
#  Fock vector  dim  operation 

0  16  shift the  
1  172  shift mouse  
2  523  shift ate  
3  523  shift cheese  
4  523  accept 
Moreover, we present the complete Fock space LC parse generated by FockBox which is a MATLAB toolbox provided by Wolff et al. (2018b)
as its threedimensional projection after principal component analysis (PCA
beim Graben et al. (2008a); beim Graben and Gerth (2012); Wolff et al. (2018b)) in Fig. 4 as illustration.4 Discussion
In this article we developed a representation theory for contextfree grammars and pushdown automata in Fock space as a vector symbolic architecture (VSA). We presented rigorous proofs for the representations of suitable term algebras. To this end, we suggested a novel normal form for CFG allowing to express CFG parse trees as terms over a symbolic term algebra. Rulebased derivations over that algebra are then represented as transformation matrices in Fock space.
Motivated by a seminal study of Shannon (1953) on cognitive dynamic systems Haykin (2012), our work could be of significance for levering research on cognitive user interfaces (CUI) Young (2010); Duckhorn et al. (2017); Huber et al. (2018); Tschöpe et al. (2018). Such systems are subject of ambitious current research. Instead of using keyboards and displays as inputoutput interfaces, users pronounce requests or instructions to a device as spoken language and listen to its uttered responses. To this aim, stateoftheart language technology scans the acoustically analyzed speech signal for relevant keywords that are subsequently inserted into semantic frames Minsky (1974) to interpret the user’s intent. This slot filling procedure Allen (2003); Tur et al. (2011); Mesnil et al. (2015)
is based on large language corpora that are evaluated by machine learning methods, such as deep learning of neural networks
Mesnil et al. (2015). The necessity to overcome traditional slot filling techniques by proper semantic analyses technologies has already been emphasized by Allen (2017). His research group trains semantic parsers from large language data bases such as WordNet or VerbNet that are constrained by handcrafted expert knowledge and semantic ontologies Allen (2003); Allen et al. (2018).Another road toward realistic CUI systems is the development of utterancemeaning transducers (UMT) that map syntactic representations obtained from the speech signal onto semantic representations in terms of feature value relations (FVR) Karttunen (1984); Huber et al. (2018). This is achieved through a perception action cycle, comprising the three components: perception, action and behavior control. The perception module transforms the input from the signal layer to the semantic symbolic layer, the module for behavior control solves decision problems based on semantic information and computes appropriate actions. Finally, the action module executes the result by producing acoustic feedback. Behavior control can flexibly adapt to user’s demands through reinforcement learning.
For the implementation of rulebased symbolic computations in cognitive dynamic systems, such as neural networks, VSA provide a viable approach. Our results contribute a formally sound basis for this kind of future research and engineering.
5 Conclusion
We reformulated contextfree grammars (CFG) through term algebras and their processing through pushdown automata by partial functions over term algebras. We introduced a novel normal form for CFG, called term normal form, and proved that any CFG in Chomsky normal form can be transformed into term normal form. Finally, we introduced a vector symbolic architecture (VSA) by assigning basis vectors of a highdimensional linear space to the respective symbols and their roles in a phrase structure tree. We suggested a recursive function for mapping CFG phrase structure trees onto representation vectors in Fock space and proved a representation theorem for the partial rulebased processing functions. We illustrated our findings by an interactive leftcorner parser and used FockBox, a freely accessible MATLAB toolbox, for the generation and visualization of Fock space VSA. Our approach directly encodes symbolic, rulebased knowledge into the hyperdimensional computing framework of VSA and can thereby supply substantial insights into the future development of explainable artifical intelligence (XAI).
Conflict of interest
The authors declare that they have no conflict of interest.
References
 Shannon [1953] C. E. Shannon. Computers and automata. Proceedings of the Institute of Radio Engineering, 41(10):1234 – 1241, 1953.
 Young [2010] Steve Young. Cognitive user interfaces. IEEE Signal Processing Magazine, 27(3):128–140, 2010.
 Haykin [2012] S. Haykin. Cognitive Dynamic Systems. Cambridge University Press, 2012.
 Römer et al. [2012] R. Römer, P. beim Graben, M. Huber, M. Wolff, G. Wirsching and I. Schmitt. Behavioral control of cognitive agents using database semantics and minimalist grammars. IEEE Conference on Cognitive InfoCommunication, Naples, 2019.
 Skinner [2015] B. F. Skinner. Verbal Behavior. Martino Publishing, Mansfield Centre (CT), 2015. 1st Edition 1957.
 Steinbuch and Schmitt [1967] K. Steinbuch and E. Schmitt. Adaptive systems using learning matrices. In H. L. Oestericicher and D. R. Moore, editors, Biocybernetics in Avionics, pages 751 – 768. Gordon and Breach, New York, 1967. Reprinted in J. A. Anderson, Pellionisz and E. Rosenfeld (1990), pp. 65ff.
 Smolensky [1990] P. Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(12):159 – 216, 1990.
 Mizraji [1989] E. Mizraji. Contextdependent associations in linear distributed memories. Bulletin of Mathematical Biology, 51(2):195 – 205, 1989.
 Plate [1995] T. A. Plate. Holographic reduced representations. IEEE Transactions on Neural Networks, 6(3):623 – 641, 1995.
 beim Graben and Potthast [2009] P. beim Graben and R. Potthast. Inverse problems in dynamic cognitive modeling. Chaos, 19(1):015103, 2009.

Kanerva [2009]
P. Kanerva.
Hyperdimensional computing: An introduction to computing in distributed representation with highdimensional random vectors.
Cognitive Computation, 1(2):139 – 159, 2009.  Gayler [2006] R. W. Gayler. Vector symbolic architectures are a viable alternative for Jackendoff’s challenges. Behavioral and Brain Sciences, 29:78 – 79, 2 2006.
 Recchia et al. [2015] G. Recchia, M. Sahlgren, P. Kanerva, and M. N. Jones. Encoding sequential information in semantic space models: Comparing holographic reduced representation and random permutation. Computational Intelligence and Neuroscience, 2015:58, 2015.
 Emruli et al. [2013] B. Emruli, R. W. Gayler, and F. Sandin. Analogical mapping and inference with binary spatter codes and sparse distributed memory. In Proceedings of the International Joint Conference on Neural Networks (IJCNN), pages 1 – 8, 2013.
 Widdows and Cohen [2014] D. Widdows and T. Cohen. Reasoning with vectors: A continuous model for fast robust inference. Logic Journal of the IGPL, 23(2):141 – 173, 11 2014.
 Kleyko et al. [2016] D. Kleyko, E. Osipov, and R. W. Gayler. Recognizing permuted words with vector symbolic architectures: A Cambridge test for machines. Procedia Computer Science, 88:169 – 175, 2016. Proceedings of the 7th Annual International Conference on Biologically Inspired Cognitive Architectures (BICA 2016).
 Gritsenko et al. [2017] V. I. Gritsenko, D. A. Rachkovskij, A. A. Frolov, R. Gayler, D. Kleyko, and E. Osipov. Neural distributed autoassociative memories : A survey. Cybernetics and Computer Engineering Journal, 188(2):5 – 35, 2017.
 Mizraji et al. [2018] E. Mizraji, A. Pomi, and J. Lin. Improving neural models of language with inputoutput tensor contexts. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 430 – 440, Cham, 2018. Springer.
 Wolff et al. [2018a] M. Wolff, M. Huber, G. Wirsching, R. Römer, P. beim Graben, and I. Schmitt. Towards a quantum mechanical model of the inner stage of cognitive agents. In IEEE International Conference on Cognitive Infocommunications (CogInfoCom), volume 9, pages 147 – 152, 2018a.
 beim Graben et al. [2004] P. beim Graben, B. Jurish, D. Saddy, and S. Frisch. Language processing by dynamical systems. International Journal of Bifurcation and Chaos, 14(2):599 – 621, 2004.
 beim Graben et al. [2008a] P. beim Graben, S. Gerth, and S. Vasishth. Towards dynamical system models of languagerelated brain potentials. Cognitive Neurodynamics, 2(3):229 – 255, 2008a.
 beim Graben et al. [2008b] P. beim Graben, D. Pinotsis, D. Saddy, and R. Potthast. Language processing with dynamic fields. Cognitive Neurodynamics, 2(2):79 – 88, 2008b.
 beim Graben and Gerth [2012] P. beim Graben and S. Gerth. Geometric representations for minimalist grammars. Journal of Logic, Language and Information, 21(4):393 – 432, 2012.

Carmantini et al. [2017]
G. S. Carmantini, P. beim Graben, M. Desroches, and S. Rodrigues.
A modular architecture for transparent computation in recurrent neural networks.
Neural Networks, 85:85 – 105, 2017.  Hopcroft and Ullman [1979] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. Addison–Wesley, Menlo Park, California, 1979.
 Fock [1932] V. Fock. Konfigurationsraum und zweite Quantelung. Zeitschrift für Physik, 75(9):622 – 647, 1932.
 Aerts [2009] D. Aerts. Quantum structure in cognition. Journal of Mathematical Psychology, 53(5):314 – 348, 2009.
 Doran et al. [2017] D. Doran, S. Schulz, and T. R. Besold. What does explainable AI really mean? A new conceptualization of perspectives. arXiv:1710.00794 [cs.AI], 2017.
 Kracht [2003] M. Kracht. The Mathematics of Language. Number 63 in Studies in Generative Grammar. Mouton de Gruyter, Berlin, 2003.
 Hale [2011] J. T. Hale. What a rational parser would do. Cognitive Science, 35(3):399 – 443, 2011.
 Wegner [1998] P. Wegner. Interactive foundations of computing. Theoretical Computer Science, 192:315 – 351, 1998.
 Dolan and Smolensky [1989] C. P. Dolan and P. Smolensky. Tensor product production system: A modular architecture and representation. Connection Science, 1(1):53 – 68, 1989.
 Gärdenfors [1988] P. Gärdenfors. Knowledge in Flux. Modeling the Dynamics of Epistemic States. MIT Press, Cambridge (MA), 1988.
 Groenendijk and Stokhof [1991] J. Groenendijk and M. Stokhof. Dynamic predicate logic. Linguistics and Philosophy, 14(1):39 – 100, 1991.
 Kracht [2002] M. Kracht. Dynamic semantics. Linguistische Berichte, Sonderheft X:217 – 241, 2002.
 beim Graben [2014] P. beim Graben. Order effects in dynamic semantics. Topics in Cognitive Science, 6(1):67 – 73, 2014.
 Mizraji [1992] E. Mizraji. Vector logics: The matrixvector representation of logical calculus. Fuzzy Sets and Systems, 50:179 – 185, 1992.
 Kanerva [1994] P. Kanerva. The binary spatter code for encoding concepts at many levels. In M. Marinaro and P. Morasso, editors, Proceedings of International Conference on Artificial Neural Networks (ICANN 1994), volume 1, pages 226 – 229, London, 1994. Springer.
 Levy and Gayler [2008] S. D. Levy and R. Gayler. Vector Symbolic Architectures: A new building material for artificial general intelligence. In Proceedings of the Conference on Artificial General Intelligence, pages 414 – 418, 2008.
 Smolensky [2006] P. Smolensky. Harmony in linguistic cognition. Cognitive Science, 30:779 – 801, 2006.
 Dirac [1939] P. A. M. Dirac. A new notation for quantum mechanics. Mathematical Proceedings of the Cambridge Philosophical Society, 35(3):416 – 418, 1939.
 Smolensky [2012] P. Smolensky. Symbolic functions from neural computation. Philosophical Transactions of the Royal Society London, A 370(1971):3543 – 3569, 2012.
 Wolff et al. [2018b] M. Wolff, G. Wirsching, M. Huber, P. beim Graben, R. Römer, and I. Schmitt. A Fock space toolbox and some applications in computational cognition. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 757 – 767, Cham, 2018b. Springer.
 Duckhorn et al. [2017] F. Duckhorn, M. Huber, W. Meyer, O. Jokisch, C. Tschöpe, and M. Wolff. Towards an autarkic embedded cognitive user interface. In F. Lacerda, editor, Proceedings of the Interspeech Conference Stockholm, pages 3435 – 3436, 2017.
 Huber et al. [2018] M. Huber, M. Wolff, W. Meyer, O. Jokisch, and K. Nowack. Some design aspects of a cognitive user interface. Online Journal of Applied Knowledge Management, 6(1):15 – 29, 2018.
 Tschöpe et al. [2018] C. Tschöpe, F. Duckhorn, M. Huber, W. Meyer, and M. Wolff. A cognitive user interface for a multimodal humanmachine interaction. In A. Karpov, O. Jokisch, and R. Potapova, editors, Speech and Computer, pages 707 – 717, Cham, 2018. Springer.
 Minsky [1974] M. Minsky. A framework for representing knowledge. Technical Report AIM306, M.I.T., Cambridge (MA), 1974.
 Allen [2003] J. F. Allen. Natural language processing. In Encyclopedia of Computer Science, pages 1218 – 1222. Wiley, Chichester (UK), 2003.
 Tur et al. [2011] G. Tur, D. HakkaniTür, L. Heck, and S. Parthasarathy. Sentence simplification for spoken language understanding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5628 – 5631, 2011.
 Mesnil et al. [2015] G. Mesnil, Y. Dauphin, K. Yao, Y. Bengio, L. Deng, D. HakkaniTur, X. He, L. Heck, G. Tur, D. Yu, and G. Zweig. Using recurrent neural networks for slot filling in spoken language understanding. IEEE Transactions on Audio, Speech and Language Processing, 23(3):530 – 539, 2015.
 Allen [2017] J. Allen. Dialogue as collaborative problem solving. In Proceedings of Interspeech Conference, page 833, 2017.
 Allen et al. [2018] J. F. Allen, O. Bahkshandeh, W. de Beaumont, L. Galescu, and C. M. Teng. Effective broadcoverage deep parsing. In Proceedings of the ThirtySecond AAAI Conference on Artificial Intelligence, 2018.
 Karttunen [1984] L. Karttunen. Features and values. In Proceedings of the 10th International Conference on Computational Linguistics, pages 28 – 33, Stroudsburg (PA), 1984. Association for Computational Linguistics (ACL).
6 Appendix
6.1 Proof of Term Normal Form
Definition 1 (ContextFree Grammer)
A contextfree grammar (CFG) is a quadruple with a set of terminals , a set of nonterminals , the start symbol and a set of rules . A rule is usually written as a production .
Definition 2 (Chomsky Normal Form)
According to Hopcroft and Ullman [1979] a CFG is said to be in Chomsky normal form iff every production is one of
(19a)  
(19b)  
(19c) 
with , and .
It is a known fact, that for every CFG there is an equivalent CFG in Chomsky normal form Hopcroft and Ullman [1979]. It is also known that if does not produce the empty string — absence of production (19c) — then there is an equivalent CFG in Chomsky reduced form Hopcroft and Ullman [1979].
Definition 3 (Chomsky Reduced Form)
A CFG is said to be in Chomsky reduced form iff every production is one of
(20a)  
(20b) 
with and .
By utilizing some of the construction steps for establishing Chomsky normal form from Hopcroft and Ullman [1979] we deduce
Corollary 1
For every CFG in Chomsky reduced form there is an equivalent CFG in Chomsky normal form without a rule corresponding to production (19c).
Proof
Let be a CFG in Chomsky reduced form. Clearly does not produce the empty string. The only difference to Chomsky normal form is the allowed presence of the start symbol on the righthand side of rules in . By introducing a new start symbol and inserting rules we eliminate this presence and obtain an equivalent CFG in Chomsky normal form without a production of form (19c). ∎
Definition 4 (Term Normal Form)
A CFG is said to be in term normal form iff and for every two rules and
holds.
We state and proof by construction:
Theorem 6.1
For every CFG not producing the empty string there is an equivalent CFG in term normal form.
Proof
Let be a CFG not producing the empty string. Let be the equivalent CFG in Chomsky reduced form and be the set of all nonterminals from which have productions of both forms (20a) and (20b).
We establish term normal form by applying the following transformations to :

For every nonterminal let be the set of rules where appears at first position on the righthand side. For every rule we add

a new rule and

a new rule .
Finally, we remove all rules from .


For every nonterminal let be the set of rules where appears at second position on the righthand side. For every rule we add

a new rule and

a new rule .
Finally, we remove all rules from .


If then we add

a new start symbol ,

a new rule and

a new rule .


Finally, we remove from . ∎
We immediately deduce
Corollary 2
For every CFG only producing strings of either exactly length or at least length there is an equivalent CFG in term normal form which is also in Chomsky normal form.
Proof
We handle the two cases separately.
Case 1
Let be a CFG producing strings of exactly length . Since does not produce the empty string there is an equivalent CFG in Chomsky reduced form where every rule is of form (20b) and the only nonterminal being the start symbol. Obviously, is in Chomsky normal form and also in term normal form.
Case 2
Let be a CFG producing strings of at least length . Since does not produce the empty string there is an equivalent CFG in Chomsky reduced form and from corollary 1 follows that there is an equivalent CFG in Chomsky normal form. Applying the construction from theorem 6.1 to this CFG leads to a CFG in term normal formal. Since does not produce strings of length step 4 is omitted by the construction and stays in Chomsky normal form. ∎
We also state the opposite direction.
Corollary 3
Every CFG for which an equivalent CFG in Chomsky normal form exists which is also in term normal form, produces either only strings of length or at least of length .
Proof
Let be a CFG in Chomsky normal form and term normal form at the same time. Clearly, does not produce the empty string. Let be the set of rules with the start symbols on the left side. Since is in term normal form we have to consider the following two cases.
Case 1
Let be a rule where . Then every rule in the set has to be of the same form. It follows that only produces strings of length .
Case 2
Let be a rule with . Then every rule in the set has to be of the same form. It follows that strings produced by have to be at least of length . ∎
We instantly deduce
Theorem 6.2
Those CFGs for which a Chomsky normal form in term normal exists are exactly the CFGs producing either only strings of length or strings with at least length .
6.2 Proof of Representation Theorem
The proof of the Fock space representation theorem for vector symbolic architectures follows from direct calculation using the definition of the tensor product representation (9).
Proof
∎