1 Overview: SLang, SNet and SRep
Embedding words and sentences in vector spaces has brought many symbolic tasks (especially in NLP) within the scope of deep neural network (DNN) models
(Hinton, 1988; Palangi et al., 2016; Pollack, 1990; Socher et al., 2010; Weston et al., 2015). In general, DNNs may be expected to benefit if they can incorporate some of the power of symbolic computation without compromising the power of deep learning. The problem of embedding general symbol structures in vector spaces, and performing symbolic computation with these vectors, has been addressed theoretically, but these methods can require very large embedding spaces — e.g., Tensor Product Representation, TPR
(Lee et al., 2016; Smolensky, 1990; Smolensky & Legendre, 2006) — or major errorcorrection/cleanup processes — e.g., Holographic Reduced Representation, HRR (Crawford et al., 2016; Plate, 1993, 2002, 2003) (Sec. 4; see also Kanerva (2009); Touretzky (1990)). We show here (Sec. 3) that deep learning can itself discover satisfactory methods of embedding general symbol structures, methods that operate in relatively small vector spaces without need for cleanup processing.We define a general formal language scheme in which expressions denote symbol structures. Such a language will be called an SLang (Sec. 2). Information within these structures is accessed by evaluating query expressions within the language. The model that learns to encode structuredenoting expressions and to evaluate queries over these structures (Zaremba & Sutskever, 2014) is a simple bidirectional encoderdecoder model that operates on symbols in the formal language one at a time (Cho et al., 2014). We call such a model an SNet (Sec. 3), and call the vector embedding of an SLang learned by an SNet an SRep.
2 The task: Embedding general symbolic structures in vector spaces and accessing their contents
Symbolstructuredenoting expressions: Structuralrole binding. In general, a symbol structure can be characterized as a set of symbols each bound to a role that it plays in the structure (Newell, 1980, 141). The method is applicable to any type of symbol structure, but we focus on binary trees here. The simple binary tree consists of the symbols respectively bound to the roles , where is the role of leftchild (‘0’) of rightchild (‘1’) of root, etc. Symbolic structural roles are typically recursive. The recursive character of binary tree roles can be seen by viewing as the symbol bound to the role . In the simple formal language we develop here, SLang, the tree will be denoted by the expression , where respectively abbreviate the roles . The grammar of SLang is shown in Fig. 1.
Querydenoting expressions: Structuralrole unbinding. A minimal requirement for a vector embedding a symbol structure is that it be possible to extract, with vector computation, (the embedding of) the symbol that is bound to any specified role. In SLang (see Fig. 1), the query denoted asks for the structure (possibly a single symbol), bound to the role denoted ; this is roleunbinding. Thus the expression asks for the structure filling the role in , i.e., the structure forming the left child of the root, which in happens to be a single symbol, . Similarly, the expression asks for the left child of the right child of the root, and so has the value . An SLang query can return a structure rather than a single symbol. The expression asks for the right child of the root of , which is the structure , denoting ’s right subtree, .
Expressions combining querying and structurebuilding. The general expression in SLang allows structure that is returned by queries to be used to build new structures. Table 1 provides examples of expressions correctly evaluated by SNet.
Expression  Value  Type 

binding  
unbind  
unbind (not found)  
3bind, unbind, rebind  
4nested, unbind 
3 The SNet model and experimental results
SNet is a standard bidirectional encoderdecoder network where the output of the bidirectional LSTM encoder is the SRep embedding of the input SLang expression. The SRep vector is then fed as input to an LSTM decoder. Some implementation details are given in Table 2, which also gives the results of training SNet on randomlyselected input/output pairs.
Accu  Test per  Train  Mini  Batch  Hidden  Drop  Learn  Optim  Attention; 
racy  plexity  loss  batches  size  dimension  out  ing rate  izer  Beam 
96.16  1.02  0.187  54K  128  128  0.2  0.001  Adam  None 
Performance and hyperparameters of the trained
SNet model4 Analysis of SRep: The Superposition Principle
The Superposition Principle in theoretical structureembedding schemes. Theoretical solutions to performing the task defined in Sec. 2 were proposed in the previous generation of neural network modeling. Two general solutions, TPR and HRR, were introduced in Sec. 1. The TPR embedding of a symbol structure with symbols respectively bound to roles is , where denotes the tensor (generalized outer) product and and are embeddings of the symbols and roles, with respective dimensions and ; the dimension of the TPR itself is then .
If the roleembedding vectors
are linearly independent, when collected together they form an invertible matrix
; the rows of are the “unbinding” vectors : so these vectors can be used to unbind the roles in a TPR. The symbol that fills role in structure is exactly the symbol with embedding .A crucial property of TPR is that the embedding of a structure is the sum over embeddings of its symbols. This is TPR’s Superposition Principle. This is what enables extraction of symbols from any binding: since .
HRRs are essentially contracted TPRs (Smolensky & Legendre, 2006, 260). The equation defining TPR() also defines HRR(), provided is reinterpreted to denote circular convolution: . Assuming the elements of the
are randomly (typically, normally) distributed, each roleembedding vector
can be used as its own unbinding vector. However the HRR unbinding equation holds only approximately: . This noise must be eliminated by ‘cleanup’ processes. Note that, like TPR, HRR obeys the Superposition Principle.Testing the Superposition Principle in the learned representation. As a test of whether the Superposition Principle holds of SRep, let denote the SRep vector embedding of SLang expression , and consider expressions containing two symbol/role bindings, such as . Then if the Superposition Principle holds, we have^{1}^{1}1The simpler equation does not hold in SRep; it appears that the manifolds of one and twobinding embeddings are distinct. Eq. 1 is designed to stay within the latter. Eq. 1 is analogous to the famous equation of Mikolov et al. (2013). To make the analogy exact, let the roles ‘gender’, ‘status’ be denoted and let be denoted . Then , i.e., Eq. 1. :

.

.
5 Conclusion
A standard bidirectional encoderdecoder model can generate vector embeddings of expressions denoting complex symbol structures and can successfully query the content of such representations. Like theoretical techniques for accomplishing this, the learned representation obeys the Superposition Principle (approximately; at least within the manifold of embeddings of twobinding expressions).
References
 Cho et al. (2014) Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. Empirical Methods in Natural Language Processing2014, abs/1406.1078, 2014.
 Crawford et al. (2016) Eric Crawford, Matthew Gingerich, and Chris Eliasmith. Biologically plausible, humanscale knowledge representation. Cognitive Science, 40(4):782–821, 2016.
 Hinton (1988) Geoffrey E. Hinton. Representing partwhole hierarchies in connectionist networks. In Proceedings of the Tenth Annual Conference of the Cognitive Science Society, pp. 48–54. 1988.

Kanerva (2009)
Pentti Kanerva.
Hyperdimensional computing: An introduction to computing in distributed representation with highdimensional random vectors.
Cognitive Computing, 1:139–159, 2009.  Lee et al. (2016) Moontae Lee, Xiaodong He, Wentau Yih, Jianfeng Gao, Li Deng, and Paul Smolensky. Reasoning in vector space: An exploratory study of question answering. In Proceedings of the International Conference on Learning Representations2016, 2016.
 Mikolov et al. (2013) Tomas Mikolov, Scott Wentau Yih, and Geoffrey Zweig. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACLHLT2013). May 2013.
 Newell (1980) Allen Newell. Physical symbol systems. Cognitive Science, 4(1):135–183, 1980.

Palangi et al. (2016)
Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen,
Xinying Song, and Rabab Ward.
Deep sentence embedding using long shortterm memory networks: Analysis and application to information retrieval.
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 24(4):694–707, 2016.  Plate (1993) Tony Plate. Holographic Recurrent Networks. In Stephen José Hanson and C Lee Giles (eds.), Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo, CA, 1993.
 Plate (2002) Tony Plate. Distributed representations. Encyclopedia of Cognitive Science, 2002.
 Plate (2003) Tony Plate. Holographic Reduced Representation: Distributed Representation for Cognitive Structures. CSLI Publications, Stanford, CA, 2003.
 Pollack (1990) Jordan B. Pollack. Recursive distributed representations. Artificial Intelligence, 46(1):77–105, 1990.
 Smolensky (1990) Paul Smolensky. Tensor product variable binding and the representation of symbolic structures in connectionist networks. Artificial Intelligence, 46:159–216, 1990.
 Smolensky & Legendre (2006) Paul Smolensky and Géraldine Legendre. The harmonic mind: From neural computation to OptimalityTheoretic grammar. 2 vols. MIT Press, Cambridge, MA, 2006.
 Socher et al. (2010) Richard Socher, Christopher D Manning, and Andrew Y Ng. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS2010 Deep Learning and Unsupervised Feature Learning Workshop, pp. 1–9, 2010.
 Touretzky (1990) David S. Touretzky. Boltzcons: Dynamic symbol structures in a connectionist network. Artificial Intelligence, 46:5–46, 1990.
 Weston et al. (2015) Jason Weston, Antoine Bordes, Sumit Chopra, Alexander M. Rush, Bart van Merrienboer, Armand Joulin, and Tomas Mikolov. Towards AIcomplete question answering: A set of prerequisite toy tasks. arXiv:1502.05698, 2015.
 Zaremba & Sutskever (2014) Wojciech Zaremba and Ilya Sutskever. Learning to execute. arXiv:1410.4615, 2014.
Comments
There are no comments yet.