A comparison of Vector Symbolic Architectures

01/31/2020
by   Kenny Schlegel, et al.
TU Chemnitz
0

Vector Symbolic Architectures (VSAs) combine a high-dimensional vector space with a set of carefully designed operators in order to perform symbolic computations with large numerical vectors. Major goals are the exploitation of their representational power and ability to deal with fuzziness and ambiguity. Over the past years, VSAs have been applied to a broad range of tasks and several VSA implementations have been proposed. The available implementations differ in the underlying vector space (e.g., binary vectors or complex-valued vectors) and the particular implementations of the required VSA operators - with important ramifications for the properties of these architectures. For example, not every VSA is equally well suited to address each task, including complete incompatibility. In this paper, we give an overview of eight available VSA implementations and discuss their commonalities and differences in the underlying vector space, bundling, and binding/unbinding operations. We create a taxonomy of available binding/unbinding operations and show an important ramification for non self-inverse binding operation using an example from analogical reasoning. A main contribution is the experimental comparison of the available implementations regarding (1) the capacity of bundles, (2) the approximation quality of non-exact unbinding operations, and (3) the influence of combined binding and bundling operations on the query answering performance. We expect this systematization and comparison to be relevant for development and evaluation of new VSAs, but most importantly, to support the selection of an appropriate VSA for a particular task.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/09/2021

Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware

This article reviews recent progress in the development of the computing...
10/24/2018

A Proof-Theoretic Approach to Scope Ambiguity in Compositional Vector Space Models

We investigate the extent to which compositional vector space models can...
09/08/2021

Computing on Functions Using Randomized Vector Representations

Vector space models for symbolic processing that encode symbols by rando...
07/11/2017

Deductive and Analogical Reasoning on a Semantically Embedded Knowledge Graph

Representing knowledge as high-dimensional vectors in a continuous seman...
09/14/2020

Variable Binding for Sparse Distributed Representations: Theory and Applications

Symbolic reasoning and neural networks are often considered incompatible...
02/18/2017

Reproducing and learning new algebraic operations on word embeddings using genetic programming

Word-vector representations associate a high dimensional real-vector to ...
07/06/2011

Integrating Generic Sensor Fusion Algorithms with Sound State Representations through Encapsulation of Manifolds

Common estimation algorithms, such as least squares estimation or the Ka...

1 Introduction

This paper is about selecting the appropriate Vector Symbolic Architecture (VSA) to approach a given task. But what is a VSA? VSAs are a class of approaches to solve computational problems using mathematical operations on large vectors. A VSA consists of a particular vector space, for example with (the space of 10,000-dimensional vectors with real numbers between -1 and 1) and a set of well chosen operations on these vectors. Although each vector from is primarily a subsymbolic entity without particular meaning, we can assign a symbolic meaning to this vector. This is similar to how a symbol can be encoded in a binary pattern in a computer (e.g. encode a number). In the computer, imperative algorithmic processing of this binary pattern is used to perform manipulation of the symbol (e.g., do calculations with numbers). The binary encodings in computers and operations on these bitstrings are optimized for maximum storage efficiency (i.e., to be able to distinguish different numbers in an n-dimensional bitstring) and for exact processing (i.e., there is no fuzziness in the encodings or the outcome of an operation). Vector Symbolic Architectures follow a considerably different approach:

1) Symbols are encoded in very large vectors, much larger than would be required to just distinguish the symbols. VSAs use the additional space to introduce redundancy in the representations, usually combined with distributing information across many dimensions of the vector (e.g., there is no single bit that represents a particular property - hence a single error on this bit can not alter this property). Moreover, it is known from mathematics that in very high dimensional spaces randomly sampled vectors are very likely almost orthogonal [Kanerva09] - this can be used in VSAs to encode symbols using random vectors and, nevertheless, there will be only a very low chance that two symbols are similar in terms of angular distance measures. Very importantly, measuring the distance between vectors allows us to evaluate a fuzzy relation between the corresponding symbols.

2) The operations in VSAs are mathematical operations that create, process and preserve the fuzziness of the representations in a systematic and useful way. For instance, an addition-like operator can overlay vectors and creates a representation that is similar to the overlaid vectors. Let us look at an example (borrowed from [Kanerva09]): Suppose that we want to represent the country USA and its properties with symbolic entities - e.g. the currency Dollar and capital Washington DC (abbreviated WDC). In a VSA representation, each entity is a high-dimensional vector. For basic entities, for which we don’t have additional information to systematically create them, we can use a random vector (e.g., sample from ). In our example, these might be Dollar and WDC - remember, these two high-dimensional random vectors will be very dissimilar. In contrast, the vector for USA shall reflect our knowledge that USA is related to Dollar and WDC. Using a VSA, a simple approach would be to create the vector for USA as a superposition of the vectors Dollar and WDC by using an operator that is called bundling: . A VSA implements this operator such that it creates a vector (from the same vector space) that is similar to the input vectors - hence, will be similar to both WDC and Dollar. VSAs provide more operators to represent more complex relations between vectors. E.g., a binding operator that can be used to create role-filler pairs and create and query more expressive terms like: , with Name, Curr, and Cap being random vectors that encode these three roles. Why is this useful? We can now query for the currency of the USA by another mathematical operation (called unbinding) on the vectors and calculate the result by: . Most interestingly, this query would still work under significant amounts of fuzziness - either due to noise, ambiguities in the word meanings, or synonyms (e.g. querying with monetary unit instead of currency - provided that these synonym vectors are created in an appropriate way, i.e. they are similar to some extend). The following Sec. 2 will provide more details on these VSA operators.

Using embeddings in high-dimensional vector spaces to deal with ambiguities is well established in natural language processing

[Widdows04]. VSAs make use of additional operations on high-dimensional vectors. A more exhaustive introduction to the properties of these operations can be found in the seminal paper of Pentti Kanerva [Kanerva09]. So far, they have been applied in various fields including medical diagnosis [Widdows15], robotics [Neubert2019], fault detection [Kleyko15a], analogy mapping [Rachkovskij12]

, reinforcement learning

[Kleyko15]

, long-short term memory

[Danihelka16], text classification [Kleyko18], synthesis of finite state automata [Osipov17], and for creating hyperdimensional stack machines [Yerxa18]

. Interestingly, also the intermediate and output layers of deep artificial neural networks can provide high-dimensional vector embeddings for symbolically processing with a VSA

[Neubert2019].

As stated initially, a VSA combines a vector space with a set of operations. However, based on the chosen vector space and the implementation of the operations, a different VSA is created. In the above list of VSA applications, a broad range of different VSAs has been used. They all use a similar set of operations, but the different underlying vector spaces and the different implementations of the operations have a large influence on the properties of each individual VSA. Basically, each application of a VSA raises the question: Which VSA is the best choice for the task at hand? This question gained relatively little attention in the literature. For instance, [Widdows15], [Kleyko2018a], [Rahimi2017] and [Plate1997]

describe various possible vector spaces with corresponding bundling and binding operation but do not provide an experimental evaluation of these VSAs on a common application. A capacity experiment of different VSAs in combination with Recurrent Neuronal Network memory was done in

[Frady2018]. However, the authors focus on the application of the recurrent memory instead of the complete operators.

In this paper, we benchmark eight known VSA implementations from the literature. We provide an overview of the properties of these available implementations in the following Sec. 2. There, we also create a taxonomy of the different available binding operators and discuss the algorithmic ramifications of their mathematical properties. A more practically relevant contribution of this paper is the experimental comparison of the available VSAs in Sec. 3 with respect to the following important questions:

(1) How efficiently can the different VSAs store (bundle) information into one representation? (2) What is the approximation quality of non exact unbind operators? (3) To what extend are binding and unbinding disturbed by bundled representations?

The paper closes with a summary of the main insights in Sec. 4.

We want to emphasize the point that a detailed introduction to VSAs and their operators are beyond the scope of this paper - instead, we focus on a comparison of available implementations. For more basic introductions to the topic please refer to [Kanerva09] or [Neubert2019].

2 VSAs AND THEIR PROPERTIES

A VSA combines a vector space with a set of operations. The set of operations can vary but typically includes operators for bundling, binding, and unbinding, as well as a similarity measure. Differences between VSAs can result from differences in one or multiple of these components. The following subsections will provide details on these components - in particular, by describing and comparing a set of eight available VSA implementations. Tab. 1 summarizes their properties. We selected the following implementations: the Multiply-Add-Permute (acronym MAP-C and MAP-B, respective their vector space) from [Gayler98], the Binary Spatter Code (BSC) from [Kanerva09]

, the Binary Sparse Distributed Representation from

[Rachkovskij2001] (BSDC-CDT and BSDC-S respectively with two different binding operations), the Holographic Reduced Representations (HRR) from [Plate1995]

and its realization in the frequency domain (

FHRR) from [Plate2003] and [Plate94Phd] as well as another implementation called Vector derived Binding (VTB) from [Gosmann2019]. All these VSAs share the same property of using high-dimensional representations (hypervectors), but they differ in specific vector spaces . Mathematical possible spaces can be real, binary, complex or ternary. Section 2.1 provides an introduction to the used vector spaces and explains their properties.

As described in the introduction, to do calculations or represent knowledge within a VSA, we need a set of operations: bundling will be the topic of Sec. 2.3 and binding and unbinding will be explained in Sec. 2.4. This section will also introduce a taxonomy to systematize the significant differences in the available binding implementations. The introduction emphasized the importance of a similarity measure to deal with the fuzziness of representations; instead of treating representations as same or different, VSAs typically evaluate their similarity. Sec. 2.2 will provide an overview of used similarity metrics and describes these in detail. For a general overview, the Tab. 1 summarizes the properties of the compared VSAs. Finally, as an important and interesting aspect, Sec. 2.5 will describe an application of VSAs to do analogical reasoning using all previously described operators. It is similar to the USA-representation example at the beginning and reveals important properties of the different VSAs.

Name elements of vector space Sim. metric Bundling Binding Unbinding Ref.
commu-tative asso-ciative commu-tative asso-ciative

MAP-C
cosine sim. elem. addition elem. multipl. elem. multipl. [Gayler98]

HRR
cosine sim. elem. addition circ. conv. circ. corr. [Plate1995], [Plate2003]
x x

VTB
cosine sim. elem. addition VTB transpose VTB [Gosmann2019]
x x x x

BSC
overlap elem. addition with threshold XOR XOR [Kanerva09]

MAP-B
cosine sim. elem. addition with threshold elem. multipl. elem. multipl. [Gayler98], [Gayler2009]

BSDC-CDT
overlap disjunction CDT - [Rachkovskij2001]

BSDC-S
overlap disjunction shifting shifting [Rachkovskij2001]
x x x x

FHRR
angle distance angles of elem. addition elem. angle addition elem. angle subtraction [Plate94Phd]
x x

Table 1: Summary of the compared VSAs.

is the uniform distribution in range

.

defines the normal distribution with mean

and variance

. denotes the number of dimensions and the density. For each binding and unbinding the algebraic properties are noticed (associative and commutative) - either check for true or a cross for false.

2.1 Hypervectors - The elements of a VSA

A VSA works in a specific vector space with a defined set of operations. The generation of hypervectors within the specific vector space is one of the most important steps in high-dimensional symbolic processing. There are essentially three ways to create a vector in a VSA: (1) It can be the result of a VSA operation. (2) It can be the result of (engineered or learned) encoding of (real-world) data. (3) It can be a basic entity (e.g. a vector that represents a role in a role-filler pair), for which VSAs use random vectors. The first way will be the topic of the following subsections on the operators. The second way (encoding other data as vectors, e.g. by feeding an image through a ConvNet) is beyond the scope of this paper (for example see [Kleyko18] or [Neubert2019]). However, the third way of creating basic vectors plays an important role when using VSAs and since it is significantly different for the different available VSAs, it is the topic of this subsection.

When selecting vectors to represent basic entities (e.g., symbols for which we don’t know any relation that we can encode), the goal is to create maximally different encodings (to be able to robustly distinguish them in the presence of noise or other ambiguities). High-dimensional vector spaces offer plenty of space to push these vectors apart and moreover, they have the interesting property that random

vectors are already very far away. In particular for angular distance measures, this means that two random vectors are very likely almost orthogonal (this is called quasi-orthogonal): If we sample the direction of vectors independent and identically distributed (i.i.d.) from a uniform distribution, the more dimensions the vectors have, the higher is the probability that the angle between two such random vectors is close to 90 degrees; for a 10,000 dimensional real vector, the probability to be in 90

5 degrees is almost one. Please refer to [Neubert2019] for a more in-depth presentation and evaluation.

The quasi-orthogonality property is heavily used in VSA operations. Since the different available VSAs use different vector spaces and metrics (cf. Sec. 2.2), different approaches to create vectors are involved. The most common approach is based on real numbers in the continuous range. For instance, the Multiply-Add-Permute (MAP-C - C means continuous) architecture uses the real range of . Other architectures such as HRR as well as the VTB VSAs use a real range which is normally distributed with a mean of 0 and a variance of where defines the number of dimensions. Another group uses a binary vector space. For example, the Binary Spatter Code (BSC) and the binary MAP (MAP-B) architecture generate the vectors in or

. The creation of the binary values is based on a Bernoulli distribution with a probability of

in each range. By reducing the probability , sparse vectors can be created for the BSDC-CDT and BSDC-S VSAs (where the acronym CDT means Context Depend Thinning and S means shifting, which are both binding operations and are explained in section 2.4). [Rachkovskij2001] shows that a probability of ( is the number of dimensions) reaches the highest capacity in the vector and is therefore used in this architecture. Finally, the complex vector space can be used. One example is the frequency Holographic Reduced Representations FHRR that uses complex numbers on the unit circle (the complex number in each vector dimension has length one) [Plate94Phd]. It is therefore sufficient to use uniformly distributed values in the range of to define the angles of the complex values. The complex numbers are computed from the angles by .

2.2 Similarity measurement

VSAs use similarity metrics to evaluate vector representations. Such a metric is necessary to find relations between two given vectors (figure out whether the represented symbols have a similar meaning). More precisely: if we have a noisy version of a hypervector (it could be an output of the hypervector calculation), we want to find the most similar elementary vector from our known database (retrieving the original information). Therefore, the carefully defined similarity metric is essential for a robustly working VSA. The term curse of dimensionality [Bellman61] describes the observation that algorithms that are designed for low dimensional spaces often fail in higher dimensional spaces - interestingly, this includes similarity measures based on Euclidean distance without a normalization of the vectors [Beyer99]

. Therefore, continuously-valued VSAs typically use cosine similarity for comparing vectors.

As shown in Tab. 1, the architecture MAP-C, MAP-B, HRR and VTB use the cosine similarity (cosine of the angle) between vectors and : . The output is a scalar value () within the range . Note that -1 means collinear vectors in opposite directions and 1 means identical directions. A value of 0 represents two maximally dissimilar (orthogonal) vectors.

The binary vector space (dense BSC as well as sparse BSDC-CDT, BSDC-S) uses the overlap (or the complementary Hamming Distance) which can be normalized to the range (0 means non-similar and 1 means similar). Eq. 1 shows the formula to compute the similarity between binary vectors and , given density and number of Dimensions .

(1)

The complex space needs a third similarity measurement. As introduced in section 2.1, the commonly used complex architecture of [Plate94Phd] (FHRR) consists of angles of the complex numbers. To measure how similar two vectors are, the average angular distance is calculated (vectors and only contain the angels ):

(2)

2.3 Bundling

VSAs use the bundling operator to superimpose (or overlay) given hypervectors (similar to the introductory example). Bundling superimposes a set of input vectors of space and creates an output vector of the same space that is similar to its inputs. Plate [Plate1997] declared that the essential property of the bundling operator is the unstructured similarity preservation. It means: a bundle of vectors A + B is still similar to vector A, B and to another bundle A + C respective. In the following section, we will use the mathematical symbol for bundling.

The implementation of the bundling operator is intuitive: all compared implementations use an addition-like operator (simple element-wise addition of the vectors). Depending on the vector space is it followed by a normalization step to the specific range (e.g. a threshold is used to get binary values after addition). In the two BSDC architectures the corresponding addition operation is the logical OR function. The vectors from the complex VSA FHRR have to be converted to the form of before bundling (the vectors contain only the angles). Afterward, the complex-valued vectors will be added. Then, only the angles of the resulting complex numbers are used and the magnitudes are discarded - the output is the angles . Since it is basically an addition, bundling is commutative and associative in all compared VSA implementations with one difference: the normalized bundling operations are only approximated associative ().

2.4 Binding

The binding operator is used to connect two vectors (e.g., the role-filler pairs in the introduction) and is the most complex and most diverse operator of VSAs. Plate [Plate1997] defines the properties of the binding as follows:

  • the output is non-similar to the inputs: binding of A and B is non similar to A and B

  • it preserves structured similarity: binding of A and B is similar to binding of A’ and B’, if A’ is similar to A and B’ is similar to B

  • an inverse of the operation exists (defined as unbinding with symbol )

The binding is typically indicated by the mathematical symbol (there is a special type of binding that is indicated by and explained later).

As mentioned in Plate [Plate1997]: to recover the elemental vectors from binding, the VSA needs an inverse binding operator: the unbinding (symbol ). For a better understanding: if we have a binding C=AB, we can retrieve the elemental vectors A or B from C with the unbinding operator: R=AC (or BC). R is now similar to the vector B or A respectively.

From a historical perspective, one of the first ideas to associate connectionist representations comes from Smolensky [smolensky90]

. He uses the tensor product (the outer product of given vectors) to compute a representation that combines all information of the inputs. To recover (unbind) the input information from the created matrix, it requires only the normalized inner product of the vector with the matrix (the tensor product). Based on this procedure, it is possible to create an exact binding and an exact unbinding (recovering). Nevertheless, the problem with the tensor product is that the output is a matrix and the size of the representation grows with each level of computation. Therefore, it is preferable to have binding operations (and corresponding unbinding operations) that

approximate the result of the outer product in a vector (). Based on this idea, several binding and unbinding operations have been developed specifically for each vector domain. These different binding operations create a taxonomy drawn in Fig. 1.

Binding

multiplicative

self-inverse

appr.invertible

MAP-C

exactinvertible

MAP-BBSC

non self-inverse

appr.invertible

VTBHRR

exactinvertible

FHRRBSDC-S

additive

BSDC-CDT
Figure 1: Taxonomy of different binding operations. The VSAs that use each binding are printed in bold (see the Tab. 1 for more details).

Several binding operations are implemented multiplicatively (thus the symbol ). The multiplicative binding can be further distinguished into self-inverse and non self-inverse binding operations. Self-inverse means that the inverse of the binding is the binding operation itself. The opposite is the non self-inverse binding: it requires an additional unbinding operator (inverse of the binding). Finally, each of these nodes is separated into the approximate and exact invertible binding (unbinding). For instance the Smolensky tensor product is an exact invertible binding, because the unbinding produces exactly the same vector as in the input of the binding: . The approximate inverse produces an unbinding output which is similar to the input of the binding, but not the same: .

Multiplicative binding can be implemented by element-wise multiplication (as in [Gayler98]). In case of bipolar values (), elementwise multiplication is self-inverse, since . The self-inverse property is essential for some VSA algorithms in the field of analogical reasoning (this will be the topic of Sec. 2.5). Element-wise multiplication is, for example, used in the MAP-C and MAP-B architectures. An important difference is that for the continuous space of MAP-C the unbinding is only approximate while it is exact for the binary space in MAP-B. With a view to the Smolensky tensor product, element-wise multiplication approximates the outer product matrix by its diagonal. Further, the element-wise multiplication is both commutative and associative (as can be seen in Tab. 1).

Another self-inverse binding with an exact inverse is defined in the BSC architecture. It uses the exclusive or (XOR) and is equivalent to the element-wise multiplication in the bipolar space. As expected, the XOR is used for both binding and unbinding - it provides an exact inverse. Additionally, it is commutative and associative like element-wise multiplication.

The second category within the multiplicative binding in our taxonomy in Fig. 1 is the non self-inverse. Two VSAs have an approximate unbinding operator. One of them is the VTB architecture. The binding for this real-valued vectors (like [Plate1995]) can be computed by vector derived transformation (VTB) as described in [Gosmann2019]. They use a matrix multiplication for the binding and unbinding. The matrix is constructed from the second input vector , and multiplied with the first vector afterward. Eq. 3 formulates the VTB as binding where represents a square matrix which is the reshaped vector .

(3)

The idea behind the VTB is to create a transformation matrix based on the second vector and specifically defined computation steps. It provides a stringent transformation of the first vector, which is invertible (it allows unbinding).

This unbinding operator is equivalent to the matrix multiplication, but only the transposed matrix is used for calculation, as shown in the eq. 4. Based on the matrix multiplication for both binding and unbinding, these operations are neither commutative nor associative.

(4)

The other approximated non self-invertible binding is part of the HRR architecture: the circular convolution. Binding of two vectors and with circular convolution is calculated by:

(5)

Circular convolution approximates Smolensky’s outer product matrix by sums over all its (wrap-around) diagonals (for more details we refer to [Plate1995]). Based on the algebraic properties of convolution, this operator is commutative as well as associative. However, the convolution is not self-inverse and requires a specific unbinding operator. The circular correlation (eq. 6) provides an approximated inverse of the circular convolution and is used for unbinding. This unbinding operator is neither commutative nor associative.

(6)

A useful property of the convolution is that it becomes an element-wise multiplication in the frequency domain (complex space). Thus, it is possible to operate entirely in the complex vector space and use the element-wise multiplication as the binding operator (refer to Plate [Plate94Phd]). This leads to a VSA with an exact invertible and non self-inverse binding as shown in the taxonomy in Fig. 1: the FHRR. With the constraints described in Sec. 2.1 (using complex values with a length of one), the computation of binding and unbinding will be more efficient. If we have two complex numbers and with angles and , we can write multiplication as an addition of the angles:

(7)

The same procedure applies to unbinding but with subtraction of the angles and . Note that a modulo operation with (angles on the complex plane are in the range of ) must follow the addition or subtraction. Based on these assumptions, it is possible to operate only with the angles rather than the whole complex numbers. Since the addition is associative and commutative, the binding is as well. But analog to the unbinding operation, subtraction is non-commutative and non-associative - therefore is also the unbinding.

Finally, we describe the last VSA with a multiplicative exact invertible and non self-inverse binding: the BSDC-S (binary sparse distributed representations with shifting). The shifting operation is a possible method to encode hypervectors into a new representation which is dissimilar to the input. If we have two vectors, the value of shifting can be computed with the positions of the elements of one vector and applied to the second one. This operation has an exact inverse (shifting to the opposite), but it is neither commutative nor associative.

According to Fig. 1, there is one VSA that uses an additive binding. The BSDC-CDT from Rachkovskij et al. [Rachkovskij2001] introduce a binding operator for sparse binary vectors with an additive operator: the disjunction (logical OR). Since sparse vectors produce roughly twice the number of on bits after the disjunction, they propose a Context depend thinning (CDT) procedure to thin vectors after the disjunction. The complete CDT procedure is described in [Rachkovskij2001a]. Surprisingly, this procedure stands in contrast to the defined properties from Plate [Plate1997] at the beginning of the section, because it creates an output that is similar to the inputs. That is why it requires only a similarity comparison instead of an unbinding to retrieve the elemental vectors and find out the most similar vector. Particularly, if the CDT is used for hierarchical binding (e.g. bundle role-filler pairs are two levels - first is binding and second is bundle), the specific levels have to be saved. While retrieving, the similarity search (unbinding) must be done in the corresponding level of binding, because it preserves the similarity of all bound vectors (every elemental vector is similar to the bundle representation). Based on such iterative search (from level to level), the CDT binding needs more computational steps and is not comparable with the other binding operations. That’s why the shifting as binding and unbinding of the BSDC VSA will be used for later experimental evaluations instead of the CDT.

2.5 Ramifications of non self-inverse binding

Sec. 2.4 distinguished two different types of binding operations: self-inverse and non self-inverse. We want to demonstrate possible ramifications of this property using the established example from Pentti Kanerva on analogical reasoning [Kanerva2010]: ”What is the Dollar of Mexico?” The task is as follows: Similar to the representation of the country from the example in the introduction, we can define a second representation of the country :

(8)

Given these two representations, we, as humans, can answer Kanerva’s question by analogical reasoning: Dollar is the currency of the USA, the currency of Mexico is Peso, thus the answer is Peso. The same can be done by a VSA but the method described in [Kanerva2010] works only with self-inverse bindings, such as BSC and MAP. To understand why, we will explain the task in detail: Given are the records of both countries and (the latter is written out in the introduction). Now, we can combine the information from these two representations ( and ) by binding them together (since it is important to find an analogy between these two countries). It creates a mapping :

(9)

With the resulting vector representation we can answer the initial question (”What is the Dollar of Mexico?”) by binding the query vector (Dollar) to the mapping:

(10)

To see what happens and why this actually works, we need a closer look. Eq. 9 can be examined based on the algebraic properties of the binding and bundling operations (binding distributes over bundling). In case of a self-inverse binding (as in the taxonomy 1), the following terms result from eq. 9 (we refer to paper [Kanerva2010] for a more detailed explanation):

(11)

Based on the self-inverse property, terms like cancel out (create a ones-vector). Other terms, like , can be treated as noise (they are summarized in the term ). This is due to the property of binding, which creates an output that is not similar to the inputs. The terms are basically as similar as random vectors, based on the property of quasi-orthogonality of random vectors in high-dimensional spaces. Binding the vector to the mapping of USA and Mexico (eq. 10) creates vector in eq. 12 (only the most important terms are noted). The part is important because it reduces to , again, based on the self-inverse property. The rest is noise and is bundled with the representation . Since the elemental vectors (representations for, e.g., or ) are randomly generated, they are highly robust against noise. That’s why the resulting vector is still very similar to the elemental vector for .

(12)

Notice, the previous description is only a brief summary to the “Dollar of Mexico” example. We refer to [Kanerva2010] for more details.

However, we can see that the computation is based on a self-inverse binding operation. As described in Sec. 2 and the taxonomy in Fig. 1, some VSAs have no self-inverse binding and need an unbind operator to retrieve elemental vectors. But applying unbinding in the task ’What is the dollar of Mexico?’ is not trivial, since it leads to an irreducible representation. For example, if we replace the binding in eq. 9 with unbinding (mapping the two different representation by unbinding) the following term will be produced:

(13)

Since the combination of binding and unbinding is neither commutative nor associative, we can’t cancel out terms as required.

Thus, non self-inverse binding operations can’t be used in Kanerva’s approach to analogical reasoning. Instead, they require an extended formalism to solve this problem. Plate [Plate1995] emphasized the need for a ’readout’ machine for the HRR VSA to decode chunked sequences (hierarchical binding). It retrieves the trace iteratively and finally generates the result. Transferred to the given example: at first, we have to figure out the meaning of Dollar (it is the currency of the USA) and query the result (Currency) on the representation of Mexico afterward (resulting in Peso). Such a readout requires more computation steps caused by iteratively traversing of the hierarchy tree (please see [Plate1995] for more details). Presumably, this is a general problem of all non self-inverse binding operations.

3 Experimental Comparison

After the formal and mathematical description of the previous section, this section provides an experimental comparison of the different VSA implementations. We applied two experiments. The first evaluates the bundling operations to answer the question How efficiently can the different VSAs store (bundle) information into one representation? The second experiment includes the binding as well as the unbinding operation and is divided into two subparts. As described in Sec. 2.4 and visualized in the taxonomy 1, some binding operations have an approximate inverse. The first subpart evaluates the question How good is the approximation of the binding inverse? The second part of the binding evaluation concentrates on the combination of bundling and binding and recovering noisy representations. The leading question here is: To what extend are binding and unbinding disturbed by bundled representations?

3.1 Bundling Capacity

The leading question about the bundle capacity is: How efficiently can the different VSAs store (bundle) information into one representation? To answer this question, we extend the experimental setup from [Neubert2019] with different database sizes and a various number of dimensions, and use this to experimentally compare available VSAs. For each VSA, we create a database of random elementary vectors from the underlying vector space . It represents basic entities stored in a so-called item memory (database). To evaluate the bundle capacity of this VSA, we randomly chose elementary vectors from this database and create their superposition using the VSA’s bundle operator. Now the question is whether this combined vector is still similar to the bundled elementary vectors. To answer this question, we query the database with the vector to obtain the k elementary vectors, which are most similar to the bundle (using the VSA’s similarity metric). The evaluation criterion is the accuracy of the query result: the ratio of correctly retrieved elementary vectors on the k returned vectors from the database.

The capacity depends on the dimensionality of . Therefore we range the number of dimensions in 4…1156 and evaluate for in 2…50. We use

elementary vectors. To account for randomness, we repeat each experiment 10 times and report means and standard deviations.


Fig. 2 shows the results of the experiment. It provides an evaluation of the required number of dimensions to achieve perfect retrieval for different values of k (i.e., 100% accuracy = query results are exactly the bundled vectors). To make the comparison more accessible, we fit a straight line to the data points and plot the result as a dotted line (the less steep the lines, the more efficient is the bundling). Dense binary spaces need the most number of dimensions, real-valued vectors a little less and the complex values require the least amount of dimensions. It can be seen that the binary sparse (BSDC-S) and the complex domain (FHRR) reach the most efficient results. They need fewer dimensions to bundle all vectors correctly. The sparse binary representations perform better than the dense binary vectors in this experiment. A more in depth analysis of the benefits of sparse distributed representations can be found in [Ahmad2019]. Particularly interesting is also the comparison between the HRR VSA from [Plate1995] and the complex-valued FHRR VSA from [Plate94Phd]. The FHRR with the complex domain as well as the HRR architecture operate in a continuous space (FHRR uses only the angles of complex numbers). But operating with real values in a complex perspective increases the efficiency noticeably. Even if the HRR architecture is adapted to a range of like the complex domain, the performance of the real VSA does not change remarkably. This is an interesting insight: If real numbers are treated as if they were angles of a complex number, then this increases the efficiency of bundling. Finally, we emphasis that each vector space expresses its values with a different number of bits. This also influence the performance, but will be not discussed at this point. It can be seen that even a vector space with few bits can store many vectors efficiently (e.g. the sparse binary).

Figure 2: Minimum required number of dimensions to reach 100% accuracy. The dotted lines represent the best linear fit.

3.2 Binding and unbinding performance

Performance of approximately invertible binding

We evaluate the approximation quality of the not exactly invertible binding operators from Fig. 1. From this taxonomy, we know three VSAs have an approximate inverse: MAP-C, VTB and HRR. To evaluate the performance of the approximate inverses we use a setup similar to [Gosmann2019]. We extended the experiment to compare the accuracy of approximate unbinding of the three relevant VSAs. The experiment is defined as follows: we start with an initial random vector v and bind it sequentially with other random vectors to an encoded sequence S (see eq. 14). The task is now to retrieve the elemental vector v with sequentially unbinding the random vectors from S. The result is vector and it should be highly similar to the original initial vector v (see eq. 15).

(14)
(15)

We applied the described procedure for the 3 approximated VSAs (all exact-invertible bindings produce 100% accuracy and are not shown in the plots) with 40 sequences () and D=1024 (number of dimensions). The normalized similarity (the range is between the most possible similar or dissimilar value) is the evaluation criterion and drawn in Fig. 3. It has to be remark at the diagram, that with more than 40 sequences the curves are slowly fallen down and there is not asymptote with a value greater than 0. As evaluated in [Gosmann2019], the VTB binding and unbinding is more potent than the circular convolution/correlation from HRR. It reaches the highest similarity over the whole range. The bind/unbind operator of the MAP-C architecture with values within the range performs worse. One reason might be the weak approximation (element-wise multiplication) of the tensor product from Smolensky [smolensky90] - it is only the diagonal of the whole resulting matrix. The HRR and VTB incorporate more values from the exact invertible tensor product.

Figure 3: Normalized similarity between the initial vector v and the unbound sequence vector with different numbers of sequences.

Unbinding of bundled pairs

The second experiment of the binding combines the bundling, the binding and the unbinding operator in one scenario. It extends the example from the introduction where we bundled three role-filler pairs to encode the knowledge about one country. A VSA allows querying for a filler by unbinding the role. Now, the question is: How many property-value (role-filler) pairs can be bundled and still provide the correct answer to any query? This is like an unbinding of a noisy representation and is similar to the experiment of VSAs scaling properties in [Eliasmith2013, p. 141] but using only a single item memory size.

In a first step, we create a database (item memory) of random elemental vectors as in the bundle capacity experiment in section 3.1. Then we combine randomly chosen elementary vectors from the item memory to vector pairs by binding these two entities. The result is bound pairs, equivalent to the property-value pairs from the USA example (…). These pairs are bundled to a single representation (like the representation ) which creates a noisy version of all bound pairs. The goal is to retrieve all elemental vectors from the compact hypervector by unbinding. The evaluation criterion is defined as follows: we compute the ratio of correct recovered vectors to the number of all initial vectors (). As in the capacity experiment, we used a variable number of dimensions and a varying number of bundled pairs . Finally, we run the experiment 10 times and use the mean values.

We evaluate the VSAs by comparing their accuracies to those of the capacity experiment from Sec. 3.1 as follows: We select the minimum required number of dimensions to retrieve either 15 bundled vectors (capacity experiment in sec. 3.1) or 15 bundled pairs (bound vectors experiment). Tab. 2 summarizes the results and shows the growth between the bundle and the binding-plus-bundle experiment. Noticeable is the significant rise of the number of dimensions with the sparse binary VSA. It requires roughly 82% larger vectors when using the bundling in combination with binding. One reason could be the increasing density while binding sparsely distributed vectors because it uses only the disjunction without a thinning procedure. On the contrary, the VSAs MAP-C, MAP-B and BSC have a marginal change of the required number of dimensions. Only the two bindings of real normal distributed vectors with convolution or VTB decrease slightly to roughly 10%. This might be caused by the approximate inverse of the binding operator. The overall best score is still achieved by the complex VSA. However, it might result mainly from the good bundling performance rather than the better binding performance.

Vector space # Dimensions to bundle 15 vectors # Dimensions to bundle 15 pairs growth
MAP-C 670 645 -4%
HRR 550 610 +11%
VTB 550 615 +12%
BSC 890 880 -1%
MAP-B 930 925 -0.5%
BSDC 360 656 +82%
FHRR 365 375 +3%
Table 2: Comparison of the minumum required number of dimensions to reach a perfect retrieval of 15 bundled vectors and 15 bundled pairs. 4th column shows the growth between the first and the second experimental results.

4 Conclusion

We discussed and evaluated available VSA implementations theoretically and experimentally. We created a general overview of the most important properties and provided insights especially to the various implemented binding operators (taxonomy of Fig. 1). It was shown that self-inverse binding operations benefit in applications such as analogical reasoning (”What is the Dollar of Mexico?”). On the other hand, these self-inverse architectures, like MAP-B and MAP-C, show a trade-off between an exactly working binding (by using a binary vectors space like or ) or a high bundling capacity (by using real-valued vectors). In the bundling capacity experiment, the sparse binary VSA BSDC performed well and required only a small number of dimensions. However, in combination with binding, it required many more dimensions, presumably, this is due to the lack of a thinning procedure after bundling in our experiments and therefore an increased vector density. High performance of both overlaying and connecting (binding) of symbolic vectors could be observed in the simplified complex architecture FHRR that uses only the angles of the complex values. Since this architecture is not self-inverse, it requires a separate unbinding operation and cannot solve the reasoning example from Sec. 2.5 based on Kanerva’s approach - however, it could presumably be solved using other methods that iteratively process the knowledge tree, but come at increased computational costs.

This paper, in particular the taxonomy of binding operations, revealed a very large diversity in available VSAs and the necessity of continued efforts to systematize these approaches. For selecting an appropriate VSA for a particular task, an extension of this present work to a full benchmark would be desirable, as well as a clear definition of theorems and axioms of VSAs (or particular sub-classes of VSAs); further general design patterns on how to solve tasks using VSAs would be helpful. Further work should also include the evaluation of VSAs on real-world data and exploration of the mapping between subsymbolic and symbolic representations. This mapping is one of the key components when working with real-world data such as sensor outputs (e.g. images).

References