DeepAI
Log In Sign Up

A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part I: Models and Data Transformations

11/11/2021
by   Denis Kleyko, et al.
ibm
0

This two-part comprehensive survey is devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Notable models in the HDC/VSA family are Tensor Product Representations, Holographic Reduced Representations, Multiply-Add-Permute, Binary Spatter Codes, and Sparse Binary Distributed Representations but there are other models too. HDC/VSA is a highly interdisciplinary area with connections to computer science, electrical engineering, artificial intelligence, mathematics, and cognitive science. This fact makes it challenging to create a thorough overview of the area. However, due to a surge of new researchers joining the area in recent years, the necessity for a comprehensive survey of the area has become extremely important. Therefore, amongst other aspects of the area, this Part I surveys important aspects such as: known computational models of HDC/VSA and transformations of various input data types to high-dimensional distributed representations. Part II of this survey is devoted to applications, cognitive computing and architectures, as well as directions for future work. The survey is written to be useful for both newcomers and practitioners.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

06/09/2021

Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware

This article reviews recent progress in the development of the computing...
01/27/2022

Recursive Binding for Similarity-Preserving Hypervector Representations of Sequences

Hyperdimensional computing (HDC), also known as vector symbolic architec...
11/10/2017

Neural-Symbolic Learning and Reasoning: A Survey and Interpretation

The study and understanding of human behaviour is relevant to computer s...
12/31/2021

Shift-Equivariant Similarity-Preserving Hypervector Representations of Sequences

Hyperdimensional Computing (HDC), also known as Vector-Symbolic Architec...
09/30/2020

Analyzing the Capacity of Distributed Vector Representations to Encode Spatial Information

Vector Symbolic Architectures belong to a family of related cognitive mo...
09/14/2020

Variable Binding for Sparse Distributed Representations: Theory and Applications

Symbolic reasoning and neural networks are often considered incompatible...
10/14/2020

Theoretical Foundations of Hyperdimensional Computing

Hyperdimensional (HD) computing is a set of neurally inspired methods fo...

1 Introduction

The two main approaches to Artificial Intelligence (AI) are symbolic and connectionist. The symbolic approach represents information via symbols and their relations. Symbolic AI (the alternative term is Good Old-Fashioned AI, GOFAI) solves problems or infers new knowledge through the processing of these symbols. In the alternative connectionist approach, information is processed in a network of simple computational units often called neurons, so another name for the connectionist approach is artificial neural networks. This article presents a survey of a research area that originated at the intersection of GOFAI and connectionism, which is known under the names Hyperdimensional Computing, HDC (term introduced in 

[57]) and Vector Symbolic Architectures, VSA (term introduced in [30]).

In order to be consistent and to avoid possible confusion amongst the researchers outside the area, we will use the joint name HDC/VSA when referring to the area. It is also worth pointing out that probably the most influential and well-known (at least in the machine learning domain) HDC/VSA model is Holographic Reduced Representations 

[122] and, therefore, this term is also used when referring to the area. For the reason of consistency, however, we use HDC/VSA as a general term while referring to Holographic Reduced Representations when discussing this particular model. HDC/VSA is the umbrella term for a family of computational models that rely on mathematical properties of high-dimensional random spaces and use high-dimensional distributed representations called hypervectors (HVs111 Another term to refer to HVs, which is commonly used in the cognitive science literature, is “semantic pointers” [9].) for structured (“symbolic”) representation of data while maintaining the advantages of connectionist vector distributed representations. This opens a promising way to build AI systems [92].

For a long time HDC/VSA did not gain much attention from the AI community. Recently, the situation has, however, begun to change and right now HDC/VSA is picking up momentum. We attribute this to a combination of several factors such as dissemination efforts from the members of the research community222HDC/VSA Web portal. [Online.] Available: https://www.hd-computing.com333VSAONLINE. Webinar series. [Online.] Available: https://sites.google.com/ltu.se/vsaonline, several successful engineering applications in the machine learning domain (e.g., [148, 143]) and cognitive architectures [17, 129]. The main driving force behind the current interest is the global trend of searching for computing paradigms alternative to the conventional (von Neumann) one, such as neuromorphic and nanoscalable computing, where HDC/VSA is prospected to play an important role (see [67] and references therein for perspective).

The major problem for researchers new to the area is that the previous work on HDC/VSA is spread across many venues and disciplines and cannot be tracked easily. Thus, understanding the state-of-the-art of the area is not trivial. Therefore, in this article we survey HDC/VSA with the aim of providing the broad coverage of the area, which is currently missing. While the survey is written to be accessible to a wider audience, it should not be considered as “the easiest entry point”. For anyone who has not yet been exposed to HDC/VSA, before reading this survey, we highly recommend starting with three tutorial-like introductory articles [57, 59], and [107]. The former two provide a solid introduction and motivation behind the area while the latter focuses on introducing HDC/VSA within the context of a particular application domain of robotics. Someone who is looking for a very concise high-level introduction to the area without too many technical details might consult [58]. Finally, there is a book [122] that provides a comprehensive treatment of fundamentals of two particular HDC/VSA models (see Sections 2.3.3 and 2.3.5). While [122] focused on the two specific models, many of the aspects presented there apply to HDC/VSA in general.

Representation types

Connectionism challenges

The HDC/VSA models

Capacity of hypervectors

Hypervectors for symbols and sets

Hypervectors for numeric data

Hypervectors for sequences

Hypervectors for 2D images

Hypervectors for graphs



[120]

[57]

[142]

[143]

[107]

[31]

[149]

[67]

[36]


This survey, Part I
Section #
2.1.1 2.1.2 2.3 2.4 3.1 3.2 3.3 3.4 3.5

TABLE I: A qualitative assessment of existing HDC/VSA literature with elements of survey.

To our knowledge, there have been no previous attempts to make a comprehensive survey of HDC/VSA but there are articles that overview particular topics of HDC/VSA. Table I contrasts the coverage of this survey with those previous articles (listed chronologically). We use to indicate that an article partially addressed a particular topic, but either new results have been reported since then or not all related work was covered.

Part I of this survey has the following structure. In Section 2, we introduce the motivation behind HDC/VSA, their basic notions, and summarize currently known HDC/VSA models. Section 3 presents transformation of various data types to HVs. Discussion and conclusions follow in Sections 4 and 5, respectively.

Part II of this survey [73]) will cover existing applications and the use of HDC/VSA in cognitive architectures.

Finally, due to the space limitations, there are topics that remain outside the scope of this survey. These topics include connections between HDC/VSA and other research areas as well as hardware implementations of HDC/VSA. We plan to cover these issues in a separate work.

2 Vector Symbolic Architectures

In this section, we describe the motivation that led to the development of early HDC/VSA models (Section 2.1), list their components (Section 2.2), overview existing HDC/VSA models (Section 2.3) and discuss the information capacity of HVs (Section 2.4).

2.1 Motivation and basic notions

The ideas relevant for HDC/VSA already appeared in the late 1980s and early 1990s [51, 103, 153, 127, 115, 86]. In this section, we first review the types of representations together with their advantages and disadvantages. Then we introduce some of the challenges posed to distributed representations, which turned out to be motivating factors for inspiring the development of HDC/VSA.

2.1.1 Types of representation

Symbolic representations


Symbolic representations (SRs) are natural for humans and widely used in computers. In SRs, each item or object is represented by a symbol.

Here, by objects we refer to items of various nature and complexity, such as features, relations, physical objects, scenes, their classes, etc. More complex symbolic representations can be composed from the simpler ones.

SRs naturally possess a combinatorial structure that allows producing indefinitely many propositions/symbolic expressions (see Section 2.1.2

below). This is achieved by composition using rules or programs as demonstrated by, e.g., Turing Machines. This process is, of course, limited by memory size that can, however, be expanded without changing the computational structure of a system. A vivid example of a symbolic representation/system is a natural language, where a plethora of words is composed from a small alphabet of letters. In turn, a finite number of words is used to compose an infinite number of sentences and so on.

SRs have all-or-none explicit similarity: the same symbols have maximal similarity, whereas different symbols have zero similarity and are, therefore, called dissimilar. To process symbolic structures, one needs to follow edges and/or match vertices of underlying graphs to, e.g., reveal the whole structure or calculate similarity. Therefore, symbolic models usually have problems with scaling because similarity search and reasoning in such models require complex and sequential operations, which quickly become intractable.

In the context of brain-like computations, the downside of the conventional implementation of symbolic computations is that they require reliable hardware [167], since any error in computation might result in a fatal fault. In general, it is also unclear how SRs and computations with them could be implemented in a biological tissue, especially when taking into account its unreliable nature.

Connectionist representations: localist and distributed


In this article, connectionism is used as an umbrella term for approaches related to neural networks and brain-like computations. Two main types of connectionist representations are distinguished: localist and distributed [164, 162].

Localist representations (LRs) are akin to SRs in that for each object there exists a single element in the representation. Examples of LRs are a single neuron (node) or a single vector component.

There is some evidence that LRs might be used in the brain (so-called “grandmother cells”) [126]

. In order to link LRs, connections between components can be created, corresponding to pointers in SRs. However, constructing compositional structures that include combinations of already represented objects requires the allocation of (a potentially infinite number of) new additional elements and connections, which is neurobiologically questionable. For example, representing “abc”, “abd”, “acd”, etc. requires introducing new elements for them, as well as connections between “a”, “b”, “c”, “d” and so on. Also, LRs share SRs’ drawbacks of lacking enough semantic basis, that may be considered as a lack of immediate explicit similarity between the representations. In other words, different neurons representing different objects are dissimilar and estimating object similarity requires additional processes.

Distributed representations (DRs) were inspired by the idea of a “holographic” representation as an alternative to the localist representation [38, 162, 164, 123]. They are attributed to a connectionist approach based on modeling the representation of information in the brain as “distributed” over many neurons. In DRs, the state of a set of neurons (of finite size) is modeled as a vector where each vector component represents a state of the particular neuron.

DRs are defined as a form of vector representations, where each object is represented by a subset of vector components, and each vector component can belong to representations of many objects. This concerns (fully distributed) representations of objects of various complexity, from elementary features or atomic objects to complex scenes/objects that are represented by (hierarchical) compositional structures.

In DRs, the state of individual components of the representation cannot be interpreted without knowing the states of other components. In other words, in DRs the semantics of individual components of the representation are usually undefined, in distinction to LRs.

In order to be useful, DRs of similar objects should be similar (according to some similarity measure of the corresponding vector representations; see Section 2.2.2), thus addressing the semantic basis issue of SRs and LRs.

DRs have the following attractive properties:

  • High representational capacity. For example, if one object is represented by binary components of a -dimensional vector, then the number of representable objects equals the number of combinations , in contrast to for LRs;

  • Direct access to the representation of an object. A compositional DR of a compositional structure can be processed directly without tracing pointers as in SRs or following connections between components as in LRs;

  • Explicit representation of similarity. Similar objects have similar representations that can be immediately compared by efficiently computable vector similarity measures (e.g., dot product, Minkowski distance, etc.);

  • A rich semantic basis due to the immediate use of representations based on features and the possibility of representing the similarity of the features themselves in their vector representations;

  • The possibility of using well-developed methods for processing vectors;

  • For many types of DRs – the ability to recover the original representations of objects;

  • The ability to work in the presence of noise, malfunction, and uncertainty, in addition to neurobiological plausibility.

2.1.2 Challenges for conventional connectionist representations

Let us consider several challenges faced by early DRs known as “superposition catastrophe”, e.g., [166, 130]. And, at a higher level, by demand for “systematicity” [19], and much later, for fast compositionality [44]. These challenges led to the necessity to make DRs “structure-sensitive” by introducing the “binding” operation (Section 2.2.3).

“Superposition catastrophe”


It was believed that connectionist representations cannot represent hierarchical compositional structures because of the superposition catastrophe, which manifestates itself in losing the information concerning object arrangements in structures, see [166, 130, 138] for discussion and references.

In the simplest case, let us activate binary LR elements corresponding to “a” & “b”, then to “b” & “c”. If we want to represent both “a” & “b”, and “b” & “c” simultaneously, we activate all three representations: “a”, “b”, “c”. Now, however, the information that “a” was with “b”, and “b” was with “c” is lost. For example, the same “a”, “b”, “c” activation could be obtained by “a” & “c”, and single “b”. The same situation occurs if “a”, “b”, “c” are represented by distributed patterns.

Fodor & Pylyshyn critics of connectionism


In [19], criticism of connectionism was concerned with the parallel distributed processing approach covered in [38]. The authors claim that connectionism lacks Productivity, Systematicity, Compositionality, and Inferential Coherence that are inherent to systems operating with SRs. Their definitions of these intuitively appealing issues are rather vague and interrelated, and their criticism is constrained to the early particular connectionist model that the authors chose for their critique. Therefore, we restate these challenges as formulated in [122]:

  • Composition, decomposition, and manipulation: How are elements composed to form a structure, and how are elements extracted from a structure? Can the structures be manipulated using DRs?

  • Productivity: A few simple rules for composing elements can give rise to a huge variety of possible structures. Should a system be able to represent structures unlike any it has previously encountered, if they are composed of the same elements and relations?

  • Systematicity: Does the DR allow processes to be sensitive to the structure of the objects? To what degree are the processes independent of the identity of elements in compositional structures?

Challenges to connectionism posed by Jackendoff


Four challenges to connectionism have been posed by Jackendoff [44], see also [30] for their HDC/VSA treatment. In principle, they are relevant to cognition but in particular they are related to language. The problem, in general, is how to neurally instantiate the rapid construction and transformation of the compositional structures.

  • Challenge 1. The binding problem: the observation that linguistic representations must use compositional representations, taking into account order and occurring combinations. For example, the same words in different order and combination are going to produce different sentences.

  • Challenge 2. The problem of two: how are multiple instances of the same object instantiated? For example, how are the “little star” and the “big star” instantiated so that they are both stars, yet distinguishable?

  • Challenge 3. The problem of variables: concerns typed variables. One should be able to represent templates or relations with variables (e.g., names of relations) and values (e.g., arguments of relations).

  • Challenge 4. Binding in working and in long-term memories: representations of the same binding should be identical in various types of memory. In other words, the challenge concerns the transparency of the boundary between a working memory and a long-term memory. It has been argued that linguistic tasks require the same structures to be instantiated in the working memory and the long-term memory and that the two instantiations should be functionally equivalent.

2.1.3 Binding to address challenges of conventional connectionist representations

The challenges presented above made it clear that an adequate representation of compositional structures requires preserving information about their grouping and order. For example, in SRs, brackets and symbolic order can be used to achieve this. For the same purpose, some mechanism of binding (à la “grouping brackets”) was needed in DRs.

One of the approaches to binding in DRs is based on the temporal synchronization of constituent activations [97, 166, 151, 41]. Although this mechanism may be useful on a single level of composition, its capabilities to represent compositional structures with multiple levels of hierarchy are questionable as it requires many time steps and complex “orchestration” to represent compositional structures. Another major problem is that such a temporal representation cannot be immediately stored in a long-term memory.

An alternative approach to binding, which eliminates these issues, is the so-called conjunctive coding approach used in HDC/VSA. Its predecessors were “extra units” considered by [39] to represent various combinations of active units of distributed patterns as well as tensor products [103, 153], which, however, increased the dimensionality of representations (see details in Section 2.3.2).

In HDC/VSA, the binding operation does not change the dimensionality of DRs and preserves grouping information through a component-wise multiplication-like operation. Moreover, it does not require any training. It is rather an automatic consequence of the properties of the HDC/VSA operations. Importantly, the schemes for forming DRs of compositional structures that exploit the binding produce DRs that are similar for similar objects (i.e., they take into account the similarity of object elements, their grouping, and order at various levels of hierarchy).

In summary, the main motivation for developing HDC/VSA was to combine the advantages of early DRs and those of SRs, while avoiding their drawbacks, in pursuit for more efficient information processing and, ultimately, for better AI systems. One of the goals was to address the above challenges faced by conventional connectionist representations. The properties of HDC/VSA models introduced below allow addressing these challenges to a varying degree.

2.2 Structure-sensitive distributed representations

2.2.1 Atomic representations

When designing an HDC/VSA-based system it is common to define a set of the most basic objects/items/entities/concepts/ symbols/scalars for the given problem and assign them HVs, which are referred to as atomic HVs. The process of assigning atomic HVs is often referred to as mapping, projection, embedding, formation or transformation. To be consistent, we will use the term transformation.

For a given problem, we need to choose atomic representations of objects such that the similarity between the representations of objects corresponds to the properties that we care about. HVs of other (compositional) objects are formed by the atomic HVs (see Section 2.2.4). As follows from their name, atomic HVs are high-dimensional vectors. Values of HV components could be binary, real, or complex numbers.

In the early days of HDC/VSA, most of the works were focused on symbolic problems. In the case of working with symbols one could easily imagine many problems where a reasonable assumption would be that symbols are not related at all. So their atomic HVs are generated at random and are considered dissimilar, i.e., their expected similarity value is considered to be “zero”. On the other hand, there are many problems where assigning atomic HVs fully randomly does not lead to any useful behavior of the designed system, see Section 3.2.

Fig. 1:

Concentration of measure phenomenon between bipolar random HVs. The lines correspond to probability density functions of cosine similarities for different dimensionalities of HVs. Normal distributions were fitted to pair-wise similarities obtained for

randomly chosen HVs.

As mentioned above, in HDC/VSA random independent HVs generated from some distribution are used for representing objects that are considered independent and dissimilar. For example, randomly generated binary HVs with components from the set , with the probability of a 1-component being for dense binary representations [57] or for sparse binary representations [138] with or with a fixed number of randomly activated components. Such random HVs are analogous to “symbols” in SRs. However, the introduction of new symbols in SRs or nodes in LRs requires changing the dimensionality of representations, whereas in HDC/VSA they are simply introduced as new HVs of fixed dimensionality. So HDC/VSA can accommodate symbols with -dimensional HVs such that . This happens because in high-dimensional spaces randomly chosen HVs are pseudo-orthogonal to each other. This mathematical phenomenon is known as concentration of measure [91]. The peculiar property of this phenomenon is that pseudo-orthogonality converges to exact orthogonality with increased dimensionality of HVs. This is sometimes referred to as the “blessing of dimensionality” [32]. Fig. 1 provides a visual illustration of the case of bipolar HVs.

If the similarity between objects is important then HVs are generated in such a way that their similarity characterizes the similarity between objects. The similarity between HVs is measured by standard vector similarity (or distance) measures.

2.2.2 Similarity measures


Similarity measure for dense representations


For dense HVs with real or integer components the most commonly used measures are the Euclidean distance:

(1)

the dot (inner, scalar) product:

(2)

and the cosine similarity:

(3)

For dense HVs with binary components, the normalized Hamming distance is the most common choice:

(4)

where denotes a component-wise XOR operation.

Similarity measures for sparse representations


and are frequently used as similarity measures in the case of sparse HVs, i.e., when the number of nonzero components in HV is small.

Sometimes, Jaccard similarity is also used for this purpose:

(5)

where and denote component-wise AND and OR operations, respectively.

2.2.3 Operations

Superposition and binding operations that do not change the HV dimensionality are of particular interest because they allow one to apply the same operations to their resultant HVs. But note that some applications might require changing the dimensionality of representations.

Superposition


The superposition is the most basic operation used to form an HV that represents several HVs. In analogy to simultaneous activation of neural patterns represented by HVs, this can be modeled as a disjunction of binary HVs or addition of real-valued HVs. See more examples of particular implementations in Section 2.3. In HDC/VSA, this operation is known under several names: bundling, superposition, and addition. Due to its intuitive meaning, below, we will use the term superposition. The simplest unconstrained superposition (denoted as ) is simply:

(6)

In order to be used at later stages, the resultant HV is often required to preserve certain properties of atomic HVs. Therefore some kind of normalization is often used to preserve, for instance, the norm, e.g., Euclidean:

(7)

or the type of components as in atomic HVs, e.g., integers in a limited range:

(8)

where is a function limiting the range of values of . Below, we will also use brackets

when some sort of normalization is implied. For example, in the case of dense binary/bipolar HVs, the binarization of components via majority rule/sign function is used.

In the case of sparse binary HVs, their superposition increases the density of the resultant HV. Therefore, there are operations to “thin” the resultant HV and preserve the sparsity.

After superposition, the resultant HV is similar to its input HVs. Moreover, the superposition of HVs remains similar to each individual HV but the similarity decreases as more HVs are superimposed together. This could be seen as an unstructured and additive kind of similarity preservation.

Binding


As mentioned above, the superposition operation alone leads to the superposition catastrophe (due to its associative property), i.e., during the recursive application of the superposition, the information about combinations of the initial objects is lost. This issue needs to be addressed in order to represent compositional structures. A binding operation (denoted as when no particular implementation is specified) is used in HDC/VSA as a solution. There are two types of binding: via multiplication-like operation and via permutation operation.

Multiplicative binding

Multiplication-like operations are used in different HDC/VSA models for implementing the binding operation when two or more HVs should be bound together. Examples of multiplication-like binding operations are conjunction for binary HVs or component-wise multiplication for real-valued HVs. Please refer to Section 2.3 for concrete implementations for a particular HDC/VSA model. Fig. 1 in [149] also presents the taxonomy of the most commonly used multiplicative bindings.

The HV obtained after binding together several input HVs depends on all of them. Similar input HVs produce similar bound HVs. However, the way the similarity is preserved is (generally) different from that of the superposition operation [121, 138]. We discuss this aspect in details in Section 2.2.3.

In many situations, there is a need for an operation to reverse the result of the binding operation. This operation is called unbinding or release (denoted as ). It allows the recovery of a bound HV [169, 33], e.g.,:

(9)

Here, the binding of only two HVs is used for demonstration purposes. In general, many HVs can be bound to each other. Note that when the unbinding operation is applied to a superposition of HVs, the result will not be exactly equal to the original bound HV. It will also contain crosstalk noise; therefore, the unbinding operation is commonly followed by a clean-up procedure (see Section 2.2.5) to obtain the original bound HV. In some models, the binding operation is self-inverse, which means that the unbinding operation is the same as the binding operation. We specify how the unbinding operations are realized in different models in Section 2.3.

Binding by permutation

Another type of HV binding is by permutation, which might correspond to, e.g., some position of the object. We denote the application of a permutation to an HV as ; if the same permutation is applied times as and the corresponding inverse is then . Note that the permuted HV is dissimilar to the original one. However, the use of partial permutations [84, 12] allows preserving some similarity (see Section 3.4.1).

The permutation operation can be implemented by matrix multiplication with the corresponding permutation matrix. Such an implementation of the permutation operation makes a connection to a particular implementation of the multiplication-like binding proposed in [27]

, where matrix multiplication is also used for positional binding by assigning each position with a random matrix (Section 

2.3.4). For permutation binding, the unbinding is achieved by inverse permutation.

Often in practice a special case of permutation – cyclic shift – is used as it is very simple to implement. The use of cyclic shift for positional binding in HDC/VSA was originally proposed in [83] for representing 2D structures (images) and 1D structures (sequences). Next, the permutations were used for representing the order in [138]. In a similar spirit, the permutation was used in [118, 46] to avoid the commutativity of a circular convolution (implementation of the multiplication-like binding operation).

Later [57] introduced a primitive for representing a sequence in a compositional HV using multiple applications of the same fixed permutation to represent a position in the sequence. This primitive was popularized via applications to texts [147, 47].

Similarity preservation by operations


For the superposition operation, the influence of any input HV on the resultant HV is the same independently of other input HVs. In binding, the influence of any input HV on the result depends on all other input HVs. Also, the superposition and binding operations preserve similarity in a different manner. There is unstructured similarity, which is the similarity of the resultant HV to the HVs used as input to the operation. The superposition operation preserves the similarity of the resultant HV to each of the superimposed HVs, that is, it preserves the unstructured similarity. For example, is similar to and . In fact, given some assumptions on the nature of the HVs being superimposed, it is possible to analytically analyze the information capacity of HVs (see Section 2.4 for details).

Most realizations of the binding operation do not preserve unstructured similarity. They do, however, preserve structured similarity, that is, the similarity of the bound HVs to each other. Let us consider two i.i.d. random HVs: and ; is similar to if is similar to and is similar to . Thus, most realizations of the binding operation preserve structured similarity in a multiplicative fashion. When and are independent, as well as and , then the similarity of to is equal to the product of the similarities of to and to . For instance, if , the similarity of to will be equal to the similarity of to . If is not similar to , will have no similarity with irrespective of the similarity between and due to the multiplicative fashion of similarity preservation. This type of similarity is different from the unstructured similarity of the superimposed HVs: will still be similar to even if is dissimilar to .

2.2.4 Representations of compositional structures

As introduced in Section 2.1.1, compositional structures are formed from objects where the objects can be either atomic or compositional. Atomic objects are the most basic (irreducible) elements of a compositional structure. More complex compositional objects are composed from atomic ones as well as from simpler compositional objects. Such a construction may be considered as a part-whole hierarchy, where lower-level parts are recursively composed to form higher-level wholes (see, e.g., an example of a graph in Section 3.5.2).

In HDC/VSA, compositional structures are transformed into their HVs using HVs of their elements and the superposition and binding operations introduced above. As mentioned in 2.1.3, it is a common requirement that similar compositional structures be represented by similar HVs. A possibility to recover the original representation from its compositional HV, which we describe in the next section, might be an additional requirement. In Section 3 below, we will review a number of approaches to the formation of atomic and compositional HVs.

2.2.5 Recovery and clean-up

Given a compositional HV, it is often desirable to find the particular input HVs from which it was constructed. We will refer to the procedure implementing this as recovery. This procedure is also known as decoding, reconstruction, restoration, decomposition, parsing, and retrieval.

For recovery, it is necessary to know the set of the input HVs from (some of) which the HV was formed. This set is usually called dictionary, codebook, clean-up memory or item memory444 In the context of HDC/VSA, the term clean-up was introduced in [115] while the term item memory was proposed in [56]. . In its simplest form, the item memory is just a matrix storing HVs explicitly, but it can be implemented as, e.g., a content-addressable associative memory [158]. Moreover, recent works suggested [150, 68, 16] that it is not always necessary to store the item memory explicitly, as it can be easily rematerialized. Also, the recovery requires knowledge about the structure of the compositional HV, that is, information about the operations used to form a given HV (see an example below).

Most of the HDC/VSA models produce compositional HVs not similar to the input HVs due to the properties of their binding operations. So, to recover the HVs involved in the compositional HV, the use of the unbinding operation (Section 2.2.3 above) will typically be required. If the superposition operation was used when forming the compositional HV, after unbinding the obtained HVs will be noisy versions of the input HVs to be recovered (noiseless version can be obtained in the case when only binding operations were used to form a compositional HV). Therefore, a “clean-up” procedure is used at the final stage of the recovery, that is, a similarity search in the item memory for the HV(s) most similar to the noisy query HV(s). For HVs representing compositional structures of limited size, there are theoretical guarantees for the exact recovery [161].

Let us consider a simple example. The compositional HV was obtained as follows: . For recovery from , we know that it was formed as the superposition of pair-wise bindings. In the simplest setup, we also know that the input HVs include , , , . The task is to find which HVs were in those pairs. Then, to find out which HV was bound with, e.g., we unbind it with as resulting in . Then we use the clean-up procedure that returns as the closest match (using the corresponding similarity measure). That way we know that was bound with . In this setup, we immediately know that was bound with . We can check this in the same manner, by first calculating, e.g., , and then cleaning it up.

In a more complicated setup, it is not known that the input HVs were , , , , but the HVs in the item memory are known. So, we take those HVs, one by one, and repeat the operations above. A possible final step in the recovery procedure is to recalculate the compositional HV from the reconstructed HVs. The recalculated compositional HV should match the original compositional HV. Note that the recovery procedure becomes much more complex in the case where HVs are bound instead of just pairs. It is easy to recover one of the HVs used in the binding, if the other HVs are known. They can simply be unbound from the compositional HV.

When the other HVs used in the binding are not known, the simplest way to find the HVs used for binding is to compute the similarity with all possible bindings of HVs from the item memory. The complexity of this search grows exponentially with . However, there is a recent work [64, 21] that proposed a mechanism called resonator network to address this problem.

HDC/VSA Ref. Space of atomic HVs Binding Unbinding Superposition Similarity
TPR [153] unit HVs tensor product tensor-vector inner product component-wise addition
HRR [118] unit HVs circular convolution circular correlation component-wise addition
FHRR [122] complex unitary HVs component-wise multiplication component-wise multiplication with complex conjugate component-wise addition
SBDR [138] sparse binary HVs context-dependent thinning repeated context- dependent thinning component-wise disjunction
BSC [55] dense binary HVs component-wise XOR component-wise XOR majority rule
MAP [29] dense bipolar HVs component-wise multiplication component-wise multiplication component-wise addition
MCR [154] dense integer HVs component-wise modular addition component-wise modular subtraction component-wise discretized vector sum modified Manhattan
MBAT [27] dense bipolar HVs vector-matrix multiplication multiplication with inverse matrix component-wise addition
SBC [90] sparse binary HVs block-wise circular convolution block-wise circular convolution with approximate inverse component-wise addition
GAHRR [3] unit HVs geometric product geometric product with inverse component-wise addition unitary product
TABLE II: Summary of HDC/VSA models. Each model has its own atomic HVs and operations on them for binding and superposition, and a similarity measure.

2.3 The HDC/VSA models

In this section, various HDC/VSA models are overviewed. For each model we provide a format of employed HVs and the implementation of the basic operations introduced in Section 2.2.3 above. Table II provides the summary of the models (see also Table 1 in [149])555Note that the binding via the permutation operation is not specified in the table. That is because it can be used in any of the models even if it was not originally proposed to be a part of it. Here, we limit ourselves to specifying only the details of different models but see, e.g., [149] for some comparisons of seven different models from Table II (the work did not cover the GAHRR, MCR, and TPR models) and some of their variations ( for HRR, for MAP, and for SBDR)..

2.3.1 A guide on the navigation through the HDC/VSA models

Before exposing a reader to the details of each HDC/VSA model, it is important make a comment on the nature of their diversity and enable an intuition behind selecting the best model for the practical usage.

The current diversity of the HDC/VSA models is a result of the evolutionary development of the main vector symbolic paradigm by independent research groups and individual researchers. Initially, the diversity comes from different initial assumptions, variations in the neurobiological inspiration and the particular mathematical background of the originators. Therefore, from a historical perspective, the question of selecting a candidate for the best model is ill-posed. Recent work [149] started to perform a systematic experimental comparison between models. We emphasize the importance of further investigations in this direction in order to facilitate conscious choice of one or another model for a given problem and to raise HDC/VSA to the level of matured engineering discipline.

However, the usage of different models could already be prioritized at this moment using the target computing hardware perspective. The recent developments of unconventional computing hardware aim at improving the energy efficiency over the conventional von Neumann architecture in AI applications. Various unconventional hardware platforms (see Section 

4.3) deliver great promises in moving the borders of energy efficiency and operational speed beyond the current standards.

Independently on the type of the computing hardware, any HDC/VSA model can be seen as an abstraction of the algorithmic layer and can thus be used for designing computational primitives, which can then be mapped to various hardware platforms using different models [67]. In the next subsections, we present the details of the currently known HDC/VSA models.

2.3.2 Tensor Product Representations

The Tensor Product Representations model (in short, the TPR model or just TPR) is one of the earliest models within HDC/VSA. The TPR model was originally proposed by Smolensky in [153]. However, it is worth noting that similar ideas were also presented in [103, 104] around the same time but received much less attention from the research community. Atomic HVs are vectors selected uniformly at random from the Euclidean unit sphere . The binding operation is a tensor product (a generalized outer product) of HVs. So, the dimensionality of bound HVs grows exponentially with their number (e.g., 2 bound HVs have the dimension , 3 bound vectors have the dimension , and so on). The vectors need not be of the same dimension. This points to another important note that the TPR model may or may not be categorized as an HDC/VSA model depending on whether the fixed dimensionality of representations is considered a compulsory attribute of an HDC/VSA model or not.

The superposition is implemented by (tensor) addition. Since the dimensionality grows, a recursive application of the binding operation is challenging. The resultant tensor also depends on the order in which the HVs are presented. Binding similar HVs will result in similar tensors. The unbinding is realized by taking the tensor product representation of a compositional structure and extracting from it the HV of interest. For linearly independent HVs, the exact unbinding is done as the tensor multiplication by the unbinding HV(s). Unbinding HVs are obtained as the rows of inverse of the matrix with atomic HVs in columns. Approximate unbinding is done using an atomic HV instead of the unbinding HV. If the atomic HVs are orthonormal, this results in the exact unbinding.

Though the similarity measure was not specified, we assume it to be .

2.3.3 Holographic Reduced Representations

The Holographic Reduced Representations model (HRR) was developed by Plate in the early 1990s [115, 117, 118]. The HRR model was inspired by Smolensky’s TPR model [153] and Hinton’s “reduced representations” [40]. Note that due to the usage of HRR in Semantic Pointer Architecture Unified Network [18], sometimes the HRR model is also referred to as the Semantic Pointer Architecture (SPA). The most detailed source of information on HRR is Plate’s book [122].

In HRR, the atomic HVs for representing dissimilar objects are real-valued and their components are independently generated from the normal distribution with mean

and variance

. For large , the Euclidean norm is close to . The binding operation is defined on two HVs ( and ) and implemented via the circular convolution:

(10)

where is the th component of the resultant HV .

The circular convolution multiplies norms, and it preserves the unit norms of input HVs. The bound HV is not similar to the input HVs. However, the bound HVs of similar input HVs are similar. The unbinding is done by the circular correlation of the bound HV with one of the input HVs. The result is noisy, so a clean-up procedure (see Section 2.2.5) is required. There is also a recent proposal in [33] for an alternative realization of the binding operation called Vector-derived Transformation Binding (VTB). It is claimed to better suit implementations in spiking neurons and was demonstrated to recover HVs of sequences (transformed with the approach from [146]) better than the circular convolution.

The superposition operation is component-wise addition. Often, the normalization is used after the addition to preserve the unit Euclidean norm of compositional HVs. In HRR, both superposition and binding operations are commutative. The similarity measure is or .

2.3.4 Matrix Binding of Additive Terms

In the Matrix Binding of Additive Terms model (MBAT) [27] by Gallant and Okaywe, it was proposed to implement the binding operation by matrix-vector multiplication. Such an option was also briefly mentioned in [122]. Generally, it can change the dimensionality of the resultant HV, if needed.

Atomic HVs are dense and their components are randomly selected independently from . HVs with real-valued components are also possible. The matrices are random, e.g., with elements also from . Matrix-vector multiplication results can be binarized by thresholding.

The properties of this binding are similar to those of most other models (HRR, MAP, and BSC). In particular, similar input HVs will result in similar bound HVs. The bound HVs are not similar to the input HVs (which is especially evident here, since even the HV’s dimensionality could change). The similarity measure is either or . The superposition is a component-wise addition, which can be normalized. The unbinding could be done using the (pseudo) inverse of the role matrix (or the inverse for square matrices, guaranteed for orthogonal matrices, as in [163]).

In order to check if a particular HV is present in the sum of HVs that were bound by matrix multiplication, that HV should be multiplied by the matrix and the similarity of the resultant HV should be calculated and compared to a similarity threshold.

2.3.5 Fourier Holographic Reduced Representations

The Fourier Holographic Reduced Representations model (FHRR) [117, 122]

was introduced by Plate as a model inspired by HRR where the HVs’ components are complex numbers. The atomic HVs are complex-valued random vectors where each vector component can be considered as an angle (phasor) randomly selected independently from the uniform distribution over

. Usually, the unit magnitude of each component is used (unitary HVs). The similarity measure is the mean of sum of cosines of angle differences. The binding operation is a component-wise complex multiplication, which is often referred to as Hadamard product. The unbinding is implemented via the binding with an HV conjugate (component-wise angle subtraction modulo ). The clean-up procedure is used similarly to HRR.

The superposition is a component-wise complex addition, which can be followed by the magnitude normalization such that all components have the unit magnitude.

2.3.6 Binary Spatter Codes

The Binary Spatter Codes model (BSC)666This acronym was not used in the original publications but we use it here to be consistent with the later literature. was developed in a series of papers by Kanerva in the mid 1990s [52, 53, 54, 55]. As noted in [120], the BSC model could be seen as a special case of the FHRR model where angles are restricted to and . In BSC, atomic HVs are dense binary HVs with components from . The superposition is a component-wise addition, thresholded (binarized) to obtain an approximately equal density of ones and zeros in the resultant HV. This operation is usually referred to as the majority rule or majority sum. It selects zero or one for each component depending on whether the number of zeros or ones in the summands is higher and ties are broken, e.g., at random. In order to have a deterministic implementation of the majority rule, it is common to assign a fixed random HV, which is included in the superposition when the number of summands is even. Another alternative is to use an exclusive OR operation on the nearby components of the first and last summands to break a tie [35].

The binding is a component-wise exclusive OR (XOR). The bound HV is not similar to the input HVs but bound HVs of two similar input HVs are similar. The unbinding operation is also XOR. The clean-up procedure is used similar to the models above. The similarity measure is .

2.3.7 Multiply-Add-Permute

The Multiply-Add-Permute model (MAP) proposed by Gayler [29] is similar to BSC. In the MAP model, atomic HVs are dense bipolar HVs with components from . This is used more often than the originally proposed version with the real-valued components from .

The binding is component-wise multiplication. The bound HV is not similar to the input HVs. However, bound HVs of two similar input HVs are similar. The unbinding is also component-wise multiplication and requires knowledge of the bound HV and one of the input HVs for the case of two bound input HVs.

The superposition is a component-wise addition. The result of the superposition can either be left as is, normalized using the Euclidean norm, or binarized to with the sign function where ties for 0-components should be broken randomly (but deterministically for a particular input HV set). The choice of normalization is use-case dependent. The similarity measure is either or .

2.3.8 Sparse Binary Distributed Representations

The Sparse Binary Distributed Representations model (SBDR) [129], also known as Binary Sparse Distributed Codes [138, 130], proposed by Kussul, Rachkovskij, et al. emerged as a part of Associative-Projective Neural Networks [86] (see Section LABEL:PartII-sec:cognitive:APNN in Part II of this survey [73]). Below, we provide some details on the basic blocks of two versions of SBDR: Conjunction-Disjunction and Context-Dependent Thinning.

Conjunction-Disjunction


The Conjunction-Disjunction is one of the earliest HDC/VSA models. It uses binary vectors for atomic HVs [127]. Component-wise conjunction of HVs is used as the binding operation. Component-wise disjunction of HVs is used as the superposition operation. Both basic operations are defined for two or more HVs. Both operations produce HVs similar to the HVs from which they were obtained. That is, the bound HV is similar to the input HVs, unlike most of the other known binding operations, except for Context-Dependent Thinning (see below).

The HVs resulting from both basic operations are similar if their input HVs are similar. The kind of similarity preservation is different though, see [130]. The operations are commutative so the resultant HV does not depend on the order in which the input HVs are presented. This could be easily changed by using, e.g., permuted HVs where some random permutations are used to represent the order.

The probability of 1-components in the resultant HV can be controlled by the probability of 1-components in the input HVs. Sparse HVs allow using efficient algorithms such as inverted indexing or auto-associative memories for similarity search. Since conjunction decreases the density of the resultant HV compared to the density of the input HVs, it is literally an implementation of “reduced representations” from [40]. At the same time, the recursive application of such binding leads to HVs with all components set to zero. However, the superposition by disjunction increases the number of 1-components. These operations can therefore be used to compensate each other (up to a certain degree; see the next section). A modified version of the Conjunction-Disjunction preserves the density of 1-components in the resultant HV as shown in the following subsubsection.

Context-Dependent Thinning


The Context-Dependent Thinning (CDT) procedure proposed by Rachkovskij and Kussul [130] can be considered either as a binding operation or as a combination of superposition and binding. It was already used in the 1990s [127, 89, 87, 86] under the name “normalization” since it approximately maintains the desired density of 1-components in the resultant HV. This density, however, also determines the “degree” or “depth” of the binding as discussed in [130].

The CDT procedure is implemented as follows. First, input binary HVs are superimposed by component-wise disjunction as:

(11)

This leads to an increased number of 1-components in the resultant HV and the input HVs are not bound yet. Second, in order to bind the input HVs, the resultant HV from the first stage is permuted and the permuted HV is conjuncted with the HV from the first stage. This binds the input HVs and reduces the density of 1-components. Third, such conjunctive bindings can be obtained by using different random permutations or by recursive usage of a single permutation. They are superimposed by disjunction to produce the resultant HV. These steps are described by the following equation:

(12)

where denotes the th random permutation. The density of the resultant HV depends on the number of permutation-disjunctions . If many of them are used, we eventually get the HV from the first stage, i.e., with no binding of the input HVs. The described version of the CDT procedure is called “additive”. Several alternative versions of the CDT procedure were proposed in [130].

The CDT procedure preserves the information about HVs that were bound together. So the resultant HV can be bound with other HVs “as a whole”. The CDT procedure is commutative. It is defined for several input HVs. As well as conjunction-disjunction, the CDT procedure preserves both unstructured and structured similarity.

Note that, for SBDR, the unbinding is not necessary since the bound HV is similar to the input HVs. However, there is an analog of the unbinding in SBDR by repeated the binding of the bound HV with some of the input HVs.

2.3.9 Sparse Block Codes

The main motivation behind the Sparse Block Codes model (SBC), proposed by Laiho et al. [90, 24], is to use sparse binary HVs as in SBDR but with a binding operation where the bound HV is not similar to the input HVs (in contrast to SBDR).

Similar to SBDR, atomic HVs in the SBC model are sparse and binary with components from . HVs, however, are imposed with an additional structure. An HV is partitioned into blocks of equal size (that the HV’s dimensionality is a multiple of the block size). In each block there is only one nonzero component, i.e., the activity in each block is maximally sparse as in Potts associative memory, see, e.g., [34].

In SBC, the binding operation is defined for two HVs and implemented via circular convolution, which is applied block-wise. The unbinding is done by the block-wise circular correlation of the bound HV with one of the input HVs.

The superposition operation is a component-wise addition. If the compositional HV is going to be used for binding, it might be binarized by leaving only one of the components with the largest magnitude within each block active. If there are several components with the same largest magnitude then a deterministic choice (e.g., the component with the largest position number could be chosen) can be made. The similarity measure is or .

2.3.10 Modular Composite Representations

The Modular Composite Representations model (MCR) [154] proposed by Snaider and Franklin shares some similarities with FHRR, BSC, and SBS. Components of atomic HVs are integers drawn uniformly from some limited range (denoted as ), which is a parameter of the MCR model. The binding operation is defined as component-wise modular addition (the module value depends on the range limit), which generalizes XOR used for binding in BSC. Binding properties are similar to most of other models. The unbinding operation is the component-wise modular subtraction.

The similarity measure is a variation of the Manhattan distance [154]:

(13)

The superposition operation resembles that of FHRR. Integers are interpreted as discretized angles on a unit circle. First, phasors are superimposed for each component (i.e., vector addition is performed). Second, the result of the superposition is normalized by setting the magnitude to one and the phase to the nearest phase corresponding to an integer from the defined range.

2.3.11 Geometric Analogue of Holographic Reduced Representations

As its name suggests, the Geometric Analogue of Holographic Reduced Representations model (GAHRR) [3] (proposed by Aerts, Czachor and De Moor) was developed as a model alternative to HRR. An earlier proposal was a geometric analogue to BSC [2]. The main idea is to reformulate HRR in terms of geometric algebra. The binding operation is implemented via the geometric product (a generalization of the outer product). The superposition operation is component-wise addition.

So far, GAHRR is mainly an interesting theoretical effort, as its advantages over conceptually simpler models are not particularly clear, but it might become more relevant in the future. Readers interested in GAHRR are referred to the original publications. An introduction to the model is given in  [4, 3, 112, 111], examples of representing data structures with GAHRR are in  [114] and some experiments comparing GAHRR to other HDC/VSA models are presented in  [113, 110].

2.4 Information capacity of HVs

An interesting question, which is often brought up by newcomers to the area, is how much information one could store in the superposition of HVs. This value is called the information capacity of HVs. Usually, it is assumed that atomic HVs in superposition are random and dissimilar to each other. In other words, one could think about such superposition as an HV representing, e.g., a set of symbols. In general, the capacity depends on parameters such as the number of symbols in the item memory (denoted as ), the dimensionality of HVs (), the type of the superposition operation, and the number of HVs being superimposed ().

Early results on the capacity were given in [117, 122]. Some ideas for the case of binary/bipolar HVs in BSC, MAP, and MBAT were also presented in [72, 27]. The capacity of SBDR was analyzed in [75]

. The most general and comprehensive analysis of the capacity of different HDC/VSA models (and also some classes of recurrent neural networks) was recently presented in 

[23]. The key idea of the capacity theory [23] can be illustrated by the following scenario when an HV to be recovered contains a valid HV from the item memory and crosstalk noise from some other elements of the compositional HV (e.g., from role-filler bindings, see Section 3.1.3). Statistically, we can think of the problem of recovering the correct atomic HV from the item memory as as a detection problem with two normal distributions: hit and reject; where hit corresponds to the distribution of similarity values (e.g.,

) of the correct atomic HV while reject is the distribution of all other atomic HVs (assuming all HVs are random). Each distribution is characterized by its corresponding mean and standard deviation:

& and & , respectively. Given the values of , , , and , we can compute the expected accuracy () of retrieving the correct atomic HV according to:

(14)

where is the cumulative Gaussian and denotes the size of the item memory.

Fig. 2: The analytical and empirical accuracies of recovering sequences from HVs against the sequence length for three HDC/VSA models, , . The reported empirical accuracies were averaged over simulations with randomly initialized item memories.

Fig. 2 depicts the accuracy of retrieving sequence elements from its compositional HV (see Section 3.3) for three HDC/VSA models: BSC, MAP, and FHRR. The accuracies are obtained either empirically or with (14). As we can see, the capacity theory (14) perfectly predicts the expected accuracy.

The capacity theory [23] has recently been extended to also predict the accuracy of HDC/VSA models in classification tasks [76]. [161] presented bounds for the perfect retrieval of sets and sequences from their HVs. Recent works [98, 149] have reported empirical studies of the capacity of HVs. Some of these results can be obtained analytically using the capacity theory. Additionally, [160, 23, 65, 37] elaborated on methods for recovering information from compositional HVs beyond the standard nearest neighbor search in the item memory, reaching to the capacity of up to 1.2 bits/component [37]. The works above were focused on the case where a single HV was used to store information but as it was demonstrated in [13] the decoding from HVs can be improved if the redundant storage is used.

3 Data transformation to HVs

As mentioned above, the similarity of SRs and LRs is all-or-none: identical symbols are considered maximally similar, whereas different symbols are considered dissimilar. On the other hand, the similarity of structures consisting of symbols (such as sequences, graphs, etc.) is often calculated using computationally expensive procedures such as an edit distance. HVs are vector representations, and, therefore, rely on simple vector similarity measures that can be calculated component-wise and provide the resultant similarity value immediately. This also concerns similarity of compositional structures. Thus, for example, the similarity of relational structures, taking into account the similarity of their elements, their grouping, and order, can be estimated by measuring the similarity of their HVs, without the need for explicit decomposition such as following edges or matching vertices of underlying graphs. Moreover, HDC/VSA can overcome problems with SRs and LRs concerning the lack of semantic basis (i.e., the lack of immediate similarity of objects; see Section 2.1.1). HDC/VSA overcomes these problems by explicit similarity in their representations as HVs do not have to represent similarity in all-or-none fashion. These promises, however, bring the problem of designing concrete transformations that form HVs of various compositional structures such that the similarity between their HVs will reflect the similarity of the underlying objects.

In this section, we consider how data of various types are transformed to HVs to create such representations. The data types include symbols (Section 3.1.1), sets (Section 3.1.2), role-filler bindings (Section 3.1.3), numeric scalars and vectors (Section 3.2), sequences (Section 3.3), 2D images (Section 3.4), and graphs (Section 3.5).

3.1 Symbols and Sets

3.1.1 Symbols

As mentioned in Section 2.2.1, the simplest data type is a symbol. Usually, different symbols are considered dissimilar, therefore, using i.i.d. random HVs for different symbols is a standard way of transforming symbols into HVs. Such HVs correspond to SRs (Section 2.1.1) since they behave like symbols in the sense that the similarity between the HV and its copy is maximal (i.e., the same symbol) while the similarity between two i.i.d. random HVs is minimal (i.e., different symbols).

3.1.2 Sets

In LRs, sets are often represented as binary “characteristic vectors”, where each vector component corresponds to a particular symbol from the universe of all possible symbols. Symbols present in a particular set are represented by one in the corresponding vector component. In the case of multisets, the values of the components of the characteristic vector are the counters of the corresponding symbols.

In HDC/VSA, a set of symbols is usually represented by an HV formed by the superposition of the symbols’ HVs. The compositional HV preserves the similarity to the symbols’ HVs. Notice that Bloom filter [8] – a well-known data structure for approximate membership – is an HV where the superposition operation is obtained by the component-wise disjunction of binary HVs of symbols (as in SBDR) and, thus, can be seen as a special case of HDC/VSA [74].

Also note that, in principle, multiplicative bindings can be used to represent sets [101, 69], however, the similarity properties will be completely different from the representation by the superposition.

3.1.3 Role-filler bindings

Role-filler bindings, which are also called slot-filler or key-value pairs, are a very general way of representing structured data records. For example, in LRs the role (key) is the component’s ID, whereas the filler (value) is the component’s value. In HDC/VSA, this is represented by the result of binding. Both multiplicative binding and binding by permutation can be used. With multiplicative binding, both the role and the filler are transformed to HVs, which are bound together. When using permutation, it is common that the filler is represented by its HV, while the role is associated with either a unique permutation or a number of times that some fixed permutation should be applied to the filler’s HV.

In the case of multiplicative binding, the associations do not have to be limited to two HVs and, therefore, data structures involving more associations can be represented. For example, the representation of “schemas” [106] might involve binding three HVs corresponding to {context, action, result}777Note that the original proposal in [106] was based of the superposition of three role-filler bindings..

A set of role-filler bindings is represented by the superposition of their HVs. Note that the number of (role-filler) HVs superimposed is limited to preserve the information contained in them, e.g., if one wants to recover the HVs being superimposed (see Sections 2.2.5 and 2.4).

3.2 Numeric scalars and vectors

In practice, numeric scalars are the data type present in many tasks, especially as components of vectors. Representing close numeric values with i.i.d. random HVs does not preserve the similarity of the original data. Therefore, when transforming numeric data to HVs, the usual requirement is that the HVs of nearby data points are similar and those of distant ones are dissimilar.

We distinguish three approaches to the transformation of numeric vectors to HVs [129]: compositional (Section 3.2.1), explicit receptive fields (Section 3.2.2), and random projections (Section 3.2.3).

3.2.1 Compositional approach

In order to represent close values of a scalar by similar HVs, correlated HVs should be generated, so that the similarity decreases with the increasing difference of scalar values.

Usually, a scalar is first normalized into some pre-specified range (e.g., ). Next it is quantized into finite grades (levels). Since HVs have a finite dimension and are random, they can reliably (with non-negligible differences in similarity) represent only a finite number of scalar grades. So only a limited number of grades are represented by correlated HVs, usually up to several dozens.

Early schemes for representing scalars by HVs were proposed independently in [127] and [152]. These and other schemes [134, 125] can be considered as implementing some kind of “flow” from one component set of an HV to another, with or without substitution. For example, “encoding by concatenation” [134]

uses two random HVs: one is assigned to the minimum grade in the range, while the second one is used to represent the maximum grade. The HVs for intermediate grades are formed using some form of interpolation, such as the concatenation of the parts of the HVs proportional to the distances from the current grade to the minimum and maximum one. Similar types of interpolations were proposed in 

[11, 169, 75]. The scheme in [152] and the “subtractive-additive” one [134] start from a random HV and recursively flip the values of some components to obtain the HVs of the following grades. This scheme was popularized, e.g., in [141, 75], for representing values of features when applying HDC/VSA in classification tasks.

Another early scheme proposed for complex and real-valued HVs is called fractional power encoding (see Section 5.6 in [117]). It is based on the fact that complex-valued HVs (random angles on a unit circle) can be (component-wise) exponentiated to any value:

(15)

where is an HV representing scalar , is a random HV called the base HV [22], and

is the bandwidth parameter controlling the width of the resultant similarity kernel. This approach requires neither normalizing scalars in a pre-specified range nor quantizing them. The fractional power encoding can be immediately applied to FHRR. It is also used for HRR by making complex-valued HV using fast Fourier transform, exponentiating, and then making inverse fast Fourier transform to return to real-valued HVs (see 

[22] for the details).

In the compositional approach to numeric vector representation by HVs ( [127, 156, 135], we first form HVs for scalars (i.e., for components of a numeric vector). Then, HVs corresponding to values of different components are combined (often using the superposition but the multiplicative binding is used as well) to form the compositional HV representing the numeric vector. If scalars in different components are represented by dissimilar item memories, HVs can be superimposed as is. Sometimes it is more practical to keep a single scalar item memory 888 Such item memory is often called continuous item memory since it stores correlated HVs., which is shared across all components. In this case, prior to applying the superposition operation, one has to associate the scalar’s HV with its component identity in the numeric vector. This can be done by representing role-filler bindings (see Section 3.1.3) with some form of binding.

Note that different schemes and models provide different types of similarity, for example, Manhattan distance () in [135, 161]. Note also that, unlike usual vectors, where the components are orthogonal, role HVs of nearby components could be made to correlate (e.g., nearby filtration banks are more correlated than banks located far away).

3.2.2 Explicit Receptive Fields

The receptive field approach is often called coarse coding since a numeric vector is coarsely represented by the receptive fields activated by the vector. There are several well-known schemes utilizing explicit receptive fields, such as Cerebellar Model Articulation Controller (CMAC) [6], Prager Codes [124], and Random Subspace Codes (RSC) [88, 79]

, where receptive fields are various kinds of randomly placed and sized hyperrectangles, often in only some of the input space dimensions. It is worth noting that these schemes form (sparse) binary HVs. Similar approaches, however, can produce real-valued HVs, e.g., by using Radial Basis Functions as receptive fields 

[5, 105].

Details and comparisons can be found in [133, 132]. In particular, a similarity function between the input numeric vectors represented by RSCs was obtained.

3.2.3 Random projections based approach

Random projection (RP) is another approach which allows forming an HV of a numeric vector by multiplying it by an RP matrix :

(16)

and possibly performing some normalization, binarization and/or sparsification. It is a well-known approach, which has been extensively studied in mathematics and computer science [45, 43, 109, 165, 128]. Originally, the RP approach was used in the regime where the resultant vectors of smaller dimensionality were produced (i.e., used for dimensionality reduction). This is useful for, e.g., fast estimation of of original high-dimensional numeric vectors (as well as and ). Well-known applications are in similarity search [62, 43] and compressed sensing [15]. First, a random with components from the normal distribution was investigated [45, 43, 165], but then the same properties were proved for RP matrices with components from (bipolar matrices) and (ternary and sparse ternary matrices) [1, 94, 49].

In the context of HDC/VSA, the RP approach using a sparse ternary matrix was first applied in [50] (see also Section LABEL:PartII-sec:random:indexing of Part II of the survey [73] for details). In [99, 100] (see Section LABEL:PartII-sec:context:HVs of Part II of the survey [73]), it was proposed to binarize or ternarize (by thresholding) the result of , which produced sparse HVs. The analysis of a estimation by sparse HVs was first reported in 2006/2007 and then published in [131, 139, 140], whereby the RP with sparse ternary and binary matrices was used. In [14], the RP with sparse binary matrices was used for expanding the dimensionality of the original vector. Note that increasing the dimensionality of the original vector has been widely used previously in machine learning classification tasks as it allows linearizing nonlinear class boundaries. In [7], thresholding of the HV produced by was investigated and applied in the context of fast (i.e., sublinear versus the number of objects in the base) similarity search.

It is common to use a single RP matrix to form an HV, but there are some approaches that rely on the use of several RP matrices [144]: , where determines the contribution of th RP to the resultant HV.

An appealing feature of the RP approach is that it can be applied to input vectors of an arbitrary dimensionality and number of component gradations, unlike the compositional or the receptive fields approaches. Some theoretical analysis of using RP for forming HVs was performed in [161].

3.3 Sequences

Below we present several standard approaches for representing sequences with HVs. These approaches include binding of elements’ HVs with their position HVs, bindings using permutations, binding with the context HVs, and using HVs for -grams. For an overview of these approaches refer to, e.g., [155].

3.3.1 Binding of elements’ HVs with their position or context HVs

One of the standard ways of representing a sequence is to use a superposition of role-filler binding HVs where the filler is a symbol’s HV and the role is its position’s HV [89, 119, 35]. In [89], the order was represented by the weight of the subset of components selected from the initial HVs of sequence symbols (e.g., each preceding symbol was represented by more 1-components of the binary HV than the succeeding symbol). In [136], it was proposed to use thinning from the HVs of the previous sequence symbols (context). It is, however, most common to use i.i.d. random HVs to represent positions. An approach to avoid storing HVs for different positions is to use HDC/VSA models where the binding operation is not self-inverse. In this case, the absolute position in a sequence is represented by binding the position HV to itself times  [116, 119]. Such a mechanism is called “trajectory association”. The trajectory association can be convenient when the sequence representation is formed sequentially, allowing the sequence HV to be formed in an incremental manner. However, the sequence HV does not preserve the similarity to symbols’ HVs in nearby positions, since the position HVs are not similar and so the role-filler binding HVs are also dissimilar too. The MBAT model (Section 2.3.4) has similar properties, where the position binding is performed by multiplying by a random position matrix.

In order to make symbols in nearby positions similar to each other, one can use correlated position HVs [155] such that, e.g., shifted sequences will still result in a high similarity between their HVs. This approach was also proposed independently in [11, 169] (see Section 3.2). Similar ideas for 2D images are discussed in Section 3.4.2.

Also, the HV of the next symbol can be bound to the HV of the previous one, or to several (weighted) HVs of previous ones, i.e., implementing the “binding with the context” approach [136].

3.3.2 Representing positions via permutations

An alternative way of representing position in a sequence is to use permutation [83, 147, 57]. Before combining the HVs of sequence symbols, the order of each symbol is represented by applying some specific permutation to its HV times (e.g., ).

However, such a permutation-based representation does not preserve the similarity of the same symbols in nearby positions, since the permutation does not preserve the similarity between the permuted HV and the original one. To preserve the similarity of HVs when using permutations, in [84, 12], it was proposed to use partial (correlated) permutations.

3.3.3 Compositional HVs of sequences

Once symbols of a sequence are associated with their positions, the last step is to combine the sequence symbols’ HVs into a single compositional HV representing the whole sequence. There are two common ways to combine these HVs.

The first way is to use the superposition operation, similar to the case of sets in Section 3.1.2. For example, for the sequence (a,b,c,d,e) the resultant HV (denoted as ) is:

(17)

Here we exemplified the representation of positions via permutations but the composition will be the same for other approaches. The advantage of the approach with the superposition operation is that it is possible to estimate the similarity of two sequences by measuring the similarity of their HVs.

The second way of forming the compositional HV of a sequence involves binding of the permuted HVs, e.g., the sequence above is represented as (denoted as ):

(18)

The advantage of this sequence representation is that it allows forming pseudo-orthogonal HVs even for sequences that differ in only one position. Similar to the trajectory association, extending a sequence can be done by permuting the current sequence’s HV and adding or multiplying it with the next HV in the sequence, hence, incurring a fixed computational cost per symbol.

Note that compositional HVs of sequences can also be of a hybrid nature. For example, when representing positions via permutations the similarity of the same symbols in nearby positions is not preserved. One way to preserve similarity when some of the symbols in a sequence are permuted was presented in [70], where compositional HVs of sequences included two parts: a representation of a multiset of symbols constituting a sequence (a bag-of-symbols) and an ordered representation of symbols. The first part is transformed into an HV as a multiset of symbols (Section 3.1.2) by superimposing atomic HVs of all symbols present in a sequence. The second part is transformed into an HV as a sequence using permutations of symbols’ HVs to encode their order (Section 3.3.2) . Both representations are superimposed together to obtain the compositional HV.

3.3.4 n-grams

-grams are consecutive symbols of a sequence. In the -gram representation of a sequence, all -grams of the sequence are extracted, sometimes for different . Usually, a vector containing -gram statistics is formed such that its components correspond to different -grams. The value of the component is the frequency (counter) of the occurrence of the corresponding -gram.

There are various transformations of -grams into HVs. The possibility of representing multisets with HVs (Section 3.1.2) allows forming HVs representing -gram statistics as the superposition of HVs of individual -grams. So the representations of -grams in HVs are usually different in the following aspect:

  • They distinguish different -grams, where “different” means even a single unique symbol (e.g., (abc) vs. (abd)) or the same symbols in different positions (e.g., (abc) vs. (acb));

  • They form similar HVs for similar -grams.

To form dissimilar HVs for different -grams, each -gram can be assigned an i.i.d. random HV. However, an -gram HV is usually generated from the HVs of its symbols. So in order to save space for the HV representations of -grams, it is possible to represent them using compositional HVs [137, 47] instead of generating an atomic HV for each -gram. Thus, the approaches above for representing sequences can be used. In [137, 47, 66], the permuted HVs of the -gram symbols were bound by multiplicative binding. Alternatively, in [46, 48], the multiplicative binding of pairs of HVs was performed recursively, whereby the left HV corresponds to the already represented part of the -gram, and the right HV corresponds to the next element of the -gram. The left and right HVs are permuted by different fixed random permutations.

In contrast to the approaches above, in [145, 42] the permuted HVs of the -gram symbols were not bound multiplicatively, but were superimposed, which gave similar HVs for similar -grams. In [35], the superposition of HVs corresponding to symbols in their positions was also used. However, the position of a symbol in a bi-gram (possibly from non-adjacent symbols) was specified not by a permutation, but by the multiplicative binding with the HVs of the left and right positions.

3.3.5 Stacks

A stack can be seen as a special case of a sequence where elements are either inserted or removed in a last-in-first-out manner. At any given moment, only the top-most element of the stack can be accessed and elements written to the stack before are inaccessible until all later elements are removed. HDC/VSA-based representations of stacks were proposed in [117, 157, 171]. The HV of a stack is essentially the representation of a sequence with the superposition operation that always moves the top-most element to the beginning of the sequence.

3.4 2D images

There is a number of proposals for representing objects with a two-dimensional structure, such as 2D images, in HDC/VSA models. In this section, we cluster the existing proposals for representations into three groups: permutation-based, role-filler binding-based, and neural network-based.

3.4.1 Permutation-based representations

As discussed in Section 2.2.3, permutations are used commonly as a way of implementing the binding operation. In the context of 2D images, permutations can be used by assigning one permutation for the -axis and another one for the -axis [83, 82]. The simplest way to implement these permutations is to use cyclic shifts. Then, to bind an HV of a pixel’s value (or of an image feature) with its position, -axis permutation is applied times and -axis permutation is applied times [101, 69]. The whole image is then represented as a compositional HV containing the superposition of all permuted HVs of pixels’ values. HVs representing pixels’ (features’) values can be formed using any of the approaches for representing scalars (see Section 3.2). Since permuted HVs are not similar in any of the works above, the same pixel’s value even in nearby positions will be represented by dissimilar HVs.

The issue of the absence of similarity in permutation-based representations was addressed in [84, 81] by using partial permutations. The number of permuted components of some pixel’s value HV (or, more generally, of some feature’s HV) increases with the coordinate change (within some radius), up to the full permutation. So, inside the radius the similarity decreases. Outside the radius, the same process is repeated again. As a result, a pixel’s value HV at (x,y) is similar to the HVs in nearby positions within the similarity radius. The features whose positions are represented are Random Local Descriptors (RLD). For instance, for binary images, the RLD feature is detected if some (varied for different features) pixels inside a local receptive field take zero and one values. Note that RLD may be considered as a version of receptive fields approach (Section 3.2.2) and can form HVs representing images, as in the case of the LInear Receptive Area (LIRA) features  [85].

3.4.2 Role-filler binding-based representations

In a 2D image, it is natural to represent a pixel’s position and its value as a role-filler binding, where the pixel’s position HV is the role and an HV corresponding to the pixel’s value is the filler. Similar to the case of permutations, the image HV is formed as a compositional HV containing the superposition of all role-filler binding HVs.

When it comes to pixels position HVs, the simplest approach is to treat each position as if it would be a unique symbol so that it can be assigned with a unique random HV. This approach allows forming image HVs through the role-filler bindings. It was investigated in [63, 72, 96] for real-valued and dense binary HVs and in [71] for sparse binary HVs. The approach with unique HVs for roles, however, neither imposes any relations between pixels’ position HVs nor does it allow preserving the similarity of pixels’ position HVs in a local neighborhood, which, e.g., is useful for robustness against small changes in the 2D image.

To address the latter issue, in [80] a method was described where the HVs of nearby and coordinates were also represented as correlated HVs, using approaches for representing scalars (Section 3.2). Then the pixel value in a position (x,y) was represented by binding three HVs: one representing a pixel’s value at (x,y) and two representing the and coordinates themselves (component-wise conjunction was used for the binding in [80]). Thus, similar values in close proximity were represented by similar HVs.

MOving MBAT (MOMBAT) [26] was proposed to explicitly preserve the similarity of HVs for nearby pixels. As in the MBAT model (Section 2.3.4), matrices were used as roles. In particular, the matrix for each coordinate (x,y) was formed as a weighted superposition of two random matrices. Thus, for both x and y, nearby coordinates were represented by similar (correlated) matrices. Then, each (x,y) pair was represented by a matrix obtained by binding (multiplying) those corresponding matrices. A value of a pixel at (x,y) was represented by binding its HV with the matrix of (x,y). Finally, the obtained HVs were superimposed. All the HVs participating in MOMBAT had real-valued components. A similar approach with local similarity preservation for BSC and dense binary HVs was considered in [10].

The Fractional Power Encoding approach (Section 3.2.1) to the representation of 2D images was employed in [168, 20, 78]. It imposes a structure between pixels’ positions and can preserve similarity in a local neighborhood. In particular, the image HV is formed as follows. Two random base HVs are assigned for the -axis and -axis, respectively. In order to represent and coordinates, the corresponding base HVs are exponentiated to coordinate values; and - the HV for the (x,y) coordinate - is formed as their binding (assuming cf. (15)):

(19)

Finally, the binding with the HV of the pixel value is done. The variant of the fractional power encoding-based representation in hexagonal coordinates was presented in [77].

3.4.3 Neural network-based representations

Since neural networks are currently one of the main tools for processing 2D images, it is becoming common to use them for producing HVs of 2D images. This is especially relevant since directly transforming pixel values into HVs does not usually provide a good representation for solving machine learning tasks with competitive performance. For example, there were some studies [96, 36]

directly transforming 2D images in MNIST to HVs. The reported accuracy was, in fact, lower than the one obtained with, e.g., a

NN classifier applied directly to pixels’ values. Therefore, it is important to either apply some feature extraction techniques (as was, e.g., the case in 

[85, 81]) or use neural networks as a front-end.

One of the earliest attempts to demonstrate this was presented in [173, 172]

. The proposed approach takes activations of one of the hidden layers of a convolutional neural network caused by an input image. These activations are then binarized and the result of the binarization is used as an initial grid state of an elementary cellular automaton. The automaton is evolved for several steps and the results of previous steps are concatenated together, the result of which is treated as an HV corresponding to the image. The automaton evolution performs nonlinear feature extraction, in the spirit of LIRA and RLD.

Later works used activations of convolutional neural networks without extra cellular automata computations. For example, in [107, 102] pre-trained off-the-shelf convolutional neural networks were used while in [61]

a network’s attention mechanism and loss function were specifically designed to produce HVs with desirable properties. For example, it directed the output of the neural network to assign pseudo-orthogonal HVs to images from different classes 

[61]. It is important to note that obtaining HVs from neural networks is a rather new direction and there are no studies that would scrupulously compare HVs obtained from different neural network architectures.

3.5 Graphs

3.5.1 Undirected and directed graphs

Fig. 3: An example of an undirected and a directed graph with nodes. In the case of the undirected graph, each node has two edges.

A graph, denoted as , consists of nodes and edges. Edges can either be undirected or directed. Fig. 3 presents examples of both directed and undirected graphs.

First, we consider the following simple transformation of graphs into HVs [28]. A random HV is assigned to each node of the graph, following Fig. 3 node HVs are denoted by letters (i.e., for node “a” and so on). An edge is represented as the binding of HVs of the connected nodes, e.g., the edge between nodes “a” and “b” is . The whole graph is represented simply as the superposition of HVs (denoted as ) representing all edges in the graph, e.g., the undirected graph in Fig. 3 is:

(20)

To represent directed graphs, the directions of the edges should be included into their HVs. This could be done, e.g., by applying a permutation, so that the directed edge from node “a” to “b” in Fig. 3 is represented as . Thus, the directed graph in Fig. 3 is represented by the following HV:

(21)

The described graph transformation can be recovered (Section 2.2.5). For graphs that have the same node HVs, the dot product is a measure of the number of overlapping edges. The described graph representations do not represent isolated vertices, but this could be fixed, see [67].

3.5.2 Labelled directed ordered graphs

When modeling, e.g., analogical reasoning (Section LABEL:PartII-sec:analogical:reasoning of Part II of the survey [73]

) or knowledge graphs, it is common to use graphs where both edges and nodes have associated labels, and edges are directed. Let us consider the representation of a graph shown in Fig. 

4, which can also be written in the bracketed notation as: cause(bite(Spot,Jane),flee(Jane,Spot)).

Using the role-filler binding approach (see Section 3.1.3), a relation, e.g., bite(Spot,Jane), is represented by HVs (in HRR or BSC) as follows [122]:

(22)

Here , , and are atomic HVs, while and are the compositional ones. In the same way, is formed, and finally the HV of the whole graph:

(23)

This particular representation was chosen in order to preserve the similarity of the resultant HV to the HVs of its various elements and influence the similarity of the graph HVs, because of the binding operation properties in HRR and BSC. In SBDR, since the binding preserves unstructured similarity, the formation of the resultant HV of the graph is more compact [138]:

(24)
(25)
(26)

Also, Predicate-Arguments relation representation using random permutations to represent the order of relation arguments was proposed in [130, 138, 59]:

(27)
(28)
(29)

Note that in the last six equations refers to the CDT procedure. These types of graph representations by HVs will be further discussed in Section LABEL:PartII-sec:analogical:retrieval of Part II of the survey [73] when describing the application of HDC/VSA to analogical reasoning.

Fig. 4: An example of a labelled graph representing the episode – cause(bite(Spot,Jane),flee(Jane,Spot)).

Representations of knowledge graphs were further studied in  [95]

. The work proposed to use a Cauchy distribution to generate atomic HVs and demonstrated the state-of-the-art results on the task of inferring missing links using HVs of knowledge graphs as an input to a neural network.

HVs of knowledge graphs can also be constructed using HVs of nodes and relations that are learned from data as proposed in [108], where HRR was used due to its differentiability.

3.5.3 Trees

Trees are an instance of graphs where a node at the lower level (child) belongs to a single higher-level node (parent). Therefore, trees can be represented in the same manner as the graphs above. Please refer to the examples of the transformations to HVs given, e.g., in [130] for unordered binary trees and in [21, 67] for ordered binary trees.

4 Discussion

In this section, we only discuss some of the HDC/VSA aspects which were covered in this first part of the survey. Please refer to Part II [73] for an extensive discussion of the aspects related to application areas, interplay with neural networks, and open issues.

4.1 Connections between HDC/VSA models

As presented in Section 2.3, there are several HDC/VSA models. Currently, there are many unanswered questions about the connections between models. For example, some models use interactions between multiple components of HVs when performing the multiplicative binding operation (e.g., circular convolution or the Context-Dependent Thinning procedure). Other models (e.g., Fourier Holographic Reduced Representations and Binary Spatter Codes) use multiplicative binding operations that are component-wise. It is not always clear in which situations one is better than the other. Therefore, it is important to develop a more principled understanding of the relations between the multiplicative binding operation and of the scope of their applicability. For example, a recent theoretical result in this direction [24] is that under certain conditions the multiplicative binding in Multiply-Add-Permute is mathematically equivalent to the multiplicative binding in Tensor Product Representations.

Moreover, in general it is not clear whether there is one best model, which will dominate all others, or whether each model has its own applicability scope. A recent work [149] started to perform a systematic experimental comparison between the models. We believe that this line of work should be continued. Finally, there is the question of whether there is a need for searching new HDC/VSA models.

4.2 Theoretical foundations of HDC/VSA

It should be noted that HDC/VSA have largely started as an empirical area. At the same time, HDC/VSA implicitly used well-known mathematical phenomena such as concentration of measure [24] and random projection [161].

Currently, there is a demand for laying out solid theoretical principles for HDC/VSA. We are starting to see promising results in this direction. For example, there is a recent “capacity theory” [23] (Section 2.4), which provides a way of estimating the amount of information that could be reconstructed from HVs using the nearest neighbor search in the item memory. Another example are the upper bound guarantees for the reconstruction of data structures from HVs [161]. In [24] manipulations of HVs in HDC/VSA were shown to be connected to the compressed sensing.

We foresee that, as HDC/VSA will be exposed more to theoretical computer scientists, we will see more works building the theoretical foundations of the area.

4.3 Implementation of the HDC/VSA models in hardware

A large part of the promise of HDC/VSA is based on the fact that they are suitable for implementations on a variety of unconventional hardware such as neuromorphic hardware [25], in-memory computing [60], monolithic 3D integration hardware [93, 170], etc. It is interesting that the motivation to use HDC/VSA on specialized hardware was present from the very beginning. An early example are specialized neurocomputers [87] built to operate with Sparse Binary Distributed Representations. Moreover, the topic of hardware implementations has received a lot of attention recently. This is mostly due to the fact that modern machine learning algorithms such as deep neural networks require massive resources to train and run them [159]. In our opinion, an additional motivation comes from the fact that HDC/VSA can be seen as an abstraction algorithmic layer and can, thus, be used for designing computational primitives, which can then be mapped to various hardware platforms [67]. Nevertheless, hardware for HDC/VSA is an active research area and a topic of its own, therefore we have decided to leave it outside the scope of this survey. However, we expect that in the coming years this topic is going to play an important role in the development of the community.

5 Conclusion

In this Part I of the survey, we provided a comprehensive coverage of the computing framework known under the names Hyperdimensional Computing and Vector Symbolic Architectures. We paid particular attention to existing Hyperdimensional Computing/Vector Symbolic Architectures models and the transformations of input data of various types into hypervector representations.

Part II of the survey [73] reviews known applications, touches upon cognitive modeling and cognitive architectures, and discusses the open problems along with the most promising directions for the future work.

References

  • [1] D. Achlioptas (2003) Database-friendly Random Projections: Johnson-Lindenstrauss with Binary Coins. Journal of Computer and System Sciences 66 (4), pp. 671–687. Cited by: §3.2.3.
  • [2] D. Aerts, M. Czachor, and B. D. Moor (2006) On Geometric Algebra Representation of Binary Spatter Codes. arXiv:cs/0610075 (), pp. 1–5. Cited by: §2.3.11.
  • [3] D. Aerts, M. Czachor, and B. D. Moor (2009) Geometric Analogue of Holographic Reduced Representation. Journal of Mathematical Psychology 53 (), pp. 389–398. Cited by: §2.3.11, §2.3.11, TABLE II.
  • [4] D. Aerts and M. Czachor (2008) Tensor-product versus Geometric-product Coding. Physical Review A 77 (012316), pp. 1–7. Cited by: §2.3.11.
  • [5] M. A. Aiserman, E. M. Braverman, and L. I. Rozonoer (1964)

    Theoretical Foundations of the Potential Function Method in Pattern Recognition

    .
    Avtomatika i Telemekhanika 25 (6), pp. 917–936. Cited by: §3.2.2.
  • [6] J. S. Albus (1975) Data Storage in the Cerebellar Model Articulation Controller (CMAC). Journal of Dynamic Systems, Measurement and Control 97 (3), pp. 228–233. Cited by: §3.2.2.
  • [7] A. Becker, L. Ducas, N. Gama, and T. Laarhoven (2016) New Directions in Nearest Neighbor Searching with Applications to Lattice Sieving. In Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 10–24. Cited by: §3.2.3.
  • [8] B. H. Bloom (1970) Space/Time Trade-offs in Hash Coding with Allowable Errors. Communications of the ACM 13 (7), pp. 422–426. Cited by: §3.1.2.
  • [9] P. Blouw, E. Solodkin, P. Thagard, and C. Eliasmith (2016) Concepts as Semantic Pointers: A Framework and Computational Model. Cognitive Science 40 (5), pp. 1128–1162. Cited by: footnote 1.
  • [10] A. Burco (2018)

    Exploring Neural-symbolic Integration Architectures for Computer Vision

    .
    Master’s Thesis, ETH Zurich. Cited by: §3.4.2.
  • [11] T. Cohen, D. Widdows, M. Wahle, and R. W. Schvaneveldt (2013) Orthogonality and Orthography: Introducing Measured Distance into Semantic Space. In International Symposium on Quantum Interaction (QI), Lecture Notes in Computer Science, Vol. 8369, pp. 34–46. Cited by: §3.2.1, §3.3.1.
  • [12] T. Cohen and D. Widdows (2018) Bringing Order to Neural Word Embeddings with Embeddings Augmented by Random Permutations (EARP). In Conference on Computational Natural Language Learning (CoNLL), pp. 465–475. Cited by: §2.2.3, §3.3.2.
  • [13] I. Danihelka, G. Wayne, B. Uria, N. Kalchbrenner, and A. Graves (2016)

    Associative Long Short-term Memory

    .
    In International Conference on Machine Learning (ICML), pp. 1986–1994. Cited by: §2.4.
  • [14] S. Dasgupta, C. F. Stevens, and S. Navlakha (2017) A Neural Algorithm for a Fundamental Computing Problem. Science 358 (6364), pp. 793–796. Cited by: §3.2.3.
  • [15] D. L. Donoho (2006) Compressed Sensing. IEEE Transactions on Information Theory 52 (4), pp. 1289–1306. Cited by: §3.2.3.
  • [16] M. Eggimann, A. R. A., and L. Benini (2021) A 5 W Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing. IEEE Transactions on Circuits and Systems I: Regular Papers 68 (10), pp. 4116–4128. Cited by: §2.2.5.
  • [17] C. Eliasmith, T. C. Stewart, X. Choo, T. Bekolay, T. DeWolf, Y. Tang, and D. Rasmussen (2012) A Large-scale Model of the Functioning Brain. Science 338 (6111), pp. 1202–1205. Cited by: §1.
  • [18] C. Eliasmith (2013) How to Build a Brain: A Neural Architecture for Biological Cognition. Oxford University Press. Cited by: §2.3.3.
  • [19] J. A. Fodor and Z. W. Pylyshyn (1988) Connectionism and Cognitive Architecture: A Critical analysis. Cognition 28 (1-2), pp. 3–71. Cited by: §2.1.2, §2.1.2.
  • [20] E. P. Frady, S. J. Kent, P. Kanerva, B. A. Olshausen, and F. T. Sommer (2018) Cognitive Neural Systems for Disentangling Compositions. In Cognitive Computing, pp. 1–3. Cited by: §3.4.2.
  • [21] E. P. Frady, S. J. Kent, B. A. Olshausen, and F. T. Sommer (2020) Resonator Networks, 1: An Efficient Solution for Factoring High-Dimensional, Distributed Representations of Data Structures. Neural Computation 32 (12), pp. 2311–2331. Cited by: §2.2.5, §3.5.3.
  • [22] E. P. Frady, D. Kleyko, C. J. Kymn, B. A. Olshausen, and F. T. Sommer (2021) Computing on Functions Using Randomized Vector Representations. arXiv:2109.03429 (), pp. 1–33. Cited by: §3.2.1.
  • [23] E. P. Frady, D. Kleyko, and F. T. Sommer (2018) A Theory of Sequence Indexing and Working Memory in Recurrent Neural Networks. Neural Computation 30 (), pp. 1449–1513. Cited by: §2.4, §2.4, §4.2.
  • [24] E. P. Frady, D. Kleyko, and F. T. Sommer (2021) Variable Binding for Sparse Distributed Representations: Theory and Applications. IEEE Transactions on Neural Networks and Learning Systems 99 (PP), pp. 1–14. Cited by: §2.3.9, §4.1, §4.2, §4.2.
  • [25] E. P. Frady and F. T. Sommer (2019) Robust Computation with Rhythmic Spike Patterns. Proceedings of the National Academy of Sciences 116 (36), pp. 18050–18059. Cited by: §4.3.
  • [26] S. I. Gallant and P. Culliton (2016) Positional Binding with Distributed Representations. In International Conference on Image, Vision and Computing (ICIVC), pp. 108–113. Cited by: §3.4.2.
  • [27] S. I. Gallant and T. W. Okaywe (2013) Representing Objects, Relations, and Sequences. Neural Computation 25 (8), pp. 2038–2078. Cited by: §2.2.3, §2.3.4, §2.4, TABLE II.
  • [28] R. W. Gayler and S. D. Levy (2009) A Distributed Basis for Analogical Mapping: New frontiers in Analogy Research. In New frontiers in Analogy Research, Second International Conference on the Analogy (ANALOGY), pp. 165–174. Cited by: §3.5.1.
  • [29] R. W. Gayler (1998) Multiplicative Binding, Representation Operators & Analogy. In Advances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences, pp. 1–4. Cited by: §2.3.7, TABLE II.
  • [30] R. W. Gayler (2003) Vector Symbolic Architectures Answer Jackendoff’s Challenges for Cognitive Neuroscience. In Joint International Conference on Cognitive Science (ICCS/ASCS), pp. 133–138. Cited by: §1, §2.1.2.
  • [31] L. Ge and K. K. Parhi (2020) Classification using Hyperdimensional Computing: A Review. IEEE Circuits and Systems Magazine 20 (2), pp. 30–47. Cited by: TABLE I.
  • [32] A. N. Gorban and I. Y. Tyukin (2018) Blessing of Dimensionality: Mathematical Foundations of the Statistical Physics of Data. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 376 (2118), pp. 1–18. Cited by: §2.2.1.
  • [33] J. Gosmann and C. Eliasmith (2019) Vector-Derived Transformation Binding: An Improved Binding Operation for Deep Symbol-Like Processing in Neural Networks. Neural Computation 31 (5), pp. 849–869. Cited by: §2.2.3, §2.3.3.
  • [34] V. I. Gritsenko, D. A. Rachkovskij, A. A. Frolov, R. W. Gayler, D. Kleyko, and E. Osipov (2017) Neural distributed autoassociative memories: a survey. Cybernetics and Computer Engineering 2 (188), pp. 5–35. Cited by: §2.3.9.
  • [35] T. Hannagan, E. Dupoux, and A. Christophe (2011) Holographic String Encoding. Cognitive Science 35 (1), pp. 79–118. Cited by: §2.3.6, §3.3.1, §3.3.4.
  • [36] E. Hassan, Y. Halawani, B. Mohammad, and H. Saleh (2021) Hyper-Dimensional Computing Challenges and Opportunities for AI Applications. IEEE Access, pp. 1–15. Cited by: TABLE I, §3.4.3.
  • [37] M. Hersche, S. Lippuner, M. Korb, L. Benini, and A. Rahimi (2021) Near-channel Classifier: Symbiotic Communication and Classification in High-dimensional Space. Brain Informatics 8, pp. 1–15. Cited by: §2.4.
  • [38] G. E. Hinton, J. L. McClelland, and D. E. Rumelhart (1986) Distributed Representations. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Foundations, pp. 77–109. Cited by: §2.1.1, §2.1.2.
  • [39] G. E. Hinton (1981) Implementing Semantic Networks in Parallel Hardware. In Parallel Models of Association Memory, pp. 191–217. Cited by: §2.1.3.
  • [40] G. E. Hinton (1990) Mapping Part-whole Hierarchies into Connectionist Networks. Artificial Intelligence 46 (1-2), pp. 47–75. Cited by: §2.3.3, §2.3.8.
  • [41] J. E. Hummel and K. J. Holyoak (1997) Distributed Representations of Structure: A Theory of Analogical Access and Mapping. Psychological Review 104 (3), pp. 427–466. Cited by: §2.1.3.
  • [42] M. Imani, T. Nassar, A. Rahimi, and T. Rosing (2018) HDNA: Energy-Efficient DNA Sequencing Using Hyperdimensional Computing. In IEEE International Conference on Biomedical and Health Informatics (BHI), pp. 271–274. Cited by: §3.3.4.
  • [43] P. Indyk and R. Motwani (1998)

    Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality

    .
    In

    Annual ACM Symposium on Theory of Computing (STOC)

    ,
    pp. 604–613. Cited by: §3.2.3.
  • [44] R. Jackendoff (2002) Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford University Press. Cited by: §2.1.2, §2.1.2.
  • [45] W. B. Johnson and J. Lindenstrauss (1984) Extensions of Lipschitz Mapping into Hilbert Space. Contemporary Mathematics 26, pp. 189–206. Cited by: §3.2.3.
  • [46] M. N. Jones and D. J. K. Mewhort (2007)

    Representing Word Meaning and Order Information in a Composite Holographic Lexicon

    .
    Psychological Review 114 (1), pp. 1–37. Cited by: §2.2.3, §3.3.4.
  • [47] A. Joshi, J. T. Halseth, and P. Kanerva (2016) Language Geometry Using Random Indexing. In International Symposium on Quantum Interaction (QI), pp. 265–274. Cited by: §2.2.3, §3.3.4.
  • [48] G. Kachergis, G. E. Cox, and M. N. Jones (2011) OrBEAGLE: Integrating Orthography Into a Holographic Model of the Lexicon. In International Conference on Artificial Neural Networks (ICANN), pp. 307–314. Cited by: §3.3.4.
  • [49] D. M. Kane and J. Nelson (2014) Sparser Johnson-Lindenstrauss Transforms. Journal of the ACM 61 (1), pp. 1–23. Cited by: §3.2.3.
  • [50] P. Kanerva, J. Kristoferson, and A. Holst (2000) Random Indexing of Text Samples for Latent Semantic Analysis. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 1036. Cited by: §3.2.3.
  • [51] P. Kanerva (1988) Sparse Distributed Memory. The MIT Press. Cited by: §2.1.
  • [52] P. Kanerva (1994) The Spatter Code for Encoding Concepts at Many Levels. In International Conference on Artificial Neural Networks (ICANN), pp. 226–229. Cited by: §2.3.6.
  • [53] P. Kanerva (1995) A Family of Binary Spatter Codes. In International Conference on Artificial Neural Networks (ICANN), pp. 517–522. Cited by: §2.3.6.
  • [54] P. Kanerva (1996) Binary Spatter-Coding of Ordered K-tuples. In International Conference on Artificial Neural Networks (ICANN), Lecture Notes in Computer Science, Vol. 1112, pp. 869–873. Cited by: §2.3.6.
  • [55] P. Kanerva (1997) Fully Distributed Representation. In Real World Computing Symposium (RWC), pp. 358–365. Cited by: §2.3.6, TABLE II.
  • [56] P. Kanerva (1998) Dual Role of Analogy in the Design of a Cognitive Computer. In Advances in Analogy Research: Integration of Theory and Data from the Cognitive, Computational, and Neural Sciences, pp. 164–170. Cited by: footnote 4.
  • [57] P. Kanerva (2009) Hyperdimensional Computing: An Introduction to Computing in Distributed Representation with High-Dimensional Random Vectors. Cognitive Computation 1 (2), pp. 139–159. Cited by: TABLE I, §1, §1, §2.2.1, §2.2.3, §3.3.2.
  • [58] P. Kanerva (2014) Computing with 10,000-bit Words. In Annual Allerton Conference on Communication, Control, and Computing, pp. 1–7. Cited by: §1.
  • [59] P. Kanerva (2019) Computing with High-Dimensional Vectors. IEEE Design & Test 36 (3), pp. 7–14. Cited by: §1, §3.5.2.
  • [60] G. Karunaratne, M. L. Gallo, G. Cherubini, L. Benini, A. Rahimi, and A. Sebastian (2020) In-Memory Hyperdimensional Computing. Nature Electronics 3 (6), pp. 327–337. Cited by: §4.3.
  • [61] G. Karunaratne, M. Schmuck, M. L. Gallo, G. Cherubini, L. Benini, A. Sebastian, and A. Rahimi (2021) Robust High-dimensional Memory-augmented Neural Networks. Nature Communications 12 (1), pp. 1–12. Cited by: §3.4.3.
  • [62] S. Kaski (1998) Dimensionality reduction by random Mapping: Fast Similarity Computation for Clustering. In International Joint Conference on Neural Networks(IJCNN, Vol. 1, pp. 413–418. Cited by: §3.2.3.
  • [63] M. A. Kelly, D. Blostein, and D. J. K. Mewhort (2013) Encoding Structure in Holographic Reduced Representations. Canadian Journal of Experimental Psychology 67 (2), pp. 79–93. Cited by: §3.4.2.
  • [64] S. J. Kent, E. P. Frady, F. T. Sommer, and B. A. Olshausen (2020) Resonator Networks, 2: Factorization Performance and Capacity Compared to Optimization-Based Methods. Neural Computation 32 (12), pp. 2332–2388. Cited by: §2.2.5.
  • [65] H.-S. Kim (2018) HDM: Hyper-Dimensional Modulation for Robust Low-Power Communications. In IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §2.4.
  • [66] Y. Kim, M. Imani, N. Moshiri, and T. Rosing (2020)

    GenieHD: Efficient DNA Pattern Matching Accelerator Using Hyperdimensional Computing

    .
    In Design, Automation Test in Europe Conference Exhibition (DATE), pp. 115–120. Cited by: §3.3.4.
  • [67] D. Kleyko, M. Davies, E. P. Frady, P. Kanerva, S. J. Kent, B. A. Olshausen, E. Osipov, J. M. Rabaey, D. A. Rachkovskij, A. Rahimi, and F. T. Sommer (2021) Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware. arXiv:2106.05268, pp. 1–28. Cited by: TABLE I, §1, §2.3.1, §3.5.1, §3.5.3, §4.3.
  • [68] D. Kleyko, E. P. Frady, and F. T. Sommer (2021) Cellular Automata Can Reduce Memory Requirements of Collective-State Computing. IEEE Transactions on Neural Networks and Learning Systems 99 (PP), pp. 1–13. Cited by: §2.2.5.
  • [69] D. Kleyko, R. W. Gayler, and E. Osipov (2020) Commentaries on ”Learning Sensorimotor Control with Neuromorphic Sensors: Toward Hyperdimensional Active Perception” [Science Robotics Vol. 4 Issue 30 (2019) 1-10]. arXiv:2003.11458, pp. 1–10. Cited by: §3.1.2, §3.4.1.
  • [70] D. Kleyko, E. Osipov, and R. W. Gayler (2016) Recognizing Permuted Words with Vector Symbolic Architectures: A Cambridge Test for Machines. Procedia Computer Science 88, pp. 169–175. Cited by: §3.3.3.
  • [71] D. Kleyko, E. Osipov, and D. A. Rachkovskij (2016) Modification of Holographic Graph Neuron using Sparse Distributed Representations. Procedia Computer Science 88, pp. 39–45. Cited by: §3.4.2.
  • [72] D. Kleyko, E. Osipov, A. Senior, A. I. Khan, and Y. A. Sekercioglu (2017) Holographic Graph Neuron: A Bioinspired Architecture for Pattern Processing. IEEE Transactions on Neural Networks and Learning Systems 28 (6), pp. 1250–1262. Cited by: §2.4, §3.4.2.
  • [73] D. Kleyko, D. A. Rachkovskij, E. Osipov, and A. Rahimi (2021) A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part II: Applications, Cognitive Models, and Challenges . arXiv (), pp. 1–36. Cited by: A Survey on Hyperdimensional Computing aka Vector Symbolic Architectures, Part I: Models and Data Transformations, §1, §2.3.8, §3.2.3, §3.5.2, §3.5.2, §4, §5.
  • [74] D. Kleyko, A. Rahimi, R. W. Gayler, and E. Osipov (2020) Autoscaling Bloom Filter: Controlling Trade-off Between True and False Positives. Neural Computing and Applications 32, pp. 3675–3684. Cited by: §3.1.2.
  • [75] D. Kleyko, A. Rahimi, D. A. Rachkovskij, E. Osipov, and J. M. Rabaey (2018) Classification and Recall with Binary Hyperdimensional Computing: Tradeoffs in Choice of Density and Mapping Characteristic. IEEE Transactions on Neural Networks and Learning Systems 29 (12), pp. 5880–5898. Cited by: §2.4, §3.2.1.
  • [76] D. Kleyko, A. Rosato, E. P. Frady, M. Panella, and F. T. Sommer (2020) Perceptron Theory for Predicting the Accuracy of Neural Networks. arXiv:2012.07881, pp. 1–12. Cited by: §2.4.
  • [77] B. Komer and C. Eliasmith (2020) Efficient Navigation using a Scalable, Biologically Inspired Spatial Representation. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 1532–1538. Cited by: §3.4.2.
  • [78] B. Komer, T. C. Stewart, A. R. Voelker, and C. Eliasmith (2019) A Neural Representation of Continuous Space using Fractional Binding. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 2038–2043. Cited by: §3.4.2.
  • [79] E. M. Kussul, T. N. Baidyk, V. V. Lukovich, and D. A. Rachkovskij (1994) Adaptive High Performance Classifier Based on Random Threshold Neurons. In European Meeting on Cybernetics and Systems (EMCSR), pp. 1687–1694. Cited by: §3.2.2.
  • [80] E. M. Kussul, T. N. Baidyk, and D. A. Rachkovskij (1992) Neural Network for Recognition of Small Images. In First All-Ukrainian conference (UkrOBRAZ), pp. 151–153. Cited by: §3.4.2.
  • [81] E. M. Kussul, T. N. Baidyk, D. C. Wunsch, O. Makeyev, and A. Martin (2006) Permutation Coding Technique for Image Recognition System. IEEE Transactions on Neural Networks 17 (6), pp. 1566–1579. Cited by: §3.4.1, §3.4.3.
  • [82] E. M. Kussul, T. N. Baidyk, and D. C. Wunsch (2010) Neural Networks and Micromechanics. Springer. Cited by: §3.4.1.
  • [83] E. M. Kussul and T. N. Baidyk (1993) On Information Encoding in Associative-Projective Neural Networks. Technical report Report 93-3, V. M. Glushkov Institute of Cybernetics (in Russian). Cited by: §2.2.3, §3.3.2, §3.4.1.
  • [84] E. M. Kussul and T. N. Baidyk (2003) Permutative Coding Technique for Handwritten Digit Recognition System. In International Joint Conference on Neural Networks (IJCNN), pp. 2163–2168. Cited by: §2.2.3, §3.3.2, §3.4.1.
  • [85] E. M. Kussul and T. N. Baidyk (2004) Improved Method of Handwritten Digit Recognition Tested on MNIST Database. Image and Vision Computing 22 (12), pp. 971–981. Cited by: §3.4.1, §3.4.3.
  • [86] E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk (1991) Associative-Projective Neural Networks: Architecture, Implementation, Applications. In International Conference on Neural Networks and Their Applications (NEURO), pp. 463–476. Cited by: §2.1, §2.3.8, §2.3.8.
  • [87] E. M. Kussul, D. A. Rachkovskij, and T. N. Baidyk (1991) On Image Texture Recognition by Associative-Projective Neurocomputer. In Intelligent Engineering Systems through Artificial Neural Networks (ANNIE), pp. 453–458. Cited by: §2.3.8, §4.3.
  • [88] E. M. Kussul, D. A. Rachkovskij, and D. C. Wunsch (1999) The Random Subspace Coarse Coding Scheme for Real-valued Vectors. In International Joint Conference on Neural Networks (IJCNN), Vol. 1, pp. 450–455. Cited by: §3.2.2.
  • [89] E. M. Kussul and D. A. Rachkovskij (1991) Multilevel Assembly Neural Architecture and Processing of Sequences. In Neurocomputers and Attention: Connectionism and Neurocomputers, Vol. 2, pp. 577–590. Cited by: §2.3.8, §3.3.1.
  • [90] M. Laiho, J. H. Poikonen, P. Kanerva, and E. Lehtonen (2015) High-Dimensional Computing with Sparse Vectors. In IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 1–4. Cited by: §2.3.9, TABLE II.
  • [91] M. Ledoux (2001) The Concentration of Measure Phenomenon. American Mathematical Society. Cited by: §2.2.1.
  • [92] S. D. Levy and R. W. Gayler (2008) Vector Symbolic Architectures: A New Building Material for Artificial General Intelligence. In Artificial General Intelligence (AGI), pp. 414–418. Cited by: §1.
  • [93] H. Li, T. F. Wu, A. Rahimi, K.-S. Li, M. Rusch, C.-H. Lin, J.-L. Hsu, M. M. Sabry, S. B. Eryilmaz, J. Sohn, W.-C. Chiu, M.-C. Chen, T.-T. Wu, J.-M. Shieh, W.-K. Yeh, J. M. Rabaey, S. Mitra, and H.-S. P. Wong (2016) Hyperdimensional Computing with 3D VRRAM In-Memory Kernels: Device-Architecture Co-Design for Energy-Efficient, Error-Resilient Language Recognition. In IEEE International Electron Devices Meeting (IEDM), pp. 1–4. Cited by: §4.3.
  • [94] P. Li, T. J. Hastie, and K. W. Church (2006) Very sparse random projections. In ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 287–296. Cited by: §3.2.3.
  • [95] Y. Ma, M. Hildebrandt, V. Tresp, and S. Baier (2018) Holistic Representations for Memorization and Inference. In Conference on Uncertainty in Artificial Intelligence (UAI), pp. 1–11. Cited by: §3.5.2.
  • [96] A. X. Manabat, C. R. Marcelo, A. L. Quinquito, and A. Alvarez (2019) Performance Analysis of Hyperdimensional Computing for Character Recognition. In International Symposium on Multimedia and Communication Technology (ISMAC), pp. 1–5. Cited by: §3.4.2, §3.4.3.
  • [97] P. M. Milner (1974) A model for visual shape recognition. Psychological Review 81 (6), pp. 521–535. Cited by: §2.1.3.
  • [98] F. Mirus, T. C. Stewart, and J. Conradt (2020) Analyzing the Capacity of Distributed Vector Representations to Encode Spatial Information. In International Joint Conference on Neural Networks (IJCNN), pp. 1–7. Cited by: §2.4.
  • [99] I. S. Misuno, D. A. Rachkovskij, S. V. Slipchenko, and A. M. Sokolov (2005) Searching for Text Information with the Help of Vector Representations. Problems of Programming. (In Russian) 4, pp. 50–59. Cited by: §3.2.3.
  • [100] I. S. Misuno, D. A. Rachkovskij, and S. V. Slipchenko (2005) Vector and Distributed Representations Reflecting Semantic Relatedness of Words. Mathematical Machines and Systems. (In Russian) 3, pp. 50–66. Cited by: §3.2.3.
  • [101] A. Mitrokhin, P. Sutor, C. Fermuller, and Y. Aloimonos (2019) Learning Sensorimotor Control with Neuromorphic Sensors: Toward Hyperdimensional Active Perception. Science Robotics 4 (30), pp. 1–10. Cited by: §3.1.2, §3.4.1.
  • [102] A. Mitrokhin, P. Sutor, D. Summers-Stay, C. Fermuller, and Y. Aloimonos (2020) Symbolic Representation and Learning with Hyperdimensional Computing. Frontiers in Robotics and AI (), pp. 1–11. Cited by: §3.4.3.
  • [103] E. Mizraji (1989) Context-Dependent Associations in Linear Distributed Memories. Bulletin of Mathematical Biology 51 (), pp. 195–205. Cited by: §2.1.3, §2.1, §2.3.2.
  • [104] E. Mizraji (1992) Vector Logics: The Matrix-Vector Representation of Logical Calculus Fuzzy Sets and Systems. Bulletin of Mathematical Biology 50 (2), pp. 179–185. Cited by: §2.3.2.
  • [105] J. Moody and C. J. Darken (1989) Fast Learning in Networks of Locally-tuned Processing Units. Neural Computation 1 (2), pp. 281–294. Cited by: §3.2.2.
  • [106] P. Neubert and P. Protzel (2018) Towards Hypervector Representations for Learning and Planning with Schemas. In Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz), Lecture Notes in Computer Science, Vol. 11117, pp. 182–189. Cited by: §3.1.3, footnote 7.
  • [107] P. Neubert, S. Schubert, and P. Protzel (2019) An Introduction to Hyperdimensional Computing for Robotics. KI - Künstliche Intelligenz 33 (4), pp. 319–330. Cited by: TABLE I, §1, §3.4.3.
  • [108] M. Nickel, L. Rosasco, and T. Poggio (2016) Holographic Embeddings of Knowledge Graphs. In AAAI Conference on Artificial Intelligence, pp. 1955–1961. Cited by: §3.5.2.
  • [109] C. H. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala (2000) Latent Semantic Indexing: A Probabilistic Analysis. Journal of Computer and System Sciences 61 (2), pp. 217–235. Cited by: §3.2.3.
  • [110] A. Patyk-Lonska, M. Czachor, and D. Aerts (2011) A Comparison of Geometric Analogues of Holographic Reduced Representations, Original Holographic Reduced Representations and Binary Spatter Codes. In Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 221–228. Cited by: §2.3.11.
  • [111] A. Patyk-Lonska, M. Czachor, and D. Aerts (2011) Distributed Representations Based on Geometric Algebra: The Continuous Model. Informatica 35 (4), pp. 407–417. Cited by: §2.3.11.
  • [112] A. Patyk-Lonska (2010) Geometric Algebra Model of Distributed Representation. In Geometric Algebra Computing, pp. 401–430. Cited by: §2.3.11.
  • [113] A. Patyk-Lonska (2011) Experiments on Preserving Pieces of Information in a Given Order in Holographic Reduced Representations and the Continuous Geometric Algebra Model. Informatica 35 (4), pp. 419–427. Cited by: §2.3.11.
  • [114] A. Patyk-Lonska (2011) Preserivng Pieces of Information in a Given Order in HRR and GAc. In Federated Conference on Computer Science and Information Systems (FedCSIS), pp. 213–220. Cited by: §2.3.11.
  • [115] T. A. Plate (1991) Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations. In International Joint Conference on Artificial Intelligence (IJCAI), pp. 30–35. Cited by: §2.1, §2.3.3, footnote 4.
  • [116] T. A. Plate (1992) Holographic Recurrent Networks. In Advances in Neural Information Processing Systems (NIPS), pp. 34–41. Cited by: §3.3.1.
  • [117] T. A. Plate (1994) Distributed Representations and Nested Compositional Structure. University of Toronto, PhD Thesis. Cited by: §2.3.3, §2.3.5, §2.4, §3.2.1, §3.3.5.
  • [118] T. A. Plate (1995) Holographic Reduced Representations. IEEE Transactions on Neural Networks 6 (3), pp. 623–641. Cited by: §2.2.3, §2.3.3, TABLE II.
  • [119] T. A. Plate (1995) Networks Which Learn to Store Variable-length Sequences in a Fixed Set of Unit Activations. Preprint, pp. 1–19. Cited by: §3.3.1.
  • [120] T. A. Plate (1997) A Common Framework for Distributed Representation Schemes for Compositional Structure. In Connectionist Systems for Knowledge Representation and Deduction, pp. 15–34. Cited by: TABLE I, §2.3.6.
  • [121] T. A. Plate (2000) Analogy Retrieval and Processing with Distributed Vector Representations.

    Expert Systems: The International Journal of Knowledge Engineering and Neural Networks

    17 (1), pp. 29–40.
    Cited by: §2.2.3.
  • [122] T. A. Plate (2003) Holographic Reduced Representations: Distributed Representation for Cognitive Structures. Stanford: Center for the Study of Language and Information (CSLI). Cited by: §1, §1, §2.1.2, §2.3.3, §2.3.4, §2.3.5, §2.4, TABLE II, §3.5.2.
  • [123] T. A. Plate (2006) Distributed Representations. In Encyclopedia of Cognitive Science, pp. 1–9. Cited by: §2.1.1.
  • [124] R. W. Prager (1993) Networks Based on Kanerva’s Sparse Distributed Memory: Results Showing Their Strengths and Limitations and a New Algorithm to Design the Location Matching Layer. In IEEE International Conference on Neural Networks (ICNN), pp. 1040–1045. Cited by: §3.2.2.
  • [125] S. Purdy (2016) Encoding Data for HTM Systems. arXiv:1602.05925 (), pp. 1–11. Cited by: §3.2.1.
  • [126] R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch, and I. Fried (2005) Invariant Visual Representation by Single Neurons in the Human Brain. Nature 435 (7045), pp. 1102–1107. Cited by: §2.1.1.
  • [127] D. A. Rachkovskij and T. V. Fedoseyeva (1990) On Audio Signals Recognition by Multilevel Neural Network. In International Symposium on Neural Networks and Neural Computing (NEURONET), pp. 281–283. Cited by: §2.1, §2.3.8, §2.3.8, §3.2.1, §3.2.1.
  • [128] D. A. Rachkovskij and V. I. Gritsenko (2018) Distributed Representation of Vector Data based on Random Projections. Interservice. Cited by: §3.2.3.
  • [129] D. A. Rachkovskij, E. M. Kussul, and T. N. Baidyk (2013) Building a World Model with Structure-sensitive Sparse Binary Distributed Representations. Biologically Inspired Cognitive Architectures 3 (), pp. 64–86. Cited by: §1, §2.3.8, §3.2.
  • [130] D. A. Rachkovskij and E. M. Kussul (2001) Binding and Normalization of Binary Sparse Distributed Representations by Context-Dependent Thinning. Neural Computation 13 (2), pp. 411–452. Cited by: §2.1.2, §2.1.2, §2.3.8, §2.3.8, §2.3.8, §2.3.8, §3.5.2, §3.5.3.
  • [131] D. A. Rachkovskij, I. S. Misuno, and S. V. Slipchenko (2012) Randomized Projective Methods for the Construction of Binary Sparse Vector Representations. Cybernetics and Systems Analysis 48 (1), pp. 146–156. Cited by: §3.2.3.
  • [132] D. A. Rachkovskij, S. V. Slipchenko, A. A. Frolov, and D. Husek (2005) Resolution of Binary Coding of Real-valued Vectors by Hyperrectangular Receptive Fields. Cybernetics and Systems Analysis 41 (5), pp. 635–646. Cited by: §3.2.2.
  • [133] D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk (2005) Properties of Numeric Codes for the Scheme of Random Subspaces RSC. Cybernetics and Systems Analysis 41 (4), pp. 509–520. Cited by: §3.2.2.
  • [134] D. A. Rachkovskij, S. V. Slipchenko, E. M. Kussul, and T. N. Baidyk (2005) Sparse Binary Distributed Encoding of Scalars. Journal of Automation and Information Sciences 37 (6), pp. 12–23. Cited by: §3.2.1.
  • [135] D. A. Rachkovskij, S. V. Slipchenko, I. S. Misuno, E. M. Kussul, and T. N. Baidyk (2005) Sparse Binary Distributed Encoding of Numeric Vectors. Journal of Automation and Information Sciences 37 (11), pp. 47–61. Cited by: §3.2.1, §3.2.1.
  • [136] D. A. Rachkovskij (1990) Development and Investigation of Multilevel Assembly Neural Networks. Glushkov Institute of Cybernetics, PhD Thesis. (In Russian). Cited by: §3.3.1, §3.3.1.
  • [137] D. A. Rachkovskij (1996) Application of Stochastic Assembly Neural Networks in the Problem of Interesting Text Selection. Neural Network Systems for Information Processing (in Russian), pp. 52–64. Cited by: §3.3.4.
  • [138] D. A. Rachkovskij (2001) Representation and Processing of Structures with Binary Sparse Distributed Codes. IEEE Transactions on Knowledge and Data Engineering 3 (2), pp. 261–276. Cited by: §2.1.2, §2.2.1, §2.2.3, §2.2.3, §2.3.8, TABLE II, §3.5.2.
  • [139] D. A. Rachkovskij (2014) Vector Data Transformation Using Random Binary Matrices. Cybernetics and Systems Analysis 50 (6), pp. 960–968. Cited by: §3.2.3.
  • [140] D. A. Rachkovskij (2015) Formation of Similarity-reflecting Binary Vectors with Random Binary Projections. Cybernetics and Systems Analysis 51 (2), pp. 313–323. Cited by: §3.2.3.
  • [141] A. Rahimi, S. Benatti, P. Kanerva, L. Benini, and J. M. Rabaey (2016) Hyperdimensional Biosignal Processing: A Case Study for EMG-based Hand Gesture Recognition. In IEEE International Conference on Rebooting Computing (ICRC), pp. 1–8. Cited by: §3.2.1.
  • [142] A. Rahimi, S. Datta, D. Kleyko, E. P. Frady, B. Olshausen, P. Kanerva, and J. M. Rabaey (2017) High-dimensional Computing as a Nanoscalable Paradigm. IEEE Transactions on Circuits and Systems I: Regular Papers 64 (9), pp. 2508–2521. Cited by: TABLE I.
  • [143] A. Rahimi, P. Kanerva, L. Benini, and J. M. Rabaey (2019) Efficient Biosignal Processing Using Hyperdimensional Computing: Network Templates for Combined Learning and Classification of ExG Signals. Proceedings of the IEEE 107 (1), pp. 123–143. Cited by: TABLE I, §1.
  • [144] O. Räsänen (2015) Generating Hyperdimensional Distributed Representations from Continuous Valued Multivariate Sensory Input. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 1943–1948. Cited by: §3.2.3.
  • [145] G. Recchia, M. N. Jones, M. Sahlgren, and P. Kanerva (2010) Encoding Sequential Information in Vector Space Models of Semantics: Comparing Holographic Reduced Representation and Random Permutation. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 865–870. Cited by: §3.3.4.
  • [146] G. Recchia, M. Sahlgren, P. Kanerva, and M. N. Jones (2015) Encoding Sequential Information in Semantic Space Models: Comparing Holographic Reduced Representation and Random Permutation. Computational Intelligence and Neuroscience (), pp. 1–18. Cited by: §2.3.3.
  • [147] M. Sahlgren, A. Holst, and P. Kanerva (2008) Permutations as a Means to Encode Order in Word Space. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 1300–1305. Cited by: §2.2.3, §3.3.2.
  • [148] M. Sahlgren (2005) An Introduction to Random Indexing. In International Conference on Terminology and Knowledge Engineering (TKE), pp. 1–9. Cited by: §1.
  • [149] K. Schlegel, P. Neubert, and P. Protzel (2020) A Comparison of Vector Symbolic Architectures. arXiv:2001.11797, pp. 1–9. Cited by: TABLE I, §2.2.3, §2.3.1, §2.3, §2.4, §4.1, footnote 5.
  • [150] M. Schmuck, L. Benini, and A. Rahimi (2019) Hardware Optimizations of Dense Binary Hyperdimensional Computing: Rematerialization of Hypervectors, Binarized Bundling, and Combinational Associative Memory. ACM Journal on Emerging Technologies in Computing Systems 15 (4), pp. 1–25. Cited by: §2.2.5.
  • [151] L. Shastri and V. Ajjanagadde (1993) From Simple Associations to Systematic Reasoning: A Connectionist Representation of Rules, Variables and Dynamic Bindings using Temporal Synchrony. Behavioral and Brain Sciences 16, pp. 417–494. Cited by: §2.1.3.
  • [152] D. Smith and P. Stanford (1990) A Random Walk in Hamming Space. In International Joint Conference on Neural Networks (IJCNN), Vol. 2, pp. 465–470. Cited by: §3.2.1.
  • [153] P. Smolensky (1990) Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems. Artificial Intelligence 46 (), pp. 159–216. Cited by: §2.1.3, §2.1, §2.3.2, §2.3.3, TABLE II.
  • [154] J. Snaider and S. Franklin (2014) Modular Composite Representation. Cognitive Computation 6, pp. 510–527. Cited by: §2.3.10, §2.3.10, TABLE II.
  • [155] A. M. Sokolov and D. A. Rachkovskij (2006) Approaches to Sequence Similarity Representation. Information Theories and Applications 13 (3), pp. 272–278. Cited by: §3.3.1, §3.3.
  • [156] P. Stanford and D. Smith (1994) Multidimensional Scatter Code: A Data Fusion Technique with Exponential Capacity. In International Conference on Artificial Neural Networks (ICANN), Vol. 2, pp. 1432–1435. Cited by: §3.2.1.
  • [157] T. C. Stewart, X. Choo, and C. Eliasmith (2014) Sentence Processing in Spiking Neurons: A Biologically Plausible Left-corner Parser. In Annual Meeting of the Cognitive Science Society (CogSci), pp. 1533–1538. Cited by: §3.3.5.
  • [158] T. C. Stewart, Y. Tang, and C. Eliasmith (2011) A Biologically Realistic Cleanup Memory: Autoassociation in Spiking Neurons. Cognitive Systems Research 12 (2), pp. 84–92. Cited by: §2.2.5.
  • [159] E. Strubell, A. Ganesh, and A. McCallum (2019)

    Energy and Policy Considerations for Deep Learning in NLP

    .
    In Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3645–3650. Cited by: §4.3.
  • [160] D. Summers-Stay, P. Sutor, and D. Li (2018) Representing Sets as Summed Semantic Vectors. Biologically Inspired Cognitive Architectures 25 (), pp. 113–118. Cited by: §2.4.
  • [161] A. Thomas, S. Dasgupta, and T. Rosing (2021) A Theoretical Perspective on Hyperdimensional Computing. Journal of Artificial Intelligence Research 72 (), pp. 215–249. Cited by: §2.2.5, §2.4, §3.2.1, §3.2.3, §4.2, §4.2.
  • [162] S. J. Thorpe (2003) Localized Versus Distributed Representations. In The Handbook of Brain Theory and Neural Networks, pp. 643–646. Cited by: §2.1.1, §2.1.1.
  • [163] M. D. Tissera and M. D. McDonnell (2014) Enabling ’Question Answering’ in the MBAT Vector Symbolic Architecture by Exploiting Orthogonal Random Matrices. In IEEE International Conference on Semantic Computing (ICSC), pp. 171–174. Cited by: §2.3.4.
  • [164] T. van Gelder (1999) Distributed vs. Local Representation. In The MIT Encyclopedia of the Cognitive Sciences, pp. 235–237. Cited by: §2.1.1, §2.1.1.
  • [165] S. S. Vempala (2005) The Random Projection Method. Vol. 65, American Mathematical Society. Cited by: §3.2.3.
  • [166] C. von der Malsburg (1986) Am I thinking assemblies?. In Brain Theory, pp. 161–176. Cited by: §2.1.2, §2.1.2, §2.1.3.
  • [167] N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel (2004) Characterizing the Effects of Transient Faults on a High-Performance Processor Pipeline. In International Conference on Dependable Systems and Networks (DSN), pp. 61–70. Cited by: §2.1.1.
  • [168] E. Weiss, B. Cheung, and B. A. Olshausen (2016) A Neural Architecture for Representing and Reasoning about Spatial Relationships. Note: OpenReview Preprint Cited by: §3.4.2.
  • [169] D. Widdows and T. Cohen (2015) Reasoning with Vectors: A Continuous Model for Fast Robust Inference. Logic Journal of the IGPL 23 (2), pp. 141–173. Cited by: §2.2.3, §3.2.1, §3.3.1.
  • [170] T. F. Wu, H. Li, P.-C. Huang, A. Rahimi, G. Hills, B. Hodson, W. Hwang, J. M. Rabaey, H.-S. P. Wong, M. M. Shulaker, and S. Mitra (2018) Hyperdimensional Computing Exploiting Carbon Nanotube FETs, Resistive RAM, and Their Monolithic 3D Integration. IEEE Journal of Solid-State Circuits 53 (11), pp. 3183–3196. Cited by: §4.3.
  • [171] T. Yerxa, A. G. Anderson, and E. Weiss (2018) The Hyperdimensional Stack Machine. In Cognitive Computing, pp. 1–2. Cited by: §3.3.5.
  • [172] O. Yilmaz (2015) Machine Learning Using Cellular Automata Based Feature Expansion and Reservoir Computing. Journal of Cellular Automata 10 (5-6), pp. 435–472. Cited by: §3.4.3.
  • [173] O. Yilmaz (2015) Symbolic Computation Using Cellular Automata-Based Hyperdimensional Computing. Neural Computation 27 (12), pp. 2661–2692. Cited by: §3.4.3.