Brain regions such as hippocampus and olfactory cortex are thought to operate as associative memories [2, 3, 4], having the ability to learn patterns from presented inputs, store a large number of patterns, and retrieve them reliably in the face of noisy queries [5, 6, 7]. Mathematical models of associative memory are therefore designed to memorize a set of given patterns so that corrupted versions of the memorized patterns may later be presented and the correct memorized pattern retrieved.
Although such information storage and recall seemingly falls naturally into the information-theoretic framework , where an exponential number of messages can be communicated reliably using a linear number of symbols , classical associative memory models can only store a linear number of patterns with a linear number of symbols . A primary shortcoming of such classical models has been their requirement to memorize a randomly chosen set of patterns. By enforcing structure and redundancy in the possible set of memorizable patterns—much like natural stimuli , internal representations in neural systems , and codewords in error-control codes —new advances in associative memory design allow storage of an exponential number of patterns with a linear number of symbols [13, 14], just like in communication systems.111The idea of restricted pattern sets leading to associative memories with increased storage capacity was first suggested in an unpublished doctoral dissertation .
Information-theoretic and associative memory models of storage have been used to predict experimentally measurable properties of synapses in the mammalian brain[16, 17]. But contrary to the fact that noise is present in computational operations of the brain [18, 19, 20, 21, 22], associative memory models with exponential capacity have assumed no internal noise in the computational nodes ; likewise with many classical models . The purpose of the present paper is to model internal noise in associative memories with exponential pattern retrieval capacity and study whether they are still able to operate reliably. Surprisingly, we find internal noise actually enhances recall performance without loss in capacity, thereby suggesting a functional role for variability in the brain.
In particular we consider a convolutional, graph code-based, associative memory model  and find that even if all components are noisy, the final error probability in recall can be made exceedingly small. We characterize a threshold phenomenon and show how to optimize algorithm parameters when knowing statistical properties of internal noise. Rather counterintuitively the performance of the memory model improves in the presence of internal neural noise, as has been observed previously as stochastic resonance in the literature [23, 21]
. Deeper analysis shows mathematical connections to perturbed simplex algorithms for linear programing, where some internal noise helps the algorithm get out of local minima.
I-a Related Work
Designing neural networks to learn a set of patterns and recall them later in the presence of noise has been an active topic of research for the past three decades. Inspired by Hebbian learning, Hopfield introduced an auto-associative neural mechanism of size
with binary state neurons in which patterns are assumed to be binary vectors of length. The capacity of a Hopfield network under vanishing block error probability was later shown to be . With the hope of increasing the capacity of the Hopfield network, extensions to non-binary states were explored 
. In particular, Jankowski et al. investigated a multi-state complex-valued neural associative memory with estimated capacity less than; Muezzinoglu et al. showed the capacity with a prohibitively complicated learning rule to increase to . Lee proposed the Modified Gradient Descent learning Rule (MGDR) to overcome this drawback .
To further increase capacity and robustness, a recent line of work considers exploiting structure in patterns. This is done either by making use of correlations among patterns or by only memorizing patterns with redundancy (rather than any possible set of patterns). By utilizing neural cliques,  demonstrated that increasing the pattern retrieval capacity of Hopfield networks to is possible. Modification of neural architecture to improve pattern retrieval capacity has also been previously considered by Venkatesh and Biswas [15, 30], where the capacity is increased to for semi-random patterns, where is the size of clusters. This significant boost to capacity is achieved by dividing the neural network into smaller fully interconnected disjoint blocks or nested blocks (cf. ). This huge improvement comes at the price of limited worst-case noise tolerance capabilities. Deploying higher order neural models beyond the pairwise correlation considered in Hopfield networks increases the storage capacity to , where is the degree of correlation . In such models, neuronal state depends not only on the state of neighbors, but also on the correlations among them. A new model based on bipartite graphs that captures higher-order correlations (when patterns belong to a subspace), but without prohibitive computational complexity, improved capacity to , for some , that is to say exponential in network size .
The basic memory architecture, learning rule, and recall algorithm used herein is from , which also achieves exponential capacity by capturing internal redundancy by dividing the patterns into smaller overlapping clusters, with each subpattern satisfying a set of linear constraints. The problem of learning linear constraints with neural networks was considered in , but without sparsity requirements. This has connections to compressed sensing ; typical compressed sensing recall/decoding algorithms are too complicated to be implemented by neural networks, but some have suggested the biological plausibility of message-passing algorithms .
relies on the fact all patterns to be learned lie in a low-dimensional subspace. Learning features of a low-dimensional space is very similar to autoencoders37]
, albeit with different objectives. DBNs are made of several consecutive stages, similar to overlapping clusters in our model, where each stage extracts some features and feeds them to the following stage. The output of the last stage is then used for pattern classification. In contrast to DBNs, our associative memory model is not classifying patterns but rather recalling patterns from noisy versions. Also, overlapping clusters can operate in parallel to save time in information diffusion over a staged architecture.
In most deep or convolutional models, one not only has to find the proper dictionary for classification, but also calculate the features for each input pattern. This increases the complexity of the whole system when the objective is simply recall. Here the dictionary corresponds to the dual vectors from previously memorized patterns.
In this work, we reconsider the neural network model of , but introduce internal computation noise consistent with biology. Note that the sparsity of the model architecture is also consistent with biology . We find that there is actually a functional benefit to internal noise.
The benefit of internal noise has been noted previously in associative memory models with stochastic update rules, cf. , by analyzing attractor dynamics. In particular, it has been shown that noise may reduce recall time in associative memory tasks by pushing the system from one attractor state to another . However, our framework differs from previous approaches in three key aspects. First, our memory model is different, which makes extension of previous analysis nontrivial. Second, and perhaps most importantly, pattern retrieval capacity in previous approaches decreases with internal noise, cf. [39, Figure 6.1], in that increasing internal noise helps correct more external errors, but also reduces the number of memorizable patterns. In our framework, internal noise does not affect pattern retrieval capacity (up to a threshold) but improves recall performance. Finally, our noise model has bounded rather than Gaussian noise, and so a suitable network may achieve perfect recall despite internal noise.
Reliably storing information in memory systems constructed completely from unreliable components is a classical problem in fault-tolerant computing [41, 42, 43], where typical models have used random access architectures with sequential correcting networks. Although direct comparison is difficult since notions of circuit complexity are slightly different, our work also demonstrates that associative memory architectures can store information reliably despite being constructed from unreliable components.
Ii Associative Memory Model
In this section, we introduce our main notation, the model of associative memories and noise. We also explain the recall algorithms.
Ii-a Notation and basic structure
In our model, a neuron can assume an integer-valued state from the set , interpreted as the short term firing rate of neurons. A neuron updates its state based on the states of its neighbor as follows. It first computes a weighted sum , where is the weight of the link from and is the internal noise, and then applies nonlinear function to .
An associative memory is represented by a weighted bipartite graph, , with pattern neurons and constraint neurons. Each pattern is a vector of length , where , . Following , the focus is on recalling patterns with strong local correlation among entries. Hence, we divide entries of each pattern into overlapping subpatterns of lengths . Due to overlaps, a pattern neuron can be a member of multiple subpatterns, as depicted in Figure 0(a). The th subpattern is denoted , and local correlations are assumed to be in the form of subspaces, i.e. the subpatterns form a subspace of dimension .
We capture the local correlations by learning a set of linear constraints over each subspace corresponding to the dual vectors orthogonal to that subspace. More specifically, let be a set of dual vectors orthogonal to all subpatterns of cluster . Then:
Eq. (1) can be rewritten as where is the matrix of dual vectors. Now we use a bipartite graph with connectivity matrix determined by to represent the subspace constraints learned from subpattern ; this graph is called cluster . We developed an efficient way of learning in , also used here. Briefly, in each iteration of learning:
Pick a pattern at random from the dataset;
Adjust weight vectors for and such that the projection of onto is reduced. Apply a sparsity penalty to favor sparse solutions.
This process repeats until all weights are orthogonal to the patterns in the dataset or the maximum iteration limit is reached. The learning rule allows us to assume the weight matrices are known and satisfy for all patterns in the dataset , in this paper.
For the forthcoming asymptotic analysis, we need to define acontracted graph whose connectivity matrix is denoted and has size . This is a bipartite graph in which constraints in each cluster are represented by a single neuron. Thus, if pattern neuron is connected to cluster , ; otherwise , see Figure 0(b). We also define the degree distribution from an edge perspective over , using
where (resp., ) equals the fraction of edges that connect to pattern (resp., cluster) nodes of degree .
Ii-B Noise model
There are two types of noise in our model: external errors and internal noise. As mentioned earlier, a neural network should be able to retrieve memorized pattern from its corrupted version due to external errors. We assume the external error is an additive vector of size , denoted by satisfying , whose entries assume values independently from 222Note that the proposed algorithms also work with larger noise values, i.e. from a set for some , see Sec. IV-A2; the noise model is presented here for simplicity. with corresponding probabilities and . The realization of the external error on subpattern is denoted . Note that the subspace assumption implies and for all .
Neurons also suffer from internal noise. We consider a bounded noise model, i.e. a random number uniformly distributed in the intervalsand for the pattern and constraint neurons, respectively ().
The goal of recall is to filter the external error to obtain the desired pattern as the correct states of the pattern neurons. When neurons compute noiselessly, this task may be achieved by exploiting the fact the set of patterns satisfy the set of constraints . However, it is not clear how to accomplish this objective when the neural computations are noisy. Rather surprisingly, we show that eliminating external errors is not only possible in the presence of internal noise, but that neural networks with moderate internal noise demonstrate better external error resilience.
Ii-C Recall algorithms
To efficiently deal with external errors in associative memory, we use two simple iterative message passing algorithms. The role of the first one, called the Intra-cluster algorithm and formally defined in Algorithm 1, is to correct at least a single external error in each cluster. However, without overlaps between clusters, the error resilience of this approach and the network in general is limited. The second algorithm, the Inter-cluster recall algorithm, exploits the overlaps: it helps clusters with external errors recover their correct states by using the reliable information from clusters that do not have external errors. The error resilience of the resulting combination thereby drastically improves.
To go further into details, and with abuse of notations, let and denote the message transmitted at iteration by pattern and constraint neurons, respectively. In the first iteration, we initialize the pattern neurons with a pattern randomly drawn from the dataset, , corrupted with some external noise, . Thus, . As a result, for cluster we have , where is the realization of the external error on cluster .
With these notations in mind, Algorithm 1 iteratively performs a series of forward and backward steps in order to remove (at least) one external error from its input domain. Assuming that the algorithm is applied to cluster , in the forward step of iteration the pattern neurons in cluster transmit their current states to their neighboring constraint neurons. Each constraint neuron then calculates the weighted sum of the messages it received over its input links. Nevertheless, since neurons suffer from internal noise, additional noise terms appear in the weighted sum, i.e., , where is the random internal noise affecting node . As before, we consider a bounded noise model for , i.e., it is uniformly distributed in the interval for some .
A non-zero input sum, excluding the effect of , is an indication of the presence of external errors among the pattern neurons. Thus, constraint neurons set to their states to the sign of the received weighted sum if its magnitude is larger than a fixed threshold, . More specifically, constraint neuron updates its state based on the received weighted sum according to the following rule
Here, is the vector of messages transmitted by the pattern neurons and is the random internal noise affecting node .333Note that although the values of can be shifted to , instead of to match our assumption that neural states are non-negative, we leave them as such to simplify later analysis.
In the backward step, the constraint neurons communicate their states to their neighboring pattern neurons. The pattern neurons then compute a normalized weighted sum on the messages they receive over their input link and update their current state if the amount of received (non-zero) feedback exceeds a threshold. Otherwise, they will retain their current state for the next round. More specifically, pattern node in cluster updates its state in round according to the equation below
where is the update threshold and
Note that is further mapped to the interval by saturating the values below and above to and respectively; this saturation is not stated mathematically for brevity. Here, is the degree of pattern node in cluster , is the vector of messages transmitted by the constraint neurons in cluster , and is the random internal noise affecting pattern node . Basically, the term reflects the (average) belief of constraint nodes connected to pattern neuron about its correct value. If is larger than a specified threshold it means most of the connected constraints suggest the current state is not correct, hence, a change should be made. Note this average belief is diluted by the internal noise of neuron . As mentioned earlier, is uniformly distributed in the interval , for some .
The error correction ability of Algorithm 1 is fairly limited, as determined analytically and through simulations in the sequel. In essence, Algorithm 1 can correct one external error with high probability, but degrades terribly against two or more external errors. Working independently, clusters cannot correct more than a few external errors, but their combined performance is much better. As clusters overlap, they help each other in resolving external errors: a cluster whose pattern neurons are in their correct states can always provide truthful information to neighboring clusters. This property is exploited in Algorithm 2 by applying Algorithm 1 in a round-robin fashion to each cluster. Clusters either eliminate their internal noise in which case they keep their new states and can now help other clusters, or revert back to their original states. Note that by such a scheduling scheme, neurons can only change their states towards correct values. This scheduling technique is similar in spirit to the peeling algorithm .
Iii Pattern Retrieval Capacity
Before proceeding to analyze recall performance, for completeness we review pattern retrieval capacity results from  to show that the proposed model is capable of memorizing an exponentially large number of patterns. First, note that since the patterns form a subspace, the number of patterns does not have any effect on the learning or recall algorithms (except for its obvious influence on the learning time). Thus, in order to show that the pattern retrieval capacity is exponential in , all we need to demonstrate is that there exists a training set with patterns of length for which , for some and .
Theorem 1 ().
Let be a matrix, formed by vectors of length with entries from the set . Furthermore, let for some . Then, there exists a set of vectors for which , with , and .
The proof is constructive: we create a dataset such that it can be memorized by the proposed neural network and satisfies the required properties, i.e. the subpatterns form a subspace and pattern entries are integer values from the set . The complete proof can be found in .
Iv Recall Performance Analysis
Now let us analyze recall error performance. The following lemma shows that if and are chosen properly, then in the absence of external errors the constraints remain satisfied and internal noise cannot result in violations. This is a crucial property for Algorithm 2, as it allows one to determine whether a cluster has successfully eliminated external errors (Step 4 of algorithm) by merely checking the satisfaction of all constraint nodes.
In the absence of external errors, the probability that a constraint neuron (resp. pattern neuron) in cluster makes a wrong decision due to its internal noise is given by (resp. ).
To calculate the probability that a constraint node makes a mistake when there are no external errors, consider constraint node whose decision parameter will be
Therefore, the probability of making a mistake will be
Thus, to make we will select . Note that this might not be possible in all cases since, as we will see, the minimum absolute value of network weights should be at least ; if is too large we might not be able to find a proper set of weights. Nevertheless, and assuming that it is possible to choose a proper , we will have
Now knowing that the constraint will not send any non-zero messages in the absence of external noise, we focus on the pattern neurons in the same circumstance. A given pattern node will receive a zero from all its neighbors among the constraint nodes. Therefore, its decision parameter will be . As a result, a mistake could happen if . The probability of this event is given by
Therefore, to make go to zero, we must select . ∎
In the sequel, we assume and so that and . However, an external error combined with internal noise may still push neurons to an incorrect state.
Given the above lemma and our neural architecture, we can prove the following surprising result: in the asymptotic regime of increasing number of iterations of Algorithm 2, a neural network with internal noise outperforms one without, with the pattern retrieval capacity remaining intact. Let us define the fraction of errors corrected by the noiseless and noisy neural network (parametrized by and ) after iterations of Algorithm 2 by and , respectively. Note that both and are non-decreasing sequences of . Hence, their limiting values are well defined: and .
Let us choose and so that and for all . For the same realization of external errors, we have .
We first show that the noisy network can correct any external error pattern that the noiseless counterpart can correct in the limit. If the noiseless decoder succeeds, then there is a non-zero probability that the noisy decoder succeeds in a given round as well (corresponding to the case that noise values are rather small). Since we do not introduce new errors during the application of Algorithm 2, the number of errors in the new rounds are smaller than or equal to the previous round, hence the probability of success is lower bounded by . If Algorithm 2 is applied times, then the probability of correcting the external errors at the end of round is . Since , for this probability tends to .
Now, we turn attention to cases where the noiseless network fails in eliminating external errors and show that there exist external error patterns, called stopping sets, for which the noisy network is capable of eliminating them while the noiseless network has failed; see Appendix A for further explication. Assuming that each cluster can eliminate external errors in their domain and in the absence of internal noise,444From the forthcoming Figure 2, we will see that in this case. stopping sets correspond to noise patterns in which each cluster has more than errors. Then Algorithm 2 cannot proceed any further. However, in the noisy network, there is a chance that in one of the rounds, the noise acts in favorably and the cluster could correct more than errors.555This is reflected in the forthcoming Figure 2, where the value of is larger when the network is noisy. In this case, if the probability of getting out of the stopping set is in each round, for some , then a similar argument to the previous case shows that when . ∎
It should be noted that if the amount of internal noise or external errors is too high, the noisy architecture will eventually get stuck just like the noiseless network. The high level idea why a noisy network outperforms a noiseless one comes from understanding stopping sets, realizations of external errors where the iterative Algorithm 2 cannot correct them all. We showed that the stopping set shrinks as we add internal noise and so the supposedly harmful internal noise helps Algorithm 2 to avoid stopping sets. Appendix A illustrates this notion further.
Theorem 2 suggests the only possible downside to using a noisy network is its possible running time in eliminating external errors: the noisy neural network may need more iterations to achieve the same error correction performance. Interestingly, our empirical experiments show that in certain scenarios, even the running time improves when using a noisy network.
Theorem 2 indicates that noisy neural networks (under our model) outperform noiseless ones, but does not specify the level of errors that such networks can correct. Now we derive a theoretical upper bound on error correction performance. To this end, let be the average probability that a cluster can correct external errors in its domain. The following theorem gives a simple condition under which Algorithm 2 can correct a linear fraction of external errors (in terms of ) with high probability. The condition involves and , the degree distributions of the contracted graph .
Under the assumptions that graph grows large and it is chosen randomly with degree distributions given by and , Algorithm 2 is successful if
The proof is based on the density evolution technique . Without loss of generality, assume we have , , and (and for .) but the proof can easily be extended if we have for . Let be the average probability that a super constraint node sends a failure message, i.e., that it can not correct external errors lying in its domain. Then, the probability that a noisy pattern neuron with degree sends an erroneous message to a particular neighbor among super constraint node is equal to the probability that none of its other neighboring super constraint nodes could have corrected its error, i.e.,
Averaging over we find the average probability of error in iteration :
Now consider a cluster that contains pattern neurons. This cluster will not send a failure message over its edge to a noisy pattern neuron in its domain with probability:
, if it is not connected to any other noisy neuron;
, if it is connected to exactly one other constraint neuron;
, if it is connected to exactly two other constraint neurons; and
, if it is connected to more than two other constraint neuron.
Averaging over yields:
where and are derivatives of the function with respect to .
It must be mentioned that the above theorem holds when the decision subgraphs for the pattern neurons in graph are tree-like for a depth of , where is the total number of number of iterations performed by Algorithm 2 .
Theorem 3 states that for any fraction of errors that satisfies the above recursive formula, Algorithm 2 will be successful with probability close to one. Note that the first fixed point of the above recursive equation dictates the maximum fraction of errors that our model can correct. For the special case of and , for all , we obtain , the same condition given in . Theorem 3 takes into account the contribution of all terms and as we will see, their values change as we incorporate the effect of internal noise and . Our results show that the maximum value of does not occur when the internal noise is equal to zero, i.e. , but instead when the neurons are contaminated with internal noise! As an example, Figure 2 illustrates how behaves as a function of in the network considered (note that maximum values are not at ). This finding suggests that even individual clusters are able to correct more errors in the presence of internal noise.
To estimate the values, we use numerical approaches.666Appendix B derives an analytical upper bound to estimate but this requires approximations that are loose. Given a set of clusters , for each cluster we randomly corrupt pattern neurons with noise. Then, we apply Algorithm 1 over this cluster and calculate the success rate once finished. We take the average of this rate over all clusters to end up with . The results of this approach are shown in Figure 2, where the value of is shown for and various noise amounts at the pattern neurons (specified by parameter ).
Now we consider simulation results for a finite system. To learn the subspace constraints (1) for each cluster we use the learning algorithm in . Henceforth, we assume that the weight matrix is known and given. In our setup, we consider a network of size with clusters. We have pattern nodes and constraint nodes in each cluster, on average. External error is modeled by randomly generated vectors with entries with probability and otherwise. Vector is added to the correct patterns, which satisfy (1). For recall, Algorithm 2 is used and results are reported in terms of Symbol Error Rate (SER) as the level of external error () or internal noise () is changed; this involves counting positions where the output of Algorithm 2 differs from the correct pattern.
Iv-A1 Symbol Error Rate as a function of Internal Noise
Figure 3 illustrates the final SER of our algorithm for different values of and . Remember that and quantify the level of noise in pattern and constraint neurons, respectively. Dashed lines in Figure 3 are simulation results whereas solid lines are theoretical upper bounds provided in this paper. As evident, there is a threshold phenomenon such that SER is negligible for and grows beyond this threshold. As expected, simulation results are better than the theoretical bounds. In particular, the gap is relatively large as moves towards one.
A more interesting trend in Figure 3 is the fact that internal noise helps in achieving better performance, as predicted by theoretical analysis (Theorem 2). Notice how moves towards one as increases.
This phenomenon is inspected more closely in Figure 4 where is fixed to while and vary. Figs. 4(a) and 4(b) display projected versions of the surface plot to investigate the effect of and separately. As we see again, a moderate amount of internal noise at both pattern and constraint neurons improves performance. There is an optimum point for which the SER reaches its minimum. Figure 4(b) indicates for instance that , beyond which SER deteriorates. There is greater sensitivity to noise in the pattern neurons, reminiscent of results for decoding circuits with internal noise .
Iv-A2 Larger noise values
So far, we have investigated the performance of the recall algorithm when noise values are limited to . Although this choice facilitates the analysis of the algorithm and increases error correction speed, our analysis is valid for larger noise values. Figure 6 illustrates the SER for the same scenario as before but with noise values chosen from . We see exactly the same behavior as we witnessed for noise values.
Iv-B Recall Time as a function of Internal Noise
Figure 7 illustrates the number of iterations performed by Algorithm 2 for correcting the external errors when is fixed to . We stop whenever the algorithm corrects all external errors or declare a recall error if all errors were not corrected in iterations. Thus, the corresponding areas in the figure where the number of iterations reaches indicates decoding failure. Figs. 7(a) and 7(b) are projected versions of Figure 7 and show the average number of iterations as a function of and , respectively.
The amount of internal noise drastically affects the speed of Algorithm 2. First, from Figure 7 and 7(b) observe that running time is more sensitive to noise at constraint neurons than pattern neurons and that the algorithms become slower as noise at constraint neurons is increased. In contrast, note that internal noise at the pattern neurons may improve the running time, as seen in Figure 7(a). Ordering of sensitivity to noise in pattern neurons and in constraint neurons is opposite for running time as compared to error probability.
Note that the results presented so far are for the case where the noiseless decoder succeeds as well and its average number of iterations is pretty close to the optimal value (see Figure 7). Figure 9 illustrates the number of iterations performed by Algorithm 2 for correcting the external errors when is fixed to . In this case, the noiseless decoder encounters stopping sets while the noisy decoder is still capable of correcting external errors. Here we see that the optimal running time occurs when the neurons have a fair amount of internal noise. Figs. 9(b) and 9(a) are projected versions of Figure 9 and show the average number of iterations as a function of and , respectively.
Iv-C Effect of internal noise on the performance in absence of external noise
Now we provide results of a study for a slightly modified setting where there is only internal noise and no external errors and further . Thus, the internal noise can now cause neurons to make wrong decisions, even in the absence of external errors. With abuse of notation, we assume pattern neurons are corrupted with a noise added to them with probability . The rest of the model setting is the same as before.
Figure 11 illustrates the effect of internal noise as a function of and , the noise parameters at the pattern and constraint nodes, respectively. This behavior is shown in Figs. 11(a) and 11(b) for better inspection. Here, we witness the more familiar phenomenon where increasing the amount of internal noise results in a worse performance. This finding emphasizes the importance of choosing update threshold and properly, according to Lemma 1. See Appendix C for details on choosing thresholds.
We have demonstrated that associative memories with exponential capacity still work reliably even when built from unreliable hardware, addressing a major problem in fault-tolerant computing and further arguing for the viability of associative memory models for the (noisy) mammalian brain. After all, brain regions modeled as associative memories, such as the hippocampus and the olfactory cortex, certainly do display internal noise [18, 46, 21].
The linear-nonlinear computations of Algorithm 1 are nearly identical to message-passing algorithms such as belief propagation and are certainly biologically plausible [47, 48, 49, 50, 51, 52, 53]. The state reversion computation of Algorithm 2 requires keeping a state variable for a short amount of time which has been suggested as realistic for biological neurons , but the general biological plausibility of Algorithm 2 remains an open question.
We found a threshold phenomenon for reliable operation, which manifests the tradeoff between the amount of internal noise and the amount of external noise that the system can handle. In fact, we showed that internal noise actually improves the performance of the network in dealing with external errors, up to some optimal value. This is a manifestation of the stochastic facilitation  or noise enhancement  phenomenon that has been observed in other neuronal and signal processing systems, providing a functional benefit to variability in the operation of neural systems.
The associative memory design developed herein uses thresholding operations in the message-passing algorithm for recall; as part of our investigation, we optimized these neural firing thresholds based on the statistics of the internal noise. As noted by Sarpeshkar in describing the properties of analog and digital computing circuits, “In a cascade of analog stages, noise starts to accumulate. Thus, complex systems with many stages are difficult to build. [In digital systems] Round-off error does not accumulate significantly for many computations. Thus, complex systems with many stages are easy to build” . One key to our result is capturing this benefit of digital processing (thresholding to prevent the build up of errors due to internal noise) as well as a modular architecture which allows us to correct a linear number of external errors (in terms of the pattern length).
This paper focused on recall, however learning is the other critical stage of associative memory operation. Indeed, information storage in nervous systems is said to be subject to storage (or learning) noise, in situ noise, and retrieval (or recall) noise [17, Figure 1]. It should be noted, however, there is no essential loss by combining learning noise and in situ noise into what we have called external error herein, cf. [43, Fn. 1 and Prop. 1]. Thus our basic qualitative result extends to the setting where the learning and stored phases are also performed with noisy hardware.
Going forward, it is of interest to investigate other neural information processing models that explicitly incorporate internal noise and see whether they provide insight into observed empirical phenomena. As an example, we might be able to explain the threshold phenomenon observed in the symbol error rate of human telegraph operators under heat stress [56, Figure 2], by invoking a thermal internal noise explanation. Returning to engineering, internal noise in decoders for limited-length error-correcting codes may improve performance as observed herein, since stopping sets are a limiting phenomenon in that setting also.
We thank S. S. Venkatesh for telling us about .
Appendix A Illustrating Proof of Theorem 2
Figure 12(a) illustrates an example of a stopping set over the graph in our empirical studies. In the figure, only the nodes corrupted with external noise are shown for clarity. Pattern neurons that are connected to at least one cluster with a single error are colored blue and other pattern neurons are colored red. Figure 12(b) illustrates the same network but after a number of decoding iterations that result in the algorithm getting stuck. We have a stopping set in which no cluster has a single error and the algorithm cannot proceed further since for in a noiseless architecture. Thus, the external error cannot get corrected.
As evident from Figure 13, the stopping set is the result of clusters not being able to correct more than one external error; this is where internal noise might come to the rescue. Interestingly, an “unreliable” neural circuit in which could easily get out of the stopping set shown in Figure 12(b) and correct all of the external errors. Because we try several times to correct errors in a cluster (and overall in the network) while making sure that the algorithm does not introduce new errors itself. Thus, the noise might act in our favor in one of these attempts and the algorithm might be able to avoid stopping set as depicted in Figure 13.
Appendix B Estimating Theoretically
To bound , consider four event probabilities for a cluster:
(resp. ): The probability that a constraint neuron (resp. pattern neuron) in cluster makes a wrong decision due to its internal noise when there is no external noise introduced to cluster , i.e. .
(resp. ): The probability that a constraint neuron (resp. pattern neuron) in cluster makes a wrong decision due to its internal noise when one input error (external noise) is introduced, i.e. .
We derive an upper bound on the probability a constraint node makes a mistake in the presence of one external error.
In the presence of a single external error, the probability that a constraint neuron makes a wrong decision due to its internal noise is given by
where is the minimum absolute value of the non-zero weights in the neural graph and is chosen such that .777This condition can be enforced during simulations as long as is not too large, which itself is determined by the level of constraint neuron internal noise, , as we must have .
Without loss of generality, assume it is the first pattern node, , that is corrupted with noise . Now calculate the probability that a constraint node makes a mistake in such circumstances. We only need analyze constraint neurons connected to since the situation for other constraint neurons is as when there is no external error. For a constraint neuron connected to , the decision parameter is
We consider two error events:
A constraint node makes a mistake and does not send a message at all. The probability of this event is denoted by .
A constraint node makes a mistake and sends a message with the opposite sign. The probability of this event is denoted by .
We first calculate the probability of . Without loss of generality, assume the so that the probability of an error of type two is as follows (the case for is exactly the same):
However, since and , then and . Therefore, the constraint neurons will never send a message that has an opposite sign to what it should have. All that remains is to calculate the probability they remain silent by mistake.
To this end, we have
This can be simplified if we assume that the absolute values of all weights in the network are bigger than a constant . Then, the above equation will simplify to
Putting the above equations together, we obtain:
In the case , we could even manage to make this probability equal to zero. However, we will leave it as is and use (15) to calculate .
We start by calculating the probability that a non-corrupted pattern node makes a mistake, which is to change its state in round . Let us denote this probability by . Now to calculate assume has degree and it has common neighbors with , the corrupted pattern node.
Out of these common neighbors, will send messages and the others will, mistakenly, send nothing. Thus, the decision making parameter of pattern node , , will be bounded by
We denote by for brevity from this point on.
In this circumstance, a mistake happens when . Thus
where represents the neighborhood of pattern node among constraint nodes.
By simplifying (16) we get
We now average this equation over , , and . To start, suppose that out of the non-zero messages node receives, of them have the same sign as the link they are being transmitted over. Thus, we will have . Assuming the probability of having the same sign for each message is , the probability of having equal signs out of elements will be . Thus, we will get
Now note that the probability of having mistakes from the constraint side is given by . With some abuse of notations we get:
Finally, the probability that and have common neighbors can be approximated by , where is the average degree of pattern nodes. Thus (again abusing some notation), we obtain:
where is given by (16), is the probability of having common neighbors and is estimated by , with being the average degree of pattern nodes in cluster . Furthermore, is the probability of having out of these nodes making mistakes. Hence, . We will not simplify the above equation any further and use it as it is in our numerical analysis in order to obtain the best parameter .
Now we turn our attention to the probability that the corrupted node, , makes a mistake, which is either not to update at all or update itself in the wrong direction. Recalling that we have assume the external noise term in to be a noise, the wrong direction would be for node to increase its current value instead of decreasing it. Furthermore, we assume that out of neighbors of , some of them have made a mistake and will not send any messages to . Thus, the decision parameter of , will be . Denoting the probability of making a mistake at by we get:
which simplifies to
Noting that the probability of making mistakes on the constraint side is , we get
where is given by (21).
Putting the above results together, the overall probability of making a mistake on the side of pattern neurons when we have one bit of external noise is
Finally, the probability that cluster could correct one error is that all neurons take the correct decision, i.e.