I Introduction
Neural networks are famous for their ability to learn and reliably perform a required task. An important example is the case of (associative) memory where we are asked to memorize (learn) a set of given patterns. Later, corrupted versions of the memorized patterns will be shown to us and we have to return the correct memorized patterns. In essence, this problem is very similar to the one faced in communication systems where the goal is to reliably transmit and efficiently decode a set of patterns (so called codewords) over a noisy channel.
As one would naturally expect, reliability is certainly a very important issue both the in neural associative memories and in communication systems. Indeed, the last three decades witnessed many reliable artificial associative neural networks. See for instance [4], [13], [14], [10], [12], [18].
However, despite common techniques and methods deployed in both fields (e.g., graphical models, iterative algorithms, etc), there has been a quantitative difference in terms of another important criterion: the efficiency.ver the past decade, by using probabilistic graphical models in communication systems it has become clear that the number of patterns that can be reliably transmitted and efficiently decoded over a noisy channel is exponential in , length of the codewords, [20]. However, using current neural networks of size to memorize a set of randomly chosen patterns, the maximum number of patterns that can be reliably memorized scales linearly in [11], [13].
There are multiple reasons for the inefficiency of the storage capacity of neural networks. First, neurons can only perform simple operations. As a result, most of the techniques used in communication systems (more specifically in coding theory) for achieving exponential storage capacity are prohibitive in neural networks. Second, a large body of past work (e.g., [4], [13], [14], [10]) followed a common assumption that a neural network should be able to memorize any
subset of patterns drawn randomly from the set of all possible vectors of length
. Although this assumption gives the network a sense of generality, it reduces its storage capacity to a great extent.An interesting question which arises in this context is whether one can increase the storage capacity of neural networks beyond the current linear scaling and achieve results similar to coding theory. To this end, Kumar et al. [2] suggested a new formulation of the problem where only a suitable set of patterns was considered for storing. This way they could show that the performance of neural networks in terms of storage capacity increases significantly. Following the same philosophy, we will focus on memorizing a random subset of patterns of length such that the dimension of the training set is . In other words, we are interested in memorizing a set of patterns that have a certain degree of structure and redundancy. We exploit this structure both to increase the number of patterns that can be memorized (from linear to exponential) and to increase the number of errors that can be corrected when the network is faced with corrupted inputs.
The success of [2] is mainly due to forming a bipartite network/graph (as opposed to a complete graph) whose role is to enforce the suitable constraints on the patterns, very similar to the role played by Tanner graphs in coding. More specifically, one layer is used to feed the patterns to the network (so called variable nodes in coding) and the other takes into account the inherent structure of the input patterns (so called check nodes in coding). A natural way to enforce structures on inputs is to assume that the connectivity matrix of the bipartite graph is orthogonal to all of the input patterns. However, the authors in [2] heavily rely on the fact that the bipartite graph is fully known and given, and satisfies some sparsity and expansion properties. The expansion assumption is made to ensure that the resulting set of patterns are resilient against fair amount of noise. Unfortunately, no algorithm for finding such a bipartite graph was proposed.
Our main contribution in this paper is to relax the above assumptions while achieving better error correction performance. More specifically, we first propose an iterative algorithm that can find a sparse bipartite graph that satisfies the desired set of constraints. We also provide an upper bound on the block error rate of the method that deploys this learning strategy. We then proceed to devise a multilayer network whose performance in terms of error tolerance improves significantly upon [2] and no longer needs to be an expander.
The remainder of this paper is organized as follows. In Section II we formally state the problem that is the focus of this work, namely neural association for a network of nonbinary neurons. We then provide an overview of the related work in this area in Section IIA. We present our pattern learning algorithm in Section III and the multilevel network design in Section IV. The simulations supporting our analytical results are shown in Section VI. Finally future works are explained in Section VII.
Ii Problem Formulation
In contrast to the mainstream work in neural associative memories, we focus on nonbinary neurons, i.e., neurons that can assume a finite set of integer values for their states (where ). A natural way to interpret the multilevel states is to think of the shortterm (normalized) firing rate of a neuron as its output. Neurons can only perform simple operations. In particular, we restrict the operations at each neuron to a linear summation over the inputs, and a possibly nonlinear thresholding operation. In particular, a neuron updates its state based on the states of its neighbors as follows:

It computes the weighted sum where denotes the weight of the input link from .

It updates its state as where is a possibly nonlinear function from the field of real numbers to .
Neural associative memory aims to memorize patterns of length by determining the weighted connectivity matrix of the neural network (learning phase) such that the given patterns are stable states of the network. Furthermore, the network should be able to tolerate a fair amount of noise so that it can return the correct memorized pattern in response to a corrupted query (recall phase). Among the networks with these two abilities, the one with largest is the most desirable.
We first focus on learning the connectivity matrix of a neural graph which memorizes a set of patterns having some inherent redundancy. More specifically, we assume to have vectors of length with nonnegative integer entries, where these patterns form a subspace of dimension . We would like to memorize these patterns by finding a set of nonzero vectors
that are orthogonal to the set of given patterns. Furthermore, we are interested in rather sparse vectors. Putting the training patterns in a matrix
and focusing on one such vector , we can formulate the problem as:(1) 
subject to
(2) 
where determines the degree of sparsity and prevents the allzero solution. A solution to the above problem yields a sparse bipartite graph which corresponds to the basis vectors of the null space specified by the patterns in the training set. In other words, the inherent structure of the patterns is captured in terms of linear constraints on the entries of the patterns in the training set. It can therefore be described by Figure 1 with a connectivity matrix such that for all .
In the recall phase, the neural network is fed with noisy inputs. A possibly noisy version of an input pattern is initialized as the states of the pattern neurons . Here, we assume that the noise is integer valued and additive^{1}^{1}1It must be mentioned that neural states below and above will be set to and , respectively.. In formula, we have where is the noise added to pattern and we used the fact that . Therefore, one can use to eliminate the input noise . Consequently, we are searching an algorithm that can provably eliminate the effect of noise and return the correct pattern.
Remark 1.
A solution in the learning/recall phase is acceptable only if it can be found by simple operations at neurons.
Before presenting our solution, we briefly overview the relation between the previous works and the one presented in this paper.
Iia Related Works
Designing a neural network capable of learning a set of patterns and recalling them later in presence of noise has been an active topic of research for the past three decades. Inspired by the Hebbian learning rule [8], Hopfield in his seminal work [4] introduced the Hopfield network: an autoassociative neural mechanism of size with binary state neurons in which patterns are assumed to be binary vectors of length
. The capacity of a Hopfield network under vanishing bit error probability was later shown to be
by Amit et al. [6]. Later on, McEliece et al. proved that the capacity of Hopfield networks under vanishing block error probability requirement is [11]. Similar results were obtained for sparse regular neural network in [9]. It is also known that the capacity of neural associative memories could be enhanced if the patterns are sparse in the sense that at any time instant many of the neurons are silent [7]. However, even these schemes fail when required to correct a fair amount of erroneous bits as the information retrieval is not better compared to that of normal networks.In addition to neural networks capable of learning patterns gradually, in [13], the authors calculate the weight matrix offline (as opposed to gradual learning) using the pseudoinverse rule [7] which in return help them improve the capacity of a Hopfield network to random patterns with the ability of one bit error correction.
Due to the low capacity of Hopfield networks, extension of associative memories to nonbinary neural models has also been explored in the past. Hopfield addressed the case of continuous neurons and showed that similar to the binary case, neurons with states between and can memorize a set of random patterns, albeit with less capacity [5]. In [14]
the authors investigated a multistate complexvalued neural associative memories for which the estimated capacity is
. Under the same model but using a different learning method, Muezzinoglu et al. [10] showed that the capacity can be increased to . However the complexity of the weight computation mechanism is prohibitive. To overcome this drawback, a Modified Gradient Descent learning Rule (MGDR) was devised in [15].Given that even very complex offline learning methods can not improve the capacity of binary or multisate Hopfield networks, a line of recent work has made considerable efforts to exploit the inherent structure of the patterns in order to increase both capacity and error correction capabilities. Such methods either make use of higher order correlations of patterns or focus merely on those patterns that have some sort of redundancy. As a result, they differ from previous methods for which every possible random set of patterns was considered. Pioneering this prospect, Berrou and Gripon [18] achieved considerable improvements in the pattern retrieval capacity of Hopfield networks, by utilizing cliquebased coding. In some cases, the proposed approach results in capacities of around , which is much larger than in other methods. In [12], the authors used low correlation sequences similar to those employed in CDMA communications to increase the storage capacity of Hopfield networks to without requiring any separate decoding stage.
In contrast to the pairwise correlation of the Hopfield model [4], Peretto et al. [17] deployed higher order neural models: the state of the neurons not only depends on the state of their neighbors, but also on the correlation among them. Under this model, they showed that the storage capacity of a higherorder Hopfield network can be improved to , where is the degree of correlation considered. The main drawback of this model was again the huge computational complexity required in the learning phase. To address this difficulty while being able to capture higherorder correlations, a bipartite graph inspired from iterative coding theory was introduced in [2]. Under the assumptions that the bipartite graph is known, sparse, and expander, the proposed algorithm increased the pattern retrieval capacity to , for some . The main drawbacks in the proposed approach is the lack of a learning algorithm as well as the assumption that the weight matrix should be an expander. The sparsity criterion on the other hand, as it was noted by the authors, is necessary in the recall phase and biologically more meaningful.
In this paper, we focus on solving the above two problems in [2]. We start by proposing an iterative learning algorithm that identifies a sparse weight matrix . The weight matrix should satisfy a set of linear constraints for all the patterns in the training data set, where . We then propose a novel network architecture which eliminates the need for the expansion criteria while achieving better performance than the error correction algorithm proposed in [2].
Constructing a factorgraph model for neural associative memory has been also addressed in [22]. However, there, the authors propose a general messagepassing algorithm to memorize any set of random patterns while we focus on memorizing patterns belonging to subspaces with sparsity in mind as well. The difference would again be apparent in the pattern retrieval capacity (linear vs. exponential in network size).
Learning linear constraints by a neural network is hardly a new topic as one can learn a matrix orthogonal to a set of patterns in the training set (i.e., ) using simple neural learning rules (we refer the interested readers to [3] and [16]). However, to the best of our knowledge, finding such a matrix subject to the sparsity constraints has not been investigated before. This problem can also be regarded as an instance of compressed sensing [21], in which the measurement matrix is given by the big patterns matrix and the set of measurements are the constraints we look to satisfy, denoted by the tall vector , which for simplicity reasons we assume to be all zero. Thus, we are interested in finding a sparse vector such that .
Iii Learning Algorithm
We are interested in an iterative algorithm that is simple enough to be implemented by a network of neurons. Therefore, we first relax (II) as follows:
(3) 
In the above problem, we have approximated the constraint with since is not a wellbehaved function. The function is chosen such that it favors sparsity. For instance one can pick to be , which leads to norm minimizations. In this paper, we consider the function
where is chosen appropriately. By calculating the derivative of the objective function and primaldual optimization techniques we obtain the following iterative algorithm for (3):
(4) 
(5) 
(6) 
(7) 
where denotes the iteration number, is the transpose of matrix , and are small step sizes and denotes .
For our choice of , the entry of the function , denoted by reduces to . For very small values of , and for large values of , . Therefore, by looking at (5) we see that the last term is pushing small values in towards zero while leaving the larger values intact. Therefore, we remove the last term completely and enforce small entries to zero in each update which in turn enforces sparsity. The final iterative learning procedure is shown in Algorithm 1.
Here, is a positive threshold at iteration and is the pointwise softthresholding function given below:
(8) 
Remark 2.
the above choice of softtheresholding function is very similar to the one selected by Donoho et al. in [1] in order to recover a sparse signal from a set of measurements. The authors prove that their choice of softthreshold function results in optimal sparsityundersampling tradeoff.
The next theorem derives the necessary conditions on , and such that Algorithm 1 converges to a sparse solution.
Theorem 1.
If as and if is bounded above by , then there is a proper choice of in every iteration that ensures constant decrease in the objective function . Here and . For , i.e. , picking ensures gradual convergence.
Sketch of the proof.
Let . We would like Let . We would like to show that for all iterations . To this end, let us denote by . Furthermore, let the function be . Rewriting the second step of algorithm (1) we will have:
(9) 
Now we have
(10)  
where the last inequality follows because . Now expanding we will get
(11)  
Denoting the matrix by , we can further simplify inequality (10):
(12)  
Where . Therefore, if we set (i.e. as ) and ensuring we get . The second requirement requires that for all elements of . Therefore, by letting we must have the following relationship for diagonal elements:
(13) 
which yields
Since for all and , the right hand side of the above inequality is satisfied if . The lefthand side is satisfied for , where . Therefore, if there exists and ensuring . If , this is simply equivalent to having .
∎
Iv Multilevel Network Architecture
In the previous section, we discussed the details of a simple iterative learning algorithm which yields rather sparse graphs. Now in the recall phase, we propose a network structure together with a simple error correction algorithm (similar to the one in [2]) to achieve good block error rates in response to noisy input patterns. The suggested network architecture is shown in Figure 2. To make the description clear and simple we only concentrate on a twolevel neural network. However, the generalization of this idea is trivial and left to the reader.
The proposed approach is in contrast to the one in suggested in [2] where the authors exploit a singlelevel neural network with a sparse and expander connectivity graph to correct at least two initial input errors. However, enforcing expansion on connectivity graphs in a gradual neural learning algorithm is extremely difficult, specially when the algorithm is required to be very simple Therefore, we use the learning algorithm explained above, which yields a rather sparse and not necessarily expander graph, and improve the error correction capabilities by modifying the network structure and error correcting algorithm.
Note that in practice, we replace the condition and with and for some small positive number .
The idea behind this new architecture is that we divide the input pattern of size into subpatterns of length . Now we feed each subpattern to a neural network which enforces constraints^{3}^{3}3The number of constraints for different networks can vary. For simplicity of notifications we assume equal sizes. on the subpattern in order to correct the input errors. The local networks in the first level and the global network in the second level use Algorithm 2, which is a variant of the ”bitflipping” method proposed in [2], to correct the errors. Intuitively, if the states of the pattern neurons correspond to a pattern from (i.e., the noisefree case), then for all we have . The quantity can be interpreted as feedback to pattern neuron from the constraint neurons. Hence, the sign of provides an indication of the sign of the noise that affects , and indicates the confidence level in the decision regarding the sign of the noise.
Theorem 2.
Algorithm 2 can correct a single error in the input pattern with high probability if is chosen large enough.
Proof.
In the case of a single error, we are sure that the corrupted node will always be updated towards the correct direction. For simplicity, let’s assume the first pattern neuron is the noisy one. Furthermore, let be the noise vector. Denoting the column of the weight matrix by , we will have . Then in algorithm 2 . This means that the noisy node gets updated towards the correct direction.
Therefore, the only source of error would be a correct node gets updated mistakenly. Let denote the probability that a correct pattern neuron gets updated. This happens if . For , this is equivalent to having . Note that in cases that the neighborhood of is different from the neighborhood of among the constraint nodes. More specifically, in the case that , there are nonzero entries in while is zero and viceversa. Therefore, letting being the probability of , we note that
Therefore, to get an upper bound on , we bound .
Let be the fraction of pattern neurons with degree , be the average degree of pattern neurons and finally be the minimum degree of pattern neurons. Then, we know that a noisy pattern neuron is connected to constraint neurons on average. Therefore, the probability of and share exactly the same neighborhood would be:
(14) 
Taking the average over the pattern neurons, we have
(15)  
where is the set of correct nodes at iteration and .
Therefore, the probability of correcting one noisy input, would be
(16)  
∎
Given that each local network is able to correct one pattern, such networks can correct input errors if they are separated such that only one error appears in the input of each local network. Otherwise, there would be a probability that the network could not handle the errors. In that case, we feed the overall pattern of length to the second layer with the connectivity matrix , which enforces global constraints. And since the probability of correcting two erroneous nodes increases with the input size, we expect to have a better error correction probability in the second layer. Therefore, using this simple scheme we expect to gain a lot in correcting errors in the patterns. In the next section, we provide simulation results which confirm our expectations and show that the block error rate can be improved by a factor of in some cases.
Iva Some remarks
First of all, one should note that the above method only works if there is some redundancy at the global level as well. If the set of weight matrices define completely separate subspaces in the dimensional space, then for sure we gain nothing using this method.
Secondly, there is no need to have the dimension of the subspaces to be equal to each other. We can have different lengths for the subpatterns belonging to each subspace and different number of constraints for that particular subspace. This gives us more degree of freedom as well since we can spend some time to find the optimal length of each subpattern for a particular training data set.
Thirdly, the number of constraints for the second layer affects the gain one obtains in the error performance. Intuitively, if the number of global constraints is large, we are enforcing more constraints so we expect obtaining a better error performance. We can think of determining the number even adaptively, i.e. if the error performance that we are getting is unacceptable, we can look deeper in patterns to identify their internal structure by searching for more constraints. This would be a subject of our future research.
V Pattern Retrieval Capacity
the following theorem will prove that the proposed neural architecture is capable of memorizing an exponential number of patterns.
Theorem 3.
Let be a matrix, formed by vectors of length with nonnegative integers entries between and . Furthermore, let for some . Then, there exists a set of such vectors for which , with , and , and such that they can be memorized by the neural network given in figure 2.
Proof.
The proof is based on construction: we construct a data set with the required properties, namely the entries of patterns should be nonnegative, patterns should belong to a subspace of dimension and each subpattern of length belongs to a subspace of dimension .
To start, consider a matrix with rank and , with . Let the entries of be nonnegative integers, between and , with . Furthermore, let be the submatrices of , where comprises of the columns to of . Finally, assume in each submatrix we have exactly nonzero rows with .
We start constructing the patterns in the data set as follows: consider a random vector with integervaluedentries between and , where . We set the pattern to be , if all the entries of are between and . Obviously, since both and have only nonnegative entries, all entries in are nonnegative. Therefore, it is the upper bound that we have to worry about.
The entry in is equal to , where is the column of . Suppose has nonzero elements. Then, we have:
Therefore, denoting , we could choose , and such that
(17) 
to ensure all entries of are less than .
As a result, since there are vectors with integer entries between and , we will have patterns forming . Which means , which would be an exponential number in if . ∎
Vi Simulation Results
We have simulated the proposed learning algorithm in the multilevel architecture to investigate the block error rate of the suggested approach and the gain we obtain in error rates by adding a second level. We constructed local networks, each with pattern and constraint nodes.
Via Learning Phase
We generated a sample data set of patterns of length where each block of belonged to a subspace of dimension . Note that can be an exponential number in . However, we selected as an example to show the performance of the algorithm because even for small values of , and exponential number in will become too large to handle numerically. The result of the learning algorithm is four different local connectivity matrices as well as a global weight matrix . The number of local constraints was and the number of global constraints was , where is dimension of the subspace for overall pattern. The learning steps are done until of the patterns in the training set converged. Table I summarizes other simulation parameters.
Parameter  (when )  

Value 
For cases where , was fixed to .
Table II shows the average number of iterations executed before convergence is reached for different constraint nodes at the local and global level. It also gives the average sparisty ratio for the columns of matrix . The sparsity ratio is defined as , where is the number of nonzero elements. From the figure one notices that as increases, the vectors become sparser.
Sparsity Ratio  Convergence Rate  

Local  
Global 
ViB Recall Phase
For the recall phase, in each trial we pick a pattern randomly from the training set, corrupt a given number of its symbols with noise and use the suggested algorithm to correct the errors. As mentioned earlier, the errors are corrected first at the local and the at the global level. When finished, we compare the output of the first and the second level with the original (uncorrupted) pattern . A pattern error is declared if the output does not match at each stage. Table III shows the simulation parameters in the recall phase.
Parameter  

Value 
Figure 3 illustrates the pattern error rates with two different values of and . The results are also compared to that of the bitflipping algorithm in [2] to show the improved performance of the proposed algorithm. As one can see, having a larger number of constraints at the global level, i.e. having a smaller , will result in better pattern error rates at the end of the second stage. Furthermore, note that since we stop the learning after of the patterns had learned, it is natural to see some recall errors even for initial erroneous node.
Table IV shows the gain we obtain by adding an additional second level to the network architecture. The gain is calculated as the ratio between the pattern error rate at the output of the first layer and the pattern error rate at the output of the second layer.
Number of initial errors  Gain for  Gain for 
ViC Comparison with Previous Work
For the sake of completeness, table LABEL:table_comparison compares the proposed algorithm with the previous work from three different perspectives: the pattern retrieval capacity, the number of initial errors that can be corrected in the recall phase (denoted by ), the existence of an online iterative learning algorithm, and if there are any other restrictions such as the focus of the algorithm on particular patterns with some redundancy. In all cases it is assumed that the length of patterns is .
[
caption = Neural associative memories compared together for a pattern of size ,
label = table_comparison,
pos = h,
]—c—c—c—c—c—
[a]The authors do not provide exact relationship for the pattern retrieval capacity. However, they show that for a particular setup with , we have .
[b]PWR stands for Patterns With Redundancy.
[c] is the order of correlations considered among patterns.
[d] is determined according to network parameters.
[e]EG stands for Expander Graphs.
Algorithm & & & Learning? & Restrictions?
[4] & & & yes & no
[13] & & & no & no
[14] & & & no & no
[10] & & & no & no
[18] & [a] & & yes & PWR[b]
[17] & & & yes & no
[2] & [b] & & no & PWR[d], EG[e]
This paper & [b] & & yes & PWR[d]
Vii Future Works
In order to extend the multilevel neural network, we must first find a way to generate patterns that belong to a subspace with dimensions , where lies within the inside of bounds . This will give us a way to investigate the trade off between the maximum number of memorizable patterns and the degree of error correction possible.
Furthermore, so far we have assumed that the second level enforces constraints in the same space. However, it is possible that the second level imposes a set of constraints in a totally different space. For this purpose, we need a mapping from one space into another. A good example is the written language. While they are local constraints on the spelling of the words, there are some constraints enforced by the grammar or the overall meaning of a sentence. The latter constraints are not on the space of letters but rather the space of grammar or meaning. Therefore, in order to for instance to correct an error in the word , we can replace with either , to get hat, or to get cat. Without any other clue, we can not find the correct answer. However, let’s say say we have the sentence ”The at ran away”. Then from the constraints in the space of meaning we know that the subject must be an animal or a person. Therefore, we can return cat as the correct answer. Finding a proper mapping is the subject of our future work.
Acknowledgment
The authors would like to thank Prof. Amin Shokrollahi for helpful comments and discussions. This work was supported by Grant 228021ECCSciEng of the European Research Council.
References
 [1] D. L. Donoho, A. Maleki, A. Montanari, Message passing algorithms for compressed sensing, Proc. Nat. Acad. Sci., Vol. 106, 2009, pp. 18914–18919.
 [2] K.R. Kumar, A.H. Salavati and A. Shokrollahi, Exponential pattern retrieval capacity with nonbinary associative memory, Proc. IEEE Int. Theory Work., 2011, pp. 8084.

[3]
L. Xu, A. Krzyzak, E. Oja, Neural nets for dual subspace pattern recognition method, Int. J. Neur. Syst., Vol. 2, No. 3, 1991, pp. 169184.
 [4] J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci., Vol. 79, 1982, pp. 25542558.
 [5] J. J. Hopfield, Neurons with graded response have collective computational properties like those of twostate neurons, Proc. Natl. Acad. Sci., Vol. 81, No. 10, 1984, pp. 3088  3092.
 [6] D. Amit, H. Gutfreund, H. Sompolinsky, Storing infinite numbers of patterns in a spinglass model of neural networks, Physic. Rev. Lett., Vol. 55, 1985, pp. 15301533.
 [7] J. Hertz, A. Krogh, R. G. Palmer, Introduction to the theory of neural computation, USA: AddisonWesley, 1991.
 [8] D. O. Hebb, The organization of behavior, New York: Wiley Sons, 1949.
 [9] J. Komlos, R. Paturi, Effect of connectivity in an associative memory model, J. Computer and System Sciences, 1993, pp. 350373.
 [10] M. K. Muezzinoglu, C. Guzelis, J. M. Zurada, A new design method for the complexvalued multistate Hopfield associative memory, IEEE Trans. Neur. Net., Vol. 14, No. 4, 2003, pp. 891899.
 [11] R. McEliece, E. Posner, E. Rodemich, S. Venkatesh, The capacity of the Hopfield associative memory, IEEE Trans. Inf. Theory, Jul. 1987.
 [12] A. H. Salavati, K. R. Kumar, W. Gerstner, A. Shokrollahi, Neural Precoding Increases the Pattern Retrieval Capacity of Hopfield and Bidirectional Associative Memories, IEEE Intl. Symp. Inform. Theory (ISIT11), 2011, pp. 850854.
 [13] S. S. Venkatesh, D. Psaltis, Linear and logarithmic capacities in associative neural networks, IEEE Trans. Inf. Theory, Vol. 35, No. 3, 1989, pp. 558568.
 [14] S. Jankowski, A. Lozowski, J.M., Zurada, Complexvalued multistate neural associative memory, IEEE Tran. Neur. Net., Vol. 1 , No. 6, 1996, pp. 14911496.
 [15] D. L. Lee, Improvements of complexvalued Hopfield associative memory by using generalized projection rules, IEEE Tran. Neur. Net.,Vol. 12, No. 2, 2001, pp. 439443.
 [16] E. Oja, T. Kohonen, The subspace learning algorithm as a formalism for pattern recognition and neural networks, Neural Networks, Vol. 1, 1988, pp. 277284.
 [17] P. Peretto, J. J. Niez, Long term memory storage capacity of multiconnected neural networks, Biological Cybernetics, Vol. 54, No. 1, 1986, pp. 5363.
 [18] V. Gripon, C. Berrou, Sparse neural networks with large learning diversity, IEEE Trans. on Neural Networks, Vol. 22, No. 7, 2011, pp. 10871096.
 [19] J. Tropp J, S. J. Wright, Computational methods for sparse solution of linear inverse problems, Proc. IEEE, Vol. 98, No. 6, 2010, pp. 948958.
 [20] T. Richardson and R. Urbanke. Modern Coding Theory. Cambridge University Press, 2008.
 [21] E. Candès, T. Tao, Near optimal signal recovery from random projections: Universal encoding strategies?, IEEE Trans. on Information Theory, Vol. 52, No. 12, 2006, pp. 5406  5425.

[22]
A. Braunstein, R. Zecchina,
Learning by messagepassing in networks of discrete synapses
, Phys. Rev. Lett., Vol. 96, No. 3, 2006, pp. 03020110302014  [23] D. J. Amit, S. Fusi, Learning in neural networks with material synapses, Neur. Comp., Vol. 6, No. 5, 1994, pp. 957982.