1 Introduction
Biological synaptic plasticity is hypothesized to be one of the main phenomena responsible for human learning and memory. One mechanism of synaptic plasticity is inspired by the Hebbian learning principle which states that connections between two units, e.g., neurons, are strengthened when they are simultaneously activated. In artificial neural networks, implementations of Hebbian plasticity are known to learn recurring patterns of activations. The use of extensions of this rule, such as Oja’s rule
[8] or the Generalized Hebbian rule, also called Sanger’s rule [14], have permitted the development of algorithms that have proved particularly efficient at tasks such as online dimensionality reduction. Two important properties of braininspired models, namely competitive learning [13] and sparse coding [9] can be performed using Hebbian and antiHebbian learning rules. Such properties can be achieved with inhibitory connections, which extend the capabilities of such learning rules beyond simple extraction of the principal component of input data. The continuous and local update dynamics of Hebbian learning also make it suitable for learning from a continuous stream of data. Such an algorithm can take one image at a time with memory requirements that are independent of the number of samples.This study employs Hebbian/antiHebbian learning rules derived from a similarity matching costfunction [11] and applies it to perform online unsupervised learning of features from multiple image datasets. The rule proposed in [11]
is applied here for the first time to online features learning for image classification with single and multilayer architectures. The quality of the features is assessed visually and by performing classification with a linear classifier working on the learned features. The simulations show that a simple singlelayer Hebbian network can outperform more complex models such as Sparse Autoencoders (SAE) and Restricted Boltzmann machines (RBM) for image classifications tasks
[2]. When applied to multilayer architectures, the rule learns additional features. This study is the first of its kind to perform multilayer sparse dictionary learning based on the similarity matching principle developed in [11] and to apply it to image classification.2 Hebbian/antiHebbian Network Derived From a Similarity Matching CostFunction
The rule implemented by the Hebbian/antiHebbian network used in this work derives from an adaptation of Classical MultiDimensional Scaling (CMDS). CMDS is a popular embedding technique [3]. Unlike most dimensionality reduction techniques, e.g. PCA, the CMDS uses as input the matrix of similarity between inputs to generate a set of embedding coordinates. The advantage of MDS is that any kind of distance or similarity matrix can be analyzed. However, in its simplest form, CMDS produces dense features maps which are often unsuitable when considered for image classification. Therefore an adaptation of the CMDS introduced recently in [11] is used to overcome this weakness. The model implemented is a nonnegative classical multidimensional scaling that has three properties: it takes a similarity matrix as input, it produces sparse codes, and can be implemented using a new biologically plausible Hebbian model. The Hebbian/antiHebbian rule introduced in [11] is given as follows: for a set of inputs for , the concatenation of the inputs defines an input matrix . The output matrix of encodings is an element of that corresponds to a sparse overcomplete representation of the input if , or to a lowdimensional embedding of the input if . The objective function proposed by [11] is:
(1) 
where is the Frobenius norm and is the Gram matrix of the inputs which corresponds to the similarity matrix. Solving Eq.1 directly requires storing which increases with time making online learning difficult. Thus instead an online learning version of Eq.1 is expressed as:
(2) 
The components of the solution of Eq.2, found in [11] using coordinate descent, are :
(3) 
(4) 
and can be found using the recursive formulations:
(5) 
(6) 
(7) 
(green arrows) and (blue arrows) can be interpreted respectively as feedforward synaptic connections between the input and the hidden layer and lateral synaptic inhibitory connections within the hidden layer. The weight matrices are of fixed sizes and updated sequentially, which makes the model suitable for online learning. The architecture of the Hebbian/antiHebbian network is represented in Figure 1.
3 A Model to Learn Features From Images
In the new model presented in this study, the input data vectors (
) are composed of patches taken randomly from a training dataset of images. For every new input presented, the model first computes a sparse postsynaptic activity . Second, the synaptic weights are modified based on local Hebbian/antiHebbian learning rules requiring only the current pre postsynaptic neuronal activities. The model can be seen as a sparse encoding followed by a recursive updating scheme, which are both well suited to solve largescale online problems.A multiclass SVM classifies the pictures using output vectors obtained by a simple pooling of the feature vectors, , obtained for the input images from the trained network. In particular, given an input image, each neuron in the output layer produces a new image, called a feature map, which is pooled in quadrants [2] to form 4 terms of the input vector for the SVM.
3.1 Multilayer Hebbian/antiHebbian Neural Network
In the proposed approach, layers of Hebbian/antiHebbian network are stacked similarly to the Convolutional DBN [4]
, and Hierarchical Kmeans. In the multilayer Hebbian/antiHebbian network, both the weights of the first layer and second layer are continuously updated. Unlike other CNNs, the nonlinearity used in each layer is not only due to the positivity constraint, but to the combination of a rectified linear unit activation function and of interneuronal competition. This model combines the powerful architecture of convolutional neural networks using ReLU activation with interneuronal competition, while all synaptic weights are updated using online local learning rules. In between layers, a
average pooling is used to downsample the feature maps.3.2 Overcompleteness of the Representation and Multiresolution
As part of the evaluation of the new model, it is important to assess its performance with different sizes () of the hidden layers. If the number of neurons exceeds the size of the input (), the representation is called overcomplete. Overcompleteness may be beneficial, but requires increased computation, particularly for deep networks in which the number of neurons has to grow exponentially in order to keep this property. One motivation for overcompleteness is that it may allow more flexibility in matching the output structure with the input. However, not all learning algorithms can learn and take advantage of overcomplete representations. The behaviour of the algorithm is analysed in the transition between undercomplete () and overcomplete () representations.
Although the model might benefit from a large number of neurons, from a practical perspective an increase in the number of neurons is a challenge for such models due to the number of operations required in the coordinate descent. In order to limit the computational cost of training a large network while still benefiting from overcomplete representations, this study proposes to train simultaneously three singlelayer neural networks, each of them having different receptive field sizes ( and pixels). Thus, a variation of the model tested here is composed of three different networks. This architecture of parallel networks with different receptive field sizes requires less computational time and memory than a model with only one receptive field size and the same total number of neurons, because the synaptic weights only connect neurons within each neural network. This model will be called multiresolution in the following.
3.3 Parameters and Preprocessing
The architecture used here has the following tunable parameters: the receptive field size () of the neurons and the number of neurons (). These parameters are standard to CNNs but their influence on this online feedforward model needs to be investigated.
For computer vision models, understanding the influence of input preprocessing is of critical importance for both biological plausibility and practical applicability. Recent findings
[1], confirm partial decorrelation of the input signal in the retinal ganglion cells. The influence of input decorrelation by applying whitening will be investigated.4 Results
The effectiveness of the algorithm is assessed by measuring the performance on an image classification task. We acknowledge that classification accuracy is at best an implicit measure evaluating the performance of representation learning algorithms, but provides a standardised way of comparing them. In the following, single and multilayer Hebbian/antiHebbian neural networks combined with the standard multiclass SVM are trained on the CIFAR10 dataset [5].
4.1 Evaluation of the Singlelayer Model
A first experiment tested the performance of the model with and without whitening of the input data. Although there exist Hebbian networks that can perform online whitening [10]
, an offline technique based on singular value decomposition
[2] is applied in these experiments. Figure (a)a and (b)b show the features learned by the network from raw input and whitened input respectively. The features learned from raw data (Fig.(a)a) are neither sharp nor localised filters and just slightly capture edges. With whitened data (Fig.(b)b), the features are sharp, localised, and resemble Gabor filters, which are observed in the primary visual cortex [9].In a second set of experiments, the performance of the network was tested for varying receptive field sizes (Fig.(c)c(d)d) and varying network sizes (400, 500, 600, and 800 neurons). The results show that the performance peaks at a receptive field size of 7 pixels and then begins to decline. This property is common to most unsupervised learning algorithms [2], showing the difficulty of learning spatially extended features. Figures (c)c and (d)d also show that for every configuration, the performance of the algorithm is largely and uniformly improved when whitening is applied to the input.
4.2 Comparison to Stateoftheart Performances and Online Training
Various unsupervised learning algorithms have been tested on the CIFAR10 dataset. Spherical Kmeans, in particular, proved in [2] to outperform autoencoders and restricted Boltzmann machines, providing a very simple and efficient solution for dictionary learning for image classification. Thus, spherical Kmeans is used here as a benchmark to evaluate the performance of the singlelayer network. As with other unsupervised learning algorithms, increasing the number of output neurons to reach overcompleteness also improved classification performance (Fig.(a)a). Although the singlelayer neural network has a higher degree of sparsity than the Kmeans proposed in [2] (results not shown here), they appear to have the same performance in their optimal configurations (Fig.(a)a).
The classification accuracy of the network during training is shown in Fig.(b)b. The graph (Fig.(b)b) suggests that the features learned by the network over time help the system improve the classification accuracy. This is significant because it demonstrates for the first time the effectiveness of features learned with a Hebblike costfunction minimisation. It is not obvious a priori that the online optimisation of a costfunction for sparse similarity matching (Eq.2) produces features suitable for image classification.
As shown in Table 1, the multiresolution network outperforms the single resolution network and Kmeans algorithm [2], reaching 80.42% accuracy on the CIFAR10. The multiresolution model shows better performance, while requiring less computation and memory than the single resolution model. It also outperforms the single layer NOMP [6], sparse TIRBM [15], CKNGM and CKNPM [7], which are more complex models. It was outperformed only by combined models or models with three layers or more.
Algorithm  Accuracy 

SingleLayer, Single Resolution (4k neurons)  79.58 % 
SingleLayer, MultiResolution (31.6k neurons)  80.42 % 
Singlelayer Kmeans [2] (4k neurons)  79.60 % 
Multilayer Kmeans [2] (3 Layers, 4k neurons)  82.60 % 
Sparse RBM  72.40 % 
Convolutional DBN [4]  78.90 % 
Sparse TIRBM [15] (4k neurons)  80.10% 
TIOMP1/T [15] (combined transformations, 4k neurons)  82.20 % 
Single Layer NOMP [6] ( 5k neurons)  78.00 % 
MultiLayer NOMP [6] (3 Layers, 4k neurons)  82.90 % 
MultiLayer CKNGM [7]  74.84 % 
MultiLayer CKNPM [7]  78.30 % 
MultiLayer CKNCO [7] (combining CKNGM & CKNPM)  82.18 % 
4.3 Evaluation of the Multilayer Model
A single resolution, doublelayer neural network with different numbers of neurons in each layer was trained similarly to the singlelayer network in the previous section. In Table 2, and correspond respectively to the features learned by the first and second layer. The results show that alone are less discriminative than as indicated in Fig. (a)a. However, when combined () the model achieves better performance than each layer considered separately. Nevertheless, the preliminary results indicate that the sizes of the two layers unevenly affect the performance of the network. A future test may investigate if a multilayer architecture can outperform the largest shallow networks.
#Neurons Layer 2  

50  100  200  400  800  
100 Neurons Layer 1  54.9%  59.7%  64.7%  68.7%  71.45%  
+  67.2%  68.1%  69.9%  72.4%  73.81%  
200 Neurons Layer 1  55.8%  60.6%  65.3%  70.3%  72.7%  
+  69.9%  70.8%  71.9%  73.7%  75.1% 
5 Conclusion
This work proposes a multilayer neural network exploiting Hebbian/antiHebbian rules to learn features for image classification. The network is trained on the CIFAR10 image dataset prior to feeding a linear classifier. The model successfully learns online more discriminative representations of the data when the number of neurons and the number of layers increase. The overcompleteness of the representation is critical for learning relevant features. The results show that a minimum unsupervised learning time is needed to optimise the network leading to better classification accuracy. Finally, one key factor in improving image classification is the appropriate choice of the receptive field size used for training the network.
Such findings prove that neural networks can be trained to solve problems as complex as sparse dictionary learning with Hebbian learning rules, delivering competitive accuracy compared to other encoder, including deep neural networks. This makes deep Hebbian networks attractive for building largescale image classification systems. The competitive performances on the CIFAR10 suggests that this model can offer an alternative to batch trained neural networks. Ultimately, thanks to its bioinspired architecture and learning rules, it also stands as a good candidate for memristive devices [12]. Moreover, if a decaying factor is added to the proposed model that might result in an algorithm that can deal with complex datasets with temporal variations of the distributions.
References
 [1] AbbasiAsl, R., Pehlevan, C., Yu, B., Chklovskii, D.B.: Do retinal ganglion cells project natural scenes to their principal subspace and whiten them? arXiv preprint arXiv:1612.03483 (2016)
 [2] Coates, A., Lee, H., Ng, A.Y.: An analysis of singlelayer networks in unsupervised feature learning. In: AISTATS 2011. vol. 1001 (2011)
 [3] Cox, T.F., Cox, M.A.: Multidimensional scaling. CRC press (2000)

[4]
Krizhevsky, A., Hinton, G.: Convolutional deep belief networks on cifar10. Unpublished manuscript 40 (2010)
 [5] Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)

[6]
Lin, T.h., Kung, H.: Stable and efficient representation learning with nonnegativity constraints. In: Proceedings of the 31st International Conference on Machine Learning (ICML14). pp. 1323–1331 (2014)
 [7] Mairal, J., Koniusz, P., Harchaoui, Z., Schmid, C.: Convolutional kernel networks. In: Advances in Neural Information Processing Systems. pp. 2627–2635 (2014)
 [8] Oja, E.: Neural networks, principal components, and subspaces. International journal of neural systems 1(01), 61–68 (1989)
 [9] Olshausen, B.A., et al.: Emergence of simplecell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–609 (1996)
 [10] Pehlevan, C., Chklovskii, D.: A normative theory of adaptive dimensionality reduction in neural networks. In: Advances in Neural Information Processing Systems. pp. 2269–2277 (2015)
 [11] Pehlevan, C., Chklovskii, D.B.: A Hebbian/antiHebbian network derived from online nonnegative matrix factorization can cluster and discover sparse features. In: 2014 48th Asilomar Conference on Signals, Systems and Computers. pp. 769–775. IEEE (2014)
 [12] Poikonen, J.H., Laiho, M.: Online linear subspace learning in an analog array computing architecture. CNNA 2016 (2016)
 [13] Rumelhart, D.E., Zipser, D.: Feature discovery by competitive learning. Cognitive science 9(1), 75–112 (1985)
 [14] Sanger, T.D.: Optimal unsupervised learning in a singlelayer linear feedforward neural network. Neural networks 2(6), 459–473 (1989)
 [15] Sohn, K., Lee, H.: Learning invariant representations with local transformations. In: Proceedings of the 29th International Conference on Machine Learning (ICML12). pp. 1311–1318 (2012)