1 Introduction
Machine learning (ML) methods in general, and neural nets (NNs) with backprop in particular, have posted tremendous successes in recent years [1, 2]. However, these methods, and NNs in particular, typically require large amounts of training data to attain high performance. This creates bottlenecks to deployment, and constrains the types of problems that can be addressed [3]
. Thus it is desirable to improve ML methods’ ability to learn from small training sets. This limited-data constraint is typical of a large and important group of ML targets, including tasks that use medical, scientific, or field-collected data, and also artificial intelligence efforts focused on rapid learning.
In this work, we seek to improve the input feature space which an arbitrary ML method will use for training. In particular, we propose an architecture that can be bolted onto the front end of an ML method, and which automatically generates, from the existing feature set, a new set of strongly class-separating features to supplement (or even replace) the existing feature set.
Biological neural nets (BNNs) are able to learn rapidly, even from just one or two samples. On the assumption that rapid learning requires effective ways to separate classes given limited data, we may look to BNNs for effective feature-generators [4]. One of the simplest BNNs that can learn is the insect olfactory network [5], containing the Antennal Lobe (AL) [6] and Mushroom Body(MB) [7], which can learn a new odor given only five exposures. This simple but effective feedforward network is built around three key elements that are ubiquitous in BNN designs: Competitive inhibition, high-dimensional sparse layers, and Hebbian update mechanisms. Specifically, the AL-MB network contains: (i) A pre-processing layer (the AL) built of units that competitively inhibit each other [8]; (ii) Projection, with sparse connectivity, up into and then down out of a sparsely-firing high-dimensional layer (the MB) [9, 10], where the dimension shift is typically 10x to 100x [11] ; and (iii) Hebbian updates of plastic synaptic connections to train the system. Roughly speaking, the Hebbian rule is “fire together, wire together”, i.e. updates are proportional to the product of firing rates of the sending and receiving neurons, [12, 13]. Synaptic connections are largely random [14]. A schematic is given in Fig 1.
MothNet is a computational model of the Manduca sexta moth AL-MB [15] that demonstrated very rapid learning of vectorized MNIST digits, with performance superior to standard ML methods in the 1 to 10 training sample regime [16]. That is, it was able to encode substantial class-relevant information from very few samples. But MothNet also appears to have limited capacity: Accuracy leveled off at about 80%, consistent with related results in [17] and the biological fact that a moth can only learn about 8 odors.
In this work we examine whether the MothNet architecture can usefully serve, not as a classifier itself, but rather as the first stage of a multi-stage network.
Our goal is to harness its class-information encoding abilities to generate strong features that can improve performance of a main downstream classifier.
In particular, we test the following hypotheses111See Acknowledgements:
1. The AL-MB architecture has an intrinsic clustering ability, due specifically of the competitive inhibition layer and/or the sparse high-dimensional layer.
That is, these structures have an inductive bias towards separating classes (just as convolutional neural nets have an inductive bias towards distinguishing visual data).
2. Despite its limitations, the trained AL-MB is an effective feature generator:
Its Readout neurons contain class-separating information that will boost an arbitrary ML algorithm’s ability to classify test samples.
We test these hypotheses by combining MothNet with a downstream ML module, so that the Readouts of the trained AL-MB model feed into the ML module as additional features (from the ML perspective, the AL-MB acts as an automatic feature generator; from the biological perspective, the ML module stands in for the downstream processing in more complex BNNs). Our Test Case is a non-spatially-correlated, 85-feature, 10-class task derived from the downsampled, vectorized MNIST dataset (hereafter “MNIST” to emphasize its vectorized, non-spatial, structure). We restrict training set size to samples per class, so that the ML modules do not attain full accuracy on the task using the 85 features (pixels) alone.
We find evidence that these hypotheses are correct: The high-dimensional sparse layer and (to lesser extent) the competitive inhibition layer, in combination with a Hebbian update rule, significantly improved the abilities of ML methods (NN, SVM, and Nearest Neighbors) to classify the test set in all cases, and especially when training samples per class. That is, the input pixel features contain class-separating information that is not being extracted by the ML methods alone. The MothNet module encodes this information in a form that is accessible to the ML methods. If the learning performance of BNNs is any guide, these layers are simple, general-purpose feature generators that can potentially improve performance of ML methods in tasks where training data is limited.
In addition, the cyborgs significantly out-performed models that used features generated by PCA (Principal Components Analysis), PLS (Projection to Latent Structures), and NNs. They also out-performed NNs that were pre-trained on the Omniglot dataset
[18] to initialize weights. These results indicate that the insect-derived network generated significantly stronger features than these other feature generator methods.2 Experimental setup
To generate MNIST, we downsampled and preprocessed the MNIST dataset [19, 20] to give samples with 85 pixels-as-features stripped of spatial information, as in [16]. We note that MNIST is not the “MNIST dataset" considered in its usual context of a task with spatial structure and large pools of training data. Rather, here the MNIST data served as raw material for a generic non-spatial Test Case. MNIST had the advantage that our baseline ML methods (Nearest Neighbors, SVM, and Neural Net) did not attain full accuracy at low N. So it acted as a good test of whether the AL-MB can improve classification by ML methods.
Full wiring details of the AL-MB model are given in [15]. Full Matlab code for MothNet simulations and these cyborg experiments, including comparison methods, can be found at https://github.com/charlesDelahunt/PuttingABugInML
Competitive inhibition in the moth AL works roughly as follows. Each neural unit in the AL receives input from one feature, and has two outputs: An inhibitory signal to other neural units in the AL, and an excitatory signal to the MB. Thus, each feature tries to dampen other features’ presence in the sample’s output signature from the AL.
Sparsity in the MB is of two types: First, the projections from the AL to the MB are non-dense (15% non-zero). Second, MB neurons fire sparsely, in the sense that only the strongest 5% to 10% of the total population are allowed to fire (through a mechanism of global inhibition).
All weights are non-negative, and are initialized randomly. Weight updates affect only MBReadout connections (the AL is not plastic, and ALMB learning rates are slow). Hebbian updates occur according to: (if ), and (if ).
Nearest-Neighbors and SVM used Matlab built-in functions as in [16]
. The Neural Nets used Matlab’s NN toolbox, with one layer (more layers did not help) and as many hidden units as features (i.e. 85 or 95; more units did not help). MothNet instances were generated randomly from templates. All hyperparameter details can be found in the online codebase. We note that our goal was to see if the MothNet-generated features improved on the baseline accuracy of the ML methods, whatever that baseline was, and that we deliberately varied the baseline by restricting training data. So the exact ML method hyperparameters were not central, as long as they were reasonable.
We ran four sets of experiments:
1. Cyborg vs baseline ML methods experiments
The main experiments were structured as follows:
1. A random set of N training samples per class were drawn from MNIST.
2. The ML methods trained on these samples, to provide a baseline (switch in Fig 2).
3. MothNet was trained on these same samples, using time-evolved stochastic differential equation simulations and Hebbian updates as in [16] (switch in Fig 2).
4. The ML methods were then retrained from scratch, with the Readout Neuron outputs from the trained MothNet instance fed in as additional features (switches in Fig 2).
These were the “insect cyborgs”.
5. Trained ML accuracy of the baselines and cyborgs were compared to assess the value of the AL-MB as a feature generator.
These experiments were repeated 13 times per , for each ML method.
2. Other feature generators vs baseline ML methods
To compare the effectiveness of MothNet fetaures to those generated by conventional ML methods, we ran experiments structured as the MothNet experiments above, but with the MothNet feature module replaced by one of the following options:
1. PCA (Principal Components Analysis) applied to the MNIST training samples.
The new features were the projections onto each of the top 10 modes.
2. PLS (Projection to Latent Structures) applied to the MNIST training samples.
The new features were the projections onto each of the top 10 modes.
We expected PLS to do better than PCA because PLS takes class information into account.
3. NN pre-trained on the MNIST training samples.
The new features were the (logs of the) 10 output units.
This feature generator was used as a front end to SVM and Nearest Neighbors only.
4. NN with weights first modulated by training on the vectorized Omniglot dataset, then trained on the
MNIST training samples. (This last was a transfer learning method, not a feature generator.)
3. Relative importance of AL vs MB experiments
There are two key structural components in the AL-MB, the competitive inhibition layer (the AL) and projection into a high-dimensional sparse layer (the MB) with Hebbian synaptic updates. These two structures can be deployed separately or together. In particular, the (trainable) high-dimensional sparse layer can be deployed with or without the competitive inhibition layer. In order to assess the relative value of the competitive inhibition layer, mutant MothNets were generated from templates that had a “pass-through” AL, i.e. with uniform weights and no lateral inhibition (switch in Fig 2). Steps 1 to 4 above were followed using these mutant MothNets (so Step 4 corresponded to switches in Fig 2). The results from step (4) were then compared to those of full cyborgs.
4. Cyborg experiments on Omniglot
These experiments were set up as in (1), but used the Omniglot dataset, a collection of hand-drawn characters with 136 classes with 20 samples each. For each run, 10 Omniglot classes were randomly chosen. Thumbnails were subsampled down to 200 pixels and vectorized. max’ed out at 15, to ensure at least 5 test samples per class.
3 Results
The ML baseline methods (no added features) started at 10% to 30% accuracy for = 1 sample per class, and rose to 80% to 88% accuracy (depending on method) at N = 100, where we stopped our sweep. This baseline accuracy is marked by the lower colored circles in Fig 3.
3.1 Gains due to MothNet features (i.e cyborgs)
MothNet-ML cyborgs, i.e. networks in which the 10 Readouts of the trained MothNet were fed into the ML module as 10 additional features, showed consistently improved Test set performance versus their ML baselines, for all ML methods at all N, except for SVM at . Cyborg accuracy is marked by the upper colored circles in Fig 3, and the raw gains in accuracy are marked by thick vertical bars.
Raw increases in accuracy due to cyborgs were fairly stable for all ML models. This led to two trends in terms of relative changes. Relative gains, i.e. as percentage of baseline, were highest at low training samples per class: Average relative gains were 10% to 33% at , and 6% to 10% for (see Fig 4 A). Conversely, the relative reduction in Test set error, as a percentage of baseline error, increased substantially as baseline accuracy grew (see Fig 4 B). Thus, MothNet cyborgs reduced Test set error by over 50% on the most accurate models, such as NNs with 80% baseline accuracy. Of the ML methods, the Neural Net cyborgs had the best performance and also showed the highest percentage gains.
Gains were significant in almost all cases where . Table 1 gives the -values of the gains due to MothNet features, for each and ML method. -values are calculated from the Fisher linear discriminant , where are the means and std devs of the ML baseline accuracy distribution and the cyborg accuracy distribution. These values are shown in the inset of Fig 3. Lower significance at was often due to large std dev (relative to mean) in ML baseline accuracies (these std devs are seen as dots at the bottom of Fig 3).
ML method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 70 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NearNeigh | 58 | 42 | 20 | 2 | 4 | 4 | 2 | 1 | 9 | 16 | 7 | 0 | 3 |
SVM | 100 | 96 | 31 | 39 | 18 | 4 | 6 | 16 | 4 | 13 | 8 | 0 | 4 |
Neural Net | 89 | 76 | 48 | 7 | 3 | 1 | 1 | 1 | 0 | 0 | 1 | 8 | 0 |
Remarkably, adding a MothNet front-end improved ML accuracy even in cases where the ML module baseline already exceeded the accuracy ceiling of MothNet ( 75% [16]), at = 15 to 100 samples per class. This implies that the Readouts of MothNet contain valuable clustering information which ML methods are able to leverage more effectively than MothNet itself does. Also remarkably, the highest gains by the NN-cyborg at came from using only the MothNet’s Readouts as features and ignoring the original feature pixels, an indication of the strong clustering abilities of the AL-MB architecture.
3.2 Comparison to other feature generators
To compare with other methods, we ran the feature generation framework using PCA (Principal Components Analysis, projections onto 10 modes), PLS (Projection to Latent Structures, projection onto top 10 modes), and NN (logs of the 10 output units). In each case, the method used the training samples to generate 10 new features. Each method was run using a Matlab built-in function. The Matlab code can be found at https://github.com/charlesDelahunt/PuttingABugInML. For NN as baseline, we did not use NN-generated features, but instead initialized the NN network weights by pre-training on the Omniglot dataset, then trained on the MNIST data as usual.
With few exceptions, MothNet features were much more effective than these other methods. Tables 2, 3, and 4 give, for each baseline ML classifier, the relative increase in mean accuracy due to the various feature generators (or to pre-training). “MothNet” refers to the cyborgs. “NA” appears in the tables for PLS and SVM at = 1, because PLS and SVM required at least 2 training samples per class to run. 13 runs per data point.
FG method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 70 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PCA | -67 | 0.7 | 0.6 | 1.4 | 1.2 | 1.2 | 1.5 | 1 | 1.3 | 1.4 | 0.0 | 0.9 | 1.5 |
PLS | NA | 1.4 | 0.6 | 1.6 | 2.1 | 1.5 | 1.1 | 1.9 | 1.2 | 1.1 | 0.4 | 0.9 | -0.1 |
Neural Net | -1.4 | 1.3 | 2.1 | 1.5 | 2.6 | 2.1 | 4.4 | 3.2 | 4.7 | 3.4 | 3.9 | 3.9 | 3.7 |
MothNet | 13.6 | 13.9 | 14.2 | 16.9 | 11.5 | 10 | 9.6 | 10 | 5.6 | 5.1 | 6.6 | 6.1 | 4.7 |
FG method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 70 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PCA | NA | 12.2 | -0.4 | -1.4 | 0.3 | 0.2 | 0.2 | -0.9 | 0.3 | -0.5 | -0.8 | -1.4 | -0.5 |
PLS | NA | -14.1 | 4.2 | 3.5 | 1.5 | -0.2 | -2.6 | -4 | -5.4 | -5.6 | -5.3 | -5.1 | -5.5 |
Neural Net | NA | 6.8 | -1.3 | -3.7 | -2 | -0.9 | 1.7 | 0.5 | 4.3 | 4.9 | 4.1 | 4.9 | 4.9 |
MothNet | NA | 0.8 | 6.5 | 10.7 | 10.8 | 10 | 7.8 | 6.3 | 7.2 | 5.8 | 6.9 | 8.3 | 6.2 |
FG method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 70 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PCA | -57 | 0.2 | -0.8 | 1.2 | 2.6 | 1.7 | 0.3 | 1.3 | -0.3 | 0 | 0.2 | 0.3 | 0.2 |
PLS | NA | 0.2 | 5.9 | 1.0 | 1.5 | 2.8 | -0.2 | 1.2 | 0.3 | 1.2 | 1.6 | 1.5 | 1.9 |
preTrainOmni | 15 | 4.2 | 5.8 | -3.1 | -1.1 | 0.2 | 1.3 | 1.5 | -3.4 | -2.5 | -0.4 | -4.7 | -1.1 |
MothNet | 4 | 17 | 15 | 13.1 | 13 | 11.3 | 10.8 | 9.0 | 9.7 | 8.4 | 8.5 | 7.1 | 6.4 |
3.3 Relative contribution of the AL and MB layers
MothNet has two key structures, a competitive inhibition layer (the AL) and a high-dimensional, sparse layer (the MB). Cyborgs built from MothNets with a “pass-through” (identity) AL still posted significant improvements in accuracy over baseline ML methods. The gains of cyborgs with pass-through ALs were generally between 60% and 100% of the gains posted by cyborgs with normal ALs (see Table 5), suggesting that the high-dimensional, trainable layer (the MB) was of primary importance. However, the competitive inhibition of the AL layer clearly added value in terms of generating strong features, contributing up to 40% of the total gain. NNs benefitted most from the competitive inhibition layer.
In terms of overall effect on downstream ML modules, a functioning AL enabled slightly better, more reliable gains: Averaged over all ML methods and all numbers of training samples
, a functioning AL gave mean raw increase in accuracy = 5.6%, standard error
= 0.38; while a pass-through AL gave mean raw increase in accuracy = 5.0%, standard error = 0.43.ML method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 | 20 | 30 | 40 | 50 | 70 | 100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NearNeigh | 82 | 100 | 91 | 76 | 100 | 100 | 58 | 74 | 88 | 64 | 100 | 100 | 65 |
SVM | NA | NA | 100 | 87 | 79 | 97 | 75 | 94 | 98 | 82 | 100 | 76 | 15 |
Neural Net | 100 | 60 | 62 | 67 | 75 | 91 | 100 | 93 | 100 | 100 | 100 | 82 | 65 |
3.4 Cyborgs on the Omniglot dataset
We also ran the cyborg experiments on a downsampled, vectorized Omniglot dataset. Experimental set-up was the same as for MNIST, except that vectorized thumbnails had 200 pixels and 10 Omniglot classes were selected at random for each run. Table 6 below gives relative percentage increases in accuracy due to MothNet. MothNet-generated features resulted in high relative gains in accuracy (up to 27%). However, due to low s (), the std dev of baseline ML accuracy was always high (, roughly twice as large as for MNIST). Thus the gains were not significant (in a -value sense), except for SVMs.
ML method | N = 1 | 2 | 3 | 5 | 7 | 10 | 15 |
---|---|---|---|---|---|---|---|
NearNeigh | 7.5 | 12.3 | 9.5 | 14.8 | 7.3 | 1.6 | -2.6 |
SVM | NA | 4.1 | 22.7 | 27.7 | 26.1 | 19.1 | 12 |
Neural Net | 0.7 | 8.2 | 10.8 | 13.3 | 10.9 | 11.8 | 2.3 |
4 Discussion
Strong, automatically-generated feature sets enhance the power of ML algorithms to extract structure from data. They are always desirable tools, but especially so when training data is limited. Many ML targets, such as tasks for which data must be manually collected in medical, scientific, or field settings, do not have the luxury of vast amounts of (eg internet-generated) training data, so they must extract maximum value from the limited available data. This large class of ML targets also includes Artificial Intelligence systems that seek adaptive and rapid learning skills. In this context, biological structures and mechanisms are potentially useful tools, given that BNNs excel at rapid learning.
Our experiments deployed an architecture based on a simple BNN, the moth olfactory network, to generate features to support ML classifiers. The three key elements of this network are novel in the context of engineered NNs, but are endemic in BNNs of all complexity levels: (i) a competitive inhibition layer; (ii) a high-dimensional sparse layer; and (iii) a Hebbian plasticity mechanism for weight updates in training. Our experiments indicate that these structures, as combined in the MothNet model of the insect olfactory network, create a highly effective feature generator whose Readout Neurons contain strong class-specific information.
In particular, using MothNet as a feature generator upstream of standard ML methods significantly and consistently improved their learning abilities on MNIST. That is, some class-relevant information in the raw feature distributions was not extracted by the ML methods alone, but pre-processing by MothNet made that information accessible. Relative increase in accuracy averaged 10% to 33% for and 6% to 10% for , while the relative reduction in Test set error exceeded 50% for NN models with higher () baseline accuracy.
MothNet features were much more useful than features generated by standard methods such as PCA, PLS, or NNs, and also more useful than pre-training NNs on similar data. We hypothesize that the “orthogonality” (loosely used) of the BNN structures and mechanisms in MothNet, relative to the baseline ML methods and to methods such as PCA, allowed MothNet to extract otherwise inaccessible clustering information.
Not only can these structures be readily prepended as feature generators to arbitrary ML modules, as we did here, but they can perhaps also be inserted as layers into deep NNs. Indeed, this is what BNNs appear to do.
These gains can also be viewed as savings on training data: For example, with 30 training samples per class, a MothNet+NN cyborg attains the same Test accuracy (79%) as a NN baseline attains with 100 training samples per class, a savings of over 3x in training data. These savings in training data can be seen in Fig 3 by drawing horizontal lines between cyborg and baseline accuracies. Savings consistently ranged from 1.5x to 3x. If these accuracy gains and commensurate savings held for higher numbers of training samples in more difficult tasks, the savings in data requirements would be substantial, an important benefit for many ML use-cases.
Comparison of the Mushroom Body to sparse autoencoders
The insect MB is a biological means to project codes into a sparse, high-dimensional space. It naturally brings to mind sparse autoencoders (SA)
[21, 22]. However, there are several differences, beyond the fact that MBs are not trying to match the identity function.First, in SAs the goal is typically to detect lower-dimensional structures that carry the input data. Thus the sparse layers of SAs have fewer active neurons than the nominal dimension of the input. In the MB, the number of neurons increases manyfold (eg 30x), so that even with enforced sparsity the number of active MB neurons is much greater than the input dimension: In MothNet there are approximately 150 - 200 active neurons in the MB vs 85 input features. The functional effects are also different: In MNIST experiments in [22], a sparse layer with 100 active neurons (vs 784 input pixels, i.e. ratio 1:8) captured only very local features and was not effective for feeding into shallow neural nets (though it was useful for deeper nets). In our experiments, a ratio of 2:1 (i.e. 16x that of the SA) generated features that were very effective as input to a shallow net.
Second, there is no off-line training or pre-tuning step, as used in some SAs, though of course Mother Nature has been tinkering with this system for a long time. Third, SAs typically (to our knowledge) require large amounts of training data (eg 5000 per class in [22]), while the MB needs as few as one training sample per class to bake in structure that improves classification. Fourth, the updates in SAs are by backprop, while those in MBs are Hebbian. While the ramifications of this difference are unclear, we suspect that the two methods yield distinct results, and that the dissimilarity of the optimizers (MothNet vs ML) was an asset in our experiments.
The MB shares with Reservoir Networks [23] a (non-linear) projection into a high-dimensional space and (linear) projection out to a Readout layer. A major difference is that in the MB neurons are not recurrently connected, while in a Reservoir Network they are. SVMs also use projection into high-dimensional spaces, and it is perhaps due to this commonality that cyborgs were less beneficial to SVMs than to NNs and Nearest Neighbors.
Role of the competitive inhibition layer
The competitive inhibition layer may enhance classification by creating several attractor basins for inputs, each focused according to which subsets of features present most strongly, which in turn depends on the classes. This might serve to push otherwise similar samples (of different classes) away from each other, towards their respective class attractors, increasing the effective distance between the samples. Thus the outputs of the AL, after this competitive inhibition, may have better separation by class.
However, in our experiments on this particular dataset, while the competitive inhibition layer (AL) did benefit the downstream ML classifier, it was less important than the sparse layer (MB). We see two reasons why this might so. First, the AL has other jobs to do in the insect olfactory network, such as gain control and corraling inputs from the noisy antennae [24, 25]. Perhaps these are the AL’s primary tasks, and separating input signals is a secondary task. Second, the MothNet model was transferred to the MNIST task from a model developed to study odor learning that was calibrated to in vivo moth data [15]. Perhaps the AL has a larger role in the natural, odor-processing setting, and its transfer to the MNIST task modified the overall balance of the AL-MB system and reduced the importance of the AL relative to the MB. That said, the best results and also most consistent improvements were posted by full cyborgs, i.e. those generating features using the full AL-MB network.
Role of Hebbian updates
We suspect that much of the success of BNNs (and MothNet) is due to the Hebbian update mechanism, which appears to be quite distinct from typical ML weight update methods. It has no objective function or output-based loss that is pushed back through the network as in backprop or agent-based reinforcement learning (there is no “agent” in the MothNet system). Hebbian weight updates, either growth or decay, occur on a local “use it or lose it” basis.
We also suspect that much of the success of the cyborgs was due to the stacking of two distinct update methods, e.g. Hebbian and backprop. In our experience, stacking dissimilar ML methods is more productive than stacking similar methods. This may be one reason MothNet cyborgs delivered improvement to ML accuracy even in cases where the baseline ML accuracy already exceeded the MothNet’s top performance, enabling up to 40% reductions in Test set error: Each system brings unique structure-extracting skills to the data. It may also explain why projecting into the high-dimensional MB is not redundant when paired with an SVM, which also projects into a high-dimensional space: The two methods of learning the projections are different.
Limitations
A practical limitation of this method, in its current form, is that MothNet trains on and evaluates samples by the time-evolution of systems of coupled differential equations. This is time-consuming (
4 seconds per sample on a laptop), and increases non-linearly for more complex datasets with high-dimensional feature spaces, since these would likely require larger networks with more neurons per layer and thus more equations to evolve. In addition, the time-evolution system does not conveniently mesh with other ML platforms such as Tensorflow. Thus, a key future project is to develop different methods of running MothNet-like architectures that bypass the computations of time evolution and mesh with other platforms, yet functionally preserve a Hebbian update mechanism.
Acknowledgments
Our thanks to Blake Richards, who articulated these hypotheses and suggested these experiments.
CBD gratefully acknowledges partial funding from the Swartz Foundation.
References
- [1] Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks. 2015;61(Supplement C):85 – 117. Available from: http://www.sciencedirect.com/science/article/pii/S0893608014002135.
- [2] Goodfellow I, Bengio Y, Courville A, Bengio Y. Deep learning. MIT press Cambridge; 2016.
- [3] Koller D, Bengio Y. A fireside chat with Daphne Koller. ICLR. 2018;Available from: https://www.youtube.com/watch?v=N4mdV1CIpvI.
- [4] Srinivasan S, Greenspan RJ, Stevens CF, Grover D. Deep(er) learning. Journal of Neuroscience. 2018;Available from: http://www.jneurosci.org/content/early/2018/07/13/JNEUROSCI.0153-18.2018.
- [5] Riffell JA, Lei H, Abrell L, Hildebrand JG. Neural Basis of a Pollinator’s Buffet: Olfactory Specialization and Learning in Manduca sexta. Science. 2012;Available from: http://science.sciencemag.org/content/early/2012/12/05/science.1225483.
- [6] Wilson RI. Neural and behavioral mechanisms of olfactory perception. Current Opinion in Neurobiology. 2008;18(4):408 – 412. Sensory systems. Available from: http://www.sciencedirect.com/science/article/pii/S0959438808000883.
- [7] Campbell RAA, Turner GC. The mushroom body. Current Biology. 2010;20(1):R11 – R12. Available from: http://www.sciencedirect.com/science/article/pii/S096098220901851X.
- [8] Bhandawat V, Olsen SR, Gouwens NW, Schlief ML, Wilson RI. Sensory processing in the Drosophila antennal lobe increases reliability and separability of ensemble odor representations. Nature Neuroscience. 2007;10:1474–1482.
- [9] Perisse E, Burke C, Huetteroth W, Waddell S. Shocking Revelations and Saccharin Sweetness in the Study of Drosophila Olfactory Memory. Curr Biol. 2013 Sep;23(17):R752–R763. S0960-9822(13)00921-4[PII], 24028959[pmid]. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3770896/.
- [10] Honegger KS, Campbell RAA, Turner GC. Cellular-Resolution Population Imaging Reveals Robust Sparse Coding in the Drosophila Mushroom Body. Journal of Neuroscience. 2011;31(33):11772–11785. Available from: http://www.jneurosci.org/content/31/33/11772.
- [11] Ganguli S, Sompolinsky H. Compressed Sensing, Sparsity, and Dimensionality in Neuronal Information Processing and Data Analysis. Annual Review of Neuroscience. 2012;35(1):485–508. PMID: 22483042. Available from: https://doi.org/10.1146/annurev-neuro-062111-150410.
- [12] Hebb DO. The organization of behavior : a neuropsychological theory. Wiley New York; 1949.
- [13] Roelfsema PR, Holtmaat A. Control of synaptic plasticity in deep cortical networks. Nature Reviews Neuroscience. 2018 Feb;19:166 EP –. Review Article. Available from: http://dx.doi.org/10.1038/nrn.2018.6.
- [14] Caron S, Ruta V, Abbott L, Axel R. Random convergence of olfactory inputs in the Drosophila mushroom body. Nature. 2013;497(5):113–7.
- [15] Delahunt CB, Riffell JA, Kutz JN. Biological Mechanisms for Learning: A Computational Model of Olfactory Learning in the Manduca sexta Moth, With Applications to Neural Nets. Frontiers in Computational Neuroscience. 2018;12:102. Available from: https://www.frontiersin.org/article/10.3389/fncom.2018.00102.
- [16] Delahunt CB, Kutz JN. Putting a bug in ML: The moth olfactory network learns to read MNIST. arXiv. 2018;In review. Available from: https://arxiv.org/abs/1802.05405.
- [17] Huerta R, Nowotny T. Fast and Robust Learning by Reinforcement Signals: Explorations in the Insect Brain. Neural Computation. 2009 Aug;21(8):2123–2151.
- [18] Lake BM, Salakhutdinov R, Tenenbaum JB. Human-level concept learning through probabilistic program induction. Science. 2015;350(6266):1332–1338.
- [19] LeCun Y, Cortes C. MNIST handwritten digit database. Website. 2010;Available from: http://yann.lecun.com/exdb/mnist/ [cited 2016-01-14 14:24:11].
- [20] Murphy KP. Machine Learning: A Probabilistic Perspective. The MIT Press; 2012.
- [21] Ng A. Sparse Autoencoder. Webpage. 2010;Available from: https://web.stanford.edu/class/archive/cs/cs294a/cs294a.1104/sparseAutoencoder.pdf.
- [22] Makhzani A, Frey BJ. k-Sparse Autoencoders. CoRR. 2013;abs/1312.5663. Available from: http://arxiv.org/abs/1312.5663.
- [23] Schrauwen B, Verstraeten D, Campenhout JV. An overview of reservoir computing: theory, applications and implementations. In: Proceedings of the 15th European Symposium on Artificial Neural Networks; 2007. p. 471–482.
- [24] Martin JP, Beyerlein A, Dacks AM, Reisenman CE, Riffell JA, Lei H, et al. The neurobiology of insect olfaction: Sensory processing in a comparative context. Progress in Neurobiology. 2011;95(3):427 – 447. Available from: http://www.sciencedirect.com/science/article/pii/S0301008211001742.
- [25] Olsen SR, Bhandawat V, Wilson RI. Divisive normalization in olfactory population codes. Neuron. 2010 Apr;66(2):287–299. 20435004[pmid]. Available from: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2866644/.
Comments
There are no comments yet.