Machine Learning (ML) networks are frequently tasked with the recognition and classification of objects or features. Traditionally, the selection of a network architecture for classification purposes has been based on an abstracted view of how mammalian brains are structured [goodfellow16]. These ML networks consist of nonlinear active units (‘neurons’), arranged in layers, and with rules on how active units in one internal (hidden) layer are influenced by the activity of the units in another. Matching inputs and associated outputs—constrained by layer-to-layer rules—provides a metric for the success of the network’s selection capability. While ‘large enough’ networks of this sort have shown substantial success in generalization (prediction) [gulshan16], the customary architecture is not reflective of how neurobiological networks perform the same tasks.
Huerta and Nowotny [huerta09] were the first to suggest that the neural circuits of the insect olfactory system might serve as a model for classification tasks in ML. Recognizing that biological olfaction networks rapidly and accurately identify chemical constituents in odors, they argued that an abstraction of the olfactory network could perform the same classification task on common data sets in the ML literature [Huerta2013]. We provide a concrete instantiation of these suggestions and demonstrate how they may be effectively used in an ML context—i.e., the identification of objects in a class and separation of mixtures of these objects.
Ii Insect Olfaction
The structure of the “front end” of the insect olfactory system is sketched in Fig (1). Even in this set of coupled functional networks, not all biophysical details are known. Our construction for ML purposes does not strive to reproduce details of observed biological networks, only to utilize their essential functionality.
In this paper, our abstraction focuses on the functional roles of the antennal lobe (AL) and mushroom body (MB). When stimulated by distinct current inputs, the AL produces trajectories in the network phase space following distinct heteroclinic loops among unstable points or regions in the model state space. These trajectories move from unstable regions to other unstable regions, and continue to do so as long as the stimulus persists. When the stimulus is ended, the trajectory retreats to a stable fixed point region where it responds to environmental noise [Laurent1996, Laurent1999, LaurentWehr1996] . These observed properties of AL capability led to a suggestion of an AL network structure [rabinovich2000, rabinovich2001, rabinovich2003] called winnerless competition networks (WLC) [rabinovich2001].
This idea about an AL as a WLC network was examined in experiments on the olfactory networks of locusts by Mazor and Laurent [mazor05]. They found that the AL responses to stimulating odors of varying duration were described by: (1) an “on-transient” when the stimulus is first received, (2) an “off-transient” as the neural activity returned to its stable baseline, and (3) movement around a fixed point in AL neuron phase space. Importantly, they noted that “optimal stimulus separation occurred during the transients…”, suggesting that the biological AL acts as both a WLC network and a longer time scale network exhibiting odor specific fixed points.
Once the network architecture is established, the precise trajectory in the system phase space is determined by the specific stimulus. As the WLC network is composed of multiple regions of nonlinear unstable behavior, the phase space trajectory is seen to be quite sensitive to the selected stimulus. This sensitivity suggests that a WLC network could distinguish among many ‘nearby’ odors.
Using the extensively studied insect olfactory network in an ML classification setting enables us to utilize it as a guide—parallel to the function of the biology. Here we explore, using components from biological olfaction, developing an ML classification structure.
In this paper we proceed with a reminder of how WLC networks can be built, and explore the properties of two such networks; each has the same basic architecture, but they differ in that one uses FitzHugh Nagumo neurons (FHN) [FitzHugh1961, Nagumo1962] at its nodes, while the other utilizes simple Hodgkin-Huxley (HH) [jwu, willshaw] neurons with Na, K, and Leak channels (NaKL neurons). Here we discuss in detail the WLC network constructed using FHN nodes. The results using biologically more realistic HH nodes are essentially the same.
This WLC architecture is then used to represent an AL. We build and report on the performance of a WLC network containing neurons in its ability to classify distinct ‘odors’ represented by currents presented to this AL.
Insects also employ a second stage of their classification network, the MB (Fig 1), possibly because it allows the specification and separation of phase space regions in a more precise manner than utilizing the AL alone. The MB is represented here by a support vector machine (SVM) [convex, cs229, winstonsvm] to produce a classifier operating on the AL projection neuron outputs. Huerta [Huerta2013] suggests that the MB does indeed act as an SVM in the insect brain.
Iii Building a WLC Network for Classification
We analyzed an AL, built as a WLC network, constructed using the Brian development package [Brian] and selecting FitzHugh Nagumo neurons [FitzHugh1961, Nagumo1962] at the nodes of the AL network.
The FHN neurons satisfy
for each presynaptic/postsynaptic pair in the network. The parameter values are , , , , and . . is the Heaviside step function. is the critical parameter in the WLC network; it must be scaled with the size of the network and the number of connections between neurons. We have
neurons, which is approximately the number of neurons in the locust AL, each connected randomly (Prob = 50 %) with simple inhibitory synapses Eq.(2).
All steps above were repeated using HH neurons [jwu, willshaw] to emphasize that it is the connectivity that dictates the desired network behavior. We do not report on the results of an AL with HH nodes here.
The ‘functional olfactory’ classifier is a WLC network whose output is comprised of N time series observed at times for each distinct stimulus. In the insect the original stimulus is chemical, and these are transformed via the ORNs into a specific set of currents driving the neurons in the AL.
The stimuli are comprised of currents represented in N dimensional neuron space by -vectors
is a constant, and
is ‘noise’ drawn from a uniform distribution in the range.
The numbers are selected by drawing N values from a random distribution such that 1/3 of the values are set to and the rest set to 0. This gives us N/3 DC currents off before and off after and constant with amplitude for . The currents in Eq.(3) have added noise where is a scalar and is a uniform distribution. When , we call this the baseline odor, and we will be asking over what range of
we can distinguish odors as we add various levels of noise to the baseline stimuli. The signal to noise ratio is
iii.3 Low Dimensional Projections of the 1000 Neuron WLC Network: PCA
We have no way to visualize the distinct phase space trajectories in dimensions. For this purpose only we project the neuron voltage outputs
into low dimensional spaces using Principal Component Analysis (PCA)[Press-Flannery-2007-NumRecipes]. Projection to low (three) dimensions produces compelling evidence that the AL does indeed yield distinguishable trajectories for distinct odors. Fig.(2) shows that for five different baseline odors, the separation in space and time is apparent in the three dimensions selected by PCA. This projection is performed for visualization purposes alone. The more accurate separation of input signals is achieved in 1000 dimensional space.
We have chosen the three axes onto which we project the 1000 dimensional network output voltages by building a time series of matrix dimension , resulting from concatenating the five individual odor time series. PCA is performed and this data is projected onto a space defined by the three largest principal components. This protocol mixes the five odors together and decorrelates their contribution to the overall set of classes of odors.
As we noted, the plots we make in three dimensions are not quantitative but ‘suggestive,’ as there is no a priori reason the results of the network voltage output signals should be separated, identifiable or classifiable in such low dimensions. Our strategy is to use the power of class separation of a SVM operating in higher dimensions to perform the precise odor identification we require.
iii.4 Support Vector Machine
The separation in phase space of the activity of the network for distinct inputs apparent in Fig.(2) lends itself to the idea of using an SVM. This idea is inspired by the operation of the mushroom body, which projects the activity of the AL to a much higher dimensional space—Fig.(1
). The operation of a linear SVM finds the optimal separating hyperplanes between sets of points through a convex optimization process.
After the presentation of K “baseline” inputs we have a series of time points for each of the classes . We apply our multiclass SVM algorithm after normalizing this set of data, with each of the time points being labeled by the class label k. The algorithm is implemented in python using the liblinear package [liblinear] through scikit-learn [scikit-learn]. Since the WLC network activity is localized by the attractor defined by the input stimulus vector, each of the classes is separable. We now have K separating hyperplanes defining the boundaries of each of the classes.
“Testing” data consists of noisy perturbations to the input vectors. This is not noise in the network but rather perturbations to the input itself. Once the network activity
is generated for the given noisy inputs, we compare the amount of time spent by each input in a given region of phase space defined by the SVM. A probability function is then specified, Eq.(4) by taking the amount of time the network spends in a region of phase space.
Iv Results and Discussion
iv.1 Robustness to Noise
To train K classes of inputs we define K baseline odors Eq.(3) with . Each of these odors is then presented to the WLC network outlined in section III for ms. The robustness of the network is tested by measuring the accuracy of classification vs the number of classes and the amount of noise—defined by in Eq.(3). Results are shown in Fig.(3). The network shows a high robustness to input noise, with accuracy slowly tailing off once reaches 7. While we do not have a quantitative statement about the high robustness to additive noise, we attribute this to the large number of negative Lyapunov exponents in a WLC network that shape the trajectory as it moves from one unstable region to another [rabinovich2001].
Mixtures correspond to the biologically natural scenario of the insect being exposed to multiple odors at once. Further analysis of mixtures of base odors can help understand how the network separates, and then classifies, inputs.
For the data presented in Fig.(4), the network was trained on 50 base odors. An important point is that the WLC network is set a priori, so the training is on the SVM alone. From these ‘pure’ odors, two were selected— and . A mixture was then presented to the trained network, and the classification of observed. Primarily, for all was identified as being either or with some very small leakage into the other 48 odors available to the WLC+SVM device. The classification is shown in Fig.(4) and the results were expressed as probabilities using Eq.(4). The figure shows ; at it should be essentially zero, and at , it should very close to unity. The fit to the red dot data for is
If the WLC+SVM network described is presented with a mixture , which results in , we can recover and , and thus the fraction of each pure odor as
V Conclusion and Discussion
The input stimulus acts as a direction in N-dimensional space; this direction defines the attractor, which the network activity traverses as long as the stimulus is on. Even when perturbed by high noise levels or mixed with other odors, this “net” directionality keeps the phase space activity of the network localized to the neighborhood of a particular trajectory. The SVM then interprets the localized activity as being indicative of a particular, learned odor.
The translation of what we have discussed here to an ML device is straightforward. Learning will be achieved using methods developed for SVMs [convex, cs229, winstonsvm]; these methods are well developed and well documented.
An item we have not studied in depth, but requires more interpretation, is the exact trajectory of the network around the attractor. We anticipate that the theoretical capacity of the network [rabinovich2001] is only fully realized when each trajectory can be identified as its own odor—i.e., particular member of the class of objects to be classified. The SVM, on the other hand, amalgamates similar inputs in the same region of space into a singular odor; this interpretation erases the temporal information in the signal. Huerta [Huerta2013] speculates that the temporal encoding in the AL helps the insect interpret the distance to the source. Additional investigation of this topic might well include experiments on time dependent odors, as well as a different formalism for the MB that can take into account the temporal aspects of the inputs and the network.
We call attention to another investigation of an insect olfactory system and its ability to classify [delahunt_biological_2018]. The AL in that work was represented as a noisy relaxation oscillator. The AL presented here is a richer system with more capacity to classify, and it does not act as a noisy circuit, but a deterministic chaotic device [rabinovich2001].
Another interesting connection with other work is to that of Ott and colleagues [ott18, ottdresden19] where a fixed, randomly connected, high dimensional ‘reservoir’ network with a linear output layer is trained to recognize patterns in both low and high dimensional nonlinear systems. The connection to this work may be that the reservoir is acting as a WLC network—the universality of large random networks with inhibitory connections operating as WLC devices is conjectured in [rabinovich2001]. In [ott18, ottdresden19] the readout is linear and easy to train, while here a WLC network has an SVM as a readout device. SVMs are also remarkably easy to train [convex, cs229, winstonsvm].
The results of this paper on robustness against noise and reliability in identification of mixtures of ‘pure’ objects in a class, using a WLC+SVM network abstracted from a functional biological neural device, demonstrates it is a classification network built on principles that we understand at a biophysical level. It removes some of the ‘mystery’ of how ML networks operate, as one can identify the physical mechanisms behind the success of this kind of supervised learning network. In this manner we gain control over the outcome of ML techniques as we would in other physically based devices.
Although we have not investigated nor reported here the value of other known functional networks in the same spirit as the insect olfactory system, it may be productive to analyze how avian song production networks can provide insight into developing natural language processing ML networks, and place cell networks can do the same for ML navigation networks.
We acknowledge partial support from Microsoft Research. Work done by Da Wei Li and Jonathan Lam helped frame the questions in this paper. Conversations with Charles Delahunt of the University of Washington and Mark A. Stopfer of the National Institutes of Health on the insect olfactory system have been useful in the development of this work. We acknowledge helpful discussions with Maxim Bazhenov on models of the antennal lobe.