This paper considers the problem of neural decoding from parallel neural measurements systems such as micro-electrocorticography (-ECoG). In systems with large numbers of array elements at very high sampling rates, the dimension of the raw measurement data may be large. Learning neural decoders for this high-dimensional data can be challenging, particularly when the number of training samples is limited. To address this challenge, this work presents a novel neural network decoder with a low-rank structure in the first hidden layer. The low-rank constraints dramatically reduce the number of parameters in the decoder while still enabling a rich class of nonlinear decoder maps. The low-rank decoder is illustrated on -ECoG data from the primary auditory cortex (A1) of awake rats. This decoding problem is particularly challenging due to the complexity of neural responses in the auditory cortex and the presence of confounding signals in awake animals. It is shown that the proposed low-rank decoder significantly outperforms models using standard dimensionality reduction techniques such as principal component analysis (PCA).
Keywords: auditory decoding; neural networks; low-rank filter; dimensionality reduction
Advancements in neural recording technologies, particularly calcium imaging and high-dimensional micro-electrocorticography (
-ECoG), now enable measurements of tremendous numbers of neurons or brain regions in parallelChang (2015); Fukushima . (2015); Stosiek . (2003). While these recordings offer the potential to observe neural activity at unprecedented level of detail, the high-dimensionality presents a fundamental challenge for learning neural decoding systems from data.
This dimensionality problem is particularly acute for the focus of this work, namely neural decoding of signals in the primary auditory signals from state-of-the-art -ECoG. Most importantly, in modern -ECoG systems, the dimensionality of the meaured responses often exceeds the number of training examples. For example, in the application we discuss below the responses from the -ECoG array Insanally . (2016) for each stimuli consists of approximately 160 time samples across 61 electrodes, resulting in a raw feature dimension of . However, due to experimental limits on the duration of the experiments, there are less than 400 training examples. Moreover, responses in the primary auditory cortex are known to be complex Zatorre . (2002); Młynarski McDermott (2018). Also, for awake animals, the responses may have confounding components from movements. Consequently, neural decoding systems must be sufficiently rich to enable nonlinear decoding and confounding signal rejection.
This work presents a novel approach for neural decoding from parallel neural measurements with a small number of parameters while being able to capture complex nonlinear relationships between the measurements and stimulus. The approach is based on a traditional neural network structure, but with two key novel properties: (1) A discrete-cosine transform (DCT) pre-processing stage used to reduce the sampling rate; and (2) An initial linear layer of the neural network with a low rank structure. We argue that both structures are well-justified based on the physical processes and can dramatically reduce the number of parameters. The method is demonstrated in neural decoding in the rat primary auditory cortex (A1) from a new high-dimensional -ECoG array Insanally . (2016).
2.1 Previous work
Despite advancements in machine learning tools, traditional methods are still common in auditory decodingGlaser . (2017). Some of these methods consider both linear and non-linear mapping of the neural responses to the auditory spectrogram Pasley . (2012)2018). As in de Cheveigné . (2018) other methods like canonical correlation analysis (CCA) have also been used as linear models to measure the correlation between the stimulus and response as a goodness of fit after transforming them.
Multi-layer neural networks showed remarkable success in feature extraction and classification in machine vision and speech processingYamins DiCarlo (2016). Since auditory signals arrive to the cortex after having been passed through a number of sensory processing areas, these networks are appealing to model the responses in auditory cortex Hackett (2011).
. The methods are largely based on the unlabeled data and attempt to find a low-dimensional latent representation that can capture the bulk of the signal variance. Neural decoders can then be trained on the low-dimensional representation to reduce the number of parameters. As we will see in the results section below, our method can outperform these dimensionality reduction-based techniques since the proposed method operates on the labeled data and, in essence, find the directions of variance that are best tuned for the neural decoding task.
3 Model Description
We consider the problem of decoding stimuli from -dimensional neural responses recorded from some area of the brain. Such responses can arise from any parallel measurements system including responses measured by an ECoG microelectrode array with channels, calcium traces from neurons, or signals recorded by the recently developed Neuropixel probes Callaway Garg (2017). Let be the response to some stimulus recorded in a time window of length after the stimulus is applied. Given input-output sample pairs
, the neural decoding problem is to learn a decoder that can estimate the stimulusfrom a new response . Depending on whether the stimuli is discrete or continuous-valued, the decoding problem can be viewed either as a classification or regression.
The key challenge in this decoding problem is the potential high-dimensionality of the input to the decoder . Since the response has features, even linear classification or regression would require parameters. This number of parameters may easily exceed the number of trials on which the decoder can be trained. Thus, some form of dimensionality reduction or structure on the decoder is required.
To address this challenge, we propose a novel low-rank neural network structure to reduce the number of parameters while still enabling rich nonlinear maps from the response to the stimulus estimate. Here, we present the model for a regression problem with a scalar target . However, the same model can be used for classification or multi-target regression with minor modifications. Figure 1 shows the structure of the model proposed for decoding multidimensional neural processes. The first stage preprocesses the data by passing each of the time samples of the components through a discrete cosine transform (DCT). To low-pass filter the signal, only the first
coefficients in the frequency domain are retained, hence reducing the dimension fromto . The low-pass filtering is well-justified assuming that the neural responses to the stimuli are typically band-limited.
After the low-pass filtering, the resulting frequency-domain matrix is passed through a neural network with two hidden layers and one output layer,
is the sigmoid function. The key novel feature of this network is in the first layer (1), where each hidden unit is computed from inner product of the input with a rank one matrix . The second hidden layer (2) and output layer (3) are mostly standard. The only slightly non-standard component is that, in the output, we have assumed that the stimuli is bounded as scaled to a range so that we can use a sigmoid output.
The main motivation of the rank one structure (1) is to reduce the number of parameters. A standard fully connected layer would require parameters for each hidden unit, requiring a total of parameters. In contrast, the rank one layer (1) uses only parameters. We will see in the results section that this savings can be considerable.
The low rank structure can be justified, at least heuristically, under the assumption of a low rank structure of the neural responses. Specifically suppose that the frequency-domain neural responses,, are approximately given by,
where are some latent variables caused by the stimuli, , and and are, respectively, the responses of the latent variable over the measurement channel index and frequency index . Under this assumption, a natural way to estimate the stimulus , is to first estimate the vector of latent variables from and then estimate from the vector . Now, we can write (4) as where is a linear map. The (regularized) least squares estimate for given is then given by for some regularization level . Due to the separability structure (4), it is easily verified that each estimate will be of the form,
4.1 ECoG data from auditory cortex
We evaluate the performance of our model using in vivo ECoG recordings of A1 area of auditory cortex in moving rodents. Signals are recorded from a high resolution ECoG array with electrodes with 420 spacing. The electrodes were arranged in an 8 x 8 grid where three corner electrodes were omitted Insanally . (2016). In each experiment, single frequency tones with different frequencies are played for every second and the responses are recorded. Figure 2 shows the experiment setup and the electrode array. Recorded signals are then down-sampled to for further processing. There are a total of 390 tones played in each experiment.
4.2 Decoder performance
To train our model and test its performance, we generate a dataset . Each sample consists of the frequency of the stimulus as the input and a window extracted from the signals after the stimulus is applied as the response . Since the sampling frequency is , each is a matrix with channels and time samples. The input frequencies are shifted and rescaled to fall inside the interval . Taking the point DCT of the signal where , we choose the first 55 frequencies to reduce the dimensionality. We then pass the signal through a low-rank layer with 10 rank-one units. This layer is followed by a Dense layer with 4 hidden units and sigmoid activation. The output layer is a single linear unit with a sigmoid non-linearity which gives us the predicted frequency index. We have used regularization with in learning the weights of both separable and fully connected layers. The model is trained on of the whole dataset and evaluated on the remaining as the test set. The goal is to estimate the index of the frequency as a regression problem and R-squared score is used as a measure of closeness of data to the fitted regression model.
We compare the performance of the proposed low-rank neural network with three commonly used models:
PCA + linear: top
principal components of the input are used for linear regression. There are a total ofparameters in this model. We use both and regularizers.
PCA + SVM: top principal components of the input are taken followed by a support vector regressor. There are a total of parameters in this model.
PCA + NN: top principal components of the input are taken followed by a neural network with one hidden layer composed of units. There are a total of parameters in this model. We use regularization for the weights.
For all three models we take top
principal components. Cross validation is used to tune the parameters. For SVM, both linear and radial basis function (RBF) kernel were tried and it was found that RBF gives better results.
Figure 3 shows the performance of all four models in estimating the stimulus frequency on the test dataset. Estimated frequency (
) from each model with one standard deviation error is plotted against the true frequency (). The dashed line shows the line , corresponding to a perfect model. Therefore, the distance of the prediction curve of each model to this line corresponds to the bias of the estimator and the error shades correspond to the variance of the estimator. The low-rank neural network is closest to the reference line, showing that it is performing better the other models. Table 1 summarizes the performance of each model in estimating the log-frequency of the stimulus in terms of the R-squared metric along with the root-mean-square errors (RMSE).
|PCA + Linear||0.484||0.179|
|PCA + SVM||0.476||0.181|
|PCA + NN||0.510||0.174|
The problem of decoding multidimensional neural responses can be challenging due to high dimensionality of the data. In this work, we presented a neural network model with low-rank structure weights as the first hidden layer which significantly reduces the number of parameters compared to a fully connected network. We tested the model for decoding ECoG data recorded from A1 area of auditory cortex of awake rats. We compared the proposed model with some of the most widely used models for decoding neural signals. We showed that our model performs much better in predicting the frequency of the stimulus.
- Callaway Garg (2017) callaway2017brainCallaway, EM. Garg, AK. 2017. Brain technology: Neurons recorded en masse Brain technology: Neurons recorded en masse. Nature5517679172.
- Chang (2015) chang2015towardsChang, EF. 2015. Towards large-scale, human-based, mesoscopic neurotechnologies Towards large-scale, human-based, mesoscopic neurotechnologies. Neuron86168–78.
- Cunningham Byron (2014) cunningham2014dimensionalityCunningham, JP. Byron, MY. 2014. Dimensionality reduction for large-scale neural recordings Dimensionality reduction for large-scale neural recordings. Nature neuroscience17111500.
- de Cheveigné . (2018) de2018decodingde Cheveigné, A., Wong, DD., Di Liberto, GM., Hjortkjær, J., Slaney, M. Lalor, E. 2018. Decoding the auditory brain with canonical component analysis Decoding the auditory brain with canonical component analysis. NeuroImage172206–216.
- Francis . (2018) francis2018smallFrancis, NA., Winkowski, DE., Sheikhattar, A., Armengol, K., Babadi, B. Kanold, PO. 2018. Small Networks Encode Decision-Making in Primary Auditory Cortex Small networks encode decision-making in primary auditory cortex. Neuron974885–897.
- Fukushima . (2015) fukushima2015studyingFukushima, M., Chao, ZC. Fujii, N. 2015. Studying brain functions with mesoscopic measurements: Advances in electrocorticography for non-human primates Studying brain functions with mesoscopic measurements: Advances in electrocorticography for non-human primates. Current opinion in neurobiology32124–131.
- Glaser . (2017) glaser2017machineGlaser, JI., Chowdhury, RH., Perich, MG., Miller, LE. Kording, KP. 2017. Machine learning for neural decoding Machine learning for neural decoding. arXiv preprint arXiv:1708.00909.
- Hackett (2011) hackett2011informationHackett, TA. 2011. Information flow in the auditory cortical network Information flow in the auditory cortical network. Hearing research2711-2133–146.
- Insanally . (2016) insanally2016lowInsanally, M., Trumpis, M., Wang, C., Chiang, CH., Woods, V., Palopoli-Trojani, K.Viventi, J. 2016. A low-cost, multiplexed ECoG system for high-density recordings in freely moving rodents A low-cost, multiplexed ecog system for high-density recordings in freely moving rodents. Journal of neural engineering132026030.
- Mazzucato . (2016) mazzucato2016stimuliMazzucato, L., Fontanini, A. La Camera, G. 2016. Stimuli reduce the dimensionality of cortical activity Stimuli reduce the dimensionality of cortical activity. Frontiers in systems neuroscience1011.
- Młynarski McDermott (2018) mlynarski2018learningMłynarski, W. McDermott, JH. 2018. Learning midlevel auditory codes from natural sound statistics Learning midlevel auditory codes from natural sound statistics. Neural computation303631–669.
- Pasley . (2012) pasley2012reconstructingPasley, BN., David, SV., Mesgarani, N., Flinker, A., Shamma, SA., Crone, NE.Chang, EF. 2012. Reconstructing speech from human auditory cortex Reconstructing speech from human auditory cortex. PLoS biology101e1001251.
- Sadtler . (2014) sadtler2014neuralSadtler, PT., Quick, KM., Golub, MD., Chase, SM., Ryu, SI., Tyler-Kabara, EC.Batista, AP. 2014. Neural constraints on learning Neural constraints on learning. Nature5127515423.
- Stosiek . (2003) stosiek2003vivoStosiek, C., Garaschuk, O., Holthoff, K. Konnerth, A. 2003. In vivo two-photon calcium imaging of neuronal networks In vivo two-photon calcium imaging of neuronal networks. Proceedings of the National Academy of Sciences100127319–7324.
- Williamson . (2016) williamson2016scalingWilliamson, RC., Cowley, BR., Litwin-Kumar, A., Doiron, B., Kohn, A., Smith, MA. Byron, MY. 2016. Scaling properties of dimensionality reduction for neural populations and network models Scaling properties of dimensionality reduction for neural populations and network models. PLoS computational biology1212e1005141.
Yamins DiCarlo (2016)
yamins2016usingYamins, DL. DiCarlo, JJ.
Using goal-driven deep learning models to understand sensory cortex Using goal-driven deep learning models to understand sensory cortex.Nature neuroscience193356.
- Zatorre . (2002) zatorre2002structureZatorre, RJ., Belin, P. Penhune, VB. 2002. Structure and function of auditory cortex: music and speech Structure and function of auditory cortex: music and speech. Trends in cognitive sciences6137–46.