Multi-View Broad Learning System for Primate Oculomotor Decision Decoding

Multi-view learning improves the learning performance by utilizing multi-view data: data collected from multiple sources, or feature sets extracted from the same data source. This approach is suitable for primate brain state decoding using cortical neural signals. This is because the complementary components of simultaneously recorded neural signals, local field potentials (LFPs) and action potentials (spikes), can be treated as two views. In this paper, we extended broad learning system (BLS), a recently proposed wide neural network architecture, from single-view learning to multi-view learning, and validated its performance in monkey oculomotor decision decoding from medial frontal LFPs and spikes. We demonstrated that medial frontal LFPs and spikes in non-human primate do contain complementary information about the oculomotor decision, and that the proposed multi-view BLS is a more effective approach to classify the oculomotor decision, than several classical and state-of-the-art single-view and multi-view learning approaches.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 5

page 7

page 8

page 9

page 10

05/07/2021

Error-Robust Multi-View Clustering: Progress, Challenges and Opportunities

With recent advances in data collection from multiple sources, multi-vie...
05/07/2021

Double-matched matrix decomposition for multi-view data

We consider the problem of extracting joint and individual signals from ...
10/28/2021

ODMTCNet: An Interpretable Multi-view Deep Neural Network Architecture for Image Feature Representation

This work proposes an interpretable multi-view deep neural network archi...
02/01/2019

Applications of Multi-view Learning Approaches for Software Comprehension

Program comprehension concerns the ability of an individual to make an u...
05/04/2021

Federated Multi-View Learning for Private Medical Data Integration and Analysis

Along with the rapid expansion of information technology and digitalizat...
05/11/2021

A Comparison of Multi-View Learning Strategies for Satellite Image-Based Real Estate Appraisal

In the house credit process, banks and lenders rely on a fast and accura...
03/23/2018

Broad Learning for Healthcare

A broad spectrum of data from different modalities are generated in the ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Multi-view learning attempts to improve the learning performance by utilizing multi-view data, which can be collected from multiple data sources, or different feature sets extracted from the same data source. For example, in an invasive brain-machine interface (BMI) using electrodes [1], effective BMI cursor control can be achieved using action potentials (spikes), which are high-pass filtered neural signals, or local field potentials (LFPs), which are low-pass filtered neural signals measured from the same electrodes. The spikes and LFPs can represent two views of the same task.

There have been a few studies on applying multi-view learning to human brain state decoding. Kandemir et al. [2] combined multi-task learning and multi-view learning in decoding a user’s affective state, by treating different types of physiological sensors (e.g., electroencephalography, electrocardiography, etc.) as different views. Pasupa and Szedmak [3]

used tensor-based multi-view learning to predict where people are looking in images (saliency prediction), by treating the image and the user’s eye movement as two views. Spyrou

et al. [4] used multi-view learning to integrate spatial, temporal, and frequency signatures of electroencephalography signals for interictal epileptic discharges classification. However, to our knowledge, no one has applied multi-view learning to non-human primate brain state decoding using invasive signals like LFPs and spikes (we will discuss this in detail in Section IV-E).

A broad learning system (BLS) [5]

is a flexible neural network, which can incrementally adjust the number of nodes for the best performance. It has achieved comparable performance, with much less computational cost, to deep learning approaches in two applications

[5]. The main difference between a BLS and a deep learning model is that BLS improves the learning performance by increasing the width, instead of the depth, of the neural network. This paper proposes a multi-view BLS (MvBLS), which extends BLS from traditional single-view learning to multi-view learning, and applies it to monkey oculomotor decision decoding from both LFP and spike features. By using features from different views in generating the enhancement nodes, the proposed MvBLS can significantly outperform some classical and state-of-the-art single-view and multi-view learning approaches.

The main contributions of this paper are:

  1. We proposed three different MvBLS architectures, which have comparable performances but different computational cost.

  2. We applied MvBLS to monkey oculomotor decision decoding using neural signals recorded in the medial frontal cortex, and demonstrated that it outperformed some classical and state-of-the-art single-view and multi-view learning approaches.

  3. We verified through extensive experiments that combining LFP and spike features can improve the decoding performance in monkey oculomotor decision classification. This shows that, at least in this context, LFPs and spikes in the medial frontal cortex contain complementary information about oculomotor decisions.

The remainder of this paper is organized as follows: Section II introduces the BLS and our proposed MvBLS. Section III describes the neurophysiological dataset used in this work, and the experimental results. Section IV presents some additional discussions. Finally, Section V draws conclusions.

Ii BLS and MvBLS

This section introduces the single-view BLS, and our proposed MvBLS, for multi-class classification.

Ii-a Broad Learning System (BLS)

Single-layer feed-forward neural networks are universal approximators

[6]

, and have been used in numerous applications. Random vector functional neural networks (RVFLNNs) accelerate single-layer feed-forward neural networks by randomly generating the weight matrix

[7]. BLS is a further improvement of the RVFLNN.

In an RVFLNN, the input and output layers are directly connected. In a BLS, the input layer first passes through a feature extractor for dimensionality reduction and noise suppression. Due to the use of sparse auto-encoders, the extracted features are more diverse. This helps improve the generalization performance.

Let be the data matrix, where is the number of observations, and the feature dimensionality. Let be the one-hot coding matrix of the labels of , where is the number of classes. The architecture of a BLS is shown in Fig. 1. It first constructs feature nodes from , and then enhancement nodes from

. Finally, BLS estimates

from both and .

Fig. 1: The architecture of a BLS [5]. Diverse linear de-noised features are extracted from data , and further mapped into nonlinear features . Linear features and nonlinear features are then concatenated to predict .

The steps to build a BLS are:

  1. Construct the linear feature nodes111In Section III-A of [5], it is stated that “

    In our BLS, to take the advantages of sparse autoencoder characteristics, we apply the linear inverse problem in (7) and fine-tune the initial

    to obtain better features.” However, its context and Algorithms 1-3 use randomly initialized , and do not mention exactly how the sparse autoencoder is used. Here we describe the BLS procedure according to their sample code at http://www.broadlearning.ai/, which includes the details on how the sparse autoencoder is implemented. We also compared with and without sparse autoencoder, and found that the former indeed worked better. . Let be the number of groups of features nodes, and be the number of features nodes in each group. We first concatenate

    with an all-one bias vector

    to form the augmented data matrix , then construct each of the groups of feature nodes, , individually. For the th group of feature nodes

    , we first randomly generate uniformly distributed feature weights

    and compute the random feature nodes , then use least absolute shrinkage and selection operator (LASSO) to obtain sparse weights :

    (1)

    where , and is the L1 regularization coefficient. Alternating direction method of multipliers [8] is applied to solve (1). Then, we construct , and .

  2. Construct the nonlinear enhancement nodes . Let

    be the hyperbolic tangent sigmoid function, i.e.,

    (2)

    then

    (3)
    (4)

    where is a matrix of the orthonormal bases of a randomly generated uniformly distributed weight matrix in , is a scalar normalization factor, and is the maximum absolute value of all elements in . The goal of is to constrain the input to to , i.e., it performs normalization.

  3. Calculate , the weights from to

    . Ridge regression is used to compute

    , i.e.,

    (5)

    where is the L2 regularization coefficient.

The pseudocode of BLS is given in Algorithm 1. Through random feature weight matrices and L1 regularization, BLS extracts multiple sets of diverse linear de-noised features (which help increase its generalization ability). Then, orthogonal mapping and sigmoid functions are used to construct the enhancement nodes to introduce more nonlinearity (which help increase its model fitting power). Finally, and are concatenated as the features for predicting .

0:  , the training data matrix;         , the corresponding one-hot coding label matrix of ;        , the number of feature node groups;        , the number of feature nodes in each group;        , the number of enhancement nodes;        , the normalization factor;        , the L1 regularization coefficient for determining ;        , the L2 regularization coefficient for determining .
0:  BLS weight matrices (), , and .
  Construct
  for  to  do
     Initialize randomly;
     Calculate ;
     Calculate using (1);
     Calculate feature nodes ;
  end for
  Construct ;
  Construct an orthonormal basis matrix from a randomly generated matrix in ;
  Calculate the enhancement nodes using (3) and (4);
  Calculate using (5).
Algorithm 1 The BLS training algorithm [5].

Ii-B Multi-View Broad Learning System (MvBLS)

BLS has achieved comparable performance, with much less computational cost, with deep learning approaches on two single-view image datasets [5]. However, it is not optimized for multi-view data. This subsection extends single-view BLS to multi-view.

The architecture of the proposed MvBLS is shown in Fig. 2. Without loss of generality, we only consider two views. The extension to more than two views is straightforward. The general idea is to construct the linear de-noised feature nodes of each view separately, concatenate the feature nodes from all views to construct the nonlinear enhancement nodes, and finally fuse the feature nodes and enhancement nodes together for prediction. By separating the two views in the first layer of the MvBLS and optimizing and separately, we may obtain better features than optimizing directly (as in the case that we concatenate and and feed them altogether into a single BLS), because may be too long to be optimized effectively.

Fig. 2: Architecture of the proposed MvBLS. Diverse linear de-noised features and are extracted from Views A and B, respectively. and are then mapped into nonlinear features . , and are next concatenated to predict .

Let the two views be and , the corresponding data matrices be and ( and are the feature dimensionality of Views and , respectively), and the shared label matrix be . The procedure for constructing the MvBLS is:

  1. Construct the feature nodes for View , and for View , using Step (1) of Algorithm 1.

  2. Construct the enhancement nodes , using the concatenated feature nodes from both views and Step (2) of Algorithm 1.

  3. Calculate , the weights from to . Again, ridge regression is used to compute . Let . Then,

    (6)

    where is the L2 regularization coefficient.

The pseudocode for MvBLS is shown in Algorithm 2.

0:  , the training data matrix for View ;         , the training data matrix for View ;         , the corresponding one-hot coding label matrix;        , the number of feature node groups;        , the number of feature nodes in each group;        , the number of enhancement nodes;        , the normalization factor;        , the L1 regularization coefficient for determining ;        , the L2 regularization coefficient for determining .
0:  MvBLS weight matrices , (), , and
  Calculate and using , , , and Step (1) of BLS to construct the feature nodes and ;
  Calculate using , , and Step (2) of BLS;
  Calculate using (6).
Algorithm 2 The proposed MvBLS for two views.

Iii Experiment and Results

This section applies BLS and MvBLS to monkey oculomotor decision classification, and compares their performance with those using several classical and state-of-the-art single-view and multi-view learning approaches.

Iii-a The Neurophysiology Experiment

The invasive neurophysiological experimental setup used here and animal behavior were reported in [9]. All animal care and experimental procedures were in compliance with the US Public Health Service policy on the humane care and use of laboratory animals, and were approved by Johns Hopkins University Animal Care and Use Committee.

Two male rhesus monkeys (Monkey A: 7.5 kg; Monkey I: 7.2 kg) were trained to perform an oculomotor gambling task, as shown in Fig. 3. In each trial, the monkeys chose between two gamble options by making an eye movement (saccade) towards one of the visual cues. Two visual cues were randomly presented in two of four fixed locations (top right, bottom right, top left, and bottom left). Each cue was comprised of two colors (from a four-color library of cyan, red, blue, and green) and each color was associated with an amount of reward (1, 3, 5 to 9 units of water respectively, where 1 unit equaled 30

L). The background color of a visual cue was cyan (small reward), and the foreground color was either red, or blue, or green (larger reward). The proportion of the two colors represented the probability of winning the corresponding reward. For the red/cyan target in Fig. 

3, there was a 60% probability of having one unit of water (cyan color), and a 40% probability of having three units of water (red color). The expected reward value would then be units of water. There were a total of seven gamble options, representing three different expected reward values, as shown in Fig. 4.

Fig. 3: Sequence of events in the oculomotor gambling task. The figure is modified from Figure 2B in [9]. In the ‘target’ step, two visual cues appeared in two random directions. After the visual cues were presented, the monkey made a choice between the two options by making a saccade to the corresponding visual cue (‘saccade’ step), indicted by the black arrow (the black arrow was artificially added to the figure to better explain the experiment design, but it was not included in the actual visual cue displayed to the monkeys). The lines at the bottom indicate the duration of various time periods in the gambling task.
Fig. 4: The seven visual cues used in the gambling task. The figure is modified from Figure 2A in [9]. Four different colors (cyan, red, blue, and green) indicated different amounts of reward (increasing from 1, 3, 5 to 9 units of water, where 1 unit equaled 30 L). For example, the expected value of the right green/cyan target is: 9 units (reward amount) 0.2 (reward probability) 1 unit (reward amount) 0.8 (reward probability) 2.6 units.

In both types of trials, neural signals from the monkeys’ supplementary eye field in the medial frontal cortex were recorded with one or more tungsten electrodes, and the monkeys’ corresponding choices were recorded using an eye tracking system (Eye Link, SR Research Ltd, Ottawa, Canada).

The goal of our study was to investigate the task of decoding eye movements (choice intention) from neural signals in the primate medial frontal cortex, which are causally involved in risky decisions [10].

Iii-B Datasets

Forty-five datasets [9] were recorded from 45 different days of experiments from the two monkeys (33 from Monkey A, and 12 from Monkey I). Their statistics are shown in Table I, where - denote the four different saccade directions (classes), the number of electrodes in recording the LFPs, and the number of cells in recording the spikes.

For each recording, electrodes were lowered into the monkeys’ supplementary eye field using electric microdrives. While the monkeys were preforming the task, activity was recorded extracellularly using 1 to 4 tungsten microelectrodes with an impedance of 2-4 Ms (Frederick Haer, Bowdoinham, ME, USA) spaced 1-3 mm apart. Neural activity was measured against a local reference, a stainless steel guide tube, which carried the electrode array and was positioned above the dura.

At the preamplifier stage, signals were processed with 0.5 Hz 1-pole high-pass and 8k Hz 4-pole low-pass anti-aliasing Bessel filters, and then divided into two streams for the recording of LFPs and spiking activity. The stream used for LFP recording was amplified (500-2000 gain), processed by a 4-pole 200 Hz low-pass Bessel filter, and sampled at 1000 Hz. The stream used for spike detection was processed by a 4-pole Bessel high-pass filter (300 Hz), a 2-pole Bessel low-passed filter (6000 Hz), and was sampled at 40k Hz. Up to four template spikes were identified using principal component analysis. The spiking activity was subsequently analyzed off-line to ensure only single units were included in consequent analyses. Finally, spikes were counted within each millisecond bin. Thus, its sampling rate was reduced to 1000 Hz.

Dataset Number of trials
Avg Std
1 218 236 169 142 191 43.35 4 20
2 255 259 195 189 225 37.64 3 16
3 275 310 256 195 259 48.17 3 14
4 198 187 197 141 181 26.96 3 13
5 203 210 146 153 178 33.16 3 13
6 218 231 213 190 213 17.11 3 15
7 159 184 161 142 162 17.25 3 16
8 170 193 188 164 179 13.94 3 14
9 194 183 197 177 188 9.36 3 11
10 222 249 220 195 222 22.07 3 16
11 224 235 242 212 228 13.12 3 14
12 121 114 140 129 126 11.17 2 10
13 193 177 178 189 184 7.97 3 14
14 251 220 201 192 216 26.09 2 10
15 227 211 184 176 200 23.67 2 7
16 207 188 183 156 184 21.05 2 8
17 225 203 131 173 183 40.69 2 8
18 228 208 194 188 205 17.77 2 10
19 185 166 149 148 162 17.42 3 13
20 169 147 145 164 156 12.04 1 3
21 170 151 117 137 144 22.38 1 4
22 163 144 102 126 134 26.00 1 5
23 193 192 171 182 185 10.28 1 4
24 196 183 164 172 179 13.89 1 3
25 148 138 104 138 132 19.25 2 7
26 209 166 129 182 172 33.43 2 9
27 218 168 133 210 182 39.48 2 9
28 183 161 118 164 157 27.45 1 2
29 198 173 111 170 163 36.87 3 13
30 188 181 144 143 164 23.85 3 12
31 216 189 174 212 198 19.81 3 14
32 213 206 137 213 192 36.98 3 14
33 198 174 116 179 167 35.38 3 13
34 212 188 115 178 173 41.37 3 14
35 304 274 168 274 255 59.70 3 14
36 274 222 134 245 219 60.37 3 14
37 263 202 114 204 196 61.41 3 14
38 244 202 160 202 202 34.29 3 13
39 249 240 134 223 212 52.78 3 15
40 273 253 182 220 232 39.86 3 15
41 133 136 134 123 132 5.80 2 9
42 138 154 141 117 138 15.33 3 11
43 283 248 232 246 252 21.70 3 15
44 194 187 129 162 168 29.41 3 15
45 146 187 145 124 151 26.36 3 15
Avg 208 196 160 177 185 27.85
TABLE I: Statistics of the 45 datasets.

Iii-C LFP and Spike Feature Extraction

In this study, single-unit spikes were smoothed through 100-point moving average. The LFPs and processed spikes were then epoched to

ms after target onset for each electrode/cell. The eye movement reaction times (the time between target onset and eye movement onset) for the monkeys ranged from 100 ms to 300 ms. Therefore, the monkeys finished their eye movement before the end of each trial. The spike view had features, where is the number of cells for spikes in Table I. For each LFP trial from each electrode, power spectrum density of the 400 samples were computed by the Welch’s method with a Hamming window of 88 ms and 50% overlap. Then, log-power in eight frequency bands (theta, 4-8 Hz; alpha, 8-12 Hz; beta 1, 12-24 Hz; beta 2, 24-34 Hz; gamma 1, 34-55 Hz; gamma 2, 65-95 Hz; gamma 3, 130-170 Hz; gamma 4, 170-200 Hz), as used in [11], were calculated and concatenated with the 400 time domain samples as the features. Therefore, the LFP view had features, where is the number of electrodes for LFPs in Table I). Finally, both spike and LFP features were -normalized.

Iii-D Algorithms

We compared the performance of different decoding algorithms, including both single-view learning and multi-view learning approaches:

  1. Support vector machine (SVM), which uses error-correcting output codes (ECOC) [12]

    for multi-class classification. SVM is a classical statistical machine learning approach, and has achieved outstanding performance in numerous applications. The box constraint

    was chosen from by nested cross-validation. Each binary SVM classifier was solved by sequential minimal optimization (SMO) [13], and the optimization stopped if the gradient difference between upper and lower violators obtained by SMO was smaller than .

  2. Ridge classification (Ridge), which performs ridge regression to approximate the output of each class, and then classifies the input to the class with the largest output. The L2 regularization coefficient was chosen from by nested cross-validation.

  3. BLS, which has been introduced in Section II-A. We used normalization factor and L1 regularization coefficient , and selected the number of feature node groups from , the number of feature nodes in each group from , the number of enhancement nodes from , and the L2 regularization coefficient from , using nested cross-validation. The alternating direction method of multipliers [8] used to solve (1) for feature nodes construction was iterative, and it terminated after 50 iterations.

  4. Multi-view discriminant analysis with view-consistency (MvDA) [14], which extends classical single-view linear discriminant analysis to multi-view learning, and adds a regularization term to enhance the view-consistency.

  5. Multi-view modular discriminant analysis (MvMDA) [15], which exploits the distance between class centers across different views.

  6. MvBLS, which has been introduced in Section II-B. Its parameter tuning was the same as that for BLS.

Note that the first three algorithms can be used for both single-view learning and multi-view learning. When they were used in multi-view learning, we simply combined the features from different views as a single view input to the classifier. The last three approaches were used in multi-view learning only. The subspace dimensionality of MvDA and MvMDA was set to three (the number of classes minus one). After subspace alignment, the subspace features of all views were concatenated and fed into an ECOC-SVM classifier. Linear kernel was employed in all SVMs.

We randomly partitioned each dataset into three subsets: 60% for training, 20% for validation, and the remaining 20% for test. We repeated this process 30 times on each of the 45 datasets, and recorded the test classification accuracies as our performance measure.

Iii-E Classification Using only the LFPs

In the first experiment, we used only the LFPs in classification. The classification accuracies in the 45 sessions, each averaged over 30 cross-validation runs, are shown in the bar graph in the top panel of Fig. 5, and also in the box plot in the top-left panel of Fig. 6. The last group of the bar plot also shows the average accuracies across the 45 sessions, whose numerical values are given in Table II

. Each standard deviation showed in Table 

II was computed from 30 average accuracies of the 45 sessions. On average BLS slightly outperformed SVM and Ridge, which was also true in 29 and 32 out of the 45 individual sessions, respectively.

Fig. 5: Classification accuracies of different algorithms, when different features were used.
Fig. 6:

Boxplots of the classification accuracies of different algorithms, using different features. The red line in the box indicates the median, and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The whiskers extend to the most extreme data points, excluding outliers.

SVM Ridge BLS MvDA MvMDA MvBLS
LFPs 40.570.70 40.850.70 41.240.62
Spikes 43.490.62 44.210.47 43.480.59
LFPs + Spikes 46.120.56 46.510.41 45.650.52 39.490.50 36.620.55 47.940.62
TABLE II: Mean and standard deviation of classification accuracies (%) when different classifiers and features were used.

To find out whether there were statistically significant differences between different algorithms, non-parametric multiple pairwise comparison tests using Dunn’s procedure [16], with a -value correction using the False Discovery Rate method [17]

, were performed on the cross-validation accuracies. The null hypothesis in each pairwise comparison was the probability of observing a randomly selected value from the first group that is larger than a randomly selected value from the second group equals

, and it was rejected if , where . The -values, when only the LFP features were used, are shown in the first part of Table III. There was no statistically significant difference between any pair of algorithms.

LFP Features Spikes Features
SVM Ridge SVM Ridge
Ridge .4148 .1224
BLS .2607 .1892 .3827 .1118
TABLE III: -values of non-parametric multiple comparisons, when only the LFP features (first part) and the spike features (second part) were used.

In summary, we have shown that when only the LFP features were used, SVM, Ridge and BLS achieved comparable classification performances (BLS may be slightly better, but there was no statistically significant difference).

Iii-F Classification Using only the Spikes

In the second experiment, we used only the spikes in classification. The classification accuracies in the 45 sessions, each averaged over 30 cross-validation runs, are shown in the bar graph in the middle panel of Fig. 5, and also in the box plot in the top-right panel of Fig. 6. The last group of the bar graph shows the average accuracies across the 45 sessions, whose numerical values are also given in Table II. For all algorithms, using spikes only achieved better classification accuracy than using LFPs only. On average Ridge slightly outperformed BLS, which was also true in 32 individual sessions. Interestingly, the opposite held when the LFP features were used. This may indicate that LFPs and spikes encode non-identical information about the oculomotor decision.

Non-parametric multiple comparison tests were also performed, and the -values are shown in the second part of Table III. There was no statistically significant difference between any pair of algorithms.

In summary, we have shown that when only the spike features were used, SVM, Ridge and BLS again achieved comparable classification performance (Ridge may be slightly better, but there was no statistically significant difference).

Iii-G Classification Using both LFPs and Spikes

In the third experiment, we used both LFPs and spikes in classification. The classification accuracies in the 45 sessions, each averaged over 30 cross-validation runs, are shown in the bar graph in the bottom panel of Fig. 5, and also the box plot in the bottom panel of Fig. 6. The last group of the bar graph shows the average accuracies across the 45 sessions, whose numerical values are also given in Table II. Observe that:

  1. On average the two subspace multi-view learning algorithms, i.e., MvDA and MvMDA, performed much worse than the three single-view algorithms, i.e., SVM, Ridge, and BLS.

  2. On average our proposed MvBLS outperformed the two subspace multi-view learning algorithms. This suggests that MvBLS can extract more discriminative features and better fuse them than the other two approaches.

  3. On average our proposed MvBLS also outperformed the three single-view learning algorithms. This suggests that fusing the two views in a more sophisticated way may be more advantageous than simply concatenating and feeding them into a single-view classifier.

Non-parametric multiple comparison tests were also performed, and the -values are shown in Table IV, where the statistically significant ones are marked in bold. There was statistically significant difference between MvBLS and each of the other five algorithms.

SVM Ridge BLS MvDA MvMDA
Ridge .2054
BLS .2652 .0772
MvDA .0000 .0000 .0000
MvMDA .0000 .0000 .0000 .0000
MvBLS .0001 .0019 .0000 .0000 .0000
TABLE IV: -values of non-parametric multiple comparisons, when both LFPs and spikes were used.

We also compared MvBLS with random guess. The results are shown in Fig. 7. Note that the random guess approach obtained slightly different accuracies in different sessions (not always exactly 25% in 4-class classification), because different classes had different numbers of trials. On average the classification accuracy of MvBLS was about twice of that of random guess (47.94% vs 25.04%), suggesting that a sophisticated machine learning approach like MvBLS can indeed mine useful information from LFPs and spikes.

Fig. 7: Classification accuracies of random guess and MvBLS.

To validate if LFPs and spikes do contain complementary information, we counted the number of sessions that LFPs+spikes achieved better performance than LFPs or spikes only, and show the results in Table V. Regardless of which classifier was used, LFPs+spikes always outperformed LFPs or spikes alone, in most sessions. Moreover, when MvBLS was used, LFPs+spikes outperformed the best LFPs performance (among SVM, Ridge and BLS) in 40 sessions (88.89%), and the best spikes performance in 32 sessions (71.11%).

Number of sessions LFP+spikes outperformed
LFPs Spikes
SVM 31 (68.89%) 40 (88.89%)
Ridge 37 (82.22%) 28 (62.22%)
BLS 31 (68.89%) 35 (77.78%)
TABLE V: Number and percentage of sessions (among the 45 sessions) that LFPs+spikes outperformed a single modality of feature alone.

In summary, we have shown that LFPs and spikes contain complementary information about the brain’s oculomotor decision, and our proposed MvBLS can better fuse these features than several classical and state-of-the-art single-view and multi-view learning approaches.

Iv Discussions

This section presents some additional discussions on the proposed MvBLS.

Iv-a Classification Accuracy versus the Number of Electrodes/Cells

Figs. 5 and 7 show that sometimes the classification accuracy was very low, e.g., close to 25% (random guess). The main reason is that the number of electrodes/cells was small in these cases. For example, the top panel of Fig. 5 also shows the number of LFP electrodes in different sessions. It has a strong correlation with the classification accuracy, regardless of which classification algorithm was used. Particularly, the sessions with the lowest classification accuracy (Sessions 20-24) had the smallest number of electrodes. The middle panel of Fig. 5 shows the number of cells in different sessions. Similar patterns can be observed.

Next, we performed a deeper investigation on how the number of electrodes in LFPs and the number of cells in spikes affected the performance of BLS222We studied LFPs and spikes separately. Each time there was only one view, and hence MvBLS degraded to BLS..

The LFPs were studied first. We identified all datasets with three or more electrodes, and considered each one separately. Let’s use a dataset with three electrodes to illustrate our experimental procedure. We randomly partitioned the dataset into 60% training, 20% validation, and 20% test. Then, we increased the number of chosen electrodes from one to two and then to three; for each , we used LFPs from the corresponding electrodes to trained a BLS and compute its test accuracy. All possible combinations of electrodes were considered, and the average test accuracy was computed. We then repeated the data partition 30 times and computed the grand average test accuracies, as shown in Fig. 8. Clearly, the classification accuracy increased with the number of electrodes. We can imagine that much higher classification accuracy could be obtained if many more electrodes were used.

The spikes were studied next, and the results are shown in Fig. 8. The experimental procedure was very similar to that for the LFPs, except one subtle difference: the number of cells associated with different electrodes were generally different, so the total number of cells from electrodes could have different values. As a result, unlike in Fig. 8, where each dataset has only one curve, in Fig. 8 each dataset may have multiple branches leading to the same end-point. To make the curves more distinguishable, we only show the results for 10 datasets in Fig. 8. Generally, the classification accuracy increased with the number of cells, which is intuitive.

Fig. 8: BLS classification accuracy versus (a) the number of electrodes, and (b) the number of cells. In (a), each curve represents a different dataset. In (b), curves with the same end-point are from the same dataset.

Iv-B MvBLS Parameter Sensitivity

MvBLS has three structural parameters and three nomalization/regularization parameters (, , , , and in Algorithm 2). It is important to know the sensitivity of MvBLS to them, which will provide valuable guidelines in selecting these parameters in future applications.

By default , , , , and . When studying the sensitivity of MvBLS to , we fixed , , , and at their default values, and varied from to . For each on each dataset, we trained 30 MvBLSs on 30 different partitions of the dataset (80% for training and 20% for test), and recorded the average test accuracy across the 30 runs. Finally we took the average of the 45 datasets, and show the results in the top-left panel of Fig. 9. Similarly, we varied from to , from to , from to , from to , and from to , and show the results in Fig. 9. Observe that:

  1. As or increased, the training accuracy increased quickly, but the test accuracy slightly decreased. This suggests that smaller and should be used for better generalization performance, which is beneficial to the computational cost.

  2. The training and test accuracies almost did not change with and . So, we can set to be a small value to save the computational cost, and choose safely in .

  3. As increased, the training accuracy increased slowly, but the test accuracy almost did not change. So, we can choose safely in .

  4. As increased, the training accuracy decreased very quickly, but the test accuracy first increased slightly and then decreased slightly. This suggests that the regularization should not be too small or too large, which is a well-known fact in machine learning.

In general, we may conclude that MvBLS is robust to its parameters.

Fig. 9: MvBLS classification accuracy versus its parameters.

Iv-C Computational Cost

It is also interesting to compare the computational cost of different algorithms, as in practice a faster algorithm is preferred over a slower one, given similar classification accuracies.

We recorded the mean running time (including training, validation and test time) of 45 sessions for different classifiers, when different features were used. Since this process was repeated 30 times, we obtained 30 mean running time for each algorithm-feature combination. The mean and standard deviation for each combination were computed from these 30 numbers and shown in Table VI. The platform was a Linux workstation with Intel Xeon CPU (E5-2699@2.20GHz) and 500-GB RAM. SVM was the most efficient single-view learning algorithm, and MvBLS the most efficient multi-view learning algorithm. Particularly, MvBLS was several times faster than the other two state-of-the-art multi-view learning approaches, and it also achieved the best performance. In summary, our proposed MvBLS is both effective and efficient.

It is also important to point out that the model training is time-consuming, because a large number of parameters need to be optimized; however, once the optimal model parameters are found, all models can be run very fast in testing, which involves mostly matrix operations.

SVM Ridge BLS MvDA MvMDA MvBLS
LFPs 7.720.30 4.400.53 37.522.24
Spikes 19.052.92 585.5452.87 85.775.42
LFPs + Spikes 24.843.19 1033.1489.47 102.406.83 748.0666.55 741.8768.08 129.667.93
TABLE VI: Mean and standard deviation of running time (seconds) when different classifiers and features were used.

Iv-D Additional MvBLS Approaches

In addition to the MvBLS model in Fig. 2, other MvBLS architectures can also be configured, by constructing the inputs to differently. Two additional configurations are shown in Fig. 10, and denoted as MvBLS2 and MvBLS3, respectively. Compared with MvBLS in Fig. 2, MvBLS2 in Fig. 10 first constructs enhancement nodes from and from , and then feeds all of them into ; so, it has more nodes and weights than MvBLS. Compared with MvBLS2, MvBLS3 in Fig. 10 further constructs enhancement nodes from and , and then feeds , , , and into . So, MvBLS3 has even more nodes and weights than MvBLS2.

Fig. 10: Two additional configurations of MvBLS. (a) MvBLS2; (b) MvBLS3.

The performances of MvBLS, MvBLS2 and MvBLS3 in the 45 sessions are shown in Fig. 11. Their classification accuracies were almost identical, which is interesting, considering that MvBLS2 and MvBLS3 have more parameters and connections. Indeed, non-parametric multiple comparisons showed that there was no statistically significant difference between any two of them. Since MvBLS has much simpler configuration and is easier to train, it is preferred in our application.

Fig. 11: Classification accuracies of the three different MvBLSs.

Iv-E Related Work

Both LFPs and spikes contain information about the monkeys’ oculomotor decision, and there has been independent research on both. Spikes are high-pass filtered neural signals, which can be decoded into high-performance movement control signals [18, 19]. However, since spikes often deteriorate as electrodes degrade over time, more stable LFPs, which are low-pass filtered neural signals, are used in long term BMIs [20, 21, 22, 23, 24, 25].

Because LFPs and spikes can be recorded from the same electrodes [26], and they convey complementary information [27, 28, 29, 30], a natural approach is to combine them for more accurate decoding [31, 32, 1, 33, 34, 11].

Bokil et al. [31]

trained two macaque monkeys to perform a memory-saccade task and collected LFPs and spikes from the lateral intraparietal area. Two-dimensional Fourier transforms were performed to extract the features. Saccade prediction was achieved by maximizing the log-likelihood function of the observed neural activity. This approach was novel in that it did not use trial start time or other trial-related timing information. However, the performance degraded when switching from the preferred-or-anti-preferred binary classification to four-direction and eight-direction classifications.

Bansal et al. [34]

trained two male macaque monkeys to perform reach-and-grasp tasks in three dimensions, and collected 192-channel LFPs and spikes from primary and ventral premotor areas. Linear Gaussian state-space representation and Kalman filter were then used to decode the reach-and-grasp kinematics. The decoding was first conducted for each channel, then about 30 channels were iteratively chosen based on the decoding performance (the Pearson correlation coefficient between the measured and the reconstructed kinematics). This approach required a large number of channels to be chosen from, which may not available in many human and non-human primate studies, including ours.

Hsieh et al. [11] trained one adult rhesus macaque to perform a center-out-and-back task and collected 137-channel LFPs and spikes from dorsal premotor cortex and ventral premotor cortex of both hemispheres. They then developed a multi-scale encoding model, a multi-scale adaptive learning algorithm, and a multi-scale filter for decoding the millisecond time-scale of spikes and slower LFPs. This approach solved a trajectory regression problem, whereas we focused on oculomotor decision classification.

Most studies, except [34] and [11] introduced above, however, have not shown significant improvements in decoding performance, compared with using LFPs or spikes alone. Our research has shown that sophisticated machine learning approaches like MvBLS can better use LFPs and spikes, and hence achieve significant decoding performance improvements.

V Conclusion

Multi-view learning is very suitable for primate brain state decoding using medial frontal neural signals. This is because these simultaneously recorded neural signals comprise both low-frequency LFPs and high-frequency spikes, which can be treated as two views of the brain state. In this paper, we have extended single-view BLS to MvBLS, and validated its performance in monkey oculomotor decision decoding from medial frontal LFPs and spikes. We demonstrated that primate medial frontal LFPs and spikes do contain complementary information about the oculomotor decision, and that the proposed MvBLS is a more effective approach to use these two types of information in decoding the decision, than several classical and state-of-the-art single-view and multi-view learning approaches. Moveover, we showed that MvBLS is fast, and robust to its parameters. Therefore, we expect that MvBLS will find broader applications in other primate brain state decoding tasks, and beyond.

Acknowledgement

This research was supported by the National Natural Science Foundation of China Grant 61873321 and the 111 Project on Computational Intelligence and Intelligent Control under Grant B18024 to DW. It was also supported by the National Institutes of Health through grants 2R01NS086104 and 1R01DA040990 to VS, and K99EY029759 to XC.

References

  • [1] S. D. Stavisky, J. C. Kao, P. Nuyujukian, S. I. Ryu, and K. V. Shenoy, “A high performing brain-machine interface driven by low-frequency local field potentials alone and together with spikes,” Journal of Neural Engineering, vol. 12, no. 3, p. 036009, 2015.
  • [2] M. Kandemir, A. Vetek, M. Gönen, A. Klami, and S. Kaski, “Multi-task and multi-view learning of user state,” Neurocomputing, vol. 139, pp. 97–106, 2014.
  • [3] K. Pasupa and S. Szedmák, “Utilising Kronecker decomposition and tensor-based multi-view learning to predict where people are looking in images,” Neurocomputing, vol. 248, pp. 80–93, 2017.
  • [4] L. Spyrou, S. Kouchaki, and S. Sanei, “Multiview classification and dimensionality reduction of scalp and intracranial EEG data through tensor factorisation,” Signal Processing Systems, vol. 90, no. 2, pp. 273–284, 2018.
  • [5] C. P. Chen and Z. Liu, “Broad learning system: An effective and efficient incremental learning system without the need for deep architecture,” IEEE Trans. on Neural Networks and Learning Systems, vol. 29, no. 1, pp. 10–24, 2018.
  • [6]

    M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, “Multilayer feedforward networks with a nonpolynomial activation function can approximate any function,”

    Neural Networks, vol. 6, no. 6, pp. 861–867, 1993.
  • [7] B. Igelnik and Y.-H. Pao, “Stochastic choice of basis functions in adaptive function approximation and the functional-link net,” IEEE Trans. on Neural Networks, vol. 6, no. 6, pp. 1320–1329, 1995.
  • [8] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010.
  • [9] X. Chen and V. Stuphorn, “Sequential selection of economic good and action in medial frontal cortex of macaques during value-based decisions,” Elife, vol. 4, p. e09418, 2015.
  • [10] X. Chen and V. Stuphorn, “Inactivation of medial frontal cortex changes risk preference,” Current Biology, vol. 28, no. 19, pp. 3114–3122.e4, 2018.
  • [11] H.-L. Hsieh, Y. T. Wong, B. Pesaran, and M. M. Shanechi, “Multiscale modeling and decoding algorithms for spike-field activity,” Journal of Neural Engineering, vol. 16, no. 1, 2019.
  • [12] T. G. Dietterich and G. Bakiri, “Solving multiclass learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research, vol. 2, pp. 263–286, 1994.
  • [13] R.-E. Fan, P.-H. Chen, and C.-J. Lin, “Working set selection using second order information for training support vector machines,” Journal of Machine Learning Research, vol. 6, no. Dec, pp. 1889–1918, 2005.
  • [14] M. Kan, S. Shan, H. Zhang, S. Lao, and X. Chen, “Multi-view discriminant analysis,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, no. 1, pp. 188–194, 2016.
  • [15] G. Cao, A. Iosifidis, K. Chen, and M. Gabbouj, “Generalized multi-view embedding for visual recognition and cross-modal retrieval,” IEEE Trans. on Cybernetics, vol. 48, no. 9, pp. 2542–2555, 2018.
  • [16] O. Dunn, “Multiple comparisons using rank sums,” Technometrics, vol. 6, pp. 214–252, 1964.
  • [17] Y. Benjamini and Y. Hochberg, “Controlling the false discovery rate: A practical and powerful approach to multiple testing,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 57, pp. 289–300, 1995.
  • [18] D. R. Humphrey, E. Schmidt, and W. Thompson, “Predicting measures of motor performance from multiple cortical spike trains,” Science, vol. 170, no. 3959, pp. 758–762, 1970.
  • [19] G. Santhanam, S. I. Ryu, B. M. Yu, A. Afshar, and K. V. Shenoy, “A high-performance brain-computer interface,” Nature, vol. 442, no. 7099, pp. 195–198, 2006.
  • [20] R. A. Andersen, S. Musallam, and B. Pesaran, “Selecting the signals for a brain–machine interface,” Current Opinion in Neurobiology, vol. 14, no. 6, pp. 720–726, 2004.
  • [21] J. Rickert, S. C. de Oliveira, E. Vaadia, A. Aertsen, S. Rotter, and C. Mehring, “Encoding of movement direction in different frequency ranges of motor cortical local field potentials,” Journal of Neuroscience, vol. 25, no. 39, pp. 8815–8824, 2005.
  • [22] J. Zhuang, W. Truccolo, C. Vargas-Irwin, and J. P. Donoghue, “Decoding 3-D reach and grasp kinematics from high-frequency local field potentials in primate primary motor cortex,” IEEE Trans. on Biomedical Engineering, vol. 57, no. 7, pp. 1774–1784, 2010.
  • [23] A. K. Bansal, C. E. Vargas-Irwin, W. Truccolo, and J. P. Donoghue, “Relationships among low-frequency local field potentials, spiking activity, and three-dimensional reach and grasp kinematics in primary motor and ventral premotor cortices,” Journal of Neurophysiology, vol. 105, no. 4, pp. 1603–1619, 2011.
  • [24] R. D. Flint, E. W. Lindberg, L. R. Jordan, L. E. Miller, and M. W. Slutzky, “Accurate decoding of reaching movements from field potentials in the absence of spikes,” Journal of Neural Engineering, vol. 9, no. 4, p. 046006, 2012.
  • [25] R. D. Flint, Z. A. Wright, M. R. Scheid, and M. W. Slutzky, “Long term, stable brain machine interface performance using local field potentials and multiunit spikes,” Journal of Neural Engineering, vol. 10, no. 5, p. 056005, 2013.
  • [26] I. E. Monosov, J. C. Trageser, and K. G. Thompson, “Measurements of simultaneously recorded spiking activity and local field potentials suggest that spatial selection emerges in the frontal eye field,” Neuron, vol. 57, no. 4, pp. 614–625, 2008.
  • [27] A. Belitski, A. Gretton, C. Magri, Y. Murayama, M. A. Montemurro, N. K. Logothetis, and S. Panzeri, “Low-frequency local field potentials and spikes in primary visual cortex convey independent visual information,” Journal of Neuroscience, vol. 28, no. 22, pp. 5696–5709, 2008.
  • [28] G. Buzsaki, C. A. Anastassiou, and C. Koch, “The origin of extracellular fields and currents – EEG, ECoG, LFP and spikes,” Nature Reviews Neuroscience, vol. 13, pp. 407–420, 2012.
  • [29] X. Chen, M. Zirnsak, and T. Moore, “Dissonant representations of visual space in prefrontal cortex during eye movements,” Cell Reports, vol. 22, no. 8, pp. 2039–2052, 2018.
  • [30] G. T. Einevoll, C. Kayser, N. K. Logothetis, and S. Panzeri, “Modelling and analysis of local field potentials for studying the function of cortical circuits,” Nature Reviews Neuroscience, vol. 14, pp. 770–785, 2013.
  • [31] H. S. Bokil, B. Pesaran, R. A. Andersen, and P. P. Mitra, “A method for detection and classification of events in neural activity,” IEEE Trans. on Biomedical Engineering, vol. 53, no. 8, pp. 1678–1687, 2006.
  • [32] J. A. Perge, S. Zhang, W. Q. Malik, M. L. Homer, S. Cash, G. Friehs, E. N. Eskandar, J. P. Donoghue, and L. R. Hochberg, “Reliability of directional information in unsorted spikes and local field potentials recorded in human motor cortex,” Journal of Neural Engineering, vol. 11, no. 4, p. 046007, 2014.
  • [33] K. Ibayashi, N. Kunii, T. Matsuo, Y. Ishishita, S. Shimada, K. Kawai, and N. Saito, “Decoding speech with integrated hybrid signals recorded from the human ventral motor cortex,” Frontiers in Neuroscience, vol. 12, p. 221, 2018.
  • [34] A. K. Bansal, W. Truccolo, C. E. Vargas-Irwin, and J. P. Donoghue, “Decoding 3D reach and grasp from hybrid signals in motor and premotor cortices: spikes, multiunit activity, and local field potentials,” Journal of Neurophysiology, vol. 107, no. 5, pp. 1337–1355, 2011.