1 Introduction
Multielectrode recordings measure the dynamic activation of large networks of neurons in the brain and are a key enabling tool in studying neural function at scale
paulk2022large; anumanchipalli2019speech; willett2021high; carmena2003learning. Modern multielectrode recordings can monitor neural activity continuously in numerous brain regions with high temporal frequency across days. In humans, implanting such electrode arrays is an invasive clinical procedure performed by neurosurgeons. However, individual electrode implants can fail during any of the recording days, almost always leading to missing electrode values in these difficulttoacquire datasets. The current standard practice is to discard electrodes with missing data, which excludes potentially useful information from analysis and also limits the generalizability of models across days and individuals banga2022reproducibility; anumanchipalli2019speech; ruebel2019nwb.To address this challenge, we study the problem of imputing missing values from multielectrode data using recordings collected across multiple days in several human participants (Figure 1). While related to the generic problem of timeseries imputation (e.g., liu2019naomi; luo2018multivariate, more discussion in Section 5), previous works are not directly applicable here due to additional challenges associated with multielectrode recordings. Two major challenges are that (1) electrodes are often missing for entire sessions with no adjacent recorded timestamps, and (2) existing methods do not handle data across participants who have completely different sets of electrodes. To our knowledge, no prior work has conducted a rigorous study of how to recover or impute missing electrode recordings using data across sessions and different human participants.
We propose Deep Neural Imputation (DNI), a framework to recover missing electrode data by learning across days and participants. We instantiate our framework with both a linear nearestneighbor method as well as two deep autoencoderbased generative models. DNI uses a selfsupervised task based on maskfilling during training, which we call masked electrode modeling, similar to previous masking approaches in other domains devlin2018bert; liu2019naomi. In order to leverage data across participants, we extend our learning approach to use participantspecific encoding/decoding layers along with shared inner layers to jointly model participants.
We evaluate DNI’s performance on multielectrode intracranial electrocorticography recordings from 12 human participants with naturalistic behavior across multiple days peterson2022ajile12. Because of the large interparticipant and interday variabilitygonschorek2021removing,^{1}^{1}1Spatial sampling in intracranial datasets is clinically determined and therefore highly variable. Consequently, recovering missing data is far more challenging than, e.g., EEG data where spatial sampling is easily controlled. each participant’s recordings over a day (and often only a few hours of said recordings) would typically be treated as separate datasets in neuroscience studies (e.g., collinger2013high; anumanchipalli2019speech; natraj2022compartmentalized). Despite these variations, we show that our approach can recover electrode recordings even when a significant fraction is missing. Further, we explore DNI’s ability to recover frequency content, which is significant to the neuroscience community miller2007spectral; jas2017learning; cole2019cycle. We also apply our imputed data on a downstream neural decoding task peterson2021generalized, showing that our method can recover significant decoding power when compared to the nonimputed missing data. These results suggest that our methods can help address a major challenge in experimental design and analysis of multielectrode arrays in neuroscience.
Our contributions are summarized as follows:

We propose the Deep Neural Imputation (DNI) Framework, a method for recovering missing multivariate electrode time series recordings across sessions and human participants. To our knowledge, DNI is the first framework to simultaneously impute fully missing spatial and temporal time series from multielectrode recordings.

We empirically demonstrate DNI’s compatibility with both linear and nonlinear imputation methods, and generalizability to the jointparticipant regime.

We provide experimental evidence of DNI’s direct and immediate utility in downstream scientific analyses: (1) DNI reconstructs not only time series content but also frequency based powerspectral content, and (2) DNI’s spatiotemporal reconstructions directly improve a brain decoder’s classification accuracy when missing data is present.
2 Background & Problem Setup
Data Modality & Format. Our data, further described in Section 4.1, consists of participants (indexed by ), recording days (indexed by ),^{2}^{2}2Most often, a neuroscience recording session is less than or equal to a few hours. However, for simplicity of exposition we use 24hr days since that is how our data is organized. electrodes corresponding to spatial locations (indexed by ), and time series instances (indexed by ). In other words, an observation corresponds to time series instance from electrode on day for participant . Indices that follow others demonstrate dependence; for example, time series instance depends on the specific electrode , the specific day , and the specific participant . Here we define generalizable as being able to perform on previously unseen data. When our models impute data across dimensions they generalize across time series instances, spatial locations, and days (Figure 1) and we call them daygeneralizable. When our models impute data across dimension as well as dimensions they additionally share a joint participant embedding space; we call them DaywithjOintParticipantEmbeddingsgeneralizable, or DOPEgeneralizable. By necessity, DOPEgeneralizable models are also daygeneralizable.
Challenges in Modeling Electrode Recordings. Due to significant interparticipant and interday variability, data analysis methods in neuroscience commonly treat a few hours from single participants as individual datasets collinger2013high; anumanchipalli2019speech; natraj2022compartmentalized to be analyzed separately. Figure 1’s right panel demonstrates the stark electrode configuration variability in our participants, illustrating a common phenomenon in this data type. Further sources of this variability include movement artifacts, insurmountable experimental noise, and electrode recording failure. For example, in a neuropixel singleunit electrophysiology mouse study, only 55.4% of the data could be used for downstream analyses, because of “recording failure[s],” “low yield,” and “noise/artifact[s]” banga2022reproducibility. In this ECoG LFP human speech decoding study, electrode recordings were discarded from a majority of the participants due to “bad signal quality” anumanchipalli2019speech. The pervasiveness of missing data is exemplified by the standard data storage file format in neuroscience, Neurodata Without Borders, which explicitly emphasizes its support for “dense ragged arrays” that permit “missing fields” ruebel2019nwb.
These ragged data tensors result from the lack of consistency across the dimensions (Top Left Figure 1). Since ragged data are difficult to jointly analyze, they are most commonly broken up into smaller, nonragged tensors. This procedure leads to more manageable data for neuroscience analysis pipelines, meaning that discarding data remains a common practice despite its inefficiency. Because neuroscience data processing pipelines seldom handle variation in the expected data structure, these pipelines fundamentally limit the generalization capability of neuroscience models.
Formal Imputation Goals. To address the above challenges, our goal is to impute missing electrode values by learning from multielectrode time series data across spatial locations, days, and participants.
Let be the full set of active electrodes on day for participant . represents the full set of electrodes for participant , which remains consistent across days. Let represent the set of missing electrodes on day for participant . Let represent the set of observed electrodes.
Formally, DNI seeks to recover given . Previous work for multivariate time series imputation liu2019naomi; cao2018brits; luo2018multivariate studies missing data at the level of sequences , where there may be missing observations for a set of timestamps (e.g. ). However, these methods could not impute when a feature was missing for all timestamps (e.g. ); in other words, each feature required at least one observed timestamp. Therefore, these methods are not directly applicable to ours because an entire time series instance may be missing and have no adjacent timestamps.
3 Deep Neural Imputation Framework
The goal of Deep Neural Imputation is to learn an imputation function , as depicted in Figure 2 (Left). may be either linear or nonlinear, and we study both instantiations. In the participantspecific case, we learn a function for each participant (Figure 2). In the jointparticipant case, we learn a single function across all participants (Figure 3). In all cases, our learned functions do not vary by recording day, making our models daygeneralizable.
HighLevel Approach. To our knowledge, no prior work has rigorously studied neural electrode signal imputation, therefore one of our key design goals is to identify the simplest approach that works well on this challenging imputation task. Two salient desiderata for imputation methods compatible with DNI are (1) the recovery of missing data without adjacent timestamps using observations from different days; and (2) robustness to variations in brain morphology and physical electrode placement across participants. Participantspecific modeling addresses (1) but not (2) and is daygeneralizable, while jointparticipant modeling addresses both (1) and (2) and is DOPEgeneralizable.
To create daygeneralizable models that address (1), the key idea is to learn conserved relationships between electrodes across recording days. A conceptually straightforward method is nearest neighbors linear interpolation, which we propose as a natural baseline in our framework since no other baselines exist. We also study a nonlinear deep autoencoder model, which is discussed below. We then extend the aforementioned participantspecific autoencoder model to train jointly over all participants. This DOPEgeneralizable model additionally addresses (2).
Linear Baseline. In the absence of appropriate existing baselines, we looked to known properties of neural signals for inspiration. Correlations exist across spatially close electrode neighbors; this well known observation motivated us to impute missing electrodes with linear combinations of observed, neighboring electrodes. In particular, we compute time series correlations from the nearest neighbors of an electrode on observed days (e.g. “training days”). Then, we use these correlations as weights to linearly combine the time series from missing electrode neighbors on the held out “test day.” These weighted, linearly combined time series compilations constitute our reconstructed neural signal and form our daygeneralizable baseline model.
CNNAE
. In our Convolutional Neural Network AutoEncoder (CNNAE) learning approach (Figure
2 Bottom Right), the encoder maps with zerofilled missing electrodes to an embedded representation . The decoder then maps to , which is either a reconstruction or an imputation (which we distinguish below) of the full set of electrodes . For , corresponds to a reconstruction of observed electrode data. For , corresponds to an imputation of unobserved electrode data. Imputation is a more challenging task than reconstruction, since in imputation, the decoding target is not an input to the model. The CNNAE is trained using the selfsupervised objective described in Section 3.1. Specific architecture design choices are discussed in Section 4.2 – the key criterion is to effectively encode and decode multivariate electrode time series data. The CNNAE is a daygeneralizable model, but it forms the backbone for the DOPEgeneralizable model class described below.MCNNAE. For jointparticipant imputation, the Multihead CNNAE model (MCNNAE) learns a single function for all participants, instead of one for each participant. When extending the CNNAE to jointly model different participants (Figure 3), we first used a participantspecific encoding layer to map input electrodes to a shared embedding space. This shared space is necessary because each participant has a different number of electrodes arranged in highly varied configurations. The encoder, , and decoder, , are then trained using the CNNAE’s learning procedure. To map the representation from the shared embedding space back to the participantspecific electrode configuration, we also train a participantspecific decoding layer. Components are trained jointly with data from all participants via a selfsupervised objective (Section 3.1). Unlike the CNNAE, the MCNNAE trains a shared representation across all participants, making it DOPEgeneralizable.
3.1 Training Objective
We train our CNNAE and MCNNAE models with selfsupervision using masked electrode modeling, where we randomly mask observed electrodes during training. Let represent the observed electrodes with random masking. Then, we train the autoencoder to simultaneously impute the masked values as well as reconstruct the observed values. This task is similar to masking used in previous time series imputation approaches liu2019naomi, except we mask entire time series from an electrode, instead of a subset of timestamps. The autoencoder is trained with the following objective:
4 Experiments
We study Deep Neural Imputation with a linear baseline model, CNNAE, and MCNNAE on realworld multielectrode recordings from 12 human participants performing naturalistic behavior across multiple recording days. In Section 4.1 we further describe the dataset, in Section 4.2 we expand upon our training and evaluation setups, and in Section 4.3 we compare our three methods. We then demonstrate DNI’s value to computational neuroscience by showing the frequency content learned by our method in Section 4.4, and recovering significant neural decoding performance with our imputations when missing values are present in Section 4.5.
4.1 Datasets for Deep Neural Imputation
To study Deep Neural Imputation we utilize recently released data from all 12 AJILE12 peterson2022ajile12 participants, each with 100 electrodes recorded across multiple days. Each participant has a wide range of naturalistic behaviors during recording, making reconstruction/imputation more difficult than if explored with canonical taskbased neural data peterson2021behavioral
. In the past, deep learning models have been used, often on subsets of the AJILE12 participants. However, these models have explored more classical neural decoding tasks, namely binary prediction and binary classification of arm movement
wang2018ajile; peterson2021generalized; peterson2021learning.Data Processing Pipelines. Using the same training split as peterson2021generalized where the last day is held out as the test day, we define two different data processing pipelines:

Procedure A: lower frequency content data used in Section 4.3
We use Procedure A to study our models’ ability to reconstruct time series data. We perform standardization on a 50,000 sample segment, corresponding to a 100 length time series, on a per electrode, per day, per participant basis. We divide the 100 into 20 and 80 sections, calculate the mean (
) and standard deviation (
) of the 20 segment, and for each time step in the 80 segment compute . As typically done in neuroscience studies without task labels, performing local standardization on each time segment allows us to account for neural data distribution changes such as spatial electrode shifts, neural drift, habituation to external stimuli, etc. We then downsample this data by a factor of 100, resulting in 400 time steps of data at a frequency of .Using Procedure B, we study the performance of our imputation model with two downstream tasks: frequency content preservation, and a scientificallyrelevant neural decoding task. We follow the data processing pipeline outlined in peterson2021generalized, consisting of bandpass filtering, downsampling, and trimming, resulting in 1000 time steps of data at a frequency of 250.
Our main goal with DNI is to reconstruct the neural time series; therefore, with Procedure A we aggressively downsampled to reduce the high temporal resolution of the original data. When using Procedure A, the frequency content of the reconstructions was recovered in the limited frequency bands available for analysis (i.e. lower frequency bands due to downsampling). The purpose of Procedure B was to (1) test our models in a regime that featured different time series lengths and frequency content in the training data, (2) verify the frequency content recovery property in higher frequency bands, (3) use our reconstructions in a downstream neural decoding task.
4.2 Training and Evaluation Setup
Our train and test split follows peterson2021generalized
, where each participant’s last day is held out as the test day. For the linear baseline model, we use 3 nearest neighbors to compute the correlation weights on the train days. We then apply these weights to the time series from the 3 nearest neighbors on the test day and combined them to reconstruct/impute the missing electrode’s neural signal. For the CNNAE and MCNNAE, we built the encoder with strided temporal convolutions and we based the decoder on
oord2016wavenet; together these comprise the shared backbone architecture (see Appendix for more modeling details). During the CNNAE and MCNNAE training, we use our selfsupervised training objective to perform masked electrode modeling. Masked electrode modeling (Figure 2) consists of creating a random electrode mask where  of the electrodes in each batch update are zerofilled. We then train the models to decode the full set of input electrodes; decoded electrode values are either reconstructed or imputed.For evaluation, we use Pearson’s Correlation between the original time series and reconstructed/imputed time series. We explore our model in 3 missing data regimes: missing, missing, and missing. Within each missing data regime, we average our correlation results over all time series instances, electrodes, and 3 distinct randomly generated sets of missing electrodes. A crucial step in evaluating our models’ decoding capabilities is comparing our reconstructions/imputations against the ground truth data that is unseen by the model. It is worth noting that the 12 participants’ original data does have naturally missing electrodes. In these cases we cannot compare our imputations because ground truth does not exist. This motivated us to create zerofilled masked electrode modeling because it allows us to evaluate not only our decoded reconstructions, but also our decoded imputations.
4.3 Benchmark Results
All time series correlation values report the comparison of our reconstruction/imputation with the original time series. It is worth reemphasizing that imputation is a harder task than reconstruction, because for reconstruction the model sees the input reconstruction targets, while for imputation, the model only sees zerofilled time series values. In Figure 4 we compare the time series correlations values between the CNNAE and our baseline, top row, as well as the MCNNAE and our baseline, bottom row. We note that prior approaches for imputing missing data in the neuroscience community use zerofilling, which results in a correlation value of zero.
We find that Deep Neural Imputation’s linear baseline performs moderately well, validating our intuition behind shared information between neighboring electrodes. As the percentage of missing electrodes increases, the baseline performance understandably falls because there are fewer neighboring electrodes available. At the same time, the performance of both the CNNAE and MCNNAE holds across our missing data regimes and improves over the baseline. For missing data the CNNAE outperforms the baseline for 9 participants, and in the most challenging evaluation regime ( missing data) the CNNAE outperforms the baseline for all 12 participants in both reconstruction and imputation. The MCNNAE model shares the same trend as the CNNAE when compared to the baseline, and in the most difficult evaluation regime ( missing data) also outperforms the baseline for all 12 participants in both reconstruction and imputation.
When studying the differences between the MCNNAE and CNNAE, we found that participant 1 had the greatest performance improvement when using the MCNNAE. Despite the fact that there are 12 trained CNNAE models (one for each participant) and 1 jointlytrained MCNNAE model, there are more cases where the MCNNAE to baseline performance is better than CNNAE to baseline performance. To decipher the MCNNAE and CNNAE differences we decided to explore the MCNNAE jointparticipant representation space and CNNAE’s participantspecific representation spaces. For a similar comparison, we . For a similar comparison, we stacked all 12 of the CNNAE’s participantspecific representation spaces and analyzed the concatenation. The MCNNAE’s jointparticipant embedding had more samples mapped to a shared space, when compared to the CNNAE’s concatenated embedding space which had far more participantspecific clusters.
4.4 Frequency Correlation Analysis
In Figure 5, we explore the relationship between frequency correlation and time series correlation across two proportions of missing data ( & ) because of frequency content’s significance to the neuroscience community jas2017learning; cole2019cycle
. We observe a positive relationship between these correlations, despite the fact that our models were not trained to perform reconstruction or imputation in the frequency domain. In particular, we note that the points for both reconstruction (
& ) and imputation () form a curve (Figure 5 Left), suggesting that time series correlation is predictive of frequency correlation.We additionally visualize the original time series and corresponding spectrogram as well as the CNNAE decoded time series and corresponding spectrogram for both a typical and performant example. A visual inspection shows that our CNNAE produces frequency reconstructions resembling the ground truth spectrograms even for low frequency correlation values. It is possible that alternative metrics for spectrogram evaluation, such as mutual information in frequency space young2020precise malladi2018mutual, could lead to stronger quantitative correlation metrics compared to the Pearson’s Correlation metric that we used for evaluation.
4.5 Downstream Task: Neural Imputation + Neural Decoding
Given neural decoding’s significance in the neuroengineering communitycarmena2003learning; anumanchipalli2019speech; willett2021high, we study DNI’s ability to directly improve neural decoding performance. We were further motivated by the fact that data used for neural decoding, particularly in humans, is precious and nearly impossible to recreate experimentally if corrupted brunton2019data. We use the movement neural decoder from peterson2021generalized
, which intakes neural time series data and predicts whether the neural data corresponds to either an arm movement event or rest. After recreating the random forest decoding performance for the original data, we randomly zerofilled either 50%, 70%, or 90% of the data for five random sets of missing electrodes and then computed the resulting performance. We purposefully chose these exaggerated proportions to test DNI’s limits on downstream applications with highly corrupted or missing data.
As expected, when large portions of the data are masked via zerofilling, decoding performance decreases significantly. Out of the 36 distinct experiments across 12 participants, our CNNAEfilled data either increased or maintained decoding accuracy in 34/36 experiments (94.44% of the time). Notably, in the 2 cases where our CNNAEfilled performance dropped below the zerofilled performance (both at 50%), in the other two experimental categories (70% and 90% missing), the CNNAEfilled performance did not drop below the zerofilled performance. These results suggest that there is value in adopting DNI in the neuroscience and neuroengineering communities.
5 Related Work
Multielectrode Neuroscience Experiments. Elucidating scientific questions in neuroengineering, neural mechanism discovery, and systems neuroscience has been critically enabled by advances in multielectrode array technology. Different electrode recording modalities (EEG, ECoG, Utah arrays, Neuropixels steinmetz2018challenges; paulk2022large) have specific tradeoffs, such as varying signal attenuation or difficulty of implantation, to name a few. However, they all share the commonality that when individual electrodes are corrupted or fail from faulty manufacturing, electrical noise, scar tissue formation, etc., the signal is lost and cannot be experimentally recovered. Previous approaches address neural drift through alignment lee2019hierarchical; pandarinath2018latent, but not neural imputation. In this paper, we recover highvalue incomplete brain recordings via our Deep Neural Imputation Framework.
Time Series Imputation. Our work relates to time series imputation, which aims to recover missing timestamps from time series data. In particular, multivariate time series modeling, where more than one feature is observed at each timestamp, has been studied using statistical methods beretta2016nearest; van2000multivariate; acuna2004treatment
as well as deep generative machine learning models
liu2019naomi; miao2021generative; cao2018brits; che2018recurrent. Our linear model baseline falls under statistical approaches using nearestneighbors, while our CNNAE and MCNNAE approaches are based on generative modeling. A few common model setups include directly regressing missing values cao2018brits; che2018recurrent, adversarial training (such as GANs) miao2021generative; fedus2018maskgan; luo2018multivariate; luo2019e2gan, as well as autoencoders liu2019naomi; luo2019e2gan. Our CNNAE architecture is based on dieleman2021variable, a model recently developed for time series representation learning, which we adapted for time series imputation via masked electrode modeling. Methods for generic time series imputation typically cannot impute when a feature is missing for all timestamps. These methods are orthogonal to our main contribution, which is a framework for multivariate electrode imputation in both daygeneralizable and DOPEgeneralizable regimes.6 Discussion
In multielectrode recordings, signals from corrupted electrodes are commonly discarded and treated as missing. To fill these gaps, we propose Deep Neural Imputation: a framework to recover missing values using neural data across days and individuals. DNI is compatible with linear and nonlinear models, such as deep generative autoencoders, and can be easily be incorporated into existing neuroscience analysis pipelines and downstream tasks.
Key Observations. We find that our nearest neighbor linear baseline reconstructs and imputes missing electrode data well when there is a small percentage of data missing (e.g. ). Further, both the CNNAE and MCNNAE models recover neural data in daygeneralizable and DOPEgeneralizable manners respectively when there are large percentages of data missing (e.g. ). These results may inspire neuroscientists to adjust neural electrode placement or enable novel experimental design. Moreover, there is a positive relationship between frequency correlation and timeseries correlation, indicating that without training explicitly for frequency content, we can reconstruct and impute it. Finally, DNI is practical in downstream neuroscience tasks, which we validate based on significant improvement on a neural decoding task in the presence of missing data.
Limitations and Future Directions.
There are many directions for future work stemming from the neural imputation problem. For jointparticipant experiments, our method currently requires a participantspecific layer to be trained for each spatial configuration. Further explorations into transfer learning and metalearning could eliminate this requirement. The jointparticipant model offers an intriguing possibility of studying the joint embedding of neural representations to, for instance, understand individual variability in neural correlates of similar behaviors
peterson2021behavioral. Additionally, our models cannot currently impute data gathered from unobserved spatial locations. Graph neural networks are one possible architecture that could develop this capability. Furthermore, these models may be improved by exploring multimodal fusion of neural, kinematic, and other measurement modalities
barnum2020benefits; peterson2021learning. Since no prior work has conducted a rigorous study of how to recover missing electrode recordings across days and participants (see an early attempt wang2018brains), establishing multiparticipant, multiday benchmark of DNI methods would be valuable.Significance and Broader impacts. Participants’ neural activity is deeply personal; therefore, we need to be thoughtful in our use of technology to analyze this information. The ethical considerations include issues of data management to guard the subjects’ privacy and security. At the same time, we need to consider this technology’s assistive potential in improving participants’ lives, because recovering missing data from incomplete brain recordings has the potential to transform the design and analysis of a wide range of neuroscience and neuroengineering studies. Further discussion and information on the data collection process can be found in the original dataset release paper peterson2022ajile12.
7 Acknowledgements
We thank Albert Hao Li for thoughtful discussions and feedback throughout the project, Steve Peterson & Zoe SteineHanson for sharing their AJILE12 dataset knowledge, and Ann Kennedy for helpful conversations. This work was supported by an NSF Graduate Fellowship (to ST), NSERC Award #PGSD35326472019 (to JJS), and the Moore Distinguished Scholar Program at Caltech (to BWB).
References
Appendix
Appendix A Additional Experimental Results
We present additional experimental results for experiments in Sections 4.3, 4.4, and 4.5 in Appendix Sections A.1, A.2, and A.3 respectively.
a.1 Benchmark Results with Standard Deviation
The mean and standard deviation from the correlation comparison in Figure 4 of the main paper is shown in Tables 1, 2. At missing electrodes, both the CNNAE and MCNNAE generally outperform the baseline. In addition, the MCNNAE performs better than the CNNAE for participant 1, and similarly to the CNNAE for the other participants. The gap between the machine learning models and the baseline increases as the amount of missing electrode increases  in particular, there is a significant improvement using the CNNAE and MCNNAE across all participants at of missing electrodes. Similar to Figure 4, the CNNAE and MCNNAE generally has higher correlation compared to the baseline at imputation as well as reconstruction.
a.2 Frequency Analysis for All Participants
Here we expand on the participant 3 results presented in the paper, Figure 5, by showing the results for all participants. For participants 1, 2, 4 we show not only the time series and frequency correlation plots but also some example time series and spectrograms (Figures 7, 8, 9 respectively). In addition, for the other participants 5, 6, 7, 8, 9, 10, 11, 12 we show the time series and frequency correlation plots, Figure 10. Results for all patients are shown across two proportions of missing data, 10% and 50%.
We observed that there was a positive relationship between time series correlation and frequency correlation, despite not having trained our model on frequency reconstruction. This trend between time series correlation and frequency correlation can been seen strongest in participants 1, 2, 3, 6, 7, 8, 10, 11, and 12.
a.3 Neural Decoding Results
Table 3 expands on the results presented in Figure 6. In Table 3, we include the performance of the random forest neural decoder across 50%, 70%, and 90% of electrodes missing. We report the mean performance across 5 random seeds for the zerofilled data, and CNNAEfilled data. In addition, we include the mean and standard deviation for the pairwise relative accuracy between the CNNAEfilled data and zerofilled data. Positive mean relative accuracy values indicate that the CNNAEfilled data outperforms the zerofilled data on the move/rest neural decoding task.
% Electrodes Missing  

[t]  0%  10%  20%  50%  
Pt 1[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 2[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 3[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 4[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 5[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 6[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
% Electrodes Missing  

[t]  0%  10%  20%  50%  
Pt 7[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 8[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 9[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 10[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 11[t]  Baseline  
  
CNNAE  
  
MCNNAE  
  
Pt 12[t]  Baseline  
  
CNNAE  
  
MCNNAE  
 
% Electrodes Missing  

[t]  50%  70%  90%  
Pt 1[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 2[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 3[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 4[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 5[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 6[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 7[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 8[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 9[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 10[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 11[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy  
Pt 12[t]  ZeroFilled  
CNNAEFilled  
Pairwise Relative Accuracy 
Appendix B Implementation Details
We provide additional details on the data processing and the model implementation. Model hyperparameters are in Table
4.b.1 Data Processing
We will start by providing more intuition behind the type of neural time series that is collected from the participants. Each electrode in the ECoG array that participants are implanted with generates a local field potential (LFP) voltage trace. Each LFP trace comes from the bulk voltage activity of a few tensofthousands to hundredsofthousands of neurons in the brain. These LFP voltage traces the comprise the AJILE12 dataset are stored in the common neuroscience data file format, NWB, which is specifically designed to handle ragged data, missing values, and nontaskdependent (i.e. naturalistic) neural data [ruebel2019nwb].
In the AJILE12 dataset there are 12 human participants; each participant has roughly 100 ECoG electrodes implanted in their brain based on clinically determined locations. Before preprocessing, the data is collected at 500Hz and in most cases continuously recorded for several days. Given the multitude of goals in our paper we created two data processing pipelines to explore our models: Procedure A, and Procedure B. In Procedure A, we standardize 50,000 sample segments of the AJILE12 dataset and downsample the data to 5Hz to study our ability to perform time series reconstruction. In Procedure B, we following the data processing pipeline in [peterson2021generalized], producing 250Hz data that allows us to study our frequency reconstruction ability, and our model’s capability at restoring move/rest neural decoding performance.
b.2 Model Details
Linear Baseline Model. In our linear baseline, we first break the patients’ recording days into "train" days and a "test" day. This data split follows the dataset split in [peterson2021generalized], where the last day is held out as the test day. We then use the full set of electrodes for each patient to create a distance matrix which allows us to calculate each electrode’s nearest neighbors by euclidean distance. The main idea is to linearly combine the time series correlation of the nearest neighbors to impute missing electrodes.
For every electrode on the test day, we compute the time series correlations between the target electrode and its three nearest neighbors averaged across observed training days. Then on the test day, the nearest neighbors are combined linearly using the train day correlations. Stepbystep, this corresponds to: (1) select a test day target electrode, (2) find one of its three nearest neighbors, (3) get the previously calculated time series correlation value that was averaged across all training days and all time series for that target electrode and the specific nearest neighbor, (4) take the time series from this nearest neighbor on the test day and weight the time series (multiply it) by the training time series correlation, (5) repeat 24 for all 3 nearest neighbors [if a neighbor is not observed on the test day, we use only the correlation from observed neighbors], (6) take the sum of the three weighted time series as the baseline reconstruction.
For evaluation, we (1) calculate the correlation between this baseline reconstruction and the time series of the actual target electrode recordings, (2) after calculating the correlations for all the time series off of an electrode we can average across all our time series to get the per electrode correlation.
CNNAE. The CNNAE model is trained for each participant. The model architecture is based on the encoderdecoder setup in [dieleman2021variable], with strided temporal convolutions for encoding layers, an upsampling network, and a WaveNet decoder [oord2016wavenet]
. Our model hyperparameters are not tuned to AJILE12 and doing a hyperparameter search on the validation set can likely further improve performance. The model is trained to convergence of the train loss at epoch 100.
We input the time series data for the full set of electrodes for each participant (with missing electrodes zerofilled) to the model. At the input layer, we input both the electrode time series data as well as the time derivative of the data (change in time series value across 1 timestamp). The encoder downsamples the time series by a factor of 8 with the current kernel and stride configurations 4
. We then upsample the encoding by a factor of 8 using the upsampling network before input to the decoder. The WaveNet decoder preserves the sequence length, and the output of WaveNet is fed into the final shallow temporal convolution layers to output the electrode reconstruction and imputation. There are four sets of shallow networks, outputting the mean and variance of the time series data and the time derivative. During evaluation, we use the output mean of the time series data.
Our CNNAE models are trained on either Amazon EC2 using p2 instances with a single Tesla K80 GPU, or a single NVIDIA GeForce RTX 2080Ti GPU. In addition to maximizing the loglikelihood of reconstruction and imputation during training, we also use the margin loss and slowness penalty proposed in [dieleman2021variable] to regularize the embeddings.
MCNNAE. The MCNNAE model is trained jointly for all participants, with a similar architecture to the CNNAE. Our MCNNAE model hyperparameters are not tuned to AJILE12 and doing a hyperparameter search on the validation set can likely further improve performance. The model is trained to epoch 45. The main difference for the MCNNAE compared to the CNNAE is participantspecific encoding and decoding layers, with shared layers across participants similar to the CNNAE in between.
Given input time series data, the data is mapped by a participantspecific encoding layer before feeding into the CNNAE encoder, which is a set of strided convolutions. Then similar to the CNNAE, the encodings are upsampled, then fed into the WaveNet decoder. The decoder outputs are used as inputs to the final participantspecific decoding layers, which are shallow networks that map decoder outputs to the mean and variance of the time series data and the time derivative. During evaluation, we use the output mean of the time series data.
Our MCNNAE models are trained on Amazon EC2 using p2 instances with a single Tesla K80 GPU. In addition to maximizing the loglikelihood of reconstruction and imputation during training, we also use the margin loss and slowness penalty proposed in [dieleman2021variable] to regularize the embeddings.
Model  Batch size  Learning Rate  zdim  Upsampling  Num units  Encoder  Decoder 

CNNAE  16  0.0001  64  8x  256  Kernel: [4,4,4]  Layers: 2 
Stride: [2,2,2]  Blocks: 2  
MCNNAE  2  0.0001  64  8x  256  Kernel: [4,4,4]  Layers: 2 
(per pt)  Stride: [2,2,2]  Blocks: 2 