Deep Metric Learning with Locality Sensitive Angular Loss for Self-Correcting Source Separation of Neural Spiking Signals

Neurophysiological time series, such as electromyographic signal and intracortical recordings, are typically composed of many individual spiking sources, the recovery of which can give fundamental insights into the biological system of interest or provide neural information for man-machine interfaces. For this reason, source separation algorithms have become an increasingly important tool in neuroscience and neuroengineering. However, in noisy or highly multivariate recordings these decomposition techniques often make a large number of errors, which degrades human-machine interfacing applications and often requires costly post-hoc manual cleaning of the output label set of spike timestamps. To address both the need for automated post-hoc cleaning and robust separation filters we propose a methodology based on deep metric learning, using a novel loss function which maintains intra-class variance, creating a rich embedding space suitable for both label cleaning and the discovery of new activations. We then validate this method with an artificially corrupted label set based on source-separated high-density surface electromyography recordings, recovering the original timestamps even in extreme degrees of feature and class-dependent label noise. This approach enables a neural network to learn to accurately decode neurophysiological time series using any imperfect method of labelling the signal.



There are no comments yet.


page 12


Time Series Source Separation with Slow Flows

In this paper, we show that slow feature analysis (SFA), a common time s...

DBNET: DOA-driven beamforming network for end-to-end farfield sound source separation

Many deep learning techniques are available to perform source separation...

Two-Step Sound Source Separation: Training on Learned Latent Targets

In this paper, we propose a two-step training procedure for source separ...

Unsupervised Training for Deep Speech Source Separation with Kullback-Leibler Divergence Based Probabilistic Loss Function

In this paper, we propose a multi-channel speech source separation with ...

Weak Label Supervision For Monaural Source Separation Using Non-negative Denoising Variational Autoencoders

Deep learning models are very effective in source separation when there ...

SA-SDR: A novel loss function for separation of meeting style data

Many state-of-the-art neural network-based source separation systems use...

Signal detection in extracellular neural ensemble recordings using higher criticism

Information processing in the brain is conducted by a concerted action o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Time series neurophysiological data is commonly characterised by repetitive activation events, for example the motor unit activation potentials (MUAP) in electromyographic (EMG) signals or the spike potentials in microelectrode cortical recordings[1][2]. The ensemble of these activation events constitute neural codes that give a direct insight into the target system[3], whilst also providing an accurate control signal for human-machine interfacing applications, such as prosthetic control[4][5]. Neurophysiological signals are typically linear superpositions of many of these spiking sources, and their extraction from noisy systems has long been a major focus of applied source separation in neuroscience[6].

Identifying sources from multiunit activity was originally achieved through manual sorting by trained operators[7], but this tedious process was quickly supplanted by automated methods using early forms of blind source separation (BSS)[8]. BSS algorithms have since become extremely effective, able to automate the recovery of sources in highly noisy and complex systems[8][4][9]. An important trend within the field of applied source separation has been the increasing availability of highly multivariate data[10][11], as a result developments in high-density electrode arrays[12][13]. By exploiting the increased spatial information collected by these systems, BSS pipelines can yield extremely large numbers of sources[11][14]

. A more recent development is the adoption of deep learning approaches as a replacement for linear separation vectors, using neural networks to decode signals with a high degree of robustness to noise and signal non-stationarities

[15][16]. These methods involve an offline supervised training phase using the augmented output of a BSS algorithm, therefore an important requirement is that the BSS decomposition contains relatively few errors if the network is to decode with high accuracy[15].

As the number of sources identified in a manual or automatic decomposition increases, so does the probability of labelling errors. Noise can be mistakenly labelled as an activation, an activation can be assigned to the wrong class or missed entirely, a class might be inappropriately partitioned or two distinct classes merged into one. These sources of label noise can be categorised based on whether the factors affecting the likelihood of a label from one class to flip to that of a different class are shared at the dataset, class or feature level

[17]. As label-flipping is class or even feature-dependent, it is difficult to identify such errors automatically, and for automatic decompositions a degree of manual post-hoc cleaning is commonly employed, often using additional knowledge about the system of interest, such as temporal statistics in the source activations[18]. The nature of this manual cleaning generally relates to the mixing system of interest, for example intracortical and intramuscular EMG (iEMG) decompositions generally require post-hoc examination of classes due to extensive class-dependent label noise[8][19]. On the other hand, surface EMG (sEMG) decompositions also contain a degree of feature-dependent noise and so require further inspection of specific activations[20]. Whilst accurate, manual post-hoc ”cleaning” is an extremely time-consuming process and in some cases not feasible because of the size of the datasets being source-separated[21]. For this reason, modern source separation pipelines are increasingly using additional automated post-processing steps in an attempt to reduce the false label burden[18][22][10]. However, these methods only compensate for a relatively small proportion of incorrect labels, so that there remains a need for new methods of post-hoc label cleaning. Additionally, if supervised deep learning frameworks are to be trained using BSS-labelled signals with increasing degrees of label noise, then they need to be able to detect and manage such errors, i.e. they need to be designed to be implicitly self-correcting.

Managing label noise is a fast-developing subfield of modern machine learning. As the size of datasets expand faster than the capacity of domain experts to screen and label, data scientists are increasingly turning to new labelling methods that are more scalable at the cost of a greater proportion of label noise, such as Amazon’s Mechanical Turk

[23]. This is particularly true in a neuroscience setting, where datasets are generally labelled by a small pool of domain experts who can differ in professional opinion[24]. Approaches to learning in the presence of label noise can be broadly split into two categories; methods that aim to select models that are robust to label noise and methods that attempt to clean the label set prior to training[25]. Contemporary methods based on the latter approach generally rely on additional models which attempt to identify noisy labels using either a smaller pool of known correct labels or by comparing a label with other in-class labels using a similarity metric[26][27][28]. This principle of using similarity metrics to build embedding spaces that inform intra- and inter-class classification is closely related to deep metric learning (DML) approaches.

The objective of deep metric learning is the training of a deep neural network which maps discrete inputs to an embedding space in which positive pairs (two inputs from the same class) are closer than negative pairs (two inputs from different classes)[29]. The Euclidean distance is commonly used as the distance metric, although measures based on the angular difference between embeddings have become popular due to their inherent rotation and scale invariance[30]. The network is optimised either using the absolute similarity between pairs or, more commonly, the relative difference between one or more positive and negative pairings[31]. During optimisation, most negative pairs will quite quickly become much further away than positive pairs, thus random generation of pairs for training is extremely inefficient. For this reason many implementations of DML seek to select the most informative pairings from each minibatch by applying some form of selection rule operating on the output embedding space[32][33]. Most of these approaches are designed purely to maximise the class separability of the embedding space, leading to dense clusters in each class[34]. More recent work has attempted to increase intra-class variability, as such tight embeddings potentially reduce the ability of the model to generalise to new in-class inputs[35][34][36]. These locality-sensitive approaches have a less distorting effect on the embedding space, giving better generalisation performance[36]. Whilst not the primary goal of these studies, another effect of preserving intra-class variance is a richer embedding of inputs, with semantically similar events sub-clustering[35]

. Such rich embedding spaces could potentially be of use for detecting class outliers, perhaps even having utility for label-cleaning operations in event-driven neurophysiological recordings. DML pipelines have been designed to operate on noisy label sets for similar tasks such as person reidentification, however these methods tend to use an external method to modify training rather than the embedding space itself, for example using label-correction based on cross-entropy


Motivated by the need to further preserve a rich intra-class embedding, and inspired by ranking approaches to triplet sampling, such as[36], here we propose a novel locality-sensitive approach to sampling during DML optimisation. We use an efficient top-k query to identify the closest in-batch positive to each event and the N-closest in-batch negatives, which are then used to calculate an N-pair formulation of the popular angular loss[30]. The idea of using top-k queries within batch losses has been explored in the context of binary classification problems[38]

, however this is, to our knowledge, the first such implementation within the domain of DML. In this paper we demonstrate that this simple modification, which we call locality-sensitive angular loss (LSAL), generates an embedding space which can be used to detect and classify repetitive events, whilst importantly having the additional utility of being able to detect label-noise in the data used for training.

The main contribution of this paper is DeepLSAL, a novel DML pipeline that leverages LSAL to perform both label-cleaning and the identification of new activations in unseen data, operating directly on neurophysiological time series signals. The specific focus is on label sets generated by BSS algorithms for decoding convolutive mixtures, as this is an area where supervised deep learning methods are clearly beginning to outperform existing methods[15][16]; however, in principle the proposed methodology is applicable to any manual or automatic method of generating imperfect label sets. The effectiveness of DeepLSAL is validated robustly using an experimentally-collected high-density (HD) sEMG dataset that had been source-separated into constituent motor unit activity using the gradient convolutional kernel compensation (gCKC) algorithm[39]. The scenario of decomposition of HD-sEMG signals is highly convenient for validation of the proposed approach since both the generating system and decomposition methodology are well-studied. Moreover, the sEMG system is characterised by a high degree of feature-dependent label noise due to the complex superposition of MUAPs caused by volume-conduction effects[40].

Ii Theory and Algorithm

Ii-a Deep Metric Learning with N-pair Loss

The objective of ranking loss DML, also called triplet loss, is to train a deep learning function such as a convolutional neural network to map a sample taken from one of

classes to an embedding vector , such that for an arbitrarily selected anchor embedding , the embedding space reduces the relative distance between positive samples from the same class and negative samples from different classes .

can be a number of different metrics, such as the Euclidean distance, cosine similarity or Kullback–Leibler divergence

[41]. Commonly the loss function is formulated such that the relative difference be greater than a margin such that:

Fig. 1: a shows the principle behind N-pair loss, in this case with an N of 5. As multiple negative embeddings are utilised, the chance of sampling only uninformative pairings reduces. b illustrates the principle behind vertex selection in angular loss. Rather than directly using the triplet as vertices, only the negative embedding is used, whilst the other two vertices are constructed so as to build a right-angled triangle with the 90angle at the midpoint between the anchor and positive embedding. c demonstrates the effect of random versus locality-sensitive sampling on the intra-class variance.

After a small amount of optimisation, the bulk of negative pairs will be much further away than the positive pairs, meaning most training examples in a batch will become uninformative[32]. N-pair loss seeks to avoid this problem by comparing each positive pair in an -size batch to multiple negative pairs (fig. 1a), which are then combined in a formulation[33]:


where is usually a hinge function such as . By taking an average across the negative pairs, it is likely that at least some informative negative pairings will be included in the loss calculation.

Ii-B Angular Loss

Angular loss is based on a geometric reformulation of the distance metric; rather than minimising the distance of to relative to , it instead minimises the angle at of a triangle formed from the three embeddings. This has the effect of improving optimisation stability as angles are scale invariant, whilst using a triangle means all edges of the triplet are taken into account[30]. However, in certain circumstances the minimisation of the angle at will push towards . This can be avoided by constructing a right-angled triangle with and the midpoint between and (fig. 1b), with the final vertex being the point on the semicircle joining and which creates a right-angled triangle[30]. By dropping constant terms, this geometric relationship can be used for the in equation 2, expressed as:


where is an angle in radians which sets the upper accepted bounds of the loss, analogous to in equation 1.

Ii-C Inducing Rich Embeddings

Whilst a random selection of positive pairings in is suitable if the objective is to maximise inter-class distance within the embedding space, it also has the tendency to collapse intra-class distances down to a point, as illustrated in figure 1c[35]. We theorised that the main explanation for this compression is that the sampling process is random, meaning that the neural network is induced during training to bring all embeddings from the same class together, which, as a complex non-linear function, it is quite capable of doing given sufficient training steps. We instead elected to sample positive and negative pairs based on the local neighbourhood of each embedding in the batch, modifying the selection of and the set of ’s for each in the batch , a process we term locality-sensitive sampling.

is first selected such that it be large enough to have a diverse representation of each class. As each embedding vector is L2-normalised, the tensor formed by finding the inner product of the batch tensor with its transpose is the pairwise cosine similarity. For each

the selected with a simple argmax, i.e. the closest different vector from the same class is selected. A similar procedure is used to select the set of ’s, using a top-k algorithm to select the most similar vectors to that belong to a different class. GPU implementations of top-k algorithms have become extremely efficient in recent years, due to their increasing use within machine learning paradigms[42]. With these easily-implemented changes the N-pair formulation of the angular loss becomes:


where is the argmax set of the pairwise cosine similarity between and its associated set in and is the top-k values of the ordered pairwise cosine similarity between and its associated set in . is the indicator function.

To stabilise early optimisation we combined the LSAL loss with a categorical cross-entropy with temperature given by:


where and

is a trainable matrix that compresses the embedding vector down to a dimension C vector for comparison with the one-hot encoded class labels. This gives the final loss function:


Ii-D Source Separation

The timestamps used by DeepLSAL can be generated using a wide variety of manual and automated processes, however for this study the gradient convolution kernel compensation (gCKC) algorithm was selected due to its strong performance in HD-sEMG signal decomposition[43][39]. In the gCKC framework for blind source separation, the vector of spiking sources at time are first extended with delayed versions of themselves, allowing the mixing problem, which is convolutive in most neurophysiological settings, to be written in instantaneous form:


where the signal observation vector at time is a linear mixture parameterised by the operation of the mixing matrix on the extended source vector plus noise . In practice both the observation and source vectors are additionally extended with a further values for reason of numerical stability during the source separation procedure.

Unlike independent component analysis methods which seek to directly estimate a separation vector for each source, gCKC seeks to include the additional statistical information that the spiking sources generate repetitive events within the signal. Sources are instead estimated indirectly using a linear minimum mean square error estimator, with the estimated

th source at time point given by:


where is the transposed cross-correlation vector between an activation of the th source and extended HD-sEMG matrix and is the inverted autocorrelation matrix of the extended HD-sEMG matrix .

The vector is usually initialised with a time point likely to contain a source activation, which can be estimated by, for example, the Mahalanobis distance calculated on the signal[39]. Once selected, is then optimised to find the rest of the source’s signal contributions. This can be done with either a fixed-point algorithm as in [10] or in the gCKC formulation by gradient descent:


where is the updated cross-correlation vector, is the learning rate and

a contrast function designed to estimate the non-gaussianity of the output source in a similar fashion to independent component analysis. Optimised sources can then be converted to timestamps using a linear threshold or a two-class k means clustering algorithm.

Fig. 2: a shows the model used in DeepLSAL, which takes the form of an easily-implemented convolutional neural network trained using a locality-sensitive angular loss to embed windows of neurophysiological time series into a low-dimensional space which can be used to both source separate and label clean. b shows the full DeepLSAL pipeline by which the noisy activation labels found from source-separating the high-density surface electromyography signal are cleaned. DeepLSAL is run twice - a cleaning phase to find the false positive labels and and a refitting phase to find the false negatives. After the refitting phase the predicted class activity is much cleaner than that of the original source separation algorithm, as seen in c.

Iii Validation Methodology

Iii-a Experimental Dataset

The HD-sEMG data set consisted of a set of 20-second recordings taken from the dominant tibialis anterior muscle of 10 men performing an isometric contraction at 15% of maximal force, previously used to validate source separation techniques[44]. Maximal contraction was defined as the mean force of three 5-s maximal contractions separated by 3 min of rest, with force sampled at 2048Hz by load cells mounted on an isometric brace. Force feedback was provided to the participants by an oscilloscope. The signal from a monopolar electrode array placed over the main muscle innervation zone was sampled at 2048Hz having been band-pass filtered at 10-500 Hz.

Gradient convolution kernel compensation with an additional k-means source refinement step was implemented using the tensorflow machine learning package

[39][10]. As the label set was to be artificially corrupted it was important that the original be as noise-free as possible, so additional post hoc steps were taken to maximise the likelihood that the timestamps were correct. Sources were manually cleaned by examining interspike intervals and the source-to-noise ratio of each activation. An additional step of validating decomposition accuracy was implemented by comparing the sources to those found using the DEMUSE source-separation software package[43][39], with source cleaning completed by a different trained operator.

Iii-B Label Set Corruption

In experiment 1 we evaluated the ability of DeepLSAL to clean a label set corrupted by feature-dependent noise, where a label flipping probability is related to its associated features[17]. In the context of source-separated HD-sEMG, this most commonly occurs as a false-positive, where a separation vector incorrectly assigns a high probability of an in-class MUAP being present when it is not, i.e. a noise class or other MU class label is flipped to the MU class of interest. To simulate this effect, we corrupted the label set by generating an artificially-noisy separation vector for each MU class; randomly-selecting 15 MUAP labels from that class and using the average of the associated extended HD-sEMG vectors to generate a linear minimum mean square error prediction on the extended HD-sEMG matrix. A two-class k-means clustering algorithm was then used to parameterise a linear threshold to find activations, creating a label set with a high degree of feature-dependent noise. Five levels of increasing difficulty were generated by taking an amount of false positives corresponding to 10/20/30/40/50% of the number of true labels, selected at random from the set of false positives.

In experiment 2 the DeepLSAL algorithm was evaluated on class-dependent label noise, when the probability of a label flipping to another class is stable across all labels in the class[17]. In HD-sEMG source separation this error generally occurs when the separation vectors are very similar, usually due to similar MUAP waveform shapes between two MU classes. This can be simulated by transferring a percentage of labels to a similar MU class. This was done by first averaging the MUAPs of each MU class and then cross-correlating these averages with the average MUAP of every other class in the recording, with 10/20/30/40% of the class labels transferred to the class with the highest value. If labels had already been transferred to the closest class then the next closest class was selected until all classes had had label transfers. A maximum label corruption of 40% was used to preserve the concept of a majority true and minority false class.

Iii-C DeepLSAL Pipeline and Training

To convert the source-separated HD-sEMG signal into labelled windows, first each channel of the HD-sEMG signal was standardised by z-scoring and then cut into overlapping 80-sample wide windows at a stride of 1. Each window was then labelled by reference to the predicted source activity at the final sample of the window. This meant the bulk of windows were labelled as part of the inactive class due to the sparse nature of motor neuron spiking. Due to this serious class imbalance, each minibatch was created from the entirety of the windows labelled as containing a motor neuron spike, with an additional 256 samples from the inactive class. Each class assignment was then converted to a one-hot representation, the bulk of which had only one class active at any one time, although rarely two activations would occur simultaneously on the same time-point. As the richness of the intra-class embedding of the inactive class windows was not of any great concern, the embeddings of these windows were not used as anchor vectors when calculating the

component of the loss, although they were used as both values and in the calculation of .

Embeddings were calculated with a convolutional neural network implemented using the tensorflow machine learning library in python, as seen in figure 2

a. Convolution steps used a 1D 3-sample wide kernel, with 32 filters and a drop-out of 0.2. 1D max-pooling was completed with 2-sample wide kernels. Each densely-connected layers had 64 neurons and a drop-out percentage of 0.5 during training. Both the convolution and densely-connected layers used ReLU activation functions. Finally the output of the last densely-connected layer was densely-connected to a bias and activation-free embedding layer of 8 neurons-wide, which was then divided by its L2 norm. This was an intentionally low-dimension embedding compared to standard DML due to the desire to avoid dimensionality issues during the clustering steps in the refitting phase. The additional matrix

used in the categorical cross-entropy was initialised with truncated normal noise, whilst the weights of the neural network layers was initialised by glorot uniform. The Adam optimisation algorithm at a learning rate of 0.001 was then used to train the model over 500 epochs for both cleaning and refitting stages.

For both experiments was set to 5, was set to 0.1, and to 0.25 radians in both the cleaning and refitting stages. After the cleaning stage the labels in the embedding space likely to relate to specific classes were selected by a simple density-estimator. First a local scale value was estimated by finding the mean cosine similarity of the each embedding vector to its 20 nearest neighbours and taking a median of this value across all vectors. For each label the number of other labels with a cosine similarity more than was found and the label with the highest number of neighbours was selected as the centre of the cluster. All labels within a cosine similarity higher than were then added to the refitting training set. This simple approach was generally adequate for quickly finding the densest region of the embedding space, which was usually the cluster of true labels.

After the cleaning phase the set of timestamp labels is generally free of false positives; if DeepLSAL is retrained with this new label set then it should effectively generalise to find unlabelled activations in both the current and future data (fig. 2b). It should be noted that finding unlabelled activations in the current dataset is in some ways a ”harder” problem than trying to generalise to completely unseen data, as these activations are included in training, but mislabelled. However these mislabelled activations are a very small proportion of the total dataset, meaning they were predicted not to impact convergence on a model with good generalisation ability, particularly a heavily regularised model such as that used for DeepLSAL. Once DeepLSAL was retrained an average embedding vector of the current MUAP timestamps was found for each class, which was then cross-correlated with the entire embedded HD-sEMG signal to generate a predicted activity (fig. 2c). This activity was then timestamped by a linear threshold parameterised by a two-class k-means clustering algorithm. These labels were compared to the pre-corrupted data using the rate of agreement (RoA) metric, a percentage defined as the number of true positive matches divided by the total number of true positives, false positives and false negatives.

Iv Results

Iv-a Feature-dependent Label Noise

In experiment 1, which tested the effect of feature-dependent label noise by simulating noisy separation vectors, DeepLSAL generated an embedding space with dense clusters for each class corresponding to the true labels. Surrounding each cluster is a large sparse periphery of false labels, with no apparent structure. The efficacy of locality-sensitive sampling at preserving a rich class embedding is clear when compared to random sampling using a two-dimensional principal component space, as in figure 3. The simple density-estimator operating on the cosine similarity between embeddings could then be used to select a subset of true labels by selecting the area with the highest density (fig. 4). This algorithm was weighted to favour specificity over sensitivity, meaning almost no false labels were included in the cleaned label-set at the cost of losing a percentage of the true labels.

Fig. 3: The effect of two different sampling strategies on the embedding space for two units as shown by the first and second principle components. a shows the effect of locality-sensitive sampling, with a rich intra-class embedding that clearly separates the true and false embeddings. In b the same optimisation was run again, but with the positive and negative pairs randomly selected, leading to all intra-class embeddings contracting down to a point.

DeepLSAL generated an embedding space with utility for removing false labels even at the maximum tested value of 50% of total correct values (fig. 5a), with a median post-cleaning false label retention of 1.3% of the total correct labels in the class (IQR 0.5 - 1.9). The number of true labels lost during the cleaning process fell as the pre-cleaning percentage of false labels increased, but even at the highest false label percentage tested, a median of 74.1% (IQR 68.9 - 80.3) of the true values were still retained (table I).

Fig. 4: The first and second principal components of the embedding space after the cleaning phase for all classes found in a single HD-sEMG sample, with labels initially extracted from a noisy separation filter. The samples selected by density analysis are circled, these will form the training labels for the refitting phase. The model is effective at inducing clustering of similar labels, with the area of highest density corresponding to the true labels.
Fig. 5: Median label accuracy and rate of agreement plots for both experiments across all recordings, with interquartile range. a shows the outcome of the DeepLSAL cleaning phase for experiment 1 simulating a noisy separation filter. This leads to high number of false positives in the labels, plotted here as a fraction of the total correct labels in each class. These are almost completely removed, at the cost of losing a fraction of the true labels. A similar cleaning result was found in experiment 2 (b), where a proportion of labels from each class was transferred to the nearest correlated class. Refitting recovers the bulk of these lost labels, giving good final rates of agreement with the ground truth labels. c shows the RoA from experiment 1 before and after cleaning and refitting, whilst d shows the RoA change in experiment 2. The RoA is returned to values close to pre-corruption levels.
False Labels Added as Proportion of Class (%) Starting RoA (%) Remaining True Labels After Cleaning (%) RoA After Cleaning and Refitting (%)
10 91.5 (91.5 – 91.7) 84.1 (79.6 - 87.1) 94.5 (91.2 – 97.2)
20 83.9 (83.8 – 83.9) 78.3 (74.4 - 83.9) 94.5 (89.8 – 97.3)
30 77.4 (77.3 – 77.5) 80.3 (75.3 - 85.1) 93.5 (90.2 – 97.7)
40 71.8 (71.7 – 71.9) 76.6 (71.7 - 81.8) 93.8 (87.7 – 96.3)
50 67.0 (66.9 – 67.0) 74.1 (68.9 - 80.3) 92.0 (86.7 – 96.4)
TABLE I: Combined median (interquartile range) scores for all classes across all recordings from experiment 1 for different stages of the cleaning and refitting pipeline.

Iv-B Class-dependent Label Noise

In experiment 2, when labels were randomly flipped to the MU class with the closes average MUAP shape, DeepLSAL again generated embedding spaces with clear separation between true and false separation (fig. 6). However, unlike in the first experiment, the false labels formed a second distinct cluster within the embedding space, again clearly visualised in the first two principle component dimensions. As the true label cluster always had more values, it was still clearly identified by the density-estimator.

As in experiment 1, the DeepLSAL cleaning phase was effective at removing almost all false labels (fig. 5b). Even at a 40% transfer the median post-cleaning fraction of 0.0% (0 - 0.6) of the total correct labels. As true labels were lost both to the initial transfer to other classes and to the cleaning phase, far fewer were retained in the post-cleaning dataset than in experiment 1 and would need to be recovered in the refitting stage (table II).

Fig. 6: Principal component plot of the embedding space after the cleaning phase for all classes found in a single HD-sEMG sample in experiment 2. 40% of the labels in each class have been removed and then added to the class with the closest correlation. The samples selected automatically for the refitting phase have been circled. Unlike in experiment 1, the false labels are correlated as they come from the same class. This results in two tight clusters for both true and false labels, however they still clearly separable.

Iv-C Rediscovering Unlabelled Activations

An important requirement if the DeepLSAL-cleaned label set is to be useful is that the cleaning process does not overly bias against true labels that are lost at this stage, making them difficult to recover using the retained true labels. Lost labels tend to be more peripheral in the cluster, meaning their MUAP shapes are likely to be less similar to the MU class average, potentially due to superposition with a MUAP from a different class or due to a noise artefact. False negatives are also still used for training, but are labelled inappropriately, with a possibly detrimental effect of the model to generalise. To demonstrate that neither of these potential problems actually impacted training, after the cleaning stage of both experiment 1 and 2 DeepLSAL was refitted with the cleaned label sets. For both experiments the predicted activity was generally both sparse and clean, with MUAPs easily identifiable (fig. 7). These labels were compared to the original data using the rate of agreement (RoA) metric, a percentage defined as the number of true positive matches divided by the total number of true positives, false positives and false negatives.

The RoA of the predicted MUAP labels with the original data was generally good for both experiments at every level of difficulty (tables I and II). In experiment 1 there was little change in RoA as difficulty increased (fig. 5c), suggesting that DeepLSAL is able to generalise to unseen activations. This finding was also replicated in experiment 2 (fig. 5d), and even when the median post-cleaning training set was just over half of the total class activations a median RoA of 94% (86.9 - 97.7) was achieved after refitting.

Fig. 7: A single channel of unprocessed HD-sEMG and the post-decomposition predicted activity of a single class before and after cleaning and refitting, with true and false labels. a demonstrates the degree of complex superposition inherent to sEMG signal as opposed to ”cleaner” recordings such as those from intracortical sources. A linear separation filter based on an average of only 15 labels is applied to the signal to generate b, which is consequently extremely noisy, simulating a poorly optimised filter. A number of false positives corresponding to 50% of the number of true class labels has been selected. After the cleaning and refitting phases the spiking motor neuron activity in c is clearly identifiable, whilst incorrect labels have been suppressed.
Proportion of Class Labels Transferred (%) Starting RoA (%) Remaining True Labels After Cleaning (%) RoA After Cleaning and Refitting (%)
10 83.2 (82.1 - 83.9) 84.1 (84.1 - 88.1) 98.2 (95.4 – 98.8)
20 67.0 (66.1 – 69.0) 75.4 (70.0 – 77.5) 96.8 (93.3 – 99.0)
30 55.0 (53.1 – 56) 60.9 (57.3 – 64.9) 95.7 (88.9 – 98.8)
40 43.7 (42.1 - 44.9) 53.1 (49.1 – 56.7) 94 (86.9 – 97.7)
TABLE II: Combined median (interquartile range) scores for all classes across all recordings from experiment 2 for different stages of the cleaning and refitting pipeline.

V Discussion

Source separation is often applied to decode the neural information embedded in neurophysiological data. This approach provides a window into the neural determinants of behaviour as well as a way to identify neural features for human machine interfacing. However, conventional source separation approaches as well as manual decompositions by expert operators often provides noisy outputs due to decoding errors. In this study we demonstrated that a supervised DML paradigm can find self-correcting data embeddings that allow accurate cleaning of labels corrupted by common noise issues, whilst also predicting new labels not included within the training set. Importantly the model was not biased against labels not selected during the cleaning phase when refitting, which is particularly important in neurophysiological signals, such as HD-sEMG, where the shape of a subset of activations will be distorted by temporal superpositions. This ability of the model to operate on un-preprocessed signals and generalise to unseen activations is particularly relevant to the field of sEMG decomposition for prosthetic control, where a current focus is on the use of neural networks to directly identify MUAPs in raw signal and so avoid the latency involved with the current complex preprocessing pipeline that prevents online implementations[15][16].

Deep metric learning has a number of attractive properties over standard softmax-based binary classifiers for neurophysiological time series classification, needing fewer training examples in general and being able to adapt to new classes easily[45]. DML methods can adapt quickly to the changes in class activity commonly seen in neural systems over time, such as with dropped units in intracortical recordings or MU recruitment and derecruitment in sEMG and iEMG recordings[46][47]. However, the focus of most implementations of DML is a high degree of inter-class separation[48], rather than the need for a more descriptive intra-class embedding needed if this space is to be used for the identification of false labels. An important result of this study was that a loss function designed to operate only on samples local to each other in the embedding space can give rise to a richer intra-class distribution, avoiding the collapse to a dense point commonly seen in a more traditional triplet learning paradigm. Furthermore, we demonstrate that the utility of the embedding space generated by DeepLSAL for label cleaning is unaffected by feature- or class-dependence in the label noise.

Whilst this study focused on source-separated HD-sEMG signal it is important to emphasise the broader applicability of this approach to any imperfectly-labelled neurophysiological time series data in which the underlying sources are repeating events, i.e. to any neural recording. Whilst the study focused specifically on action potentials, the proposed methodology could also be used for pattern recognition in bulk neurophysiological signal, supplementing recently-proposed systems of prosthetic control

[49]. Additionally, the labelling process need not be by a BSS algorithm. For example, a DeepLSAL could be applied to a dataset for which only a small component of the data has been manually labelled by an expert-operator to recover the rest of the labels accurately. In this way, the proposed approach can be viewed as a minimally supervised method for neurophysiological time series decomposition into individual cell activities.

A potential limitation of the study is the method of converting the embedding layer to a cleaned label set. One strength of the DeepLSAL algorithm was that it was usually quite simple to identify the main cluster of clean labels in the embedding space during the cleaning phase, meaning a simple density-estimator was sufficient to set a decision threshold. However, this approach tended to cut out a larger proportion of true labels than was potentially necessary, even if the refitting stage was able to rediscover those lost labels. An improved sorting methodology is even more relevant to the specific case when two classes were mixed with equal proportions, which meant that there was no way of identifying of which of the label clusters was ”correct”. This implies an interesting future direction for a DML-based approach is to investigate the use of adaptive k-means clustering to find distinct clusters within the error labels and hence potentially rehabilitate their averages with their origin class. Whilst it was not a focus of this study, a an additional interesting future direction might be found in improving the richness of non-activation embeddings, which may allow the identification of new unlabelled classes through clustering events.

In summary, we have presented DeepLSAL, a deep metric learning pipeline for embedding source-separated multivariate neurophysiological time series into a dimensionally-reduced space suitable for both classification and label cleaning. Whilst the focus of the demonstration of this approach in this paper was performed on electromyographic signal decomposition, the method is broadly applicable to other neuromuscular recordings, such as intracortical or intraneural signals.


  • [1] R. Merletti and D. Farina, Surface electromyography: physiology, engineering, and applications. John Wiley & Sons, 2016.
  • [2] E. Stark and M. Abeles, “Predicting movement from multiunit activity,” Journal of Neuroscience, vol. 27, pp. 8387–8394, Aug. 2007.
  • [3] E. Drebitz, B. Schledde, A. K. Kreiter, and D. Wegener, “Optimizing the yield of multi-unit activity by including the entire spiking activity,” Frontiers in Neuroscience, vol. 13, Feb. 2019.
  • [4] D. Farina and A. Holobar, “Characterization of human motor units from surface EMG decomposition,” Proceedings of the IEEE, vol. 104, pp. 353–373, Feb. 2016.
  • [5] T. Kapelner, I. Vujaklija, N. Jiang, F. Negro, O. C. Aszmann, J. Principe, and D. Farina, “Predicting wrist kinematics from motor unit discharge timings for the control of active prostheses,” Journal of NeuroEngineering and Rehabilitation, vol. 16, Apr. 2019.
  • [6] W. H. Calvin, “Some simple spike separation techniques for simultaneously recorded neurons,” Electroencephalography and Clinical Neurophysiology, vol. 34, pp. 94–96, Jan. 1973.
  • [7] W. Simon, “The real-time sorting of neuro-electric action potentials in multiple unit studies,” Electroencephalography and Clinical Neurophysiology, vol. 18, pp. 192–195, Feb. 1965.
  • [8] H. G. Rey, C. Pedreira, and R. Q. Quiroga, “Past, present and future of spike sorting techniques,” Brain Research Bulletin, vol. 119, pp. 106–117, Oct. 2015.
  • [9] J. Kevric and A. Subasi, “Comparison of signal decomposition methods in classification of EEG signals for motor-imagery BCI system,” Biomedical Signal Processing and Control, vol. 31, pp. 398–406, Jan. 2017.
  • [10] F. Negro, S. Muceli, A. M. Castronovo, A. Holobar, and D. Farina, “Multi-channel intramuscular and surface EMG decomposition by convolutive blind source separation,” Journal of Neural Engineering, vol. 13, p. 026027, Feb. 2016.
  • [11] M. Pachitariu, N. Steinmetz, S. Kadir, M. Carandini, and H. K. D., “Kilosort: realtime spike-sorting for extracellular electrophysiology with hundreds of channels,” bioRxiv preprint, June 2016.
  • [12] J. J. Jun, N. A. Steinmetz, J. H. Siegle, D. J. Denman, M. Bauza, B. Barbarits, A. K. Lee, C. A. Anastassiou, A. Andrei, Ç. Aydın, M. Barbic, T. J. Blanche, V. Bonin, J. Couto, B. Dutta, S. L. Gratiy, D. A. Gutnisky, M. Häusser, B. Karsh, P. Ledochowitsch, C. M. Lopez, C. Mitelut, S. Musa, M. Okun, M. Pachitariu, J. Putzeys, P. D. Rich, C. Rossant, W. lung Sun, K. Svoboda, M. Carandini, K. D. Harris, C. Koch, J. O’Keefe, and T. D. Harris, “Fully integrated silicon probes for high-density recording of neural activity,” Nature, vol. 551, pp. 232–236, Nov. 2017.
  • [13] S. Muceli, W. Poppendieck, K.-P. Hoffmann, S. Dosen, J. Benito-León, F. O. Barroso, J. L. Pons, and D. Farina, “A thin-film multichannel electrode for muscle recording and stimulation in neuroprosthetics applications,” Journal of Neural Engineering, vol. 16, p. 026035, Feb. 2019.
  • [14] A. D. Vecchio and D. Farina, “Interfacing the neural output of the spinal cord: robust and reliable longitudinal identification of motor neurons in humans,” Journal of Neural Engineering, Oct. 2019.
  • [15] A. K. Clarke, S. F. Atashzar, A. D. Vecchio, D. Barsakcioglu, S. Muceli, P. Bentley, F. Urh, A. Holobar, and D. Farina, “Deep learning for robust decomposition of high-density surface EMG signals,” IEEE Transactions on Biomedical Engineering, vol. 68, pp. 526–534, Feb. 2021.
  • [16] Y. Wen, S. Avrillon, J. C. Hernandez-Pavon, S. J. Kim, F. Hug, and J. L. Pons, “A convolutional neural network to identify motor units from high-density surface electromyography signals in real time,” Journal of Neural Engineering, Mar. 2021.
  • [17] G. Algan and I. Ulusoy, “Label noise types and their effects on deep learning,” arXiv preprint arXiv:2003.10471, 2020.
  • [18] R. I. Kumar, M. M. Mallette, S. S. Cheung, D. W. Stashuk, and D. A. Gabriel, “A method for editing motor unit potential trains obtained by decomposition of surface electromyographic signals,” Journal of Electromyography and Kinesiology, vol. 50, p. 102383, Feb. 2020.
  • [19] K. C. McGill, Z. C. Lateva, and H. R. Marateb, “EMGLAB: An interactive EMG decomposition program,” Journal of Neuroscience Methods, vol. 149, pp. 121–133, Dec. 2005.
  • [20] F. Hug, S. Avrillon, A. D. Vecchio, A. Casolo, J. Ibanez, S. Nuccio, J. Rossato, A. Holobar, and D. Farina, “Analysis of motor unit spike trains estimated from high-density surface electromyography is highly reliable across operators,” bioRxiv preprint, Feb. 2021.
  • [21] D. Carlson and L. Carin, “Continuing progress of spike sorting in the era of big data,” Current Opinion in Neurobiology, vol. 55, pp. 90–96, Apr. 2019.
  • [22] P. Yger, G. L. Spampinato, E. Esposito, B. Lefebvre, S. Deny, C. Gardella, M. Stimberg, F. Jetter, G. Zeck, S. Picaud, J. Duebel, and O. Marre, “A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo,” eLife, vol. 7, Mar. 2018.
  • [23] H. Song, M. Kim, D. Park, and J.-G. Lee, “Learning from noisy labels with deep neural networks: A survey,” arXiv preprint, 2020.
  • [24] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: exploring techniques and remedies in medical image analysis,” CoRR, vol. abs/1912.02911, 2019.
  • [25] B. Frenay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp. 845–869, May 2014.
  • [26] A. Veit, N. Alldrin, G. Chechik, I. Krasin, A. Gupta, and S. J. Belongie, “Learning from noisy large-scale datasets with minimal supervision,” CoRR, vol. abs/1701.01619, 2017.
  • [27] K. Lee, X. He, L. Zhang, and L. Yang, “Cleannet: Transfer learning for scalable image classifier training with label noise,” CoRR, vol. abs/1711.07131, 2017.
  • [28] J. Han, P. Luo, and X. Wang, “Deep self-learning from noisy labels,” CoRR, vol. abs/1908.02160, 2019.
  • [29] S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in

    2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)

    , IEEE, 2005.
  • [30] J. Wang, F. Zhou, S. Wen, X. Liu, and Y. Lin, “Deep metric learning with angular loss,” 2017.
  • [31]

    F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.
  • [32] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” CoRR, vol. abs/1703.07737, 2017.
  • [33] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” in Advances in Neural Information Processing Systems (D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, eds.), vol. 29, Curran Associates, Inc., 2016.
  • [34] C.-Y. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl, “Sampling matters in deep embedding learning,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
  • [35] O. Rippel, M. Paluri, P. Dollar, and L. Bourdev, “Metric learning with adaptive density discrimination,” 2016.
  • [36] X. Wang, Y. Hua, E. Kodirov, G. Hu, R. Garnier, and N. M. Robertson, “Ranked list loss for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  • [37] F. Xu, B. Ma, H. Chang, and S. Shan, “PRDP: Person reidentification with dirty and poor data,” IEEE Transactions on Cybernetics, pp. 1–13, 2021.
  • [38] Y. Fan, S. Lyu, Y. Ying, and B.-G. Hu, “Learning with average top-k loss,” arXiv preprint arXiv:1705.08826, 2017.
  • [39] A. Holobar and D. Zazula, “Multichannel blind source separation using convolution kernel compensation,” IEEE Transactions on Signal Processing, vol. 55, no. 9, pp. 4487–4496, 2007.
  • [40] D. Farina, F. Negro, S. Muceli, and R. M. Enoka, “Principles of motor unit physiology evolve with advances in technology,” Physiology, vol. 31, pp. 83–94, Mar. 2016.
  • [41] S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullback-leibler divergence metric learning,” IEEE Transactions on Cybernetics, pp. 1–12, 2020.
  • [42] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” CoRR, vol. abs/1701.06538, 2017.
  • [43] A. Holobar and D. Zazula, “Gradient convolution kernel compensation applied to surface electromyograms,” in Independent Component Analysis and Signal Separation, pp. 617–624, Springer Berlin Heidelberg, 2007.
  • [44] A. Holobar, M. A. Minetto, A. Botter, F. Negro, and D. Farina, “Experimental analysis of accuracy in the identification of motor unit spike trains from high-density surface emg,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 18, no. 3, pp. 221–229, 2010.
  • [45] H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4004–4012, 2016.
  • [46] C. F. Dunlap, S. C. Colachis, E. C. Meyers, M. A. Bockbrader, and D. A. Friedenberg, “Classifying intracortical brain-machine interface signal disruptions based on system performance and applicable compensatory strategies: A review,” Frontiers in Neurorobotics, vol. 14, Oct. 2020.
  • [47] D. Farina, M. Fosci, and R. Merletti, “Motor unit recruitment strategies investigated by surface EMG variables,” Journal of Applied Physiology, vol. 92, pp. 235–247, Jan. 2002.
  • [48] Q. Qian, L. Shang, B. Sun, J. Hu, H. Li, and R. Jin, “Softtriple loss: Deep metric learning without triplet sampling,” 2020.
  • [49]

    S. Pancholi and A. M. Joshi, “Advanced energy kernel-based feature extraction scheme for improved EMG-PR-based prosthesis control against force variation,”

    IEEE Transactions on Cybernetics, pp. 1–10, 2020.