I Introduction
Time series neurophysiological data is commonly characterised by repetitive activation events, for example the motor unit activation potentials (MUAP) in electromyographic (EMG) signals or the spike potentials in microelectrode cortical recordings[1][2]. The ensemble of these activation events constitute neural codes that give a direct insight into the target system[3], whilst also providing an accurate control signal for humanmachine interfacing applications, such as prosthetic control[4][5]. Neurophysiological signals are typically linear superpositions of many of these spiking sources, and their extraction from noisy systems has long been a major focus of applied source separation in neuroscience[6].
Identifying sources from multiunit activity was originally achieved through manual sorting by trained operators[7], but this tedious process was quickly supplanted by automated methods using early forms of blind source separation (BSS)[8]. BSS algorithms have since become extremely effective, able to automate the recovery of sources in highly noisy and complex systems[8][4][9]. An important trend within the field of applied source separation has been the increasing availability of highly multivariate data[10][11], as a result developments in highdensity electrode arrays[12][13]. By exploiting the increased spatial information collected by these systems, BSS pipelines can yield extremely large numbers of sources[11][14]
. A more recent development is the adoption of deep learning approaches as a replacement for linear separation vectors, using neural networks to decode signals with a high degree of robustness to noise and signal nonstationarities
[15][16]. These methods involve an offline supervised training phase using the augmented output of a BSS algorithm, therefore an important requirement is that the BSS decomposition contains relatively few errors if the network is to decode with high accuracy[15].As the number of sources identified in a manual or automatic decomposition increases, so does the probability of labelling errors. Noise can be mistakenly labelled as an activation, an activation can be assigned to the wrong class or missed entirely, a class might be inappropriately partitioned or two distinct classes merged into one. These sources of label noise can be categorised based on whether the factors affecting the likelihood of a label from one class to flip to that of a different class are shared at the dataset, class or feature level
[17]. As labelflipping is class or even featuredependent, it is difficult to identify such errors automatically, and for automatic decompositions a degree of manual posthoc cleaning is commonly employed, often using additional knowledge about the system of interest, such as temporal statistics in the source activations[18]. The nature of this manual cleaning generally relates to the mixing system of interest, for example intracortical and intramuscular EMG (iEMG) decompositions generally require posthoc examination of classes due to extensive classdependent label noise[8][19]. On the other hand, surface EMG (sEMG) decompositions also contain a degree of featuredependent noise and so require further inspection of specific activations[20]. Whilst accurate, manual posthoc ”cleaning” is an extremely timeconsuming process and in some cases not feasible because of the size of the datasets being sourceseparated[21]. For this reason, modern source separation pipelines are increasingly using additional automated postprocessing steps in an attempt to reduce the false label burden[18][22][10]. However, these methods only compensate for a relatively small proportion of incorrect labels, so that there remains a need for new methods of posthoc label cleaning. Additionally, if supervised deep learning frameworks are to be trained using BSSlabelled signals with increasing degrees of label noise, then they need to be able to detect and manage such errors, i.e. they need to be designed to be implicitly selfcorrecting.Managing label noise is a fastdeveloping subfield of modern machine learning. As the size of datasets expand faster than the capacity of domain experts to screen and label, data scientists are increasingly turning to new labelling methods that are more scalable at the cost of a greater proportion of label noise, such as Amazon’s Mechanical Turk
[23]. This is particularly true in a neuroscience setting, where datasets are generally labelled by a small pool of domain experts who can differ in professional opinion[24]. Approaches to learning in the presence of label noise can be broadly split into two categories; methods that aim to select models that are robust to label noise and methods that attempt to clean the label set prior to training[25]. Contemporary methods based on the latter approach generally rely on additional models which attempt to identify noisy labels using either a smaller pool of known correct labels or by comparing a label with other inclass labels using a similarity metric[26][27][28]. This principle of using similarity metrics to build embedding spaces that inform intra and interclass classification is closely related to deep metric learning (DML) approaches.The objective of deep metric learning is the training of a deep neural network which maps discrete inputs to an embedding space in which positive pairs (two inputs from the same class) are closer than negative pairs (two inputs from different classes)[29]. The Euclidean distance is commonly used as the distance metric, although measures based on the angular difference between embeddings have become popular due to their inherent rotation and scale invariance[30]. The network is optimised either using the absolute similarity between pairs or, more commonly, the relative difference between one or more positive and negative pairings[31]. During optimisation, most negative pairs will quite quickly become much further away than positive pairs, thus random generation of pairs for training is extremely inefficient. For this reason many implementations of DML seek to select the most informative pairings from each minibatch by applying some form of selection rule operating on the output embedding space[32][33]. Most of these approaches are designed purely to maximise the class separability of the embedding space, leading to dense clusters in each class[34]. More recent work has attempted to increase intraclass variability, as such tight embeddings potentially reduce the ability of the model to generalise to new inclass inputs[35][34][36]. These localitysensitive approaches have a less distorting effect on the embedding space, giving better generalisation performance[36]. Whilst not the primary goal of these studies, another effect of preserving intraclass variance is a richer embedding of inputs, with semantically similar events subclustering[35]
. Such rich embedding spaces could potentially be of use for detecting class outliers, perhaps even having utility for labelcleaning operations in eventdriven neurophysiological recordings. DML pipelines have been designed to operate on noisy label sets for similar tasks such as person reidentification, however these methods tend to use an external method to modify training rather than the embedding space itself, for example using labelcorrection based on crossentropy
[37].Motivated by the need to further preserve a rich intraclass embedding, and inspired by ranking approaches to triplet sampling, such as[36], here we propose a novel localitysensitive approach to sampling during DML optimisation. We use an efficient topk query to identify the closest inbatch positive to each event and the Nclosest inbatch negatives, which are then used to calculate an Npair formulation of the popular angular loss[30]. The idea of using topk queries within batch losses has been explored in the context of binary classification problems[38]
, however this is, to our knowledge, the first such implementation within the domain of DML. In this paper we demonstrate that this simple modification, which we call localitysensitive angular loss (LSAL), generates an embedding space which can be used to detect and classify repetitive events, whilst importantly having the additional utility of being able to detect labelnoise in the data used for training.
The main contribution of this paper is DeepLSAL, a novel DML pipeline that leverages LSAL to perform both labelcleaning and the identification of new activations in unseen data, operating directly on neurophysiological time series signals. The specific focus is on label sets generated by BSS algorithms for decoding convolutive mixtures, as this is an area where supervised deep learning methods are clearly beginning to outperform existing methods[15][16]; however, in principle the proposed methodology is applicable to any manual or automatic method of generating imperfect label sets. The effectiveness of DeepLSAL is validated robustly using an experimentallycollected highdensity (HD) sEMG dataset that had been sourceseparated into constituent motor unit activity using the gradient convolutional kernel compensation (gCKC) algorithm[39]. The scenario of decomposition of HDsEMG signals is highly convenient for validation of the proposed approach since both the generating system and decomposition methodology are wellstudied. Moreover, the sEMG system is characterised by a high degree of featuredependent label noise due to the complex superposition of MUAPs caused by volumeconduction effects[40].
Ii Theory and Algorithm
Iia Deep Metric Learning with Npair Loss
The objective of ranking loss DML, also called triplet loss, is to train a deep learning function such as a convolutional neural network to map a sample taken from one of
classes to an embedding vector , such that for an arbitrarily selected anchor embedding , the embedding space reduces the relative distance between positive samples from the same class and negative samples from different classes .can be a number of different metrics, such as the Euclidean distance, cosine similarity or Kullback–Leibler divergence
[41]. Commonly the loss function is formulated such that the relative difference be greater than a margin such that:(1) 
After a small amount of optimisation, the bulk of negative pairs will be much further away than the positive pairs, meaning most training examples in a batch will become uninformative[32]. Npair loss seeks to avoid this problem by comparing each positive pair in an size batch to multiple negative pairs (fig. 1a), which are then combined in a formulation[33]:
(2) 
where is usually a hinge function such as . By taking an average across the negative pairs, it is likely that at least some informative negative pairings will be included in the loss calculation.
IiB Angular Loss
Angular loss is based on a geometric reformulation of the distance metric; rather than minimising the distance of to relative to , it instead minimises the angle at of a triangle formed from the three embeddings. This has the effect of improving optimisation stability as angles are scale invariant, whilst using a triangle means all edges of the triplet are taken into account[30]. However, in certain circumstances the minimisation of the angle at will push towards . This can be avoided by constructing a rightangled triangle with and the midpoint between and (fig. 1b), with the final vertex being the point on the semicircle joining and which creates a rightangled triangle[30]. By dropping constant terms, this geometric relationship can be used for the in equation 2, expressed as:
(3) 
where is an angle in radians which sets the upper accepted bounds of the loss, analogous to in equation 1.
IiC Inducing Rich Embeddings
Whilst a random selection of positive pairings in is suitable if the objective is to maximise interclass distance within the embedding space, it also has the tendency to collapse intraclass distances down to a point, as illustrated in figure 1c[35]. We theorised that the main explanation for this compression is that the sampling process is random, meaning that the neural network is induced during training to bring all embeddings from the same class together, which, as a complex nonlinear function, it is quite capable of doing given sufficient training steps. We instead elected to sample positive and negative pairs based on the local neighbourhood of each embedding in the batch, modifying the selection of and the set of ’s for each in the batch , a process we term localitysensitive sampling.
is first selected such that it be large enough to have a diverse representation of each class. As each embedding vector is L2normalised, the tensor formed by finding the inner product of the batch tensor with its transpose is the pairwise cosine similarity. For each
the selected with a simple argmax, i.e. the closest different vector from the same class is selected. A similar procedure is used to select the set of ’s, using a topk algorithm to select the most similar vectors to that belong to a different class. GPU implementations of topk algorithms have become extremely efficient in recent years, due to their increasing use within machine learning paradigms[42]. With these easilyimplemented changes the Npair formulation of the angular loss becomes:(4)  
where is the argmax set of the pairwise cosine similarity between and its associated set in and is the topk values of the ordered pairwise cosine similarity between and its associated set in . is the indicator function.
To stabilise early optimisation we combined the LSAL loss with a categorical crossentropy with temperature given by:
(5) 
where and
is a trainable matrix that compresses the embedding vector down to a dimension C vector for comparison with the onehot encoded class labels. This gives the final loss function:
(6) 
IiD Source Separation
The timestamps used by DeepLSAL can be generated using a wide variety of manual and automated processes, however for this study the gradient convolution kernel compensation (gCKC) algorithm was selected due to its strong performance in HDsEMG signal decomposition[43][39]. In the gCKC framework for blind source separation, the vector of spiking sources at time are first extended with delayed versions of themselves, allowing the mixing problem, which is convolutive in most neurophysiological settings, to be written in instantaneous form:
(7) 
where the signal observation vector at time is a linear mixture parameterised by the operation of the mixing matrix on the extended source vector plus noise . In practice both the observation and source vectors are additionally extended with a further values for reason of numerical stability during the source separation procedure.
Unlike independent component analysis methods which seek to directly estimate a separation vector for each source, gCKC seeks to include the additional statistical information that the spiking sources generate repetitive events within the signal. Sources are instead estimated indirectly using a linear minimum mean square error estimator, with the estimated
th source at time point given by:(8) 
where is the transposed crosscorrelation vector between an activation of the th source and extended HDsEMG matrix and is the inverted autocorrelation matrix of the extended HDsEMG matrix .
The vector is usually initialised with a time point likely to contain a source activation, which can be estimated by, for example, the Mahalanobis distance calculated on the signal[39]. Once selected, is then optimised to find the rest of the source’s signal contributions. This can be done with either a fixedpoint algorithm as in [10] or in the gCKC formulation by gradient descent:
(9) 
where is the updated crosscorrelation vector, is the learning rate and
a contrast function designed to estimate the nongaussianity of the output source in a similar fashion to independent component analysis. Optimised sources can then be converted to timestamps using a linear threshold or a twoclass k means clustering algorithm.
Iii Validation Methodology
Iiia Experimental Dataset
The HDsEMG data set consisted of a set of 20second recordings taken from the dominant tibialis anterior muscle of 10 men performing an isometric contraction at 15% of maximal force, previously used to validate source separation techniques[44]. Maximal contraction was defined as the mean force of three 5s maximal contractions separated by 3 min of rest, with force sampled at 2048Hz by load cells mounted on an isometric brace. Force feedback was provided to the participants by an oscilloscope. The signal from a monopolar electrode array placed over the main muscle innervation zone was sampled at 2048Hz having been bandpass filtered at 10500 Hz.
Gradient convolution kernel compensation with an additional kmeans source refinement step was implemented using the tensorflow machine learning package
[39][10]. As the label set was to be artificially corrupted it was important that the original be as noisefree as possible, so additional post hoc steps were taken to maximise the likelihood that the timestamps were correct. Sources were manually cleaned by examining interspike intervals and the sourcetonoise ratio of each activation. An additional step of validating decomposition accuracy was implemented by comparing the sources to those found using the DEMUSE sourceseparation software package[43][39], with source cleaning completed by a different trained operator.IiiB Label Set Corruption
In experiment 1 we evaluated the ability of DeepLSAL to clean a label set corrupted by featuredependent noise, where a label flipping probability is related to its associated features[17]. In the context of sourceseparated HDsEMG, this most commonly occurs as a falsepositive, where a separation vector incorrectly assigns a high probability of an inclass MUAP being present when it is not, i.e. a noise class or other MU class label is flipped to the MU class of interest. To simulate this effect, we corrupted the label set by generating an artificiallynoisy separation vector for each MU class; randomlyselecting 15 MUAP labels from that class and using the average of the associated extended HDsEMG vectors to generate a linear minimum mean square error prediction on the extended HDsEMG matrix. A twoclass kmeans clustering algorithm was then used to parameterise a linear threshold to find activations, creating a label set with a high degree of featuredependent noise. Five levels of increasing difficulty were generated by taking an amount of false positives corresponding to 10/20/30/40/50% of the number of true labels, selected at random from the set of false positives.
In experiment 2 the DeepLSAL algorithm was evaluated on classdependent label noise, when the probability of a label flipping to another class is stable across all labels in the class[17]. In HDsEMG source separation this error generally occurs when the separation vectors are very similar, usually due to similar MUAP waveform shapes between two MU classes. This can be simulated by transferring a percentage of labels to a similar MU class. This was done by first averaging the MUAPs of each MU class and then crosscorrelating these averages with the average MUAP of every other class in the recording, with 10/20/30/40% of the class labels transferred to the class with the highest value. If labels had already been transferred to the closest class then the next closest class was selected until all classes had had label transfers. A maximum label corruption of 40% was used to preserve the concept of a majority true and minority false class.
IiiC DeepLSAL Pipeline and Training
To convert the sourceseparated HDsEMG signal into labelled windows, first each channel of the HDsEMG signal was standardised by zscoring and then cut into overlapping 80sample wide windows at a stride of 1. Each window was then labelled by reference to the predicted source activity at the final sample of the window. This meant the bulk of windows were labelled as part of the inactive class due to the sparse nature of motor neuron spiking. Due to this serious class imbalance, each minibatch was created from the entirety of the windows labelled as containing a motor neuron spike, with an additional 256 samples from the inactive class. Each class assignment was then converted to a onehot representation, the bulk of which had only one class active at any one time, although rarely two activations would occur simultaneously on the same timepoint. As the richness of the intraclass embedding of the inactive class windows was not of any great concern, the embeddings of these windows were not used as anchor vectors when calculating the
component of the loss, although they were used as both values and in the calculation of .Embeddings were calculated with a convolutional neural network implemented using the tensorflow machine learning library in python, as seen in figure 2
a. Convolution steps used a 1D 3sample wide kernel, with 32 filters and a dropout of 0.2. 1D maxpooling was completed with 2sample wide kernels. Each denselyconnected layers had 64 neurons and a dropout percentage of 0.5 during training. Both the convolution and denselyconnected layers used ReLU activation functions. Finally the output of the last denselyconnected layer was denselyconnected to a bias and activationfree embedding layer of 8 neuronswide, which was then divided by its L2 norm. This was an intentionally lowdimension embedding compared to standard DML due to the desire to avoid dimensionality issues during the clustering steps in the refitting phase. The additional matrix
used in the categorical crossentropy was initialised with truncated normal noise, whilst the weights of the neural network layers was initialised by glorot uniform. The Adam optimisation algorithm at a learning rate of 0.001 was then used to train the model over 500 epochs for both cleaning and refitting stages.
For both experiments was set to 5, was set to 0.1, and to 0.25 radians in both the cleaning and refitting stages. After the cleaning stage the labels in the embedding space likely to relate to specific classes were selected by a simple densityestimator. First a local scale value was estimated by finding the mean cosine similarity of the each embedding vector to its 20 nearest neighbours and taking a median of this value across all vectors. For each label the number of other labels with a cosine similarity more than was found and the label with the highest number of neighbours was selected as the centre of the cluster. All labels within a cosine similarity higher than were then added to the refitting training set. This simple approach was generally adequate for quickly finding the densest region of the embedding space, which was usually the cluster of true labels.
After the cleaning phase the set of timestamp labels is generally free of false positives; if DeepLSAL is retrained with this new label set then it should effectively generalise to find unlabelled activations in both the current and future data (fig. 2b). It should be noted that finding unlabelled activations in the current dataset is in some ways a ”harder” problem than trying to generalise to completely unseen data, as these activations are included in training, but mislabelled. However these mislabelled activations are a very small proportion of the total dataset, meaning they were predicted not to impact convergence on a model with good generalisation ability, particularly a heavily regularised model such as that used for DeepLSAL. Once DeepLSAL was retrained an average embedding vector of the current MUAP timestamps was found for each class, which was then crosscorrelated with the entire embedded HDsEMG signal to generate a predicted activity (fig. 2c). This activity was then timestamped by a linear threshold parameterised by a twoclass kmeans clustering algorithm. These labels were compared to the precorrupted data using the rate of agreement (RoA) metric, a percentage defined as the number of true positive matches divided by the total number of true positives, false positives and false negatives.
Iv Results
Iva Featuredependent Label Noise
In experiment 1, which tested the effect of featuredependent label noise by simulating noisy separation vectors, DeepLSAL generated an embedding space with dense clusters for each class corresponding to the true labels. Surrounding each cluster is a large sparse periphery of false labels, with no apparent structure. The efficacy of localitysensitive sampling at preserving a rich class embedding is clear when compared to random sampling using a twodimensional principal component space, as in figure 3. The simple densityestimator operating on the cosine similarity between embeddings could then be used to select a subset of true labels by selecting the area with the highest density (fig. 4). This algorithm was weighted to favour specificity over sensitivity, meaning almost no false labels were included in the cleaned labelset at the cost of losing a percentage of the true labels.
DeepLSAL generated an embedding space with utility for removing false labels even at the maximum tested value of 50% of total correct values (fig. 5a), with a median postcleaning false label retention of 1.3% of the total correct labels in the class (IQR 0.5  1.9). The number of true labels lost during the cleaning process fell as the precleaning percentage of false labels increased, but even at the highest false label percentage tested, a median of 74.1% (IQR 68.9  80.3) of the true values were still retained (table I).
False Labels Added as Proportion of Class (%)  Starting RoA (%)  Remaining True Labels After Cleaning (%)  RoA After Cleaning and Refitting (%) 

10  91.5 (91.5 – 91.7)  84.1 (79.6  87.1)  94.5 (91.2 – 97.2) 
20  83.9 (83.8 – 83.9)  78.3 (74.4  83.9)  94.5 (89.8 – 97.3) 
30  77.4 (77.3 – 77.5)  80.3 (75.3  85.1)  93.5 (90.2 – 97.7) 
40  71.8 (71.7 – 71.9)  76.6 (71.7  81.8)  93.8 (87.7 – 96.3) 
50  67.0 (66.9 – 67.0)  74.1 (68.9  80.3)  92.0 (86.7 – 96.4) 
IvB Classdependent Label Noise
In experiment 2, when labels were randomly flipped to the MU class with the closes average MUAP shape, DeepLSAL again generated embedding spaces with clear separation between true and false separation (fig. 6). However, unlike in the first experiment, the false labels formed a second distinct cluster within the embedding space, again clearly visualised in the first two principle component dimensions. As the true label cluster always had more values, it was still clearly identified by the densityestimator.
As in experiment 1, the DeepLSAL cleaning phase was effective at removing almost all false labels (fig. 5b). Even at a 40% transfer the median postcleaning fraction of 0.0% (0  0.6) of the total correct labels. As true labels were lost both to the initial transfer to other classes and to the cleaning phase, far fewer were retained in the postcleaning dataset than in experiment 1 and would need to be recovered in the refitting stage (table II).
IvC Rediscovering Unlabelled Activations
An important requirement if the DeepLSALcleaned label set is to be useful is that the cleaning process does not overly bias against true labels that are lost at this stage, making them difficult to recover using the retained true labels. Lost labels tend to be more peripheral in the cluster, meaning their MUAP shapes are likely to be less similar to the MU class average, potentially due to superposition with a MUAP from a different class or due to a noise artefact. False negatives are also still used for training, but are labelled inappropriately, with a possibly detrimental effect of the model to generalise. To demonstrate that neither of these potential problems actually impacted training, after the cleaning stage of both experiment 1 and 2 DeepLSAL was refitted with the cleaned label sets. For both experiments the predicted activity was generally both sparse and clean, with MUAPs easily identifiable (fig. 7). These labels were compared to the original data using the rate of agreement (RoA) metric, a percentage defined as the number of true positive matches divided by the total number of true positives, false positives and false negatives.
The RoA of the predicted MUAP labels with the original data was generally good for both experiments at every level of difficulty (tables I and II). In experiment 1 there was little change in RoA as difficulty increased (fig. 5c), suggesting that DeepLSAL is able to generalise to unseen activations. This finding was also replicated in experiment 2 (fig. 5d), and even when the median postcleaning training set was just over half of the total class activations a median RoA of 94% (86.9  97.7) was achieved after refitting.
Proportion of Class Labels Transferred (%)  Starting RoA (%)  Remaining True Labels After Cleaning (%)  RoA After Cleaning and Refitting (%) 

10  83.2 (82.1  83.9)  84.1 (84.1  88.1)  98.2 (95.4 – 98.8) 
20  67.0 (66.1 – 69.0)  75.4 (70.0 – 77.5)  96.8 (93.3 – 99.0) 
30  55.0 (53.1 – 56)  60.9 (57.3 – 64.9)  95.7 (88.9 – 98.8) 
40  43.7 (42.1  44.9)  53.1 (49.1 – 56.7)  94 (86.9 – 97.7) 
V Discussion
Source separation is often applied to decode the neural information embedded in neurophysiological data. This approach provides a window into the neural determinants of behaviour as well as a way to identify neural features for human machine interfacing. However, conventional source separation approaches as well as manual decompositions by expert operators often provides noisy outputs due to decoding errors. In this study we demonstrated that a supervised DML paradigm can find selfcorrecting data embeddings that allow accurate cleaning of labels corrupted by common noise issues, whilst also predicting new labels not included within the training set. Importantly the model was not biased against labels not selected during the cleaning phase when refitting, which is particularly important in neurophysiological signals, such as HDsEMG, where the shape of a subset of activations will be distorted by temporal superpositions. This ability of the model to operate on unpreprocessed signals and generalise to unseen activations is particularly relevant to the field of sEMG decomposition for prosthetic control, where a current focus is on the use of neural networks to directly identify MUAPs in raw signal and so avoid the latency involved with the current complex preprocessing pipeline that prevents online implementations[15][16].
Deep metric learning has a number of attractive properties over standard softmaxbased binary classifiers for neurophysiological time series classification, needing fewer training examples in general and being able to adapt to new classes easily[45]. DML methods can adapt quickly to the changes in class activity commonly seen in neural systems over time, such as with dropped units in intracortical recordings or MU recruitment and derecruitment in sEMG and iEMG recordings[46][47]. However, the focus of most implementations of DML is a high degree of interclass separation[48], rather than the need for a more descriptive intraclass embedding needed if this space is to be used for the identification of false labels. An important result of this study was that a loss function designed to operate only on samples local to each other in the embedding space can give rise to a richer intraclass distribution, avoiding the collapse to a dense point commonly seen in a more traditional triplet learning paradigm. Furthermore, we demonstrate that the utility of the embedding space generated by DeepLSAL for label cleaning is unaffected by feature or classdependence in the label noise.
Whilst this study focused on sourceseparated HDsEMG signal it is important to emphasise the broader applicability of this approach to any imperfectlylabelled neurophysiological time series data in which the underlying sources are repeating events, i.e. to any neural recording. Whilst the study focused specifically on action potentials, the proposed methodology could also be used for pattern recognition in bulk neurophysiological signal, supplementing recentlyproposed systems of prosthetic control
[49]. Additionally, the labelling process need not be by a BSS algorithm. For example, a DeepLSAL could be applied to a dataset for which only a small component of the data has been manually labelled by an expertoperator to recover the rest of the labels accurately. In this way, the proposed approach can be viewed as a minimally supervised method for neurophysiological time series decomposition into individual cell activities.A potential limitation of the study is the method of converting the embedding layer to a cleaned label set. One strength of the DeepLSAL algorithm was that it was usually quite simple to identify the main cluster of clean labels in the embedding space during the cleaning phase, meaning a simple densityestimator was sufficient to set a decision threshold. However, this approach tended to cut out a larger proportion of true labels than was potentially necessary, even if the refitting stage was able to rediscover those lost labels. An improved sorting methodology is even more relevant to the specific case when two classes were mixed with equal proportions, which meant that there was no way of identifying of which of the label clusters was ”correct”. This implies an interesting future direction for a DMLbased approach is to investigate the use of adaptive kmeans clustering to find distinct clusters within the error labels and hence potentially rehabilitate their averages with their origin class. Whilst it was not a focus of this study, a an additional interesting future direction might be found in improving the richness of nonactivation embeddings, which may allow the identification of new unlabelled classes through clustering events.
In summary, we have presented DeepLSAL, a deep metric learning pipeline for embedding sourceseparated multivariate neurophysiological time series into a dimensionallyreduced space suitable for both classification and label cleaning. Whilst the focus of the demonstration of this approach in this paper was performed on electromyographic signal decomposition, the method is broadly applicable to other neuromuscular recordings, such as intracortical or intraneural signals.
References
 [1] R. Merletti and D. Farina, Surface electromyography: physiology, engineering, and applications. John Wiley & Sons, 2016.
 [2] E. Stark and M. Abeles, “Predicting movement from multiunit activity,” Journal of Neuroscience, vol. 27, pp. 8387–8394, Aug. 2007.
 [3] E. Drebitz, B. Schledde, A. K. Kreiter, and D. Wegener, “Optimizing the yield of multiunit activity by including the entire spiking activity,” Frontiers in Neuroscience, vol. 13, Feb. 2019.
 [4] D. Farina and A. Holobar, “Characterization of human motor units from surface EMG decomposition,” Proceedings of the IEEE, vol. 104, pp. 353–373, Feb. 2016.
 [5] T. Kapelner, I. Vujaklija, N. Jiang, F. Negro, O. C. Aszmann, J. Principe, and D. Farina, “Predicting wrist kinematics from motor unit discharge timings for the control of active prostheses,” Journal of NeuroEngineering and Rehabilitation, vol. 16, Apr. 2019.
 [6] W. H. Calvin, “Some simple spike separation techniques for simultaneously recorded neurons,” Electroencephalography and Clinical Neurophysiology, vol. 34, pp. 94–96, Jan. 1973.
 [7] W. Simon, “The realtime sorting of neuroelectric action potentials in multiple unit studies,” Electroencephalography and Clinical Neurophysiology, vol. 18, pp. 192–195, Feb. 1965.
 [8] H. G. Rey, C. Pedreira, and R. Q. Quiroga, “Past, present and future of spike sorting techniques,” Brain Research Bulletin, vol. 119, pp. 106–117, Oct. 2015.
 [9] J. Kevric and A. Subasi, “Comparison of signal decomposition methods in classification of EEG signals for motorimagery BCI system,” Biomedical Signal Processing and Control, vol. 31, pp. 398–406, Jan. 2017.
 [10] F. Negro, S. Muceli, A. M. Castronovo, A. Holobar, and D. Farina, “Multichannel intramuscular and surface EMG decomposition by convolutive blind source separation,” Journal of Neural Engineering, vol. 13, p. 026027, Feb. 2016.
 [11] M. Pachitariu, N. Steinmetz, S. Kadir, M. Carandini, and H. K. D., “Kilosort: realtime spikesorting for extracellular electrophysiology with hundreds of channels,” bioRxiv preprint, June 2016.
 [12] J. J. Jun, N. A. Steinmetz, J. H. Siegle, D. J. Denman, M. Bauza, B. Barbarits, A. K. Lee, C. A. Anastassiou, A. Andrei, Ç. Aydın, M. Barbic, T. J. Blanche, V. Bonin, J. Couto, B. Dutta, S. L. Gratiy, D. A. Gutnisky, M. Häusser, B. Karsh, P. Ledochowitsch, C. M. Lopez, C. Mitelut, S. Musa, M. Okun, M. Pachitariu, J. Putzeys, P. D. Rich, C. Rossant, W. lung Sun, K. Svoboda, M. Carandini, K. D. Harris, C. Koch, J. O’Keefe, and T. D. Harris, “Fully integrated silicon probes for highdensity recording of neural activity,” Nature, vol. 551, pp. 232–236, Nov. 2017.
 [13] S. Muceli, W. Poppendieck, K.P. Hoffmann, S. Dosen, J. BenitoLeón, F. O. Barroso, J. L. Pons, and D. Farina, “A thinfilm multichannel electrode for muscle recording and stimulation in neuroprosthetics applications,” Journal of Neural Engineering, vol. 16, p. 026035, Feb. 2019.
 [14] A. D. Vecchio and D. Farina, “Interfacing the neural output of the spinal cord: robust and reliable longitudinal identification of motor neurons in humans,” Journal of Neural Engineering, Oct. 2019.
 [15] A. K. Clarke, S. F. Atashzar, A. D. Vecchio, D. Barsakcioglu, S. Muceli, P. Bentley, F. Urh, A. Holobar, and D. Farina, “Deep learning for robust decomposition of highdensity surface EMG signals,” IEEE Transactions on Biomedical Engineering, vol. 68, pp. 526–534, Feb. 2021.
 [16] Y. Wen, S. Avrillon, J. C. HernandezPavon, S. J. Kim, F. Hug, and J. L. Pons, “A convolutional neural network to identify motor units from highdensity surface electromyography signals in real time,” Journal of Neural Engineering, Mar. 2021.
 [17] G. Algan and I. Ulusoy, “Label noise types and their effects on deep learning,” arXiv preprint arXiv:2003.10471, 2020.
 [18] R. I. Kumar, M. M. Mallette, S. S. Cheung, D. W. Stashuk, and D. A. Gabriel, “A method for editing motor unit potential trains obtained by decomposition of surface electromyographic signals,” Journal of Electromyography and Kinesiology, vol. 50, p. 102383, Feb. 2020.
 [19] K. C. McGill, Z. C. Lateva, and H. R. Marateb, “EMGLAB: An interactive EMG decomposition program,” Journal of Neuroscience Methods, vol. 149, pp. 121–133, Dec. 2005.
 [20] F. Hug, S. Avrillon, A. D. Vecchio, A. Casolo, J. Ibanez, S. Nuccio, J. Rossato, A. Holobar, and D. Farina, “Analysis of motor unit spike trains estimated from highdensity surface electromyography is highly reliable across operators,” bioRxiv preprint, Feb. 2021.
 [21] D. Carlson and L. Carin, “Continuing progress of spike sorting in the era of big data,” Current Opinion in Neurobiology, vol. 55, pp. 90–96, Apr. 2019.
 [22] P. Yger, G. L. Spampinato, E. Esposito, B. Lefebvre, S. Deny, C. Gardella, M. Stimberg, F. Jetter, G. Zeck, S. Picaud, J. Duebel, and O. Marre, “A spike sorting toolbox for up to thousands of electrodes validated with ground truth recordings in vitro and in vivo,” eLife, vol. 7, Mar. 2018.
 [23] H. Song, M. Kim, D. Park, and J.G. Lee, “Learning from noisy labels with deep neural networks: A survey,” arXiv preprint, 2020.
 [24] D. Karimi, H. Dou, S. K. Warfield, and A. Gholipour, “Deep learning with noisy labels: exploring techniques and remedies in medical image analysis,” CoRR, vol. abs/1912.02911, 2019.
 [25] B. Frenay and M. Verleysen, “Classification in the presence of label noise: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, pp. 845–869, May 2014.
 [26] A. Veit, N. Alldrin, G. Chechik, I. Krasin, A. Gupta, and S. J. Belongie, “Learning from noisy largescale datasets with minimal supervision,” CoRR, vol. abs/1701.01619, 2017.
 [27] K. Lee, X. He, L. Zhang, and L. Yang, “Cleannet: Transfer learning for scalable image classifier training with label noise,” CoRR, vol. abs/1711.07131, 2017.
 [28] J. Han, P. Luo, and X. Wang, “Deep selflearning from noisy labels,” CoRR, vol. abs/1908.02160, 2019.

[29]
S. Chopra, R. Hadsell, and Y. LeCun, “Learning a similarity metric
discriminatively, with application to face verification,” in
2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR05)
, IEEE, 2005.  [30] J. Wang, F. Zhou, S. Wen, X. Liu, and Y. Lin, “Deep metric learning with angular loss,” 2017.

[31]
F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face recognition and clustering,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2015.  [32] A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person reidentification,” CoRR, vol. abs/1703.07737, 2017.
 [33] K. Sohn, “Improved deep metric learning with multiclass npair loss objective,” in Advances in Neural Information Processing Systems (D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, eds.), vol. 29, Curran Associates, Inc., 2016.
 [34] C.Y. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl, “Sampling matters in deep embedding learning,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.
 [35] O. Rippel, M. Paluri, P. Dollar, and L. Bourdev, “Metric learning with adaptive density discrimination,” 2016.
 [36] X. Wang, Y. Hua, E. Kodirov, G. Hu, R. Garnier, and N. M. Robertson, “Ranked list loss for deep metric learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
 [37] F. Xu, B. Ma, H. Chang, and S. Shan, “PRDP: Person reidentification with dirty and poor data,” IEEE Transactions on Cybernetics, pp. 1–13, 2021.
 [38] Y. Fan, S. Lyu, Y. Ying, and B.G. Hu, “Learning with average topk loss,” arXiv preprint arXiv:1705.08826, 2017.
 [39] A. Holobar and D. Zazula, “Multichannel blind source separation using convolution kernel compensation,” IEEE Transactions on Signal Processing, vol. 55, no. 9, pp. 4487–4496, 2007.
 [40] D. Farina, F. Negro, S. Muceli, and R. M. Enoka, “Principles of motor unit physiology evolve with advances in technology,” Physiology, vol. 31, pp. 83–94, Mar. 2016.
 [41] S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullbackleibler divergence metric learning,” IEEE Transactions on Cybernetics, pp. 1–12, 2020.
 [42] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. V. Le, G. E. Hinton, and J. Dean, “Outrageously large neural networks: The sparselygated mixtureofexperts layer,” CoRR, vol. abs/1701.06538, 2017.
 [43] A. Holobar and D. Zazula, “Gradient convolution kernel compensation applied to surface electromyograms,” in Independent Component Analysis and Signal Separation, pp. 617–624, Springer Berlin Heidelberg, 2007.
 [44] A. Holobar, M. A. Minetto, A. Botter, F. Negro, and D. Farina, “Experimental analysis of accuracy in the identification of motor unit spike trains from highdensity surface emg,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 18, no. 3, pp. 221–229, 2010.
 [45] H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4004–4012, 2016.
 [46] C. F. Dunlap, S. C. Colachis, E. C. Meyers, M. A. Bockbrader, and D. A. Friedenberg, “Classifying intracortical brainmachine interface signal disruptions based on system performance and applicable compensatory strategies: A review,” Frontiers in Neurorobotics, vol. 14, Oct. 2020.
 [47] D. Farina, M. Fosci, and R. Merletti, “Motor unit recruitment strategies investigated by surface EMG variables,” Journal of Applied Physiology, vol. 92, pp. 235–247, Jan. 2002.
 [48] Q. Qian, L. Shang, B. Sun, J. Hu, H. Li, and R. Jin, “Softtriple loss: Deep metric learning without triplet sampling,” 2020.

[49]
S. Pancholi and A. M. Joshi, “Advanced energy kernelbased feature extraction scheme for improved EMGPRbased prosthesis control against force variation,”
IEEE Transactions on Cybernetics, pp. 1–10, 2020.
Comments
There are no comments yet.