I Introduction
The data collected by a robot are often naturally represented as matrices or tensors, i.e., generalization of matrices to arrays of higher dimensions [1]. Examples include video streams [2], movements in multiple coordinate systems [3], electroencephalography (EEG) [4, 5] or tactile myography (TMG) data [6]
. Many approaches described in the literature consist of reorganizing the elements of these tensors into vectors before applying learning algorithms based on linear algebra, operating on vector spaces. This flattening operation ignores the underlying structure of the original data. Moreover, the dimensionality of the resulting problem dramatically increases, creating high computational and memory requirements. Finally, the number of model parameters to estimate in the learning method may become high, which constitutes an important issue in the cases where only few training data are available.
With the burst of multidimensional data available in various fields of research, important efforts were turned toward extending standard dimensionality reduction and learning techniques to tensor data. In this context, Ozdemir et al. [7]
proposed a review of tensor decomposition methods by dividing them into three categories of problems usually targeted by principal component analysis (PCA), namely lowrank tensor approximation, lowrank tensor recovery, and feature extraction. In particular, multilinear PCA (MPCA)
[8] and weighted MPCA [5] were proposed to extract features from tensor objects as a preprocessing step for classification. Similarly, linear discriminant analysis was extended to multilinear discriminant analysis in [9] and factor analysis was adapted to tensor data in [10]. In the context of regression, Guo et al. [11]proposed generalizations of ridge regression (RR) and supportvector regression (SVR) methods to tensor data, where they showed the superiority of tensorbased algorithms over the vectorbased algorithms in various applications. A similar extension of RR to tensor data was proposed in
[12]with an application in magnetic resonance imaging (MRI). Following a similar process, tensorvariate logistic regression (LR) was proposed in
[13, 14] for the classification of multidimensional data. Moreover, kernelbased frameworks such as Gaussian processes (GPs) [2] were also adapted to tensors [15].In this paper, we introduce a tensorvariate mixture of experts (TME) model for regression. Mixture of experts (ME) models, first introduced by Jacobs et al. [16]
, combine the predictions of several experts based on their probability of being active in a given region of the input space. Each expert is a regression function and a gate determines the regions of the input space where each expert is trustworthy. The output of the model is a weighted sum of the experts predictions. Over the years, ME models were widely improved with different gates, regression and classification models for the experts, and used in many applications (see
[17] for a review).In order to handle tensor data in a ME model, we propose to use tensorvariate models for the experts and for the gate. As explained in Section III, the experts are defined using tensor linear models and the gate is set as a tensorvariate softmax function. Both elements are based on the inner product between the input tensor data and model parameters (see Section II). The resulting TME model is trained with an EM algorithm using the CANDECOMP/PARAFAC (CP) decomposition [18, 19].
The functionality of the proposed approach is first evaluated and compared to the corresponding vectorbased approach using artificially generated data (Section IVA). The TME model is then exploited in the context of prosthetic hands to recognize hand movements from tactile myography (TMG).
Our TMG sensor is made of 320 cells organized in a array forming a bracelet [6], therefore providing intrinsically matrixvalued data. We show the effectiveness of our approach in an offline experiment with the aim of detecting finger and wrist movements from TMG data (Section IVB). The TME model outperforms the standard ME model and achieves similar performances as a nonlinear model, namely a GP, at a lower computational cost. Note that a ME model with linear experts was shown to achieve similar performance as more complex nonlinear methods at a lower computational cost in wrist movements recognition based on electromyographic (EMG) signals [20]. We finally validate the use of the proposed approach in a realtime teleoperation experiment, where participants controlled a robotic arm and hand by moving their wrist and closing/opening their hand (Section IVC).
The contributions of this paper are twofold: (i) we propose a tensorvariate mixture of experts model that exploits tensorial representations to take into account the structure of the data in the regression process; and (ii) we demonstrate the efficiency of tensorbased approaches in the context of prosthetic hands to recognize hand movements from tactile myography. In particular, we present a teleoperation experiment based on tactile myography to control a robotic arm and hand.
Ii Background
Tensors are generalization of matrices to arrays of higher dimensions [1], where vectors and matrices correspond to 1st and 2ndorder tensors. Tensor representation permits to represent and exploit the intrinsic structure of multidimensional arrays. We introduce here the tensor operations necessary for the proposed TME, as well as the extensions of two forms of the generalized linear model, namely ridge regression and logistic regression, to tensorvariate data.
Iia Tensor operations
The inner product of two tensors , is defined as the sum of the products of their entries, so that
(1) 
Note that the inner product of two tensors is equivalent to the Frobenius inner product of their mode matricization or unfolding and to the inner product of their vectorization
(2) 
A rankone tensor of order is a tensor that can be written as the outer product of vectors, i.e.,
(3) 
with the outer product between vectors, so that each element of is equal to and the number of dimensions or modes of the tensor.
The CANDECOM/PARAFAC (CP) decomposition factorizes a tensor as a sum of rankone tensors, i.e.,
(4) 
The smallest number of rankone tensors that generate as their sum is defined as the rank of the tensor . It corresponds to the smallest number of components in the CP decomposition. The CP decomposition can also be expressed in terms of the mode matricization and vectorization of the tensor and as
(5)  
(6) 
where denotes the KhatriRao product, are factor matrices defined as , , is a vector of ones. The KhatriRao product of two matrices and results in a matrix whose columns are equal to the Kronecker product of the columns of and , i.e., .
IiB Generalized linear model for tensors
Given a vectorvalued input data , the generalized linear model (GLM) is given by
(8) 
where is the predicted output, is a vector of weights, is the bias and is a function, see Figure 1top. This model can be naturally extended to matrixvalued data as
(9) 
where and are vectors of weights. Following a similar procedure, the model is generalized to dimensional tensorvalued data as
(10) 
as shown in Figure 1middle. The key advantages of this representation compared to vectorvalued representation are that the underlying structure of the tensorvalued data is taken into account in the model and that the number of parameters is reduced from to . Moreover, more complex features can be represented by encoding the weight tensor as a sum of rankone tensors with
(11) 
This model is represented in Figure 1bottom.
Similarly to the vector case, we obtain the tensorvalued linear and logistic regression models by defining the function as identity and as the softmax function, respectively.
IiC Tensor Ridge regression
Given a vectorvalued input data , the classical regression model is of the form
(12) 
where is the predicted output, is a vector of weights, is the bias and is a zeromean Gaussian noise variable. Following (11), as shown in [11, 12], the model can be generalized to dimensional tensorvalued data
(13) 
therefore taking the underlying structure of the data into account.
Given a dataset of dimensional tensor inputs and corresponding outputs , the parameters of the tensor ridge regression (TRR) model (13) are learned by maximizing its likelihood or equivalently its corresponding loglikelihood
(14) 
where
is the variance of the zeromean Gaussian random variable
. By using the inner product equality (7), one can observe that the model (13) is linear in individually, so that the parameters can be learned by optimizing a sequence of generalized linear models (see [11, 12] for details). Therefore, by adding a zeromean Gaussian prior on the weight tensor, equivalent to the regularization term , to the loglikelihood function, the bias and factor matrices are updated at each iteration until convergence with(15)  
(16) 
where the th row of the matrix is equal to , the th element of the vector is and is a vector of ones. Note that other types of regularization are also proposed in [11].
IiD Tensor logistic regression
In the classical multiclass logistic regression model, the posterior probability of the class
is given by the softmax function(17) 
where denotes the parameters of the model and the number of classes. Similarly as ridge regression, the logistic regression model can be extended to tensorvalued data by encoding the tensor of weights as a sum of rankone tensors, thus leading to the following tensorvalued softmax function [13, 14]
(18) 
where . Similarly to TRR, the tensor logistic regression (TLR) model takes into account the underlying structure of the data and reduces the number of parameters in the model compared to a vectorbased representation of the tensorvalued data.
Given a dataset of inputs and corresponding unit vector outputs where indicates that the th data belong to the th class, the loglikelihood of the multivariate tensor logistic regression model is
(19) 
Note that a regularization term can be added to the loglikelihood function to avoid the problem of overfitting. The parameters can be learned by minimizing the negative loglikelihood of the model via any gradientbased optimizer, e.g., Newton’s method or limited memory BFGS. By exploiting (7), the gradients of the regularized negative loglikelihood used in the optimization process can be computed as
(20)  
(21) 
where .
Iii TensorVariate Mixture of Experts
A mixture of experts (ME) regression model [16]
aims at solving a nonlinear supervised learning problem by combining the predictions of a set of experts. The model is composed of a gate determining a soft division of the input space and several experts making predictions in the different regions of the input space. In this section, we propose to generalize the ME regression model to tensorvariate data by using tensorvariate models for the experts and for the gate.
Iiia TME model
Given a tensorvariate input and an output , the tensorvariate mixture distribution is
(22) 
where is the number of experts, is the probability of the th expert to be activated (gate’s rating) and is the model of the th expert. , where and denote the parameters of the gate and the set of experts, respectively.
Similarly to the original ME model, we define the gate of the TME model by the tensorvariate softmax function, so that
(23) 
with defined by (18), and the rank of the weight tensors . The experts follow the Gaussian model
(24) 
where , and is the rank of the weight tensors . Note that one weight tensor is defined for each element of . This is similar to the vector case, where different vectors weight the input for each element of the output, so that .
IiiB Training of TME
Similarly to ME, the TME model can be trained using the expectationmaximization (EM) algorithm. By introducing a set of binary latent variables
where indicate that the data was generated by the th mixture component, the expected complete data loglikelihood is given by(26) 
where denotes the responsibility of the th component for the th data point so that . In the Estep, the responsibilities are computed using
(27) 
In the Mstep, the parameters are updated to maximize the expected complete data loglikelihood (26). First, the parameters of the experts are updated iteratively until convergence, based on (15) and (16), with
(28)  
(29) 
where the th row of the matrix is equal to , the th element of the vector is and , are the scaled input tensors and output vectors, respectively. The covariance of the experts Gaussian model is then updated as
(30) 
Finally, the gate parameters are updated by maximizing the loglikelihood of the multivariate tensor logistic regression model
(31) 
based on (19). Similarly to tensor logistic regression, a gradientbased optimizer is used to minimize the negative loglikelihood with gradients given by (IID) and (21), where is replaced by .
Note that regularization terms have been added in the Mstep to avoid the overfitting problem. The Estep and Mstep are iterated until convergence of the algorithm.
IiiC Model selection and initialization
Selecting the number of experts of ME is known as a difficult problem [17]. In the case where the structure of the application allows it, as in the experiments of Section IV, the number of experts can be determined by the experimenter. Otherwise, strategies used for ME, such as exhaustive search, growing or pruning models can be adapted for TME.
The TME model assumes fixed ranks and for the gate and experts weight tensors, respectively. The appropriate rank can be estimated using crossvalidation or through usual model selection criterion, e.g., the Bayesian information criterion (BIC).
Previous works on TRR [11, 12] and TLR [14] showed that both TRR and TLR models converge to a similar solution independently of the initial weight values. Therefore, the weight tensors of TME can be initialized with random values. In order to facilitate the convergence, we initialized the weights of the expert model and of the gate as equal to the weights obtained from TRR.
Iv Experiments
In this section, we first evaluate the functionality and the performance of the proposed TME on artificially generated data. The approach is then applied to the detection and recognition of hand movements from tactile myography (TMG) data. An offline experiment and a realtime teleoperation experiment, where participants controlled a robotic arm and hand based on their hand movements, illustrate the effectiveness of the proposed TME model. A video of the teleoperation experiment accompanies the paper. Source codes related to the experiments will be available after publication.
Iva 2D shape example
In this illustrative example, we propose to evaluate the performance of the proposed TME model for different ranks under various sample sizes and signal strengths. To do so, we generate artificial data following the model (22) from known parameters and we evaluate the recovery of those parameters by the model. In this illustrative example, we consider matrixvariate inputs
whose elements are independent and normally distributed. The output
is normally distributed with a mean given by a 2classes TME model with zero biases(32) 
and a standard deviation
. The weight matrices are equal to the binary 2D shapes represented in Figure 2(a), where the black and white regions correspond to and , respectively. The use of these 2D shapes was inspired by the illustrative example presented by [12].We first examine the performances of the proposed TME model for ranks and varying from to with a sample size equally divided between the two classes and a noise level equal to of the standard deviation of the mean . The regularization terms and were fixed as equal to . Moreover, we compare the TME model with the standard ME regression model whose gate is defined by the softmax function (17) and experts follow a Gaussian model with a mean given by (12).
Figure 3 shows the original and recovered weight matrices by the ME and TME models along with the rootmeansquare error (RMSE) for the estimation of the weight matrices and the BIC value for TME. We observe that TME outperforms ME for all the tested rank values as the maximum RMSE value achieved by TME is (, ) versus for the ME model. Moreover, we observe that the weight matrices retrieved by ME are noisier and the shapes of the experts weights are not clearly delimited and tend to be fused together compared to those retrieved by TME. Similar results were obtained for different sample sizes and noise levels.






Due to their structure, a rank2 setting is sufficient to capture a cross or a tshape, while a low rank setting does not allow to exactly represent a disk shape. As expected, the cross and tshape are fully recovered by TME in the cases where and , respectively, while approximations of the shapes are obtained for lower ranks (see Figures 2(c)–2(f)). Moreover, while the disk shape is approximated by a square in a rank1 setting (Figure 2(c)), it is already fairly recovered by a rank2 or rank3 setting (see Figures 2(d)–2(f)).
Consistently with the aforementioned observations, the minimum RMSE value is obtained by TME with a rank(2, 2) setting. Moreover, TME with ranks obtains a slightly higher RMSE than with ranks . This can be explained by the fact that approximating the cross and tshape with a rank3 setting, while a rank2 setting is sufficient, leads to an overfitted estimation with a higher influence of the noise contained in the training data. According to the BIC values reported for the tested TME models, the model with and should be selected (lowest BIC cost). However, in practice, one may prefer the rank(2,2) setting in this case, suggesting that other rank selection methods, such as crossvalidation, may be used in function of the application. Note that similar observations were made for different sample sizes and noise levels.
As shown in Figure 4, the estimation accuracy increases with the sample size and decreases with the noise level , validating the consistency of the proposed method.

IvB Detection of hand movements from tactile myography
In the context of prosthetic hands, tactile myography (TMG) has recently been proposed as a complementary or alternative approach to the traditional surface electromyography (sEMG) to achieve simultaneous and proportional control of multiples degrees of freedom (DOFs) of a hand prosthesis (see, e.g.,
[21, 6]). In this context, the aim of TMG is to measure the pressure related to the deformation induced by the muscles activity of the forearm. This signal is then used to determine the corresponding hand and wrist movements. Our TMG sensor, developed in [6] and showed in Figure 4(a), is composed of 320 resistive taxels organized in a array forming a bracelet. Therefore, the data provided by the sensor are intrinsically matrixvalued.Previous studies showed that Ridge regression (RR) directly applied to the data of the bracelet allows the prediction of different finger and wrist movements [6], which could outperform detection using sEMG [22]. However, RR does not take into account the matrix structure of the TMG data as they are vectorized before the application of the regression method. Moreover, the same weight vectors are used independently of the activated movements, which may result in false positive detection of activations. Therefore, the motivations to use TME for this application are the following: (1) the structure of the data is taken into account in the regression process; (2) the problem is decomposed in two subparts, namely detecting which movements are activated and determining their individual level of activation; and (3) the low computational complexity to evaluate one test sample allows TME to be used for realtime detection of hand and wrist movements from TMG data.
In this experiment, we investigate the performance of TME on the dataset^{1}^{1}1The dataset is available online at http://www.idiap.ch/paper/mdpi/data/exp2/. presented in the second experiment of [23]. The dataset was gathered from 9 healthy participants requested to replicate the movements of a stimulus in the form of a 9DOF hand model while wearing the tactile bracelet. Ground truth was obtained from the values of the animated hand model displayed on a monitor. This method has the drawback of possibly reducing the precision of the prediction of the intended activations due to the delay required by the participant to replicate the displayed movement. However, this approach allows the association of intended activations with input signal patterns in the case of amputees (since ground truth data can obviously not be collected by other means in this case). Each participant executed three times a sequence of six movements, namely wrist flexion, wrist extension, wrist supination, thumb flexion, index flexion and littlefinger flexion. The data were recorded during the whole cycle of the stimulus, namely transition, activation, transition and relaxation phases, in order to obtain the whole range of activation from rest to complete finger and wrist movements (see [23] for more details).
The training dataset is composed of data recorded at zero and full activation. The testing dataset is composed of data recorded during the transition parts, containing the whole range of intermediate activation levels. Therefore, the evaluation of the performance of the model is compatible with the evaluation in forecasted studies with amputees, as they cannot provide accurate intermediate training data.
We compared the performance of vectorbased and tensorbased algorithms, namely RR, ME and Gaussian process regression (GPR), as well as TRR and TME, on this dataset. The TMG data were centered for all the methods. The regularization parameter of RR and ME were fixed to
. GPR was used with a radial basis function (RBF) kernel whose parameters were optimized using GPy
[24]. Note that GPR with RBF kernel with an Euclidean distance measure was proved to reach good performances on this dataset in [23]. The rank of TRR was determined using fold crossvalidation and the regularization parameter was fixed to . For TME, as for ME, one expert was considered for each of the six finger and wrist movements. In order to facilitate the training process, we considered a common value for the ranks . The ranks and were determined using fold crossvalidation for . The regularization parameters of TME, and , were fixed by the experimenter as equal to . Note that small variations of the regularization parameters did not change significantly the results for the different regression methods.Table I shows the mean and standard deviation over the 9 participants of the RMSE between the ground truth and the prediction for the aforementioned methods. We observe that taking into account the structure of the data in the regression process improves the quality of predictions as both TRR and TME outperform their vector counterparts. Moreover, taking into account the structure of the data allows a linear method (TRR) to achieve performances comparable to those obtained by nonlinear methods. GP and TME achieve the best performance compared to the other methods, with TME obtaining a slightly lower RMSE () than GP (). Moreover, TME obtained the minimum RMSE for 5 participants out of 9. Note that GP with RBF kernel slightly outperformed GP with linear (), Matérn 32 () and Matérn 52 () kernels, therefore all the results are presented with RBF kernel.
Figure 6 shows an example of original and recovered activations for all the movements over the time for GP and TME. We observe that TME is generally recovering a more stable signal than GP when one movement is not activated. In the zones of zero activations, the signal recovered by GP tends to oscillate around zero. However, the signal recovered by TME can have a bigger delay than GP to detect an activation different than zero (see, e.g., Fig. 6, wrist extension).
RR  ME  GP  TRR  TME 

Table II
shows the average training and testing computation time for the tested regression methods. The computation times were measured using a nonoptimized Python code on a laptop with 2.7GHz CPU and 32 GB of RAM. Despite the fact that TME converged with less than 10 iterations of the EM algorithm for all the participants, we observe that this method has the highest computation time for training. This is mainly due to the fact that a TLR model is optimized at each step of the EM algorithm. Therefore, the training time of TME could be improved by optimizing the training time of TLR in the same way as machine learning libraries do for standard logistic regression. However, this training time remains reasonable for the method to be applied in realtime with amputees. Moreover, a testing time of
ms is also reasonable for realtime applications with TME allowing predictions at a frequency Hz, as usually targeted by realtime detection of hand movements. As opposed to GP, the computation testing time of TME is independent of the number of training data and depends only on the number of experts. Therefore, TME would be adapted to realtime predictions independently on the number of provided training data.RR  ME  GP  TRR  TME  

Training [s]  
Testing [ms] 
IvC Realtime teleoperation with tactile myography
To evaluate our method in a scenario closer to the enduser case, we conducted a realtime teleoperation experiment where 11 nonamputated participants (one female and ten males) controlled a robotic hand and arm based on the activation of the muscles on their forearm.
In the first part of the experiment, a protocol similar to the data collection of the experiment of Section IVB was applied to collect TMG data associated with the hand postures of the participants. The tactile bracelet was placed on the forearm of the participant with the closing gap on the ulna bone. The participant, wearing the tactile bracelet and sitting in front of a monitor, was asked to replicate the movements of a model of the 24DOFs dexterous motor hand of the Shadow Robot Company [25]. Similarly to the previous experiment, ground truth was obtained from the values of the animated hand model. Each participant executed four times the sequence of four movements, namely wrist flexion, wrist extension, power grasp and fingers extension. The participants were asked to perform the different movements in a relaxed way, particularly the fingers were relaxed during wrist movements. Each stimulus follows a cycle of 14 s composed of a transition phase (2 s), an activation phase (6 s), a transition phase (2 s) and a relaxing phase (4 s). The data collected during the activation and relaxing phases, i.e., at zero and full activations, were used to train the regression models.
During the second part of the experiment, the participant teleoperated a Shadow robot hand mounted on a 7DOFs Mitsubishi PA10 robot arm. S/he was sitting in front of the robotic system with the palm of the Shadow robot hand facing right, as showed in Figure 6(a). The different movements taught to the model in the first part of the experiments were mapped to the robotic system as follows: wrist flexion and extension were used to move the arm forward (in the direction of the palm) and backward (in the direction of the back of the hand), respectively. Power grasp and fingers extension were used to close and open the Shadow robot hand. When wrist flexion or extension was detected above a certain activation threshold, the velocity of the robot arm was incremented in the corresponding direction proportionally to the detected activation. Similarly, the posture of the robotic hand was incremented proportionally to the activation of power grasp or fingers extension if they were detected above a predefined threshold. The detected activations were also displayed on the Shadow robot hand model as in the first part of the experiment.
At the beginning of the second part of the experiment, the participant could get used to the learned mapping by controlling the simulated Shadow robot hand for a few minutes. Then, while teleoperating the real robotic system, the participant was asked to control the arm in order to approach it close to an object placed on a cube, to grasp this object and to bring it to a specific location on the left (A) or on the right (B) side and to release it. The complete setup is showed in Figure 6(a). Three objects with different diameters were considered, namely a chips cylinder ( mm), a thin woodstick ( mm) and a PET bottle ( mm), as shown in Figure 6(b). A total of 8 tasks were executed by each participant. The first six tasks consisted of bringing each object to A and then to B. Once a contact with the object was detected by the tactile fingertip sensors of the Shadow robot hand, the grasp pose was automatically maintained by the hand so that the subject could relax his/her fingers and focus on the wrist motion to steer the arm. The grasp pose was released as soon as a fingers extension command was detected. The maintenance of the grasp and the release were announced verbally by the system. The two last tasks consisted of bringing the PET bottle to A and to B without any holding assistance by the robot system. The time to complete each task was limited to two minutes. In case the object felt from the cube or was released out of the desired area, the experimenter replaced it at the initial position and the participant continued to execute the task in the remaining time. In case the participant could not control both the arm and the hand, e.g., if the arm was drifting continuously in one direction, the arm commands were disabled and the participant was requested to maintain a grasp on the object for 10 s before releasing it. Each participant tried to complete the 8 tasks with two different regression methods, namely TME and RR, trained on the data collected in the first part of the experiment. RR was chosen for comparison since it is considered as the baseline method for regression with TMG data. The order in which the two methods were tested was alternated between the participants.
Figure 8 shows snapshots of a participant executing different tasks. 6, respectively 7, out of the 11 participants were able to control both the arm and the hand during the whole experiment by using TME and RR, respectively. Note that the cases where the arm commands had to be disabled occurred mainly for the second tested method (4 participants out of 5 testing TME as second method and 3 out of 4 participants testing RR as second method), suggesting a decreasing of performance over time. One participant was not able to control both arm and hands for both methods and an other participant was able to control them for the 4 first tasks of the first tested method only.
The success rates, or ratios of successful tasks, for the case in which the participants controlled both robot arm and hand are presented in Table III. We observe that TME outperformed the performance RR by when all the objects and both locations A and B are considered. The time needed to accomplish successful tasks were s and s for TME and RR, respectively, showing almost no difference between the two methods.
For both methods, the tasks involving the woodstick result in the lowest success rate. Due to the small diameter of this object, the arm had to be positioned very precisely and a complete grasp activation had to be detected in order to perform a successful grasp. Therefore, the tasks involving the woodstick were the hardest to complete for the participants. In the case of TME, the success rate for the chips cylinder is lower than for the PET bottle. This can be explained by the fact that a small activation of the grasp movement was sometimes detected when the participants were flexing their wrist to make the arm move in the direction of the object. However, the robotic hand had to be completely opened to be able to be placed around the chips cylinder before grasping, while it could still be placed around the bottle if a small grasping activation was detected. In the case of RR, the success rate diminishes for the bottle compared to the chips cylinder. This may suggest a stronger decreasing of performance over time with this method.
We observe that the success rates for the bottle are similar with and without the holding assistance activated for both methods. This result is particularly interesting as it shows that combinations of hand and wrist movements, namely grasping with wrist flexion and grasping with wrist extension in this experiment, can be detected while training only on individual movements. Moreover, both aforementioned combinations were equally detected as the number of completed tasks for each location A or B was similar, i.e., 4 and 3 successful tasks for A and B with TME and 2 for each location with RR. Moreover, some of the participant did not wait that the contact with the object was detected before bringing it to its final location. Therefore, they managed to complete other tasks without using the holding assistance.
Total  Chips cylinder  Woodstick  Bottle  Bottle (no HA)  

TME  
RR 
Table IV shows the proportion of the failed tasks for which the time ran out during each of the task steps, namely grasping, moving and releasing the object, for TME and RR. We observe that the proportions are similar for both methods with the grasping step being the main cause of failure, followed by the moving step. Failures during grasping occurred mainly because the detected grasp activation was not sufficient to grasp the object or when it was activated too soon, therefore resulting in the object being pushed out of the support box. Failures while moving were due to difficulties to detect wrist flexion and extension or to the object falling down while the participant was trying to reach A or B. Finally, if the fingers extension movement was not detected properly while the holding assistance was activated, the opening of the hand was not triggered, resulting in failure to release the object.
Grasping  Moving  Releasing  

TME  
RR 
The success rates for the case in which the arm commands where disabled and the participant was only requested to grasp the object are presented in Table V. In this case, RR outperforms TME, especially for the tasks involving the woodstick. This can be explained by the fact that, in most of the cases, the arm commands had to be disabled for RR because the arm was drifting on the left or on the right, while it had to be disabled for TME because no activation was detected to move the arm, so that the participant could not position it to grasp the object. Generally, the detected activation of the grasp movement was also limited, therefore some tasks were difficult to achieve, especially those involving the woodstick.
Total  Chips cylinder  Woodstick  Bottle  Bottle (no HA)  

TME  
RR 
V Discussion
The proposed TME model allows the structure of tensorvalued data to be taken into account in the regression problem. Overfitting can then be reduced, which is particularly important when only few tensorvalued training data are available. We showed the effectiveness of the approach to detect hand movements from TMG data, outperforming the other tested methods in an offline experiment and allowing participants to teleoperate a robotic arm and hand in realtime. Particularly, the method was able to successfully detect intermediate and combination of activation, while trained only with zero and complete individual movements. Notably, participants managed to activate wrist flexion or extension along with power grasp. This indicates that a holding assistance may not be required. However, some participants reported that the holding assistance was helpful as a feedback indicating that the grasp was effective or to make them feel more comfortable while teleoperating the arm as they could focus on one movement only.
A decrease of performance over time, indicated by the necessity of deactivating the arm commands while testing the second method, seemed to have occurred during the realtime experiment. Moreover, some participants reported that they felt that the control of the robotic arm and hand was harder to perform over time. Moreover, we qualitatively observed that this problem seemed to occur particularly for participants who were trying to apply high forces to execute the different movements. We hypothesize that this is due to small displacements of the TMG bracelet over time, inducing a shift of the testing data compared to the training data. This problem could be overcome by improving the placement of the bracelet and by adapting the model over time. Techniques such as covariate shift adaptation in the case of sEMG [26] could be investigated.
It is important to emphasize the fact that the participants were able to adapt in some extent to the predictions of the method. They slightly modified their hand movements in order to obtain the desired action of the robotic arm and hand. Therefore, we observe a form of active learning, where the method learned from the training data, while the participants learned from the method in order to achieve the desired performance.
In both experiments, the ranks of the experts were given by a common value. The performance of TME may be further improved by selecting a specific rank for each expert. However, to avoid increasing computation time, automatic rank selection procedures have to be investigated. In particular, the automatic rank selection presented in [11] could be exploited to determine the rank of the expert models. The suggested method uses a norm regularization and optimizes the model with iteratively reweighted least squares (IRLS) algorithm. This approach seems promising as the authors reported in their experiments that the automatic procedure provided the same rank as the one selected by crossvalidation.
Vi Conclusion
This paper presented an extension of mixture of experts to tensorvalued data. Our method brings together the advantages and robustness of mixture models and tensor methods. Therefore, it allows an efficiently combination of predictions from experts specialized in different regions of the input space, while taking into account the structure of tensorvalued data in the soft space division and in the predictions of the experts. The data are efficiently exploited, so that a model trained with a small amount of training data is able to achieve good performances, while overcoming the overfitting problem. This is particularly important in robotics as the amount of training data is often small compared to the dimensionality of the data. The effectiveness of our model was illustrated with artificially generated data and in two experiments aiming at detecting hand movements by measuring the pressure induced by the muscles activity of the forearm with tactile myography. We showed that the testing computational time of the proposed model is low, due to a computational cost independent of the number of training data, therefore making it compatible with realtime robotic applications.
Future work will investigate automatic rank selection procedures with the objective of automatically determining all the ranks of the model, therefore avoiding the use of crossvalidation in the training process. Moreover, we will investigate extensions of the proposed tensorvariate mixture of experts (TME) model to other applications in robotics, as well as to more complex models, such as hierarchical TME [17]
. It is worth noting that the proposed TME model permits to incorporate structural information of the data as a special case of neural network. Extensions of this model could then also lead to interesting perspectives in the development of neural networks for tensor data that would have better interpretability, that would allow training with smaller amount of data and that would provide better generalization results by avoiding overfitting.
Acknowledgment
We thank Guillaume Walck for his significant help on the preparation and execution of the experiment on the real robotic setup and for his suggestions during the preparation of the manuscript.
References
 [1] T. Kolda and B. Bader, “Tensor decompositions and applications,” SIAM Review, vol. 51, no. 3, pp. 455–500, 2009.
 [2] Q. Zhao, G. Zhou, L. Zhang, and A. Cichocki, “TensorVariate Gaussian Processes Regression and Its Application to Video Surveillance,” in IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014, pp. 1265–1269.
 [3] S. Calinon, “A tutorial on taskparameterized movement learning and retrieval,” Intelligent Service Robotics, vol. 9, no. 1, pp. 1–29, January 2016.
 [4] F. Miwakeichi, E. MartínezMontes, P. A. ValdésSosa, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “Decomposing EEG data into spacetimefrequency components using Parallel Factor Analysis,” NeuroImage, vol. 22, no. 3, pp. 1035–1045, 2004.
 [5] Y. Washizawa, H. Higashi, T. Rutkowski, T. Tanaka, and A. Cichocki, “Tensor Based Simultaneous Feature Extraction and Sample Weighting for EEG Classification,” in Intl. Conf. on Neural Information Processing (ICONIP), 2010, pp. 26–33.
 [6] R. Kõiva, E. Riedenklau, C. Viegas, and C. Castellini, “Shape Conformable High Spatial Resolution Tactile Bracelet for Detecting Hand and Wrist Activity,” in IEEE Intl. Conf. on Rehabilitation Robotics (ICORR), 2015, pp. 157–162.
 [7] A. Ozdemir, A. Zare, M. A. Iwen, and S. Aviyente, “Extension of PCA to higher order data structures: An introduction to tensors, tensor decompositions, and tensor PCA,” Proceedings of the IEEE, vol. 106, no. 8, pp. 1341–1358, 2018.
 [8] H. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos, “MPCA: Multilinear principal component analysis of tensor objects,” IEEE Transactions on Neural Networks, vol. 19, no. 1, pp. 18–39, 2008.
 [9] D. Tao, X. Li, X. Wu, and S. J. Maybank, “General Tensor Discriminant Analysis and Gabor Features for Gait Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 29, no. 10, pp. 1700–1715, 2007.
 [10] Y. Tang, R. Salakhutdinov, and G. Hinton, “Tensor analyzers,” in Proceedings of the 30th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 28, no. 3, 2013, pp. 163–171.
 [11] W. Guo, I. Kotsia, and I. Patras, “Tensor learning for regression,” IEEE Transactions on Image Processing, vol. 21, no. 2, pp. 816–827, 2012.
 [12] H. Zhou, L. Li, and H. Zhu, “Tensor Regression with Applications in Neuroimaging Data Analysis,” Journal of the American Statistical Association, vol. 108, no. 502, pp. 540–552, 2013.
 [13] H. Hung and C. Wang, “Matrix variate logistic regression model with application to eeg data.” Biostatistics, vol. 14, no. 1, pp. 189–202, 2013.
 [14] X. Tan, Y. Zhang, S. Tang, J. Shao, F. Wu, and Y. Zhuang, “Logistic Tensor Regression for Classification,” in Intelligent Science and Intelligent Data Engineering, 2013, pp. 573–581.
 [15] M. Signoretto, L. De Lathauwer, and J. A. K. Suykens, “A KernelBased Framework to Tensorial Data Analysis,” Neural Networks, vol. 24, no. 8, pp. 861–874, 2011.
 [16] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive Mixture of Local Experts,” Neural Computation, vol. 3, no. 1, pp. 79–87, 1991.
 [17] S. E. Yuksel, J. N. Wilson, and P. D. Gader, “Twenty years of mixture of experts,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 8, pp. 1177–1193, 2012.
 [18] J. D. Carroll and J. J. Chang, “Analysis of individual differences in multidimensional scaling via an nway generalization of ”EckartYoung” decomposition,” Psychometrika, vol. 35, no. 3, pp. 283–319, 1970.
 [19] R. A. Harshman, “Foundations of the PARAFAC procedure: Models and conditions for an ”explanatory” multimodal factor analysis,” UCLA Working Papers in Phonetics, vol. 16, pp. 1–84, 1970.
 [20] J. M. Hahne, F. Bießmann, N. Jiang, H. Rehbaum, D. Farina, F. C. Meinecke, K. R. Müller, and L. C. Parra, “Linear and nonlinear regression techniques for simultaneous and proportional myoelectric control,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, no. 2, pp. 269–279, 2014.
 [21] S. L. Phillips and W. Craelius, “Residual kinetic imaging: a versatile interface for prosthetic control,” Robotica, vol. 23, no. 3, pp. 277–282, 2005.
 [22] C. Nissler, M. Connan, M. Nowak, and C. Castellini, “Online tactile myography for simultaneous and proportional hand and wrist myocontrol,” in Proceedings of Myoelectric Control Symposium (MEC), 2017.
 [23] N. Jaquier, M. Connan, C. Castellini, and S. Calinon, “Tensor decompositions and applications,” Technologies, vol. 4, no. 5, 2017.
 [24] GPy, “GPy: A Gaussian process framework in python,” http://github.com/SheffieldML/GPy, since 2012.
 [25] “Shadow robot company,” https://www.shadowrobot.com/.

[26]
M. Vidovic, H. Hwang, S. Amsüss, J. M. Hahne, D. Farina, and K.R. Müller, “Improving the Robustness of Myoelectric Pattern Recognition for Upper Limb Prostheses by Covariate Shift Adaptation,”
IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 24, no. 9, 2016.