Transferrable Operative Difficulty Assessment in Robot-assisted Teleoperation: A Domain Adaptation Approach

06/12/2019 ∙ by Ziheng Wang, et al. ∙ The University of Texas at Dallas 0

Providing an accurate and efficient assessment of operative difficulty is important for designing robot-assisted teleoperation interfaces that are easy and natural for human operators to use. In this paper, we aim to develop a data-driven approach to numerically characterize the operative difficulty demand of complex teleoperation. In effort to provide an entirely task-independent assessment, we consider using only data collected from the human user including: (1) physiological response, and (2) movement kinematics. By leveraging an unsupervised domain adaptation technique, our approach learns the user information that defines task difficulty in a well-known source, namely, a Fitt's target reaching task, and generalizes that knowledge to a more complex human motor control scenario, namely, the teleoperation of a robotic system. Our approach consists of two main parts: (1) The first part accounts for the inherent variances of user physiological and kinematic response between these cross-domain motor control scenarios that are vastly different. (2) A stacked two-layer learner is designed to improve the overall modeling performance, yielding a 96.6 Fitts' reaching task when using movement kinematic features. We then validate the effectiveness of our model by investigating teleoperated robotic needle steering as a case study. Compared with a standard NASA TLX user survey, our results indicate significant differences in the difficulty demand for various choices of needle steering control algorithms, p<0.05, as well as the difficulty of steering the needle to different targets, p<0.05. The results highlight the potential of our approach to be used as a design tool to create more intuitive and natural teleoperation interfaces in robot-assisted systems.



There are no comments yet.


page 1

page 3

page 5

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Motivations

In current robot-assisted teleoperation systems, human users typically control remote, robotic slave tools to carry out complex operations by physically manipulating a master controller [1]

. The decision for how to map user inputs to robotic slave outputs is largely the choice of the designer, with careful considerations needed in cases where the degrees of freedom of the master manipulator and slave end-effectors, or robots, do not match. In general, the use of teleoperation has enabled improvements in the capabilities and performance of the human operator in many scenarios, such as robot-assisted surgery 

[2], search and rescue robots [3], and space exploration [4], to name a few. However, due to limitations such as that kinematic dissimilarity between master and slave devices, time delay, and the lack of sensory feedback from the slave environment [5, 6], it has been shown that teleoperation also can be mentally and physically challenging for operators to use. Furthermore, user perception of difficulty can affect one’s ability to perform specific tasks [7, 8]. As such, a poor choice of the teleoperation interface can negatively affect task performance, degrade user experience, and reduce the overall effectiveness of the system [9, 10]. Interest in this problem has also increased in the collaborative robotics community. Studies suggest that an efficient assessment of difficulty is imperative for determining how to appropriately share levels of autonomy between the robot and human operators to deliver adaptive robotic assistance and improve user satisfaction [11]. Thus, our goal is to provide an objective approach to quantify the intuitiveness, or alternatively, the difficulty demand, of robot-assisted teleoperated tasks.

1.2 Background & Prior Work

Accurately inferring the difficulty of a teleoperated task is practically challenging. Current work on teleoperation difficulty assessment has primarily focused on the administration of customized questionnaires or standard user ratings, such as the NASA Task Load Index (TLX) [12]. This approach, while helpful and easy to obtain, has the risk of being biased due to human interpretation [13, 14, 15]. Furthermore, the post-experiment surveys do not describe the real-time cognitive effort or behavioral and physical demand of the task. These drawbacks not only limit the usefulness of user surveys, but could also explain differences that have been found in the literature regarding a mismatch between user performance and perceived task workload demand [16, 17, 18, 19, 20].

An objective measure of difficulty is possible in simple human motor control tasks. Commonly used in human-computer interaction and ergonomics research, Fitts’ Law describes underlying difficulty demand of a simple target reaching task [21]. The difficulty of the target-reaching motion, i.e., the index of difficulty (ID), is mathematically quantified as a function of the target width (W) and distance to the target (D): ID . As a widely accepted empirical rule, Fitts’ difficulty model has been applied in a variety of studies for ergonomic design and general human-computer interaction analysis [22, 23].

Figure 1: Overview of objective difficulty assessment in complex robot-assisted teleoperation. Our approach takes the user sensorimotor measures (physiological response and movement kinematics) as input. A model was fitted on an established source: simple motor control (Fitts’ target reaching), and then adaptively transferred for analyzing the target domain: a complex teleoperation task (robotic needle steering), while compensating the distribution variances of user response for cross-domain applicability. The model output is a numerical index indicating the inherent difficulty demand of teleoperation task.

However, directly applying the traditional Fitts’ paradigm to real-world teleoperation could be an oversimplification as the tasks are typically more involved than simple target reaching and may require more complex motor manipulation from the human operator. Additionally, in teleoperated tasks, the robotic system often has no knowledge of particular objectives of the human operator making task-specific metrics such as time and errors less effective in assessing task difficulty.

In this work, we aim to provide an objective task difficulty assessment explicitly devoted to characterizing the underlying difficulty demand of robot-assisted teleoperation tasks. We propose a data-driven modeling approach to automatically predict the difficulty of complex operation based on the real-time measures of human sensorimotor response (i.e., physiological response and movement kinematics). Fig. 1 shows the overview. Our approach leverages the known difficulty information from a well-established source, namely, a simple human motor control task (Fitts’ reaching), and generalizes to a related but more complex motor control scenario (i.e., robot-assisted teleoperation). Meanwhile, we explicitly consider the variability of the human sensorimotor response [24, 25, 26], which has historically made objective assessment of operative difficulty difficult. Based on unsupervised domain adaptation, our approach compensates for the inherent user response variances from two different motor control tasks. Thus, the resulting model could be applicable to the target domain of interests without a priori knowledge of particular tasks. Furthermore, a stacked machine learning architecture consisting of multiple single-algorithm learners is used to improve the overall modeling accuracy. Finally, we validated the effectiveness of our approach by analyzing the teleoperation of robotic steerable needles and compared our difficulty predictions with the results of standard NASA TLX user surveys. To our knowledge, this is the first study to provide a transferable operative difficulty assessment for better understanding the underlying difficulty demand of complex robot-assisted teleoperation.

2 Methodology

Figure 2:

Detailed architectures of proposed model for transferrable operative difficulty assessment. The domain adaptation is accomplished by weighting source domain samples for modeling while minimizing the Kullback-Leibler divergences of the distributions between source and target domains.

2.1 Problem Definition & Formulation

The primary goal of our study is to develop a learner that can numerically infer the difficulty demand of a complex robot-assisted teleoperation task, while leveraging the established knowledge of difficulty in a simple motor control task. Fig. 2 illustrates the overall framework of the operative difficulty assessment. We formulated the process as an unsupervised domain adaptation problem, which does not require the availability of any a priori labels of particular real-world teleoperation tasks [27]. To align with domain adaptation literature, we refer to the data of complex teleoperation as the “target domain” and the simple motor control task as the “source domain”, respectively.

Formally, we denote the source domain as . Each -th sample pair is composed by the input data

that is drawn from a probability distribution

, with a corresponding label of difficulty index

that follows a conditional probability distribution

. Here, is the dimension of input data. Similarly, we denote the target domain as , where is the unlabeled target data drawn independently from a probability distribution . The and are the sample sizes in source and target domains, respectively.

According to the above formulation, predicting the difficulty of complex teleoperation task in target domain can be solved by minimizing the empirical risk , namely, the expected loss, , under the target data distribution, , given the model parameters, :


where denotes the prediction function that maps input data to the output label . The prediction function can be equivalently expressed in a conditional probability form with respect to the target distribution as .

2.2 Covariate Shift Adaptation

Covariate shift refers to a phenomenon that the samples in source and target domains follow different data distributions, namely, , while the conditional distributions of the output values assumably remain unchanged (the learned prediction function is applicable to both), namely,  [28]. Under the covariate shift, the learned model from source domain can not be directly applied to target domain. Instead, the distribution discrepancy should be explicitly taken into account for enabling the cross-domain applicability. One popular technique to compensate the covariate shift is the sample-based importance re-weighting. Given a set of source domain and target domain data, the expectation over each source sample is adaptively weighted in accordance with their similarity to the target data distribution [29]. Intuitively, the target difficulty labels can be more accurately inferred from labeled source domain samples that shared more similarity of user response. To this end, the importance weight of source domain samples, , is defined as the ratio of target domain distribution to the source domain distribution:


In order to estimate the importance weight for each sample, several protocols have been proposed, including direct density estimation 

[29], kernel mean matching [30], unconstrained least-squares importance fitting [31], and Kullback-Leibler importance estimation procedure (KLIEP[32]. In this work, we implemented the KLIEP procedure for domain adaptation due to its computational efficiency and stability [33]. Specifically, we compute the weight estimate by following steps:

First, we parametrize the weight as a mixture of Gaussians:


where are the mixing coefficients, is a fixed coefficient size, denotes the Gaussian kernel function with a kernel width , and is the Gaussian centre that is randomly sampled from target domain distribution . The Gaussian kernel width is chosen based on a built-in cross-validation procedure. Then, the optimal coefficients are determined such that the Kullback-Leibler (KL) divergence between the distributions and , namely, , is minimized. Since the KL divergence function is convex, the obtained weight estimate can thus be guaranteed as globally unique [32].

Finally, based on the KLIEP

domain adaptation, the expected loss function over the distribution of target domain

can thus be computed with respect to the distribution of source domain as:


Note that, estimating the importance weights of samples is performed in a fully unsupervised way and thus does not require any a priori labels in either source or target domain.

2.3 Stacked Two-layer Architecture

While the aforementioned domain adaptation adjusts the domain of applicability so that source domain data can adaptively align with that of the target domain for building a model, it does not necessarily guarantee an optimal prediction accuracy as a plain covariate-shift adapted learner can still produce high-variance estimations, thus causing instability. To alleviate this problem, we seek an optimal function that minimizes the expected loss (Eq. 4

). To this end, we proposed a stacked two-layer architecture that combines multiple machine learning algorithms as base learners. The idea behind stacked learning is that averaging predictions from diverse single learners might better enhance accuracy and simultaneously reduce variances. Followed by the aforementioned adaptation process, two stages of learning were performed: First, five single machine learning algorithms were chosen as components in the first layer, including ridge regression, support vector machine, random forests, stochastic gradient boosting, and adaptive boosting (AdaBoost). Each single learner was fitted on source domain data

given the importance weights

as estimated from the previous adaptation step. Then, the second layer algorithm takes the predictions generated by the single-algorithm learners as input and is optimally trained to output a final prediction. We choose a linear regression with

regularization to ensemble the output of first-layer base learners while avoiding over-fitting.

Taken together, transferrable difficulty assessment for complex teleoperation tasks from target domain can be ultimately solved by minimizing the weighted loss of stacked learner along with the importance weights with respect to the distribution of source domain :


In the following, we briefly introduced the five single-algorithm base learners in our proposed architecture.

2.3.1 Ridge Regression

Ridge regression is a linear regression algorithm with regularization to alleviate multicollinearity among feature variables in regression [34]. The algorithm of ridge regression can be described as , where the coefficients,

, are solved by the ordinary least-square optimization such that the penalized residual sum of squares is minimized.

2.3.2 Support Vector Machine

Support vector machine performs a robust regression by exploiting a maximum-margin separating hyperplane in a transformed subspace 

[35]. The regression version of SVM algorithm can be described as , where denotes a kernel function that maps input features into a linearly separable subspace via nonlinear transformations. Within the new feature space, it searches for an optimal separating hyperplane by solving an inequality-constrained quadratic optimization problem.

2.3.3 Random Forests

Random forests regression is an ensemble algorithm that fits a collection of decision trees 


. The algorithm generates a set of unpruned decision trees based on bootstrapped data samples and then randomly selects the subsets as candidates to split tree nodes. This process is repeated multiple times to blend input data for the purpose of alleviating over-fitting and noisy outliers. Given the input, final regression prediction of random forests model is made by averaging ensemble outputs of the generated trees.

2.3.4 Stochastic Gradient Boosting

Stochastic gradient boosting is another ensemble algorithm that improves the predictive performance. This approach combines a set of weak learners in a greedy fashion via sequentially minimizing residual errors [37]. First, the predictions of a base learner (commonly a decision tree) are made and its residual errors are then fitted to the initial learner. Then, it updates the model by adding an additional learner to the previous learner. The final model is obtained via iteratively repeating above steps such that the prediction loss is minimized. In contrast to basic gradient boosting [38], stochastic gradient boosting randomly selects a subset of samples for learning, which alleviates the potential over-fitting.

2.3.5 Adaptive Boosting (AdaBoost)

Different from the stochastic gradient boosting, AdaBoost model fits a set of weak learners while sequentially re-weighting samples to obtain an incremental prediction improvement [39]. Given initial sample weights, a base learner is first learned by fitting the input samples. Then, additional learners are sequentially added and trained on the same data but the initial weights are individually adjusted at each iteration to minimize prediction loss. Last, the final model is obtained by applying the weighted average to the output of multiple individual weak learners [40]. Note that the sample weights for AdaBoost here are first initialized based on the importance weights obtained from the previous domain adaptation process and then sequentially updated for modeling optimization.

2.4 Implementation Details

We employed a nested cross validation for training and testing models using labeled source domain data. The nested cross validation consists of an inner loop and an outer loop. Within each loop, a random subset of samples was used as a holdout for validation, while the remaining data was used for -th () fold training. In our study, we used -fold cross-validation for the inner () and -fold cross-validation for the outer loop (). We obtained the optimal sample importance weights and model hyper-parameters through the process of inner loop cross validation. Then, the outer loop cross validation was applied to evaluate prediction performance on the testing set.

We implemented our methodology using Scikit-learn and Python 3.6. All codes for model training and testing were run on the UT Southwestern BioHPC high-performance computing cloud platform with an NVIDIA Tesla P100 GPU.

3 Experimental Methods

Figure 3: Experimental setup for real-time acquisition of user response. The subject manipulates a master controller (haptic device) to teleoperate a simulated slave end-effector. Wearable body sensors capture the user sensorimotor response in real time during operations: (1) physiological signals, including electromyography (EMG), galvanic skin response (GSR), electroencephalography (EEG), and heart rate (HR), (2) Movement kinematics of dominant forearm and upper arm captured by Inertial Measurement Units (IMUs) sensors.

To validate the effectiveness of our approach, we performed the experiments to objectively inquire the inherent difficulty demand of a complex teleoperation task. Two independently-collected datasets were used in this study: one is the source domain dataset from a simple motor control task, and the other one is the target domain dataset from a complex teleoperation task. Fig. 3 shows the common experimental setup for real-time acquisition of user response. In both experiments, subjects manipulate the position and orientation of a simulated slave tool by controlling a master controller. A haptic device (Geomagic Touch) is used as the master controller while providing haptic feedback (depending on the control task). Simultaneously, all user data were non-invasively recorded in real-time using Robot Operating System (ROS). The collected user data include the signals of human physiological response (electromyography, galvanic skin response, electroencephalography, and heart rate) and movement kinematic signals (limb angular velocities and linear accelerations). More details regarding the sensor setup, data acquisition, and signal preprocessing can be found in our prior work [41, 42].

3.1 Datasets

Figure 4: Source domain: a simple motor control task (Fitts’ target reaching). Subjects control a virtual tooltip to reach a set of targets (see Fig. 6 (a)) by manipulating the master controller (Geomagic Touch). The difficulty of Fitts’ reaching was known and labeled as the index of difficulty (ID) according to Fitts’ law. Changing the distance (D) between the target and a fixed starting point could change the difficulty of Fitts’ reaching.
Figure 5: Target domain: a complex motor control task (teleoperation of a robotic steerable needle). Four different control algorithms were available for subjects to steer the simulated needle-like robot to reach a series of predefined targets (see Fig. 6 (b)) by manipulating the master controller (Geomagic Touch). In both joint space and steering control, the needle is controlled via insertion and rotation inputs from the master controller. In Cartesian space control, the operator defines the desired trajectory of the needle tip in Cartesian space. Haptic feedback regarding the inherent kinematic constraints of the needle is provided to the human operator [42].
Figure 6: Illustration of target layouts from two human motor control tasks in (a) source domain (Fitts’ target reaching) and (b) target domain (teleoperation of robotic needle).

3.1.1 Source Domain Dataset

Source domain data was collected from subjects when performing a simple motor control task: Fitts’ target reaching. Fig. 4 illustrates the Fitts’ target reaching task. In the experiment, subjects are required to reach a series of predefined targets (see Fig. 6 (a)) by controlling the stylus of the haptic device. According to Fitts’ law, the difficulty levels of Fitts’ target reaching were varied by changing the distances between the target and a fixed starting point. We chose six different difficulty levels of target reaching (ID) ranging from 2.0 to 7.0 (bits). To obtain a comparable difficulty scale, we further normalized the original difficulty indices into the range from 0 to 1.

Dataset Task Master Controller Slave End-effector Label Subject Recorded Sensory Data Sample Size
Source domain Fitts’ Target Reaching Geomagic Tough Virtual Tooltip Labeled (Fitts’ IDs) 14 Physiological Response Movement Kinematics 420
Target domain Robotic Needle Steering Geomagic Tough Robotic Steerable Needle Unlabeled 6 Physiological Response Movement Kinematics 960
Table 1: A description of datasets collected from independent experiments: Fitts’ target reaching and robotic needle steering.

3.1.2 Target Domain Dataset

Target domain data was collected from subjects when performing a higher-level motor control task: the teleoperation of robotic steerable needles [42]. The needle is a complex non-holonomic robot that is controlled through the axial insertion and rotation forces [43]. The inherent kinematics of the steerable needle make manual control for these needles very challenging, thus creating opportunities for performance improvements for these systems through robot-assisted teleoperation [44, 45, 20]. While various potential algorithms exist for these needles, in this paper, we focus on four different control algorithms, including (1) joint space control (JC), (2) steering control (SC), (3) Cartesian space control (CC), and (4) Cartesian space control with haptic force feedback (CFB). Fig. 5 illustrates the teleoperation of steerable needle, as well as the different types of control algorithms employed. Fig. 6 (b) shows the target layout with the four target locations for needle steering. We briefly introduce the four control algorithms below and detailed definitions can be found in our prior work [20] and [42].

  • Joint space control: In joint space control, the user defines the insertion velocity and rotation angle for the steerable needle, similar to steering a car. As the needle inherently curves during insertion, the user must actively control curvature by varying the rotation angle.

  • Steering control: Steering control was implemented by direction-pointing (left or right-pointing), similar to hub-centered steering (i.e., driving a motorcycle). Users manipulate the stylus to define steering angle which translates to axial needle rotation.

  • Cartesian space control: In Cartesian space control, the user defines a desired trajectory for the needle tip with the haptic stylus in Cartesian space and a nonholonomic kinematic model for needle curvature is used to determine corresponding insertion and rotation inputs for the needle [20]. Virtual haptic boundary is employed to alert the user when the needle trajectory is beyond the boundary of reachable space.

  • Cartesian control with haptic force feedback: In addition, a secondary form of Cartesian space control was implemented to provide an additional haptic cue. A position-error-based force, , proportional to the distance between needle tip position and current user position was added to prevent the user from moving too far ahead of the needle when defining a needle trajectory.

These four algorithms were chosen for this experiment because not only are they all complex control tasks, but also there is no objective way to assess the difficulty of these algorithms relative to each other. Table 1 summarizes collected datasets from both experiments in the source domain and target domain. Additional details related to these two experiments can be found in our prior work [41, 20].

3.2 Participants

For the source domain data, fourteen subjects participated in the experiment. Each subject completed tasks over five separate sessions, where in each session subjects are required to reach each of six targets that were presented in a random order. The experiment results in the total sample size of 420 individual Fitts’ reaching trials.

In the target domain dataset, six subjects in total participated in the experiment. All subjects completed four sessions wherein each session subjects are required to perform needle steering using a single control algorithm that was randomly selected. Each session included 40 randomized trials (10 repetitions for each of four different target locations) for each of the control algorithms. The experiment resulted in a the total sample size of 960 individual needle steering trials.

3.3 Feature Extraction

Instead of focusing on performance metrics that are dependent on the task objectives, such as target locations, ideal trajectories, or ideal task completion times, we characterize the user response using features that are task independent. Our features consist of two different modalities, i.e., physiological response features (Physiological) and kinematic movement features (Kinematic) of human subjects, extracted from signals measured in real-time during task completion.

3.3.1 Physiological Features

To quantify user physiological response, all physiological features were normalized to the baseline on a per-subject basis in order to eliminate individual variances across subjects. Details of obtaining the baseline of physiological response for each subject are available in [41].

We extracted the normalized root mean square (RMS) and normalized mean absolute value (MAV) to quantify muscle activations from surface EMG signals. Two normalized probabilistic metrics, Engagement and Workload, were extracted using BIOPAC Cognitive State Analysis (BIOPAC Systems, Inc.) to assess subjects’ cognitive states from the EEG data [46]. For galvanic skin response, we extracted the normalized skin conductance () and normalized conductance variances (). Finally, we calculated the normalized heart rate (HR) based on peak detection of photoplethysmography signals collected from an optical pulse sensor.

3.3.2 Kinematic Features

To characterize user movement, we extracted kinematic features from user dominant forearm and upper arm’s motion profiles that were collected from IMU sensors. We calculated the normalized angular velocity (AngVel), normalized linear acceleration (LinAcc) and average jerk (Jerk). For any motion trajectory, the normalized motion features, i.e., normalized angular velocity and linear acceleration, were calculated as the mean divided by the peak velocity or acceleration. The average jerk was used to assess the movement smoothness given the motion profiles of user arms [47]. In addition, we extracted features measured from slave end-effectors to characterize movement kinematics in the simulated workspace, including the path straightness (PathStrDev) and path efficiency (PathEff[42]. The path straightness and path efficiency quantify the efficiency of tool movements and user motor ability to continuously manipulate tools. It is important to note that these kinematic features are derived using only the position and velocity data, and thus do not depend on a priori knowledge of actual desired paths.

3.4 Modeling Evaluation

Evaluation of our proposed method was carried out in two steps. First, we evaluate the prediction accuracy on the labeled source domain data . According to the aforementioned nested cross-validation scheme, prediction accuracy on testing data are averaged over the 10-fold outer-loop cross validation. We used the root mean squared error (RMSE) and to assess overall prediction accuracy. A lower RMSE value and higher indicate a better prediction accuracy.

Next, we aim to evaluate the prediction accuracy for the unlabeled target domain data . However, this is not directly possible as there are no ground-truth difficulty labels for the target domain. Therefore, we depend on other methods of assessing difficulty as a comparison. We used the NASA Task Load Index (TLX) rating survey, a structured grading tool that was used to subjectively assess difficulty based on subject experience in teleoperation. The six measurable scales of NASA TLX rating include: Mental Demand, Physical Demand, Temporal Demand, Performance, Effort and Frustration. In the experiment, NASA TLX rating was conducted for each control algorithm when subjects completed a teleoperation session; therefore, four NASA TLX surveys were acquired from each subject. The values of each scale, ranging from 0 to 20, were normalized by dividing by the maximum value selected for each subject across all surveys into the range of 0 to 1. An average combined score was then calculated from the six NASA TLX scales scores to access the overall user-perceived difficulty level for a given control algorithm. The larger value indicates a higher user-perceived difficulty demand.

To quantify the strength and statistical significance of our prediction, two-way ANOVA was conducted to determine significant differences between groups (i.e., control algorithm and targets) on the predicted difficulty. One-way ANOVA was conducted to determine the group of control algorithms on the TLX difficulty ratings. Post-hoc pairwise Tukey test was used for the significant results to determine the degree to which the group levels of interests differed. For all statistics, significance was determined by value of 0.05.

Figure 7:

(a) t-SNE visualization of distribution variances among extracted physiological response (Physiological) and movement kinematics (Kinematic) features. High-dimensional data from target (red dots) and source domains (blue dots) were projected into two-dimensional space for visualization. Plot is shown for a preliminary qualitative analysis of covariate shift. (b) A histogram of importance weights estimated from

KLIEP method. Data sample with a less distribution discrepancy of the extracted features between source and target domains was assigned with a larger importance weight for modeling.
Modeling Methods Physiological Kinematic

Single Learner

Ridge Regression 0.233 0.499 0.097 0.916
Support Vector Machine 0.219 0.560 0.083 0.938
Random Forest 0.224 0.544 0.074 0.949
Stochastic Gradient Boosting 0.248 0.444 0.142 0.819
AdaBoost 0.246 0.453 0.074 0.950
Stacked Learning Model 0.215 0.576 0.061 0.966
Table 2: Source-domain evaluation: comparison of our proposed approach with different modeling methods. The prediction performance was evaluated given the ground-truth Fitts’ difficulty index. The results are the RMSE and of test sets averaged from the outer 10-fold cross-validation. Bolded numbers denote the highest average accuracy based on the input data of different features: physiological response features () and movement kinematic features ().
Feature Control Algorithm
Physiological 0.555 0.453 0.451 0.450
Kinematic 0.748 0.425 0.291 0.497
Feature Target Configuration
T1 T2 T3 T4
Physiological 0.432 0.498 0.488 0.482
Kinematic 0.255 0.679 0.623 0.521
Table 3: Target-domain prediction: mean predicted difficulty values of control algorithms (JC, SC, CC, and CFB) and target configurations (T1, T2, T3, and T4) based on different features (Physiological and Kinematic features) as input. Higher value indicates a higher difficulty demand for respective operation.
Feature Control Algorithm Target Configuration Interaction
post-hoc comparisons post-hoc comparisons
Physiological CFBCC, CFBJC T1(T4T3T2)
Kinematic CC(SCCFB)JC T1T4(T3T2)
represents significance
Table 4: Target-domain prediction: two-way ANOVA statistical analysis and post-hoc Tukey comparisons of control algorithms and targets on predicted difficulty demand. Round bracket denotes no significance between the pairs of group levels.

4 Results & Discussion

It is important to note that our numerical difficulty prediction in this study is similar to, but different from that originally proposed by Fitts. Both models are aimed to quantify the underlying difficulty demand of certain operations. However, the difficulty model in Fitts’ definition is quantified as a logarithmic function of target distance and target width only, which depend on particular task settings. In contrast, difficulty assessment in this work is modeled as a prediction function ID , based on extracted human sensorimotor response features as input. It is mathematically solved (see Eq. 5) while taking into the account the covariate shift of user response for an optimal modeling. Furthermore, our approach is data-driven and task-independent in the sense that the assessment is informed by using user physiological or kinematic features only. It does not depend on any knowledge of certain task constraints or settings, such as target locations, ideal trajectories.

4.1 Distribution Variances & Shift Adaptation

Understanding the distribution variances of extracted features among the simple motor control and complex teleoperation from the source and target domains is an important step when employing our learning scheme. For a direct qualitative analysis, we used the t-Distributed Stochastic Neighbor Embedding (t-SNE) method to visualize the distribution variances cross domains. As a popular approach of visualizing complex data, t-SNE projects the high-dimensional data into lower-dimensional space while the original distributions are preserved in the projection [48].

Fig. 7(a) shows the t-SNE projection in the two-dimensional space based on the input data of different feature modalities. For both physiological and kinematic features, the structure of data distribution in target domain is characteristically different from the source domain data. Specifically, the distribution of source domain data (depicted in blue color) falls into a well-grouped structure, which is overlapped with the target domain data that is more scattered (depicted in red color). It reveals that an underlying distribution discrepancy exists among the extracted features (, ) in the target and source domains. This observation can be explained by the stochastic nature of user physiological response and motion kinematics, and thus justifies the need for further adaptation in our learning model. It is important to note that the t-SNE projection is a nonlinear, stochastic process that depends on a choice of parameters [49]. Therefore, we only interpret t-SNE outputs as the tool for a preliminary qualitative analysis and visualization of the distribution variances.

Since two domains have different distributions, the key factor is to bridge the gap between them to enable knowledge transfer. Specially, the above-mentioned covariate shift was quantitatively adapted by the KLIEP procedure (see Section II.B.) to align the user data distributions between source and target domains. Fig. 7(b) presents the histogram of estimated importance weights of all source domain samples. Higher value of weights indicates a larger similarity of data distributions cross domains. The majority of sample importance weights was within a range from 0 to 1. This result further confirms the existing data distribution discrepancies of the selected features across domains.

4.2 Prediction Accuracy at Source Domain

As the model evaluation process contains two steps, we first show the prediction accuracy on the labeled source domain. Table 2 summarizes the average testing accuracy from the 10-fold outer-loop cross-validation using different modeling methods. The results show that the presented stacked learning model outperformed the other single-algorithm learners with the highest prediction accuracy, given the input data of either physiological features or kinematic features. In particular, from the stacked learning model, the movement-based difficulty prediction showed a highest accuracy with the RMSE , , whilst the physiological response-based model provided a lower accuracy with the average RMSE , . This result indicates that the movement features carry the most representative information for difficulty prediction. This result makes sense as the Fitts’ reaching task directly affects the way and extent to which the human user moves his or her arm. In contrast, extracted physiological features may not be sensitive enough to detect difficulty changes accurately. One explanation is that the user physiological signals, such as EMG and EEG, have relatively higher inherent signal noises. Further improvement of physiological signal acquisition might help to achieve better difficulty identification.

Figure 8: Box plot summarizing mean predicted difficulty values of control algorithms (JC, SC, CC and CFB) in needle steering using physiological response (left) and kinematic features (right).
Figure 9: Box plot summarizing mean predicted difficulty values of target configurations (T1, T2, T3 and T4) in needle steering using physiological response (left) and kinematic features (right).

4.3 Operative Difficulty of Robotic Needle Steering

Operative difficulty demand of complex motor control in the target domain, i.e., teleoperation of robotic needle, was characterized by the difficulty predictions output by our developed model, as shown in Table 3. Fig. 8 and Fig. 9 are the box plots summarizing the mean values of predicted difficulty based on the candidate physiological features (left) and kinematic features (right) for characterizing control algorithm (JC, SC, CC, and CFB) and target layout (T1, T2, T3, T4), respectively. Each box represents the 25th, 50th, and 75th percentiles, while the whiskers represent the confidence intervals. The difficulty prediction was further analyzed using a two-way ANOVA to determine the significant differences between the groups of control algorithms and target layouts. Post-hoc pairwise Tukey comparisons highlight the significant effects among each group. Table 4 summarizes the statistical results from two-way ANOVA and post-hoc Tukey comparisons.

4.3.1 Effects of Control Algorithms

As shown in Table 4, the analysis consistently shows significant differences of the control algorithms in needle steering for all physiological features (), and kinematic features (), respectively. In particular, from both modalities of selected features, joint space control was associated with the highest mean difficulty value. It indicates that joint space control was significantly more difficult to manipulate, physiologically and kinematically. Cartesian space control shows smaller difficulty values from both physiology-based and movement-based predictions. These results match our expectations because by abstracting the nonholonomic task constraints into haptic constraints, Cartesian space control no longer requires subjects to maintain an accurate internal model of the needle kinematics, and thus potentially decreasing the difficulty demand for the control task.

Interestingly, from the physiology-based prediction, Cartesian space control with haptic feedback (resistant force feedback in this study) had the lowest average difficulty value at 0.450, which was also significantly less than Cartesian space control without haptic feedback (CC), ; however, from the movement-based prediction, Cartesian space control with haptic feedback (CFB) has significantly higher difficulty of 0.497, compared to the Cartesian space control without haptic feedback (CC) at 0.291, . It is evident that the addition of haptic feedback diminished the difficulty demand physiologically, however, significantly increased the kinematic cost. This result might be linked to previous studies that haptic feedback contributes to a higher overall physical workload [50]. Although haptic force feedback was generally designed to improve overall control performance [51, 52, 53], the presence of such a feedback might not always guarantee intuitive user experience kinematically. As indicated from the movement-based prediction, haptics could potentially interfere with user movements due to the additional force resistance, and thus resulted in the increase of underlying difficulty demand for the task. Though seemingly contradictory, the physiology-based and movement-based predictions are able to provide a meaningful difficulty assessment from different aspects of the individual user response. Our analysis indicates that a better understanding of the impacts of force feedback on the user physiological response and movement behaviors would be helpful to fine-tune haptics settings and deliver a more intuitive robotic interface for the human user.

Figure 10: Predicted operative difficulty of target configurations in conjunction with particular control algorithms. Error bars denote the 95% confidence intervals.
NASA-TLX Control Algorithm ANOVA
JC SC CC CFB post-hoc comparisons
Mental Demand 0.833 0.722 0.361 0.278 0.157 -
Physical Demand 0.735 0.412 0.441 0.500 0.397 -
Temporal Demand 0.676 0.882 0.412 0.471 0.147 -
Performance 0.382 0.294 0.147 0.088 0.206 -
Effort 0.917 0.806 0.194 0.167 0.271 -
Frustration 0.563 0.438 0.063 0.094 0.232 -
Average 0.775 0.665 0.258 0.231 0.141 -
represents significance
Table 5: One-way ANOVA and post-hoc Tukey comparisons of control algorithms on NASA-TLX user ratings. The average score aggregating measures of six TLX scales represents an overall user-perceived difficulty demand. No significant groups were found in post-hoc comparisons.
Figure 11: Box plot summarizing mean values of normalized NASA TLX user ratings from six user-perceived scales (Mental Demand, Physical Demand, Temporal Demand, Performance, Effort, and Frustration) together with an average scale.

4.3.2 Effects of Target Configurations

Similar to the effects of control algorithms, the statistical analysis showed that target layouts in needle steering produced significant differences on the difficulty demand for all features. As shown in Table 4, for both physiological and movement kinematic features, the closest target, T1, had the lowest difficulty demand. For the difficulty model using kinematic features, target T4 was also significantly less difficult to reach than T2 and T3. This can be explained by the fact that in all control modes, this target was either reached by a simple arm reach (in Cartesian space) or a continuous insertion with a constantly changing steering angle (in Joint and Steering control).

4.3.3 Interaction Between Control Algorithms and Targets

Next, we investigated whether any potential interaction effects exists between the different targets and control algorithms on task difficulty. Fig. 10 shows the mean difficulty values of different control algorithms for each target configuration. Using the joint space control (JC), it was significantly less difficult for subjects to reach target T1 while no statistical significance was found for reaching the other targets. For the targets T1, T3, and T4, needle steering using Cartesian control was found to have the lowest difficulty demand. For the target T2, the steering control (SC) is associated with the lowest average difficulty demand compared to the other control algorithms. These comparisons suggest that it is possible to improve difficulty demand for operators by combining the consideration of control algorithms and specific setups. The result represents an opportunity for future studies to adaptively adjust control algorithms according to certain task objectives for a more intuitive robotic teleoperation.

4.3.4 Comparison to Subjective Difficulty Assessments

To validate our proposed approach, we compared the predicted difficulty demand of needle steering operations with the user perceived difficulty demand measured from standard NASA TLX user ratings. Table. 5 shows the results from one-way ANOVA statistics of normalized user ratings within different control algorithms. Fig. 11 shows the mean values of normalized scores from respective NASA TLX scales.

We acknowledge that the results of user-perceived difficulty demand from NASA TLX ratings were aligned with our difficulty prediction in general, however, no statistical significance were found in user ratings among all scales. In particular, joint space control was associated with the highest average TLX score (), followed by steering control (), Cartesian control (), and Cartesian control with haptic feedback (). This result shows that the joint space control was perceived as the most difficult one for operators. Though not significantly, Cartesian control with haptic feedback had the lowest score in the scale of Mental Demand while it was associated with higher score of Physical Demand, compared to steering control and Cartesian control without haptics. The companions of NASA TLX user ratings confirmed the validity of our proposed approach, while the difficulty prediction was able to provide a statistically significant assessment.

4.4 Limitations and Future Work

The first limitation can be found in the challenge of physiology-based difficulty modeling due to high variances of user physiological response. Despite our results indicating some potential for difficulty prediction, larger sample sizes would be needed to further improve accuracy and increase the statistical power. Secondly, we tested only the teleoperation of robotic needle as a case study of complex motor control. Further investigations of broader robot-assisted teleoperation are warranted to confirm the generalizability of the proposed difficulty modeling approach. Moreover, manually defining and selecting meaningful features from both the source and target domain is laborious. We might not have found the most distinguishing features of user data to capture the underlying difficulty information in complex control tasks. For future work, it could be desirable to automatically learn features from raw time-series data of user sensorimotor signals [54]. Finally, we are also interested in expanding our method towards online difficulty prediction that can be ultimately integrated with a robot system to provide better robotic assistance for complex teleoperation.

5 Conclusion

Accurately evaluating complex human motor control tasks and obtaining a clear understanding of inherent difficulty demand is a crucial step for designing an efficient, user-friendly robot-assisted teleoperation system. In this study, we present a novel method to infer the underlying difficulty demand of complex teleoperation tasks based on the real-time measures of human operator sensorimotor response. The proposed approach transfers the known difficulty information in a simple motor control task to analyze a complex teleoperation. Based on an unsupervised domain adaptation, our approach explicitly takes into account the distribution discrepancies of user response in different motor control scenarios. A stacked two-layer architecture, combining multiple single-algorithm learners, was implemented to refine predictions. We validated our approach by analyzing teleoperated robotic needle steering as a case study and compared our results with a standard NASA-TLX user survey. We showed that the operative difficulty demand of complex motor control can be readily inferred from user sensorimotor response, either physiologically or kinematically. In particular, our results confirmed that Cartesian space control provides the lowest operative difficulty at 0.291, with a statistical significance when compared to other control algorithms. The target configurations could also significantly affect the difficulty demand, .

Overall, our proposed approach is data-driven and task-independent in the sense that inferring underlying operative difficulty demand from the user physiological response and movement kinematics would not require any a priori knowledge of specific tasks or operational environment. As such, our method could potentially be generalized for analyzing a broader range of complex motor control scenarios in robot-assisted systems.


  • [1] M. Shahbazi, S. F. Atashzar, and R. Patel, “A systematic review of multilateral teleoperation systems,” IEEE Transactions on Haptics, 2018.
  • [2] J. W. Hill, et al., “Telepresence surgery demonstration system,” in Proceedings of the 1994 IEEE International Conference on Robotics and Automation.    IEEE, 1994, pp. 2302–2307.
  • [3] J. Casper and R. R. Murphy, “Human-robot interactions during the robot-assisted urban search and rescue response at the world trade center,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 33, no. 3, pp. 367–385, 2003.
  • [4] T. B. Sheridan, “Space teleoperation through time delay: Review and prognosis,” IEEE Transactions on robotics and Automation, vol. 9, no. 5, pp. 592–606, 1993.
  • [5] J. Y. Chen, E. C. Haas, and M. J. Barnes, “Human performance issues and user interface design for teleoperated robots,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 37, no. 6, pp. 1231–1245, 2007.
  • [6] E. Yang and M. C. Dorneich, “The emotional, cognitive, physiological, and performance effects of variable time delay in robotic teleoperation,” International Journal of Social Robotics, vol. 9, no. 4, pp. 491–508, 2017.
  • [7] P. Liu and Z. Li, “Task complexity: A review and conceptualization framework,” International Journal of Industrial Ergonomics, vol. 42, no. 6, pp. 553–568, 2012.
  • [8] K. Stowers, et al., “A framework to guide the assessment of human–machine systems,” Human factors, vol. 59, no. 2, pp. 172–188, 2017.
  • [9] A. Zunjic, “Performance and workload of operators in a human - tele-rebot system,” pp. 01–03, 08 2015.
  • [10] D. Pan, Y. Zhang, and Z. Li, “The effects of task complexity and spatial ability on teleoperation performance,” in International Conference on Engineering Psychology and Cognitive Ergonomics.    Springer, 2017, pp. 42–50.
  • [11] D.-J. Kim, et al., “How autonomy impacts performance and satisfaction: Results from a study with spinal cord injured subjects using an assistive robot,” IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, vol. 42, no. 1, pp. 2–14, 2012.
  • [12] S. G. Hart and L. E. Staveland, “Development of nasa-tlx (task load index): Results of empirical and theoretical research,” Adv. in Psycho., vol. 52, pp. 139–183, 1988.
  • [13] J. C. de Winter, “Controversy in human factors constructs and the explosive use of the nasa-tlx: a measurement perspective,” Cognition, technology & work, vol. 16, no. 3, pp. 289–297, 2014.
  • [14] S. G. Hart, “Nasa-task load index (nasa-tlx); 20 years later,” in Proceedings of the human factors and ergonomics society annual meeting, vol. 50, no. 9.    Sage Publications Sage CA: Los Angeles, CA, 2006, pp. 904–908.
  • [15] L. A. Cavuoto, et al., “Improving teamwork: evaluating workload of surgical team during robot-assisted surgery,” Urology, vol. 107, pp. 120–125, 2017.
  • [16] D. Whitney, et al., “Comparing robot grasping teleoperation across desktop and virtual reality with ros reality,” in Proceedings of the International Symposium on Robotics Research, 2017.
  • [17] D. Cannon and M. Siegel, “Perceived mental workload and operator performance of dexterous manipulators under time delay with master-slave interfaces,” in Computational Intelligence and Virtual Environments for Measurement Systems and Applications (CIVEMSA), 2015 IEEE International Conference on.    IEEE, 2015, pp. 1–6.
  • [18] D. B. Kaber, et al., “Effects of visual interface design, and control mode and latency on performance, telepresence and workload in a teleoperation task,” in Proceedings of the human factors and ergonomics society annual meeting, vol. 44, no. 5.    SAGE Publications Sage CA: Los Angeles, CA, 2000, pp. 503–506.
  • [19] C. D. Pham, H. N. T. Phan, and P. J. From, “Evaluation of Subjective and Objective Performance Metrics for Haptically Controlled Robotic Systems,” Modeling, Identification and Control, vol. 35, no. 3, pp. 147–157, 2014.
  • [20] A. Majewicz and A. M. Okamura, “Cartesian and joint space teleoperation for nonholonomic steerable needles,” in 2013 World Haptics Conf. (WHC).    IEEE, Apr. 2013, pp. 395–400.
  • [21] P. M. Fitts, “The information capacity of the human motor system in controlling the amplitude of movement.” J. Exp. Psychol., vol. 47, no. 6, p. 381, 1954.
  • [22] R. Meyer, et al., “Development and evaluation of an input method using virtual hand models for pointing to spatial objects in a stereoscopic desktop environment with a fitts’ pointing task,” in Adv. Ergonomic Design Syst., Products Process.    Springer, 2016, pp. 327–342.
  • [23] R. W. Soukoreff and I. S. MacKenzie, “Towards a standard for pointing device evaluation, perspectives on 27 years of fitts’ law research in hci,” Intl. J. Human-computer Studies, vol. 61, no. 6, pp. 751–789, 2004.
  • [24] W. Klonowski, “Everything you wanted to ask about eeg but were afraid to get the right answer,” Nonlinear Biomedical Physics, vol. 3, no. 1, p. 2, 2009.
  • [25] W. Samek, F. C. Meinecke, and K.-R. Müller, “Transferring subspaces between subjects in brain–computer interfacing,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 8, pp. 2289–2298, 2013.
  • [26] N. Stergiou and L. M. Decker, “Human movement variability, nonlinear dynamics, and pathology: is there a connection?” Human movement science, vol. 30, no. 5, pp. 869–888, 2011.
  • [27] S. J. Pan, Q. Yang, et al.

    , “A survey on transfer learning,”

    IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345–1359, 2010.
  • [28] M. Sugiyama, et al., Dataset shift in machine learning.    The MIT Press, 2017.
  • [29] H. Shimodaira, “Improving predictive inference under covariate shift by weighting the log-likelihood function,” Journal of statistical planning and inference, vol. 90, no. 2, pp. 227–244, 2000.
  • [30] J. Huang, et al., “Correcting sample selection bias by unlabeled data,” in Advances in neural information processing systems, 2007, pp. 601–608.
  • [31] T. Kanamori, S. Hido, and M. Sugiyama, “A least-squares approach to direct importance estimation,” Journal of Machine Learning Research, vol. 10, no. Jul, pp. 1391–1445, 2009.
  • [32] M. Sugiyama, et al., “Direct importance estimation with model selection and its application to covariate shift adaptation,” in Advances in neural information processing systems, 2008, pp. 1433–1440.
  • [33] M. Sugiyama, T. Suzuki, and T. Kanamori, “Density ratio estimation: A comprehensive review,” RIMS Kokyuroku, pp. 10–31, 01 2010.
  • [34] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.
  • [35] H. Drucker, et al., “Support vector regression machines,” Advances in neural information processing systems, vol. 9, pp. 155–161, 1997.
  • [36] K. Fawagreh, M. M. Gaber, and E. Elyan, “Random forests: from early developments to recent advancements,” Systems Science & Control Engineering: An Open Access Journal, vol. 2, no. 1, pp. 602–609, 2014.
  • [37] J. H. Friedman, “Stochastic gradient boosting,” Computational Statistics & Data Analysis, vol. 38, no. 4, pp. 367–378, 2002.
  • [38] J. H. Friedman, “Greedy function approximation: a gradient boosting machine,” Annals of statistics, pp. 1189–1232, 2001.
  • [39] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of computer and system sciences, vol. 55, no. 1, pp. 119–139, 1997.
  • [40] H. Drucker, “Improving regressors using boosting techniques,” in ICML, vol. 97, 1997, pp. 107–115.
  • [41] Z. Wang and A. M. Fey, “Human-centric predictive model of task difficulty for human-in-the-loop control tasks,” PloS one, vol. 13, no. 4, p. e0195053, 2018.
  • [42] Z. Wang, R. Isabella, and A. M. Fey, “Toward intuitive teleoperation in surgery: Human-centric evaluation of teleoperation algorithms for robotic needle steering,” in Robotics and Automation (ICRA), 2018 IEEE International Conference on.    IEEE, 2018, pp. 5799–5806.
  • [43] R. J. Webster III, et al., “Nonholonomic modeling of needle steering,” The International Journal of Robotics Research, vol. 25, no. 5-6, pp. 509–525, 2006.
  • [44] J. M. Romano, R. J. Webster, and A. M. Okamura, “Teleoperation of steerable needles,” in Proceedings 2007 IEEE International Conference on Robotics and Automation.    IEEE, 2007, pp. 934–939.
  • [45] C. Pacchierotti, et al., “Teleoperation of steerable flexible needles by combining kinesthetic and vibratory feedback,” IEEE transactions on haptics, vol. 7, no. 4, pp. 551–556, 2014.
  • [46] R. R. Johnson, et al., “Drowsiness/alertness algorithm development and validation using synchronized eeg and cognitive performance to individualize a generalized model,” Biological psychology, vol. 87, no. 2, pp. 241–250, 2011.
  • [47] S. Estrada, et al., “Smoothness of surgical tool tip motion correlates to skill in endovascular tasks,” IEEE Transactions on Human-Machine Systems, vol. 46, no. 5, pp. 647–659, 2016.
  • [48] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
  • [49] M. Wattenberg, F. Viégas, and I. Johnson, “How to use t-sne effectively,” Distill, 2016. [Online]. Available:
  • [50] J. C. Huegel and M. K. O’malley, “Workload and performance analyses with haptic and visually guided training in a dynamic motor skill task,” in Computational Surgery and Dual Training.    Springer, 2014, pp. 377–387.
  • [51] M. K. O’Malley, et al., “Shared control in haptic systems for performance enhancement and training,” Journal of Dynamic Systems, Measurement, and Control, vol. 128, no. 1, pp. 75–85, 2006.
  • [52] H. U. Yoon, et al., “Customizing haptic and visual feedback for assistive human–robot interface and the effects on performance improvement,” Robotics and Autonomous Systems, vol. 91, pp. 258–269, 2017.
  • [53] R. J. Kuiper, et al., “Evaluation of haptic and visual cues for repulsive or attractive guidance in nonholonomic steering tasks,” IEEE Transactions on Human-Machine Systems, vol. 46, no. 5, pp. 672–683, 2016.
  • [54]

    Z. Wang and A. M. Fey, “Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery,”

    International Journal of Computer Assisted Radiology and Surgery, vol. 13, pp. 1959 – 1970, 2018.