1 Introduction
Virtual Reality (VR) training simulators are growing in popularity and are used by aircraft pilots Yavrucuk et al. (2011), surgeons and clinicians Judkins et al. (2009), Huang et al. (2005), Vaughan et al. (2014b), military and defence applications Khooshabeh et al. (2017). Training simulators enable trainees to learn the skills required to perform skilled tasks by practicing on a virtual model invitro. The advantages of using simulators for surgery are well documented including the reduced risk of injury associated with practice on patients Lendvay et al. (2013), ability to safely practice emergency procedures and ability to practice on various models of different patients Vaughan et al. (2015). Additionally, due to changing training structure and compliance with the European Working Time Directive, clinical experts are being required to reduce their time spent assessing novices Williams (2014). Virtual training is useful in high risk activities such as epidural needle insertion, helping to avoid injuries Gupta et al. (2007). Automated training with VR simulators could assist, however methods for skill assessment in VR, particularly dynamically in realtime, are still undeveloped. The following sections outline recent methods for classifying skill and define our proposed method, which is applied and tested for accuracy.
1.1 Background
Skill classification in surgical VR training has been attempted using various approaches Vaughan and Gabrys (2016), including time series clustering and classification which received considerable research attention Montero et al. (2014), Paparrizos and Gravano (2015). The JIGSAWS dataset Gao et al. (2014) provides benchmark data from Da Vinci robot for testing skill classification and 100% has been achieved using deep learning Forestier et al. (2018)
specifically for surgical skill assessment. Since 2018 this emerging field of Surgical Data Science rapidly accelerated with increases in Deep Learning for multivariate Time Series Classification. Recent advances producing state of art results in general time series classification include long short term memory (LSTM), a recurrent Fully Convolutional Network (FCN) for multivariate series
Karim et al. (2017)and TimeNet, a multilayered Recurrent Neural Network (RNN)
Malhotra et al. (2017), inspired by successful image feature extraction. Machine learning algorithms with inertial measurement units can improve the predictive power of surgeon motion analysis
Watson (2014). Support Vector Machine (SVM) classification can achieve 86% sensitivity whereas the nonmachine learning Lempel–Ziv (LZ) complexity metric gave 64% sensitivity. This suggests that nonparametric supervised learning algorithm such as SVM applied to surgical skills classification can be useful for motion pattern recognition.
Oropesa et al. (2014) assessed skill for MIS, handeye bimanual coordination, spatial perception in a sample population of 4 experts, 22 residents, 16 novices, applying Linear discriminant analysis (LDA) (71%), Nonlinear support vector machine (SVM) (78.2%) and Adaptive neurofuzzy inference systems (ANFIS) (71.2%). Fuzzy classification and radial bias function (RBF) was used Huang et al. (2005)with MISTVR dataset containing 4 experts, 4 intermediate, 4 novices over 200 Epochs gaining 33%. Support vector machine (SVM)
Allen et al. (2009)gained 91.6%. Hidden Markov Models (HMMs) can be used to classify surgeon skills from surgical gestures with accuracy up to 100% and discovers rules governing task ordering
Tao et al. (2012). Surgical skill can be classified using global measurements of the simulation, such as the distance travelled Datta et al. (2001), the total time taken Judkins et al. (2009), force or pressure signatures Yamauchi et al. (2002), the number and speed of hand motions Datta et al. (2001). These global measurements offer the quickest methods but lack information about task structure. Dynamic time warping (DTW) could further benefit classification.1.2 Dynamic Time Warping
Given and are two multivariate time series, various local distance functions, denoted , are compatible with DTW. Euclidean distance (Eq. 1) can be applied if both series are of equal length. Squared Euclidean distance uses the same formula (Eq. 1) without the square root, reducing computation. Manhattan city block is widely used and rapidly computed. Minkowski forms a generalisation of Euclidean and Manhattan distances. Others distance measures include Mahalanobis Mahalanobis (1930), Bhattacharyya Bhattacharyya (1943) and Canberra Lance and Williams (1966).
(1) 
where and refer to variable from element within and , both containing elements with variables.
The dynamic programming algorithm DTW distance measure supports multivariate time series of unequal length where Berndt and Clifford (1994). A comparison between Euclidean and DTW is shown in Fig. 1.
Visualization of the DTW with warping path is shown in Fig. 2, left. The Global constraints Itakura Parallelogram Sakoe and Chiba (1978) applies (Fig. 2, middle) (Eq. 2).
(2) 
1.3 Time Series Prototyping
Several time series clustering methods require combination of several time series into a single prototype time series, representing characteristics of all time series within a cluster Gusfield (1997). Prototyping is an essential tool for many clustering algorithms like
Means, or Ascendant Hierarchical Clustering to reposition cluster centroids and describe each cluster. Mean and median prototyping does not perform accurately and perturbs convergence of clustering algorithms, producing a nonrepresentative prototype
Petitjean et al. (2011), Vaughan and Gabrys (2016). Partition Around Medoids (PAM) prototyping has a benefit that it avoids modifying any time series by calculating for each time series the sum of distances to all other series, and selecting the prototype to be the one time series which has lowest total distance (Eq. 3).(3) 
DTW barycenter averaging (DBA) Petitjean et al. (2011) is an iterative global method. The global nature of DBA enables the avoidance of iterative pairwise averaging, so results are unaffected by ordering. Shape based extraction Paparrizos and Gravano (2015) or fuzzybased prototypes use fuzzy clustering such as fuzzy cmedoids (FCMdd) Izakian et al. (2015). For this work, we propose a new method of time series prototyping: DTW Multivariate Prototyping (DTWMP).
1.4 Dynamic monitoring with Upper and Lower Envelopes
Upper and lower envelopes are generated in this work to create a tunnel of acceptable motion using lower bounding so that an alert can be triggered if a VR object exits from the normal path of motion. Existing lower bounding methods have been proposed including: Yi et al. (1998), Kim et al. (2001), Keogh and Ratanamahatana (2005), Lemire (2009). For this work there are three purposes of applying lower bounds: (1) To speed up the classification of new insertions by reducing the complexity of the similarity search required for new insertions, (2) To enable an alarm to be raised if the new time series does not stay within the tunnel of acceptable motion, between the upper and lower envelope around the cluster’s prototype insertion. (3) The upper and lower envelopes define an area which will contain all of the expert insertions, and our proposed prototype.
1.5 Dynamic Assessment of Incomplete Series
Skill classification taking place during an insertion only has access to a partial time series, which is the first part of a surgical procedure, but not the end, because the remainder of the procedure has not yet been completed. This problem is related to time series subsequences Rakthanmanon et al. (2012). The given time series of length represents a sequence, . A subsequence , is a shorter region of length from within which starts at any position within (whereby is restricted such that , whereby
. There are requirements which we aim to achieve when dealing with dynamic data: (1) Estimate during a procedure what proportion of the procedure has been completed. (2) Identify if the procedure is running fast or slow in comparison to the training insertions. (3) Compute the distance between a new partial trajectory and the cluster prototype. Recent research on detection of unusual time series events refer to discord subsequences
Yankov et al. (2008), outliers, unusual, abnormal
Naftel and Khalid (2006), Pokrajac et al. (2007), novel, deviant or anomalous time series subsequences Keogh et al. (2005), Laptev et al. (2015). Our previous research has outlined methods for prediction of time series Lemke and Gabrys (2010), Ruta et al. (2010), which are particularly relevant in the context of the incomplete time series and could be applied to predict events which are likely to arise in the time series, which is a future work. We apply comparison of incomplete time series in this work to enable dynamic monitoring of insertions in realtime.2 Methods
2.1 Development of an Epidural Simulator
We developed a virtual reality epidural training simulator using 3 Degrees Of Freedom (DOF) haptic input with force feedback and epidural pressure measurements (Fig. 2, right)
Vaughan et al. (2014b). The 3D graphics model contains vertebrae with software and haptic based biomechanical models of soft tissues based on measurements from our clinical trial with obstetric patients of various Body Mass Index (BMI). The data generated from the VR simulator consists of multivariate time series recording position, force and pressure over time. The 3D motion of the tool in x, y and z planes is recorded over time by the haptic device. The fourth dimension is force applied by the user, measured within the haptic device. The fifth dimension is pressure, measured within the syringe plunger using a custom wireless microcontroller system Vaughan et al. (2014a). Each epidural procedure tends to last around 20 seconds, during which the measurements were recorded at 500MHz, with 2 millisecond intervals. The resulting time series lengths are approximately 10,000 for each of the 5 measurements.2.2 Data Collection Trial
We recorded simulator training data using our VR epidural training simulator Vaughan et al. (2014a)
. Seven participants were in two groups: GroupC contained 3 medical trained NHS clinicians, each with varying experience of performing epidurals on real patients: ClinicianA had performed around 1000 real epidurals, ClinicianB had performed around 100 insertions and ClinicianC had performed around 20 epidurals. GroupN contained 4 nonclinicians who were not medically trained. Within GroupN, NonClinicianA had performed over 300 simulated epidurals, and the other 3 participants (NonClinicianB, NonClinicianC, NonClinicianD) had never performed real or simulated epidurals before. Skill labels (N=Novice, I=Intermediate, E=Expert) were assigned to each insertion based on the experience level, clinical background and number of clinical epidurals each clinician had previously completed. In total 271 needle insertions were recorded. NonClinicianA recorded 101 epidurals. NonClinicianB, NonClinicianC and NonClinicianD recorded 95, 31 and 20 epidurals. ClinicianA recorded 10, ClinicianB recorded 9 and ClinicianC recorded 5. To avoid data skew, epidural40 subset of the 271 epidural dataset was created containing 40 insertions which exactly matches the class distribution and size of JIGSAWS dataset
with 8 participants performing 5 insertions each. This 40 epidural dataset was created at both lengths of 500 and 5000. All of the data recordings were of simulated epidural insertion, using the same haptic device, the same build version of the VR software and on the same computer. Recordings from 271 insertions were stored as multivariate time series containing 5 variables: . Each series has different length, relative to the time taken.2.3 Normalization of the Data
Due to collection methods, standardization of the raw data recorded is necessary to set microcontroller sensor data onto the same scale as the haptic device data. A standard normalization method is applied which scales each element within the time series by subtracting the population mean from
and diving by the standard deviation
. After normalization if one time series contains very subtle movement on axis and another has large movement on axis, these will be normalized into the same scale.2.4 Skill classification Method 1: DTW 1NN
The DTWNN classifier was used to classify insertion skill, making use of all recorded insertions. Each of the training examples is checked to identify which most closely resembles the new insertion, by calculating the DTW distance between the new insertion and all previous recorded examples. The new insertion is labelled the same class as the closest training example (N, I or E).
2.5 Skill classification Method 2: Nearest Centroid
Nearest centroid classifier applies the NN classifier by measuring DTW distance between the new insertions and the cluster prototypes, reducing the number of DTW computations required compared to DTWNN which requires all insertions. A range of 7 stateofart prototype methods were applied: Mean, SoftDTW, DBA, PAM, Shape Extraction, and we propose 2 new prototype methods: DTWMP and DTWMP. The prototype of each skill level (N,I,E) is calculated to represent a prototypical insertion for each cluster. The prototypical insertions are built from all insertions in each cluster. We propose the DTW Multivariate Prototyping (DTWMP) algorithm to produce prototypical time series using DTW Vaughan and Gabrys (2016). Our proposed prototyping method (DTWMP) has advantages over Mean and Median: (1) DTWMP retains features which occur in two time series at different times which would not be aligned by Mean. (2) DTWMP can handle two series of different length unlike Mean. (3) The DTWMP prototype is guaranteed to stay within the summativeenvelope. Our proposed prototyping method (DTWMP) starts by calculating the warping path between two time series (Fig. 2, left). The new DTWMP prototype is created with the same length as the warping path . Each element in is set to the mean of the two elements from and which were aligned in the equivalent element of the warping path . Therefore, the length of the new DTWMP prototype , will be the same length as the warping path , which is .
2.6 Skill classification Method 3: Deep learning
Four deep learning techniques were used for time series classification: (1) a relatively deep Residual Network (ResNet) with 9 convolutional layers and a Global Average Pooling (GAP) layer He et al. (2016). (2) Fully Convolutional Network (FCN) Long et al. (2015)
with a final layer of Global Average Pooling (GAP). (3) Convolutional Neural Network (CNN)
Zhao et al. (2017)with final discriminative layer taking the result of the convolutions to give probability distribution over class variables. (4) MultiChannel Deep Convolutional Neural Network (MCDCNN)
Zheng et al. (2016)with architecture of a traditional deep CNN, plus the convolutions are applied in parallel on each dimension of the input MTS. These four DL architectures were chosen to provide a range of frameworks due to their previous successful applications to time series classification tasks. The DL implementation architectures were matching with the open source time series classification framework
Fawaz et al. (2019). Each method was applied to the 40 epidural subset for skill classification.Details of each deep learning architecture are in Fig. 3. Our method harnesses transfer learning, within FCN architecture, as one advantage of the utilized FCN method is the invariance which enables the use of a transfer learning approaches to train models on one dataset and further tune it on other target datasets. All the convolutions in the framework have a stride of 1 which preserves the length of the time series after convolution. The methods we used to incorporate the time series data into the deep learning architectures is shown in Fig. 3 (top). The architectures for ResNet (Fig. 3, middle) and FCN (Fig. 3, lower) are shown. Architectures were based on adopted highest performing frameworks
Fawaz et al. (2019). This illustrates that deep learning with multivariate time series is a challenging problem.3 Methods for Dynamic Monitoring
3.1 Phase estimation to detect proportion of time series
It is first required to dynamically estimate how much of the procedure has been completed at any given time. Phase estimation is achieved by producing 10 subsequences of the prototype increasing in size from 10% to 100% at 10% intervals. The DTW distance is computed between the incomplete new insertion and each of the 10 prototype subsequences. The best match identifies the proportion of the procedure which is completed, irrespective of whether one time series occurred faster than the other.
3.2 Cluster Prototyping
Hierarchical clustering is applied to group similar good or bad insertions techniques together. In order to perform clustering, a distance matrix comparing each time series to all others is generated. The DTW Euclidean normalised distance is precalculated between all pairs of insertions to produce a distance matrix of size , which requires DTW computations. Within each time series , the number of elements is approximately 1000, so each DTW comparison requires approximately element alignments which equates to Euclidean distance calculations. The distance matrix is used as input to hierarchical clustering.
3.3 Upper and Lower Bounds tunnel of acceptable motion
To enable dynamic realtime monitoring a tunnel or envelope of acceptable motion is created. We propose summativeenvelope algorithm which creates an envelope pathway guided in shape by the closest insertions to the expert cluster prototype. It generates a new pair of time series which contain a summative upper (Eq. 4) and summative lower (Eq. 5) envelopes from the expert insertion envelopes. Each element of the summative upper envelope is set to the maximum of any expert upper envelope at that time (Eq. 5), denoted as , , , … The summativeenvelope is generated for all 5 variables in the multivariate time series, so the summativeenvelope could be visualised as a 5D tunnel of acceptable motion.
(4) 
(5) 
where refers to variable from element within time series .
This has benefit that new insertions will fit within the summativeenvelope if they are within the envelope of any previously expert insertion, but not if it contains an event unlike any previously seen event in an insertion. The strength of our proposed summativeenvelope technique is that the summativeenvelope combines data from all insertions, whereas the lower bounding upper and lower envelopes only contain information from one time series. The summativeenvelope only needs to be computed once from the training data. It can subsequently be used to raise an alarm each time a new insertion goes outside of the summativeenvelope. During a simulation, summativeenvelopes could enable the allowable motion to be visualised by the user in realtime, providing more clarity of procedural requirements. When reviewing completed simulations, the summativeenvelope can enable a visualisation showing where and when it went wrong. If a new insertion is one of the closest to the prototype, the envelope is then regenerated when the insertion completes, adapting to the new data. This was previously a problem with blackbox skill classification.
3.4 Scoring a procedure based on tunnel of acceptable motion
The next step is to score each new insertion by checking that it remains inside a tunnel or envelope of acceptable motion which is created using prototypical insertion data. In order to score a new insertion dynamically during the insertion, the summative envelope from the identified cluster is used. The new insertion is monitored dynamically to identify whether it stays inside the envelope. If an insertion goes out of the envelope, an alarm is raised. The score is updated dynamically using Eq. 6, during the insertion taking into account what proportion of the insertion was outside of the envelope and the distance outside the envelope.
(6) 
Where is the total sum of distances the new insertion was outside the summative upper and summative lower envelopes, which have 5 dimensions: x, y, z, pressure and force.
4 Results for Skill Classification
This section describes the application of the proposed methods to our VR simulator for epidural needle insertion.
4.1 Skill Classification 1: DTW1NN
DTW1NN was applied to classify skill of the Epidural40 subset into classes N, I, E. Accuracy of DTW1NN was 60%. 1NN outperformed NN with = 2 to 9. Results were verified by LOOCV and 5 fold CV. Accuracy wasn’t affected by length reduction from 5000 to 500, which doesn’t largely affect DTW distances Ratanamahatana and Keogh (2004), Vaughan and Gabrys (2016). The DTWNN classification was also applied to the full dataset of 271 epidural insertions giving 90.03% accuracy.
4.2 Skill Classification 2: Nearest Centroid
Results from the nearest centroid classifier applied to the 40epidural subset for skill classification showed that accuracy depends largely on the algorithm used for generating the centroid. Of the 7 prototype algorithms applied, SoftDTW gave highest accuracy (77.5%) for skill classification, results for each are in Table I. The cluster prototypes are shown in Fig. 4. SoftDTW (red), Partition Around Medoids (PAM) (pink), DTW Barycenter Averaging (DBA)(Yellow), Shape Extraction (SE) (cyan), DTWMP (green), DTWMP (navy), Mean (black), and individual time series (grey) in each class (N, I, E).
SoftDTW  Mean  DBA  SE  PAM  DTWMPD  DTWMPI 
77.5%  70%  47.5%  60%  60%  50%  52.5% 
4.3 Skill Classification 3: Deep Learning
Four deep learning techniques for time series classification were applied for skill classification. Time series length of 5000 and 500 were tested and produced similar results. Results are shown in Table II validated with both LOOCV (and 5fold CV in brackets).
ResNet  FCN  CNN  MCDCNN 
85% (60.2%)  75% (82.5%)  72.5% (72.5%)  28.5% (23.6%) 
5 Results From Dynamic Assessment
5.1 Clustering epidural insertions
Clustering was applied to the 271 insertions which produced seven clusters, as shown in the dendrogram in Fig. 5. This identifies the optimum separation between clusters and the optimum similarity within clusters according to the Cluster Validity Indices (CVIs). Most of the clusters contain a mixture of skill levels and individuals (Fig. 6). One individual can use several techniques or one technique can be used by several individuals. Clusters may represent numerous valid techniques of performing a good insertion, or common mistakes repeated by different people. For these reasons, when clustering was applied to produce 3 clusters representing each skill level (N, I, E), insertions from each skill level were not all grouped together.
Assessment of cluster validity was performed using seven CVIs including Sil, Dunn, COP, DB, DBStar, SF and CH Arbelaitz et al. (2013). The CVIs showed that highest validity was achieved by hierarchical clustering. Generating seven clusters produced various numbers of insertions in each cluster: 129, 81, 3, 4, 37, 16, 1 with hierarchical clustering or 45, 83, 11, 24, 76, 26, 6 with partitional clustering.
5.2 Prototypes of the 7 clusters
A prototype was generated for each of the 7 clusters identified. Fig. 7 shows a comparison between the 7 prototypes on the axis. This reveals why certain insertions were clustered separately. Our proposed prototyping method (DTWMP) was used.
5.3 Scoring by distance from cluster prototype
All insertions from cluster 1 are shown in Fig. 8. Highlighted black is the cluster 1 prototype insertion. Highlighted in Purple is the number one discord found in all trajectories which was furthest away from the prototype and highlighted in green is the sequence closest to the prototype, based on DTW with Euclidean distance. The recorded trajectories and centroid was plotted into the VR simulator as shown in Fig 7, showing all trajectories in cluster 1. This is useful for the VR trainee who can visualise their performance in comparison to previous insertions.
Each insertion can be dynamically scored according to the DTW distance from the prototype, generating the cumulative error cost matrix (Fig. 9). Cumulative error is much lower in the best trajectory (left) and higher error in the worst trajectory (right). This graph could be useful for clinicians to visualise which stage of the procedure contained most error. In the cost matrices (Fig. 9), the worst insertion has total cost of 656 whereas the best insertion has lower cost of 112 (shown on axis in leftmost corner). This indicates that the worst insertion was approximately 5.8 times further from the ideal trajectory.
5.4 Generating envelopes for dynamic score monitoring
The upper and lower bounds were generated from the prototypes of each cluster. is not compatible with multivariate data, so the upper and lower envelopes and were generated for each axis individually (Fig. 10).
The summativeenvelopes were computed around the prototypes of each cluster, including the 4 best insertions in each cluster, according to their DTW distance from the cluster prototype. In Fig. 11, and were created by taking the four insertions (solid black lines) which were closest to the prototype of cluster 1. Then lower bounding algorithm was applied to generate the 4 upper envelopes (dashed red lines) and 4 lower envelopes (dashed green lines). The envelopes were combined by creating the summativeenvelope for cluster 1 shown in Fig. 11 as bold lines. Two unseen insertions (solid purple lines) are shown, which both fit within the summative envelope, so they would be classified as good insertions. Where subsections of the insertion require higher accuracy than others, the summativeenvelope intuitively produces a thinner tunnel in those areas.
5.5 Phase estimation with incomplete time series
The phase estimation algorithm predicted correctly 90% of the time which proportion of the insertion had taken place. Even with a bad insertion, this method can detect that it’s a bad insertion even with only the first 10% of the data, as the normalised distance is already much greater than that of the best insertions. During insertion, Fig. 12 shows that the completed part of an incomplete insertion has low normalised distance when compared to a similar percentage of the prototype, but a high distance when compared to a different percentage of the prototype. The percentages within Fig. 12 indicate the current percentage that has been completed of an incomplete insertion.
5.6 Dynamic scoring
The dynamic scoring begins by computing the distance between a new incomplete insertion and equal proportion of the prototype predicted by phase estimation. This results in a dynamic system able to score a partial insertion in realtime during insertion. Fig. 13 shows the dynamic scoring process applied to four new unseen partial insertions (green). All four new insertions stayed within the summativeenvelope upper (dark blue) and summativeenvelope lower (light blue) but some were closer to the cluster prototype (black) than others. The envelope has differing width in some parts, which represents the variation in accuracy required in some stages of the procedure.
The system can offer dynamic adaptive learning by adding new insertions into the training set. When a new insertion is combined with an existing cluster, it is possible either to recompute the prototype including the new insertions or combine the new insertion with the existing prototype with weighting. Unusual new insertions can be identified if they are allocated into existing clusters with known bad insertions. These can then be added into the training set as bad insertions.
5.7 Identifying the individual
The DTWNN classifier was applied to identify who performed each insertion. Of the 247 nonclinician insertions 95.4% were classified as nonclinicians. Some individuals were easier to classify than others, NonClinicianA was correctly classified 100% of the time due to high consistency (Fig. 14), whereas ClinicianA only 20%. Overall, the individual was classified correctly in 81.2% of insertions.
6 Discussions
This research proposed three methods for skill classification by analyzing multivariate time series recorded during a VR epidural training simulator procedure. Our data collection of 271 virtual reality epidural procedures included three trained clinicians from the NHS. (1) DTW1NN achieved 60% classification accuracy. (2) Nearest centroid classifier SoftDTW achieved 77.5%. (3) Within deep learning methods, ResNet achieved 85%, FCN 75%, CNN 72.5% and MCDCNN was inaccurate with 28.5%. Therefore the nearest centroid approach is competitive and only outperformed by the ResNet deep learning architecture. High performance of ResNet matches with previous TSC results applying DL methods (Fawaz et al., 2019). Insights into why SoftDTW was the optimum prototyping method for nearest centroid include that SoftDTW is differentiable and both its value and gradient can be computed with quadratic time/space complexity making it more suited to cluster time series under the DTW geometry. In this case ResNet outperformed SoftDTW leveraging favorable features of our 5dimensional multivariate time series dataset. Our dataset originated from epidural needle procedures which are relatively short compared to DaVinci surgeries which commonly produce longer, higher dimensional time series and future work could investigate whether SoftDTW or ResNet would perform similarly on those additional datasets. We proposed a new time series prototyping algorithm, DTWMP which is applied to create a prototype insertion for each cluster. New insertions are classed according to their DTW distance from the cluster prototype. The research developed dynamic methods to assess the score of a virtual reality task while the task is being completed. (1) Clustering is performed to divide the time series training set into groups according to the different techniques. (2) We propose the Summativeenvelopes algorithm, which takes the best time series from each cluster including the prototype to create a combined envelope using lower bounds. This can raise alarms in realtime if a time series exits the envelope tunnel. Our experiment showed that DTW1NN can recognise which trainee performed a virtual reality task in 81% of the cases. The developed methods enable trainees to view their score, clustering and summativeenvelope in realtime during insertion. The summativeenvelope reveals which parts of the procedure were abnormal. After reviewing the performance, the trainee’s technique can improve by repetitive practice until their motion becomes closer to the cluster prototype representing an expert, improving performance and skill of trainees. Over time, trajectory clustering algorithms can enable the measurement of consistency within a single trainee’s performances, and identify the trainee’s improvement of consistency over time. Future work can use these results for several purposes including: (i) Adaptation and automation of VR training based on the recorded data to customise VR training for individual requirements. (ii) Detecting the type of motion or hand gestures using classification. (iii) Recognising actions which increase the risk of injury to raise an alert. In future, the developed methods could be applied to invivo data, tracking devices or cameras monitoring surgeons with hospital patients as well as being applied to trajectories from VR training simulators.
7 Ethics
The Bournemouth University Ethics service has reviewed the study plan prior to initiation. The dataset does not contain identifying information such as names or personal information.
Acknowledgements.
This project was supported by the Royal Academy of Engineering (RAEng) under the Research Fellowship scheme awarded to Professor Neil Vaughan, also support from University of Exeter, University of Technology Sydney and Bournemouth University during the time of the research.Conflict of interest
The authors declare that they have no conflict of interest.
References
 Support vector machines improve the accuracy of evaluation for the performance of laparoscopic training tasks. Surgical Endoscopy 11 (1), pp. 170. Cited by: §1.1.
 An extensive comparative study of cluster validity indices. Pattern Recognition 46 (1), pp. 243–256. Cited by: §5.1.
 Using dynamic time warping to find patterns in time series.. In KDD workshop, Vol. 10(16), pp. 359–370. Cited by: §1.2.
 On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc. 35, pp. 99–109. Cited by: §1.2.
 The use of electromagnetic motion tracking analysis to objectively measure open surgical skill in the laboratorybased model. Journal of the American College of Surgeons 193 (5), pp. 479–485. Cited by: §1.1.
 Deep learning for time series classification: a review. Data Mining and Knowledge Discovery 33 (4), pp. 917–963. Cited by: §2.6, §2.6.
 Surgical motion analysis using discriminative interpretable patterns. Artificial intelligence in medicine 91, pp. 3–11. Cited by: §1.1.
 Jhuisi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In MICCAI Workshop: M2CAI, Vol. 3, pp. 3. Cited by: §1.1.
 Increasing dural tap rate: is this a national trend. Int J Obstet Anesth 16, pp. S17. Cited by: §1.
 Algorithms on stings, trees, and sequences: computer science and computational biology. Acm Sigact News 28 (4), pp. 41–60. Cited by: §1.3.

Deep residual learning for image recognition.
In
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778. Cited by: §2.6.  Fuzzy classification: towards evaluating performance on a surgical simulator. Studies in health technology and informatics 111, pp. 194–200. Cited by: §1.1, §1.
 Fuzzy clustering of time series data using dynamic time warping distance. Engineering Applications of Artificial Intelligence 39, pp. 235–244. Cited by: §1.3.
 Objective evaluation of expert and novice performance during robotic surgical training tasks. Surgical endoscopy 23 (3), pp. 590. Cited by: §1.1, §1.
 LSTM fully convolutional networks for time series classification. IEEE access 6, pp. 1662–1669. Cited by: §1.1.
 Hot sax: efficiently finding the most unusual time series subsequence. In Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 8–pp. Cited by: §1.5.
 Exact indexing of dynamic time warping. Knowledge and information systems 7 (3), pp. 358–386. Cited by: §1.4.
 Mixed reality training for tank platoon leader communication skills. In 2017 IEEE Virtual Reality (VR), pp. 333–334. Cited by: §1.
 An indexbased approach for similarity search supporting time warping in large sequence databases. In Proceedings 17th International Conference on Data Engineering, pp. 607–614. Cited by: §1.4.
 Computer programs for hierarchical polythetic classification (“similarity analyses”). The Computer Journal 9 (1), pp. 60–64. Cited by: §1.2.

Generic and scalable framework for automated timeseries anomaly detection
. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1939–1947. Cited by: §1.5.  Faster retrieval with a twopass dynamictimewarping lower bound. Pattern recognition 42 (9), pp. 2169–2180. Cited by: §1.4.
 Metalearning for time series forecasting and forecast combination. Neurocomputing 73 (1012), pp. 2006–2016. Cited by: §1.5.
 Virtual reality robotic surgery warmup improves task performance in a dry laboratory environment: a prospective randomized controlled study. Journal of the American College of Surgeons 216 (6), pp. 1181–1192. Cited by: §1.
 Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440. Cited by: §2.6.
 On test and measures of group divergence: theoretical formulae. Journal and Proceedings of Asiatic Society of Bengal 26, pp. 541–588. Cited by: §1.2.
 TimeNet: pretrained deep recurrent neural network for time series classification. arXiv preprint arXiv:1706.08838. Cited by: §1.1.
 TSclust: an r package for time series clustering. Journal of Statistical Software 62 (1), pp. 1–43. Cited by: §1.1.

Classifying spatiotemporal object trajectories using unsupervised learning in the coefficient feature space
. Multimedia Systems 12 (3), pp. 227–238. Cited by: §1.5.  Supervised classification of psychomotor competence in minimally invasive surgery based on instruments motion analysis. Surgical endoscopy 28 (2), pp. 657–670. Cited by: §1.1.
 Kshape: efficient and accurate clustering of time series. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1855–1870. Cited by: §1.1, §1.3.
 A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition 44 (3), pp. 678–693. Cited by: §1.3, §1.3.

Incremental local outlier detection for data streams
. In 2007 IEEE symposium on computational intelligence and data mining, pp. 504–515. Cited by: §1.5.  Searching and mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 262–270. Cited by: §1.5.
 Everything you know about dynamic time warping is wrong. In Third workshop on mining temporal and sequential data, Vol. 32. Cited by: §4.1.
 A generic multilevel architecture for time series prediction. IEEE Transactions on Knowledge and Data Engineering 23 (3), pp. 350–359. Cited by: §1.5.
 Dynamic programming algorithm optimization for spoken word recognition. IEEE transactions on acoustics, speech, and signal processing 26 (1), pp. 43–49. Cited by: §1.2.
 Sparse hidden markov models for surgical gesture classification and skill evaluation. In International conference on information processing in computerassisted interventions, pp. 167–177. Cited by: §1.1.
 Does virtualreality training on orthopaedic simulators improve performance in the operating room?. In 2015 Science and Information Conference (SAI), pp. 51–54. Cited by: §1.
 Epidural pressure measurements from various bmi obstetric patients. Journal of Medical Devices 8 (3). Cited by: §2.1, §2.2.
 Parametric model of human body shape and ligaments for patientspecific epidural simulation. Artificial intelligence in medicine 62 (2), pp. 129–140. Cited by: §1, §2.1.
 Comparing and combining time series trajectories using dynamic time warping. Procedia Computer Science 96, pp. 465–474. Cited by: §1.1, §1.3, §2.5, §4.1.
 Use of a machine learning algorithm to classify expertise: analysis of hand motion patterns during a simulated surgical task. Academic Medicine 89 (8), pp. 1163–1167. Cited by: §1.1.
 The implementation of the working time directive and its impact on the nhs and health professionals, taskforce, independent working time regulations. London: The Royal College of Surgeons of England. Cited by: §1.
 Surgical skill evaluation by force data for endoscopic sinus surgery training system. In International Conference on Medical Image Computing and ComputerAssisted Intervention, pp. 44–51. Cited by: §1.1.
 Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowledge and Information Systems 17 (2), pp. 241–262. Cited by: §1.5.
 A low cost flight simulator using virtual reality tools. IEEE Aerospace and Electronic Systems Magazine 26 (4), pp. 10–14. Cited by: §1.
 Efficient retrieval of similar time sequences under time warping. In Proceedings 14th International Conference on Data Engineering, pp. 201–208. Cited by: §1.4.
 Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics 28 (1), pp. 162–169. Cited by: §2.6.
 Exploiting multichannels deep convolutional neural networks for multivariate time series classification. Frontiers of Computer Science 10 (1), pp. 96–112. Cited by: §2.6.
Comments
There are no comments yet.