I Introduction
Participation in physical therapy and rehabilitation programs is necessary and critical in postoperative recovery by patients or for treatment of a wide array of musculoskeletal conditions. However, it is infeasible and economically unjustified to offer patients’ access to a clinician for every single rehabilitation session [1]. Accordingly, current healthcare systems around the world are organized such that an initial portion of rehabilitation programs is performed in an inpatient facility under direct supervision by a clinician, followed by a second portion performed in an outpatient setting, where patients perform a set of prescribed exercises in their own residence. Reports in the literature indicate that more than 90% of all therapy sessions are performed in a homebased setting [2]. Under these circumstances, patients are asked to record their daily progress and to visit the clinic periodically for progress assessment. Still, numerous medical sources report low levels of patient motivation and adherence to the recommended exercise regimens in homebased rehabilitation, leading to prolonged treatment times and increased healthcare cost [3], [4]. Although many different factors have been identified that contribute to the low compliance rates, the major impact factor is the absence of continuous feedback and timely oversight of patient exercises in a home environment by a healthcare professional [5].
Despite the development of a variety of new tools and devices in support of physical rehabilitation, such as robotic assistive systems [6], virtual reality and gaming interfaces [7], and Kinectbased assistants [2], [8], there is still a lack of versatile and robust systems for automatic monitoring and assessment of patient performance.
The article proposes a novel rehabilitation framework that encompasses formulation of metrics for quantifying movement performance, scoring functions for mapping the performance metrics into numerical scores of movement quality, and deep learningbased endtoend models for encoding the relationship between movement data and quality scores.
The studied performance metrics are classified into modelless and modelbased groups of metrics
[9]. The modelless metrics employ distance functions, such as Euclidean and Mahalanobis distance, or dynamic time warping (DTW) [10] deviation between data sequences. The modelbased metrics apply probabilistic approaches for modeling the movement data, and consequently, employ the loglikelihood for performance evaluation [11]. Next, the article investigates the effectiveness of deep autoencoder networks for dimensionality reduction of captured data. Further, we propose scoring functions for scaling the values of the studied performance metrics into the range. The resulting movement quality scores are employed as the ground truth for training the proposed deep neural networks (NNs) for rehabilitation applications.The proposed framework compares the performance of convolutional neural networks (CNNs), recurrent neural networks (RNNs), and hierarchical neural networks (HNNs)
[12]. The framework is validated on the University of Idaho – Physical Rehabilitation Movement Dataset (UIPRMD) [13]. To the best of our knowledge, this is the first framework that employs deep NNs for assessment of rehabilitation exercises.The main contributions of the paper are: (1) A novel framework for assessment of rehabilitation exercises using deep NNs; (2) Comparison of NN architectures for movement assessment, and use of antoencoder NNs for dimensionality reduction of rehabilitation data; and (3) Comparison of performance metrics, and formulation of scoring functions for mapping the performance metrics into movement quality scores.
The article is organized as follows. The next section provides an overview of related work. Section III describes the problem and introduces the mathematical notation. The ensuing section surveys common performance metrics for movement assessment. The proposed framework for rehabilitation assessment is presented in Section V, covering performance metric, dimensionality reduction, scoring functions, and deep NN architectures. The validation of the proposed framework on a dataset of exercises is reported in Section VI. The last two sections discuss the results and summarize the paper.
Ii Related Work
Iia Human Movement Modeling
Conventional approaches for mathematical modeling and representation of human movements are broadly classified into two categories: topdown approaches that introduce latent states for describing the temporal dynamics of the movements, and bottomup approaches that employ local features for representing the movements. Commonly used methods in the first category include Kalman filters
[14][15], and Gaussian mixture models [16]. The main shortcomings of these methods originate from employing linear models for the transitions among the latent states (as in Kalman filters), or from adopting simple internal structures of the latent states (typical for hidden Markov models). The approaches based on extracting local features employ predefined criteria for identifying key points [17]or a collection of statistics of the movements (e.g., mean, standard deviation, mode, median)
[18]. Such local features are typically motionspecific, which limits the ability to efficiently handle arbitrary spatiotemporal variations within movement data.Recent developments in artificial NNs stirred significant interest in their application for modeling and analysis of human motions. Numerous works employed NNs for motion classification and applied the trained models for activity recognition, gait identification, gesture recognition, action localization, and related applications. NNbased motion classifiers utilizing different computational units have been proposed, including convolutional units [19], [20]
, long shortterm memory (LSTM) recurrent units
[21], [22][23], and combinations [24] or modifications of these computational units [25]. Also, NNs with different layer structures have been implemented, such as encoderdecoder networks [22], spatiotemporal graphs [26], and attention mechanism models [27], [28]. Besides the task of classification, a body of work in the literature focused on modeling and representation of human movements for prediction of future motion patterns [29], synthesis of movement sequences [22], and density estimation
[11]. Conversely, little research has been conducted on the application of NNs for evaluation of movement quality, which can otherwise find use in various applications (physical rehabilitation being one of them).IiB Movement Assessment
Quantifying the level of correctness in completing prescribed exercises is important for the development of tools and devices in support of home and clinicbased rehabilitation. The movement assessment in existing studies is typically accomplished by comparing a patient s performance of an exercise to the desired performance by healthy participants.
Several studies in the literature on exercise evaluation employed machine learning methods to classify the individual repetitions into
correct or incorrect classes of movements. Methods used for this purpose include Adaboost classifier [30], nearest neighbors [31], Bayesian classifier [32], and an ensemble of multilayer perceptron NNs
[33]. The outputs in these approaches are discrete class values of or (i.e., incorrect or correct repetition). However, these methods do not provide the capacity to detect varying levels of movement quality or identify incremental changes in patient performance over the program duration.The majority of related studies employed distance functions for deriving movement quality scores. Concretely, Houmanfar et al. [18] used a variant of the Mahalanobis distance to quantify the level of correctness of rehabilitation movements, based on a calculated distance between patientperformed repetitions and a set of repetitions performed by a group of healthy individuals. Similarly, a body of work utilized the dynamic time warping (DTW) algorithm [10] for calculating the distance between a patient s performance and healthy subjects performance [34]–[36]. The advantage of the distance functions is that they are not exercisespecific, and thus can be applied for assessment of new types of exercises. However, the distance functions also have shortcomings, because they do not attempt to derive a model of the rehabilitation data, and the distances are calculated at the level of individual timesteps in the raw measurements.
Another body of research work utilized probabilistic approaches for modeling and evaluation of rehabilitation movements. Studies based on hidden Markov models [37], [38]
and mixtures of Gaussian distributions
[11] typically perform a quality assessment based on the likelihood that the individual sequences are being drawn from a trained model. Whereas the probabilistic models are advantageous in handling the variability due to the stochastic character of human movements, models with abilities for a hierarchical data representation can produce more reliable outcomes for movement quality assessment, and better generalize to new exercises.Iii Preliminaries
Iiia Problem Description
In a physical rehabilitation setting, a clinician prescribes a collection of rehabilitation exercises to a patient, by either performing the movements in front of the patient or physically moving the patients body parts along the required paths. It is assumed that the clinician provides several demonstrations of the exercises in order to reinforce the perception of the required movements by the patient, in relation to the range and speed of movements, or other postural constraints in the performance of the movements. Then the patient is tasked to perform the prescribed set of exercises at home, according to the instructed rehabilitation regimen. It is assumed here that a daily rehabilitation session requires completing a series of exercises, where the patient is instructed to complete a certain number of repetitions of each exercise during each session.
This article applies machine learning algorithms for mathematical modeling and assessment of rehabilitation movements captured in the form of skeletal data, consisting of timeordered sequences of position or angular displacements of the joints in the human body. The main task is to evaluate the level of correctness in performing the exercises. Toward this goal, we employ performance metrics to quantify the level of performance, and define scoring functions for assigning quality scores to the repetitions of an exercise. NNs are trained in a suppervised manner by using the obtained quality scores as labels for the individual repetitions of an exercise.
IiiB Notation
We assume that a sensory system is available to capture the skeletal data during the performance of rehabilitation exercises. It is further assumed that a control group of healthy subjects will perform multiple repetitions of each rehabilitation exercise. The healthy subjects will provide reference movement performance completed in a correct manner. The acquired data by the sensory system for the movements for one particular exercise performed by subjects is denoted by , and hereafter they are referred to as reference movements. The symbol is used for the number or repetitions of the exercise by the th subject. The combined data for all repetitions of the exercise by the th subject is denoted . Similar, is used for the total number of all repetitions by the subjects, i.e., . Using the notation for the collected data of the th repetition by the th subject, we have for , where for . For convenience, throughout the text the underscore symbol denotes a set of indices, e.g., for any positive integer . The data for each repetition is a temporal sequence of measurements, therefore
, where the superscripts are used for indexing the temporal order of the displacement vectors within the repetition. Furthermore, the individual measurement
for is a dimensional vector, consisting of the values for all joint displacements in the human body, i.e. .Similarly, it is assumed that movement data will be collected from a group of patients, who may not be able to perform the exercises in a correct manner due to a musculoskeletal condition. The collected data for the patients group are referred in the article as patient movements, and are denoted with the symbol . By analogy to the introduced notation for the reference movements, , where is the data of the th repetition by the th subject. Analogously, the repetition is is comprised of a sequence of multidimensional vectors .
Iv Survey of Performance Metrics for Movement Assessment
Performance metrics evaluate the level of correctness of each repetition with respect to the set of reference movements. This section surveys metrics that have been commonly used in prior studies on rehabilitation assessment [18], [34], [36], [37]. The metrics are classified into two categories: modelless and modelbased [9]. The modelless metrics are calculated directly from lowlevel measurements of trajectories of body joints as acquired by the sensory system, without modeling the movement data. The modelbased metrics evaluate the repetitions data with respect to a model of the exercise, e.g., using probabilistic modeling approaches and employing the loglikelihood for performance evaluation. In addition, considering that existing datasets of movements contain data collected by multiple subjects, the performance metrics can be calculated based on betweensubjector withinsubject case.
Iva Modelless Performance Metrics
Euclidean distance: the Euclidean distance between two movement data and has been commonly used for movement assessment, and it is defined by
(1)  
for and .
Mahalanobis distance: The Mahalanobis distance between two sequences and is
(2)  
where is the covariance matrix of the set
. In fact, the Euclidean distance represents a special case of the Mahalanobis distance when the covariance matrix is identity matrix.
DTW Distance: Dynamic time warping (DTW) algorithm is an algorithm for aligning timeseries data via nonlinear warping of the temporal order of the data points in order to reduce a distance function between the timeseries. The most commonly used distance function in DTW is the Euclidean distance. The optimal alignment path in DTW is calculated by minimizing the sum of the cumulative distances between the two timeseries (e.g., ) and the minimum distances between the neighboring data points (here denoted ), i.e.,
(3)  
For a more detailed description of DTW, refer to [10].
IvB Modelbased Performance Metrics
GMM Loglikelihood:
Gaussian Mixture Model (GMM) is a parametric probabilistic model for representing data with a mixture of Gaussian probability density functions
[39] GMM is frequently used for modeling human movements. For a dataset consisting of multidimensional vectors , a GMM with Gaussian components has the form(4) 
where are the mixing coefficient, mean, and covariance of the th Gaussian component, respectively. Subsequently, for a GMM model with parameters , the loglikelihood for the repetition is used as a performance metrics, which otherwise is calculated as
(5) 
IvC Between and WithinSubject Metrics
Betweensubject metrics calculate the deviation between a repetition of an exercise and the set of reference repetitions of the exercise performed by a group of healthy subjects . Hence, the deviation for the modelless metrics is obtained as
(6) 
where is one of the metrics formulated in (1) – (3). For the modelbased metric in (5), the deviation for the repetition and a GMM model of with parameters is
(7) 
Withinsubject metrics quantify the deviation between a repetition of an exercise and a set of repetitions of the exercise performed by the same subject . The deviation between and is given with
(8) 
for and for the modelless metrics. The deviation for the modelbased metric is also obtained from (7), where a separate GMM model is used for encoding the repetitions data for each subject , i.e., .
V Proposed Method
The proposed framework for assessing rehabilitation exercises encompasses dimensionality reduction, performance metrics, scoring functions, and NN models. A blockdiagram of the envisioned framework is depicted in Fig. 1. The measured joint coordinates data by the sensory system are processed via dimensionality reduction, performance metric, and scoring function to obtain movement quality scores that are used for training a NN model. The trained NN model is afterward used to automatically generate movement quality scores for input movement data acquired by the sensory system.
Va Dimensionality Reduction
Outputs from the sensory systems for capturing human joint displacements are highdimensional data, typically ranging between
and dimensions. Dimensionality reduction of recorded data is often considered an essential step in processing human movements, in order to suppress unimportant, redundant, or highly correlated dimensions. The aim is to project the data into a lowerdimensional representation , for , , , and where. A common approach for dimensionality reduction of human movement data is maximum variance
[40], which which simply retains the first dimensions with the largest variance and discards the remaining dimensions. PCA and its variants [41] are also widely used for reducing the dimensionality of movement data, where a matrix containing the leading eigenvectors corresponding to the largest eigenvalues of the covariance matrix is used for projecting the data into a lowerdimensional space. Although PCA is one of the most common approaches for dimensionality reduction in general, it employs linear mapping of highdimensional data into a lowerdimensional representation. Likewise, the shortcomings of maximum variance originate from its simplicity.In the proposed framework, we introduce autoencoder NNs [42]
for dimensionality reduction. Autoencoder NNs is a nonlinear technique for dimensionality reduction, which allows extracting richer data representations for dimensionality reduction in comparison to the linear techniques (such as PCA). Furthermore, deep autoencoder NNs created by stacking multiple consecutive layers of hidden neurons, can additionally increase the representational capacity of the network.
Autoencoders are an unsupervised form of NNs designed to learn an alternative representation of input data, through a process of data compression and reconstruction. The data processing involves an encoding step of compressing input data through one or multiple hidden layers, followed by a decoding step of reconstructing the output from the encoded representation through one or multiple hidden layers. If denotes a class of mapping functions from to , and is a class of mapping functions from to , then for any function and , the encoder portion projects an input into a lowerdimensional representation (referred to as a code), and the decoder portion converts the code into an output . Autoencoders are trained to find functions and which minimize the mean squared deviation between the input data and output data, i.e.,
(9) 
A graphical representation of the adopted architecture for the autoencoder network is presented in Fig. 2. The encoder portion consists of three intermediate layers of LSTM recurrent units with 30, 10, and 4 computational units, and the corresponding decoder portion has three intermediate layers of LSTM units with 10, 30, and 117 computational units, respectively. The input timeseries data are 117dimensional vectors of joint coordinates. The code representation of the proposed network is a temporal sequence of 4dimensional vectors.
VB Performance Metric
We adopt the metric based on GMM loglikelihood given in (5) and (7). This choice stems from the demonstrated capacity of statistical methods to encode stochastic variability in human movements, which results in improved ability by the modelbased metrics to handle spatiotemporal variations in rehabilitation data.
VC Scoring Functions
Scoring functions map the value of the performance metrics into a movement quality score in the range between and . The resulting movement quality scores play a dual role in the proposed framework. First, in a realworld exercise assessment setting, the quality scores allow for intuitive understanding of the calculated values of the used performance metric. For instance, a movement quality score of 88% presented to a patient is easy to understand, and it can also enable the patient to selfmonitor his/her progress toward functional recovery based on received quality scores over a period of time. Second, the quality scores are used here for supervised training of the studied NN models.
For a sequence of metric values of the reference movements and a sequence related to the patient movements, we propose the following scoring functions:
(10) 
(11) 
(12) 
In the above equations, , , , , and to are data specific parameters. The proposed scoring functions in (VC) and (VC) are monotonically decreasing, and are designed to preserve the distribution of the values of performance metrics. The values for the reference movements are scaled by in (VC) or by the maximum value in (VC) to ensure scores close to 1 for inputs approximately in the range or no larger than , respectively. Similarly, for the movements the scoring functions in (VC) and (VC) are designed to preserve their distribution in mapping the metrics values to quality scores. The scoring function in (VC) scales the metric values based on the absolute distance from the mean of the reference movements. Experimental evaluation of the scoring functions is provided in Section VI.
VD Neural Networks
Three different deep NN architectures are developed, implemented and evaluated in this work. These include CNNs, RNNs, and HNNs. For the networks we performed a grid search with various combinations of layers, numbers of layers, computational units per layer, size of convolutional filters, batch size, and other related hyperparameters. For all models, meansquarederror was selected as a cost function, and Adam optimizer was employed. A batch size of 5 was applied, with early stopping regularization. Inputs are 117dimensional sequences of joint displacements corresponding to single repetitions of an exercise. The output layer has linear activations, and outputs a numerical movement quality score for an input repetition.
The network architectures are presented in Table I, and are also illustrated in Fig. 3
. The adopted CNNs contain three convolutional layers, two fully connected hidden layers, and an output layer. They utilize strided onedimensional convolutional filters, leaky ReLU activations, and dropout of 0.2.
The RNN models with recurrent architecture consist of two bidirectional layers of LSTM units, one intermediate full connected layer, and an output layer. The recurrent layers use a recurrent dropout of 0.5, and are as well followed by a dropout layer with the rate of 0.25.
The HNNs [12] are based on a hierarchical model that employs five recurrent subnetworks that take as inputs joint displacements of the left arm, right arm, left leg, right leg, and torso, respectively. The outputs from the five sub–networks are progressively merged through a series of layers into a unified representation. Such hierarchical organization of the layers allows lowlevel spatial information from joint coordinates to be exploited for obtaining a highlevel representation of the body parts movements in accomplishing the movements. The network employs three bidirectional layers with simple recurrent units, one bidirectional layer with LSTM units, and an output layer. The structure of the network was selected via a grid search and is displayed in Table I.
Network  Layers^{a} 

CNNs  Conv1D (60, 5, LR, D:0.2, St:2) Conv1D (30, 3, LR, 
D:0.2, St:2) Conv1D (10, 3, LR, D:0.2) FC (200, LR,  
D:0.2) FC (100, LR, D:0.2) FC (1, L)  
RNNs  BiLSTM (20, RD:0.5, D:0.25) FC (30, LR, TH, D:0.5) 
BiLSTM (10, RD:0.5, D:0.25) FC (1, L)  
HNNs  BiRNN (10, TH, RD:0.5) * 5 BiRNN (20, TH,RD:0.5) * 4 
BiRNN (20, TH, RD:0.5) * 2 BiLSTM (30, TH,  
RD:0.5) FC (1, L) 
^{a}Acronyms: Conv1D – Layer with onedimensional convolutional units with kernels of size , FC – Fully connected layer, BiLSTM – Layer with bidirectional LSTM units, BiRNN – Layer with bidirectional simple recurrent units, LR – Leaky ReLU activation, D – Dropout, RD – Recurrent dropout, St . Stride, L – linear activation, TH – Tanh activation, Merged layers.
Data augmentation is crucial in image processing with NNs, where applying various transformations on input images leads to improved performance and model robustness. Consequently, we posit that data augmentation is important for processing movement data with NNs, particularly because existing datasets of human movement have relatively small size. For this purpose, data augmentation to the movement sequences is performed by introducing additive noise
(13) 
for and , where
represents a random number drawn from a uniform probability distribution
, and is a constant parameter.Vi Experimental Results
Via Dataset
For validation of the presented framework, we created the UIPRMD dataset [13]. The dataset comprises skeletal data collected from 10 healthy subjects. Each subject completed 10 repetitions of 10 rehabilitation exercises, listed in Table II. The data were acquired with a Vicon optical tracking system, and consist of 117–dimensional sequences of angular joint displacements. The subjects performed the exercises both in a correct manner, hereafter referred to as correct movements, and in an incorrect manner, i.e., simulating performance by patients with musculoskeletal constraints, hereafter referred to as incorrect movements. A detailed description of the UIPRMD dataset is provided in [13].
Order  Exercise 

E1  Deep squat 
E2  Hurdle step 
E3  Inline lunge 
E4  Side lunge 
E5  Sit to stand 
E6  Standing active straight leg raise 
E7  Standing shoulder abduction 
E8  Standing shoulder extension 
E9  Standing shoulder internal external rotation 
E10  Standing shoulder scaption 
ViB Comparison of Performance Metrics
In this section, the performance metrics presented in Section IV—Euclidean, Mahalanobis, DTW distances, and GMM loglikelihood are evaluated on the UIPRMD dataset.
Data scaling: To compare the metrics on the same basis, their values are first linearly scaled to the same range. In this study the range was selected based on an empirical understanding of the data. For obtained values of the performance metrics for repetitions of the correct movements denoted and for the metrics of the incorrect movements the following scaling functions were used
(14) 
where denote the scaled values of the performance metrics, , and .
The scaled values of the Euclidean distance for exercises E1 and E2 are shown in Fig. 4. Green circles markers are used for the repetitions of the correct movements, whereas the red squares symbolize the repetitions of the incorrect movements. Note that inconsistent data (associated with measurement errors or subjects performing the exercise with their leftarm/leg in a set of mostly right arm/leg exercises) were manually removed from the original dataset, resulting in less than 100 repetitions per subject. E.g., there are 90 correct and incorrect movements for E1 in Fig. 4LABEL:sub@fig:subfig_a, and 55 correct and incorrect movements for E2 in Fig. 4LABEL:sub@fig:subfig_b.
Separation degree: For comparison of the scaled values of the performance metrics we propose the concept of separation degree. Specifically, for any positive real numbers their separation degree is defined as . The separation degree between two positive sequences and is defined by
(15) 
Values of the separation degree close to or indicate good separation between the two sequences. Conversely, for values of the separation degree close to , the sequences don’t separate well and they are almost mixed together.
When applied to the values of the distance metrics, the separation degree indicates greater ability of the used metric to differentiate between correct and incorrect repetitions of an exercise. For instance, in Fig. 4LABEL:sub@fig:subfig_b one can observe a clearer differentiation between the correct and incorrect movements, in comparison to Fig. 4LABEL:sub@fig:subfig_a. This results in a larger value of the separation degree for the repetitions of exercise E2, which we calculated at 0.384 for E1, and 0.497 for E2, respectively.
The values for the separation degrees for the four studied performance metrics are presented in Table III. Each cell in the table corresponds to the average separate degree values for the 10 exercises in the dataset. The shown values are the mean and in parenthesis is the standard deviation. For the comparison, scaled values of the metrics according to (14) are used. The table also compares the values of the metrics for the cases of raw 117dimensional data, and lowdimensional data obtained with the presented methods of maximumvariance, PCA, and GMM loglikelihood. The largest values for the separation degree are indicated in each row with a bold font.
Metric  Euclidean  Mahalanobis  DTW  Loglikelihood 
distance  distance  distance  GMM  
Betweensubject  
D^{a} =117  0.445(0.087)  0.195(0.152)  0.487(0.063)  
D=3(MV)  0.309(0.101)  0.063(0.130)  0.310(0.100)  0.344(0.049) 
D=3(PCA)  0.296(0.103)  0.108(0.169)  0.265(0.093)  0.360(0.060) 
D=4(AE)  0.423(0.092)  0.229(0.102)  0.427(0.094)  0.515(0.106) 
Withinsubject  
D=117  0.568(0.058)  0.441(0.118)  0.570(0.059)  
D=3(MV)  0.472(0.048)  0.325(0.118)  0.455(0.053)  0.471(0.098) 
D=3(PCA)  0.508(0.032)  0.322(0.169)  0.501(0.031)  0.518(0.057) 
D=4(AE)  0.582(0.057)  0.474(0.133)  0.574(0.060)  0.603(0.073) 
^{a}
D: data dimensions; MV: maximum variance; PCA: principal component analysis; AE: autoencoder neural networks.
Conclusively, the GMM loglikelihood metric applied on a lowdimensional data with the autoencoder NN resulted in the largest separation between the correct and incorrect movements for both between and withinsubject cases. The withinsubject case provides improved separation because the repetitions performed by the same subject are characterized with a lower level of variability. The value of the GMM loglikelihood is not provided for the 117dimensional data because GMM delivers better performance on lowdimensional data. Furthermore, the performance of the Euclidean and DTW distances in Table III is comparable, and better than the Mahalanobis distance. Also, the autoencoder NN lost less information in compressing the highdimensional data sequences in comparison to maximum variance and PCA, because the separation degree values for all metrics using autoencoders are very close to the corresponding metric values of the 117dimensions data without dimensionality reduction. In implementing GMM on the dataset, the number of Gaussian components was set to 6.
ViC Comparison of Neural Networks
Next, the performance of the presented NN architectures is evaluated. For training the networks, the movement quality scores based on the GMM loglikelihood calculated with autoencoder reduced data are employed. Only the case of betweensubject is considered, because for the withinsubject cases the number of repetitions per subject is too low for NNs training.
Scoring functions: The scoring function presented in (VC) provided the best results for the used dataset, and therefore it is used to calculate the quality scores. The values of the parameters in (VC) are empirically selected as and . For example, Fig. 5 depicts the values of the loglikelihood and the corresponding performance scores for exercise E1 (i.e., deep squat). The scores for the correct movements are shown in Fig. 5LABEL:sub@subfig:score_b and have values close to 1, and the majority of the scores for the incorrect movements are in the range between 0.7 and 0.9.
NN movement assessment: Inputs to the NNs are pairs of repetitions data and quality scores. The networks are trained in a supervised regression manner, where the output is a predicted value of the movement quality for an input repetition (i.e., the quality scores can be considered equivalent to class labels in classification tasks). Also, note that inputs to the network are the raw measurement data with 117 dimensions.
Each network model is run five times, and we report the average absolute deviation between the input quality scores and the quality scores predicted by the network. The results for the 10 exercises in the UIPRMD dataset are displayed in Table IV. Lower values of the deviation indicate low errors by the network model in predicting the quality scores for input data. Accordingly, CNNs outperformed RNNs and HNNs on most of the exercise data. The results are further discussed in the subsequent section.
A performance example by the used CNN for exercise E1 is depicted in Fig. 6. The set of 90 correct and 90 incorrect repetitions was split 0.7/0.3 into a training set of 124 and a validation set of 56 repetitions. The input scores and predicted scores for the training and validation sets are shown in Figs. 6LABEL:sub@subfiga:training and LABEL:sub@subfigb:validation, respectively. In the two subfigures the first half of the scores are for the correct sequences and have values close to one, and the second half of the scores pertain to the incorrect sequences and have lower quality scores. Overall, the network predictions are close to the assigned quality scores for almost all data instances.
Exercise  CNN  RNN  HNN 

E1  0.01357  0.01670  0.03010 
E2  0.02953  0.04934  0.07742 
E3  0.04141  0.09382  0.13766 
E4  0.01640  0.01609  0.03580 
E5  0.01300  0.02536  0.06367 
E6  0.02349  0.02166  0.04676 
E7  0.03346  0.04090  0.19280 
E8  0.02905  0.04590  0.07260 
E9  0.02495  0.04419  0.06508 
E10  0.03667  0.05198  0.16009 
Next, the effect of data augmentation on the employed NNs is investigated. Multiple values were adopted for the parameter in (13), i.e., , resulting in differing levels of added noise. By adding random noise to the 90 correct instances for exercise E1, 360 new instances were synthetically created. After the data augmentation, movement quality scores were calculated for the generated sequences, and the NNs were trained on data including the original real data and synthetic data. The results are displayed in Table IV. The results indicate low errors in predicting the movement quality scores on the augmented dataset for all three network architectures.
Vii Discussion
The article introduces a novel framework for the assessment of rehabilitation exercises via deep NNs. The framework includes performance metrics, scoring functions, and NN models.
Data type  CNN  RNN  HNN 

Original data  0.01357  0.01670  0.03010 
Augmented data  0.00656  0.00688  0.01404 
Common metrics for quantifying the level of consistency in captured rehabilitation movements are surveyed and compared. The metrics include a modelless category that calculates distances between lowlevel measurement data points, and a modelless group that employs highlevel latent states for estimating the data consistency. Studied metrics in the former group are Euclidean, Mahalanobis, and DTW distance, and in the latter group is GMM loglikelihood. The concept of separation degree is proposed for metric comparison. GMM loglikelihood outperformed the modelless metrics on the UIPRMD dataset. Such results confirm our hypothesis that efficient movement assessment is strongly predicated on the provision of efficient models of human movements. Probabilistic approaches, such as the used GMM approach, have improved ability to handle the inherent variability and measurement uncertainty in human movement data, in comparison to the modelless approaches.
We compared the performance of PCA and maximum variance approaches for dimensionality reduction of human movements to a proposed approach that employs autoencoder NNs. Expectedly, the provision of nonlinear functions for neuron activations in autoencoders provided richer representational capacity of projected data into a lower dimensional space, in comparison to the linear technique of PCA and the simple concept of maximum variance.
The article introduces scoring functions for mapping the values of the performance metrics into quality scores in the range . The quality scores are afterward employed for training the NN models.
The performance of three deep NN architectures is investigated – CNNs, RNNs, and HNNs. The networks are trained via supervised regression, where for inputs comprising repetitions of rehabilitation exercises the inferred outputs are quality scores. The best performance was recorded by the CNN models. Although RNNs and HNNs employ recurrent layers that are specifically designed for processing sequential data, the results were not too surprising to our team for two key reasons: (1) the employed dataset is fairly small, consisting of less than 200 repetitions per exercises, and (2) a growing body of work report of improved performance by CNNs on timeseries and movement data [43], [44]. More specifically, recurrent networks utilize a larger number of parameters, thus they are more prone to overfitting on smaller datasets. Further, we observed improved network performance with additive noise for data augmentation, which may be of consideration in related biomedical studies, where data collection is expensive.
Our presented research has several limitations. First, the dataset used for validation of the approach comprises rehabilitation exercises collected with healthy subjects, rather than patients in rehabilitation programs. Second, we do not have a ground truth assessment of the movement quality by clinicians. For example, a movement quality score of 0.8 does not translate into a meaningful clinical score. Third, the approach is validated on measurements acquired with an expensive optical motion capturing system, whereas for practical applications in homebased rehabilitation we envision using an inexpensive color/depth camera (e.g., Kinectlike) for motion capture.
In future work, we will attempt to address the abovelisted shortcomings of this study. I.e., we will focus on creating a dataset of rehabilitation exercises performed by patients and labeled by a group of clinicians who will assign quality scores. In addition, we will investigate the implementation of advanced NN architectures for hierarchical spatiotemporal modeling of rehabilitation data.
Viii Conclusion
The article proposes a deep learningbased framework for assessment of rehabilitation exercises. To this end, an autoencoder NN is employed for reducing the dimensionality of skeleton data captured during the performance of repetitions of rehabilitation exercises. Further, the lowdimensional data representation is probabilistically modeled with a GMM and the loglikelihood of the movement repetitions is utilized as a metric for performance evaluation. A scoring function maps the values of the performance metric into movement quality scores. For each rehabilitation exercise a deep NN model is trained to learn the relationship between the movement data and quality scores, and generate quality scores for unseen repetitions of rehabilitation exercises. The experimental results indicate that the movement quality scores generated by the proposed deep learningbased framework closely follow the ground truth quality scores for the movements, and confirm the potential of deep learning models for assessing rehabilitation exercises.
References
 [1] S. R. Machlin, J. Chevan, W. W. Yu, and M. W. Zodet, Determinants of utilization and expenditures for episodes of ambulatory physical therapy among adults, Phys Ther, vol. 91, no. 7, pp. 1018–1029, Jul. 2011.
 [2] R. Komatireddy, A. Chokshi, J. Basnett, M. Casale, D. Goble, and T. Shubert, Quality and Quantity of Rehabilitation Exercises Delivered By A 3D Motion Controlled Camera: A Pilot Study, Int J Phys Med Rehabil, vol. 2, no. 4, Aug. 2014.
 [3] S. F. Bassett and H. Prapavessis, Homebased physical therapy intervention with adherenceenhancing strategies versus clinicbased management for patients with ankle sprains, Phys Ther, vol. 87, no. 9, pp. 1132–1143, Sep. 2007.
 [4] K. Jack, S. M. McLean, J. K. Moffett, and E. Gardiner, Barriers to treatment adherence in physiotherapy outpatient clinics: A systematic review, Man Ther, vol. 15, no. 3 2, pp. 220–228, Jun. 2010.
 [5] K. K. Miller, R. E. Porter, E. DeBaunSprague, M. Van Puymbroeck, and A. A. Schmid, Exercise after Stroke: Patient Adherence and Beliefs after Discharge from Rehabilitation, Top Stroke Rehabil, vol. 24, no. 2, pp. 142–148, 2017.
 [6] P. Maciejasz, J. Eschweiler, K. GerlachHahn, A. JansenTroy, and S. Leonhardt, A survey on robotic devices for upper limb rehabilitation, Journal of NeuroEngineering and Rehabilitation, vol. 11, no. 1, p. 3, Jan. 2014.
 [7] L. V. Gauthier et al., Video Game Rehabilitation for Outpatient Stroke (VIGoROUS): protocol for a multicenter comparative effectiveness trial of inhome gamified constraintinduced movement therapy for rehabilitation of chronic upper extremity hemiparesis, BMC Neurology, vol. 17, no. 1, p. 109, Jun. 2017.
 [8] D. Ant n, A. Go i, A. Illarramendi, J. J. TorresUnda, and J. Seco, KiReS: A Kinectbased telerehabilitation system, in 2013 IEEE 15th International Conference on eHealth Networking, Applications and Services (Healthcom 2013), 2013, pp. 444–448.
 [9] A. Vakanski, J. M Ferguson, and S. Lee, Metrics for Performance Evaluation of Patient Exercises during Physical Therapy, International Journal of Physical Medicine & Rehabilitation, vol. 05, no. 03, 2017.
 [10] H. Sakoe, Dynamic programming algorithm optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, pp. 43–49, 1978.
 [11] A. Vakanski, J. M. Ferguson, and S. Lee, Mathematical Modeling and Evaluation of Human Motions in Physical Therapy Using Mixture Density Neural Networks, J Physiother Phys Rehabil, vol. 1, no. 4, Dec. 2016.

[12]
Y. Du, W. Wang, and L. Wang, Hierarchical recurrent neural network
for skeletonbased action recognition, in 2015
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2015, pp. 1110–1118.  [13] A. Vakanski, H. Jun, D. Paul, and R. Baker, A Data Set of Human Body Movements for Physical Rehabilitation Exercises, Data, vol. 3, no. 1, p. 2, Jan. 2018.
 [14] X. Yun and E. R. Bachmann, Design, Implementation, and Experimental Results of a QuaternionBased Kalman Filter for Human Body Motion Tracking, IEEE Transactions on Robotics, vol. 22, no. 6, pp. 1216–1227, Dec. 2006.
 [15] J. Yang, Y. Xu, and C. S. Chen, Human action learning via hidden Markov model, IEEE Transactions on Systems, Man, and Cybernetics  Part A: Systems and Humans, vol. 27, no. 1, pp. 34–44, Jan. 1997.
 [16] Y. Huang, K. B. Englehart, B. Hudgins, and A. D. C. Chan, A Gaussian mixture model based classification scheme for myoelectric control of powered upper limb prostheses, IEEE Transactions on Biomedical Engineering, vol. 52, no. 11, pp. 1801–1811, Nov. 2005.
 [17] A. Vakanski, I. Mantegh, A. Irish, and F. JanabiSharifi, Trajectory Learning for Robot Programming by Demonstration Using Hidden Markov Model and Dynamic Time Warping, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), vol. 42, no. 4, pp. 1039–1052, Aug. 2012.
 [18] R. Houmanfar, M. Karg, and D. Kulic, Movement Analysis of Rehabilitation Exercises: Distance Metrics for Measuring Patient Progress, IEEE Systems Journal, vol. 10, no. 3, pp. 1014–1025, Sep. 2016.
 [19] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, Sequential Deep Learning for Human Action Recognition, in Human Behavior Understanding, 2011, pp. 29–39.
 [20] T. T. Um, V. Babakeshizadeh, and D. Kuli , Exercise motion classification from largescale wearable sensor data using convolutional neural networks, in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 2385–2390.
 [21] G. Lefebvre, S. Berlemont, F. Mamalet, and C. Garcia, BLSTMRNN Based 3D Gesture Classification, in Artificial Neural Networks and Machine Learning ICANN 2013, 2013, pp. 381–388.
 [22] K. Fragkiadaki, S. Levine, P. Felsen, and J. Malik, Recurrent Network Models for Human Dynamics, in Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 2015, pp. 4346–4354.
 [23] S. Yao, S. Hu, Y. Zhao, A. Zhang, and T. Abdelzaher, DeepSense: A Unified Deep Learning Framework for TimeSeries Mobile Sensing Data Processing, in Proceedings of the 26th International Conference on World Wide Web, Republic and Canton of Geneva, Switzerland, 2017, pp. 351–360.
 [24] F. J. Ord ez and D. Roggen, Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition, Sensors, vol. 16, no. 1, p. 115, Jan. 2016.
 [25] A. Shahroudy, J. Liu, T.T. Ng, and G. Wang, NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis, arXiv:1604.02808 [cs], Apr. 2016.
 [26] A. Jain, A. R. Zamir, S. Savarese, and A. Saxena, StructuralRNN: Deep Learning on SpatioTemporal Graphs, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 5308–5317.
 [27] Y. Wang, S. Wang, J. Tang, N. O Hare, Y. Chang, and B. Li, Hierarchical Attention Network for Action Recognition in Videos, arXiv:1607.06416 [cs], Jul. 2016.

[28]
S. Song, C. Lan, J. Xing, W. Zeng, and J. Liu, An EndtoEnd SpatioTemporal Attention Model for Human Action Recognition from Skeleton Data,
in Association for the Advancement of Artificial Intelligence (AAAI)
, 2017, pp. 4263–4270.  [29] J. B tepage, M. J. Black, D. Kragic, and H. Kjellstr m, Deep Representation Learning for Human Motion Prediction and Classification, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1591–1599.
 [30] P. E. Taylor, G. J. M. Almeida, T. Kanade, and J. K. Hodgins, Classifying human motion quality for knee osteoarthritis using accelerometers, Conf Proc IEEE Eng Med Biol Soc, vol. 2010, pp. 339–343, 2010.
 [31] Z. Zhang, Q. Fang, L. Wang, and P. Barrett, Template matching based motion classification for unsupervised poststroke rehabilitation, in International Symposium on Bioelectronics and Bioinformations 2011, 2011, pp. 199–202.
 [32] I. Ar and Y. S. Akgul, A computerized recognition system for the homebased physiotherapy exercises using an RGBD camera, IEEE Trans Neural Syst Rehabil Eng, vol. 22, no. 6, pp. 1160–1171, Nov. 2014.

[33]
J.Y. Jung, J. I. Glasgow, and S. H. Scott, Feature selection and classification for assessment of chronic stroke impairment,
in 2008 8th IEEE International Conference on BioInformatics and BioEngineering, 2008, pp. 1–5.  [34] C.J. Su, C.Y. Chiang, and J.Y. Huang, Kinectenabled homebased rehabilitation system using Dynamic Time Warping and fuzzy logic, Applied Soft Computing, vol. 22, pp. 652–666, Sep. 2014.
 [35] Z. Zhang, Q. Fang, and X. Gu, Objective Assessment of UpperLimb Mobility for Poststroke Rehabilitation, IEEE Transactions on Biomedical Engineering, vol. 63, no. 4, pp. 859–868, Apr. 2016.
 [36] D. Ant n, A. Go i, and A. Illarramendi, Exercise Recognition for Kinectbased Telerehabilitation, Methods Inf Med, vol. 54, no. 02, pp. 145–155, 2015.
 [37] M. Capecci et al., A Hidden SemiMarkov Model based approach for rehabilitation exercise assessment, Journal of Biomedical Informatics, vol. 78, pp. 1–11, Feb. 2018.
 [38] J. F. Lin, M. Karg, and D. Kuli , Movement Primitive Segmentation for Human Motion Modeling: A Framework for Analysis, IEEE Transactions on HumanMachine Systems, vol. 46, no. 3, pp. 325–339, Jun. 2016.
 [39] C. M. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2011.
 [40] J. F. Lin and D. Kuli , Online Segmentation of Human Motion for Automated Rehabilitation Exercise Analysis, IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 22, no. 1, pp. 168–180, Jan. 2014.
 [41] F. Bashir, W. Qu, A. Khokhar, and D. Schonfeld, HMMbased motion recognition system using segmented PCA, in IEEE International Conference on Image Processing 2005, 2005, vol. 3, pp. III–1288.

[42]
H. Bourlard and Y. Kamp, Autoassociation by multilayer perceptrons and singular value decomposition,
Biological cybernetics, vol. 59, no. 4–5, pp. 291–294, 1988.  [43] T. M. Le, N. Inoue, and K. Shinoda, A FinetoCoarse Convolutional Neural Network for 3D Human Action Recognition, arXiv:1805.11790 [cs], May 2018.

[44]
M. Liu, C. Chen, and H. Liu, 3D action recognition using data visualization and convolutional neural networks,
in 2017 IEEE International Conference on Multimedia and Expo (ICME), 2017, pp. 925–930.
Comments
There are no comments yet.