The usage of machine learning for automated disease diagnosis in an efficient and accurate manner has the potential to reduce labor and cost in healthcare, and improve patient care. Deep learning has been widely studied for automated and quantitative disease assessment particularly for medical imaging, where convolutional neural networks (CNNs) were employed[1, 2]. Other medical applications are typically less studied, mainly since datasets in these other contexts are not as readily available as imaging datasets.
An area in medicine where this problem is prevalent is the quantitative assessment of motor disorders such as Parkinson’s disease (PD) and Essential Tremor (ET). Both these conditions are debilitating and have increasing prevalence . Both PD and ET typically manifest in hand tremors which severely impacts the patients’ quality of life. Distinguishing between the different types of tremors (PD vs ET) is critical for correct treatment and long term management of the disease . The Static Spiral Test (SST) is a widely used test in tremor diagnosis .
This simple and short test requires the patient to retrace Archimedean spirals on paper using a pen. An expert neurologist performs an observational assessment of the subject as they carry out the task, as well as a visual analysis of the drawn spiral. Healthcare professionals and patients could greatly benefit from an automation of the SST since observation-based data is qualitative and subjective, and hence are prone to bias and inaccuracy, which may lead to incorrect treatment. To the best of our knowledge there is no known human baseline for PD, ET and Control discrimination. Additionally, an automated analysis of the SST would allow patients in clinics without expert neurologists to be triaged.
Attempts to automate the SST typically use digitized tablets or instrumented pens embedded with sensors, such as accelerometers and/or pressure sensors, that capture information as the patient draws the spiral [7, 8, 9]. Although these methods have a potential to capture the relevant motor information from the SST and can easily record quantitative data for analysis, practical clinical considerations often limit their usage. In particular the cost of additional and expensive high-end hardware is often a limitation.
Additionally, most analyses with these devices focus on a binary classification between PD vs controls or ET vs Controls, while movement disorder clinics require a discrimination between PD, ET and controls, since early stage symptoms of PD and ET are mild and may resemble control subjects .
This paper therefore makes the following contributions to address these aforementioned limitations:
Discrimination between PD, ET and Controls using an end-to-end deep learning solution.
Automated analysis of the SST based on conventional pen and paper tests.
An ablation study to highlight the performance improvement which can be obtained through correct hyper-parameter optimization of deep neural networks.
In this work we present a deep-learning based solution applied to the discrimination of Parkinson’s Disease (PD) patients from controls, and PD patients from Essential Tremor (ET) patients and from controls. We propose an end-to-end system illustrated in Figure 1, making use of a convolutional neural network (CNN) to analyze images of the hand drawn SST, hence not requiring any manual feature engineering.
The dataset consists of camera captured images of hand-drawn Archimedean spirals which were acquired from subjects performing the Static Spiral Test (SST) using a pen and paper. The image dimensions are 300 x 300 x 3. Examples of the images of the spirals in the three groups studied, Parkinson’s disease (PD), Essential Tremor (ET) and controls are illustrated in Figure 2.
The data was acquired during routine neurological assessments in a tertiary hospital and labeled by the examining neurologists. The re-use of the dataset was approved by the hospital as well as the university ethics committee.
The dataset consists of spirals of the following categories:
370 Parkinson’s disease subjects.
669 Essential Tremor subjects.
357 control subjects.
Ii-B End-to-End Deep Learning System
The proposed end-to-end deep learning system precludes the need for manual feature engineering, with the network learning the underlying feature representations needed for discrimination. Three important sub-modules are important to the overall system and are described below.
Ii-B1 Data Augmentation
Deep neural networks typically require large quantities of data for training. Even in cases where transfer learning is applied, an increased number of samples is beneficial to training. Since changes in orientation and contrast do not directly affect image class as well as the overall perceptive task associated with SST, we apply the following data augmentations to the dataset.
Random application of a horizontal flip to the image with a probability of 0.5.
Random change of image contrast with a probability of 0.1.
Random zoom and crop on the image with a probability 0.75.
Ii-B2 Convolutional Neural Network (CNN) Approach
We aim to demonstrate the value of a transfer learning approach applied in a biomedical computer vision application, where a pre-trained CNN is used as a base network and is fine-tuned for the specific task. The hypothesis is that the generic features (i.e. edges, gradients) in the earlier CNN layers are still useful representations for the specific SST task, whilst the later layers of the network which learn specific feature representations are re-trained to be representative of the SST classification task at hand. Furthermore, we aim to demonstrate the value of tranfer learning on small datasets (such as the SST dataset) and that it reduces the time to train the CNN, reduces compute resources required compared to training the network from scratch and is still a viable method to attain strong discriminative performance.
Consequently, we make use of a ResNet-32 CNN architecture 
with pre-trained ImageNet weights, which is one of the state of the art networks trained on Imagenet. Whilst, other base architectures could be utilized the ResNet (Deep Residual Network) architecture was utilized as the deeper and thinner representation has been shown to provide better generalization which should allow for better translation to the specific task . Moreover, the smoother loss surfaces in ResNets allow for easier forward and backward propagation leading to easier optimization when fine-tuning on the task .
Finally, the fine-tuned architecture built on top of the pre-trained ResNet, consists of a fully-connected dense layer where the number of outputs for the softmax equals the number of classes classified. Specifics of training the network is discussed under the experimental analysis.
Ii-B3 CNN Hyper-parameter optimization
Neural networks are also highly sensitive to the specific hyper-parameters on which the network is trained. Therefore, we aim to demonstrate the value of hyper-parameter optimization in ensuring optimal classification performance.
In particular, we optimize the learning rate as it directly impacts the magnitude of parameter updates in the network. We apply two techniques that demonstrate the importance of this optimization:
Cyclical learning rate:
As per the work of 
, the aim of a cyclical learning rate policy is to allow the traversal of saddle points and local minima in the loss landscape. This is done by varying the learning rate over an epoch between a lower and upper threshold, where the periodic higher learning rate assists in the traversal of the saddles and local minima points. Furthermore, this allows not only for fewer experiments (and by virtue computations) to find optimal learning rates, but also this policy results in superior accuracy when compared to a singular learning rate with decay.
Discriminative learning rate:
As per the work of [12, 13], the network is divided into groups of layers from earlier to later layers. Earlier groups of layers are trained with a lower learning rate as the weights represent generic features that do not need to be adapted to the task, whilst later layers are trained at a higher learning rate as the weights need to adapted specifically to the classification task. Hence, we divide our network into three weight groups- early, middle and late. We apply an adaptive policy of learning rates to these three weight groups as follows: Early: , Middle: , Late: .
Ii-C Technical pipeline
Our technical evaluation pipeline then follows the following protocol:
We perform 5-fold cross-validation, such that during each fold: 80% of the data is used as training data, whilst 20% of the data is held-out as an unseen test dataset.
Apply data augmentation as described in Section 2.2 to the training dataset.
Use a Resnet32 CNN pre-trained on Imagenet as the base network. Since the pre-trained Resnet32 has an input image size requirement of 224x224x3, all images are resized using nearest neighbor interpolation.
Remove the final fully-connected layer from the pre-trained network.
Add a dense fully connected layer (Multi Layer Perceptron) with the correct number of outputs in the softmax based on the number of classification classes (two for experiment 1 and three for experiment 2).
Freeze the pre-trained CNN networks weights and train the dense layers with a high learning rate () for 5 epochs.
Unfreeze the entire networks weights and fine-tune the networks for 3 epochs at a low learning rate using the the three weight groups - Early: , Middle: , Late: .
Repeat the cross-validation process (steps 2-7) to ensure numerical stability/robustness and compute the mean and standard deviation of the 5-fold cross-validation.
Iii Experimental Analysis
Two sets of experiments are carried out on the hand-drawn SST data. The first experiment carries out discrimination between PD and control subjects and the second experiment carries out discrimination between PD, ET and control subjects.
Both experiments make use of the experimental protocol outlined in Section II (c). An ablation study is carried out for steps 6-7, where steps 6 and 7 are carried out with and without the learning rate hyper-parameter optimization techniques described in Section 2.2. The aim is to demonstrate the performance benefit obtained by making using of these optimizations.
Iii-a Discrimination between Parkinson’s disease and control subjects
This experiment aims to classify between PD subjects and controls based on the SST test images. The results shown in Table 1 are obtained from the 5-fold cross-validation.
|With Hyper-parameter optimization||98.2 1.35|
|Without Hyper-parameter optimization||95.3 1.66|
Iii-B Discrimination between Parkinson’s disease, Essential Tremor and controls
This experiment aims to classify between PD subjects, ET subjects and controls based on the SST test images. The results shown in Table 2 are obtained from the 5-fold cross-validation.
|With Hyper-parameter optimization||92 0.614|
|Without Hyper-parameter optimization||87.67 1.02|
This study presents an automated machine learning discrimination of PD, ET and Controls using images of the hand written SST. This is a preliminary work, which aimed to validate the techniques within this application domain.
The results convey that an end-to-end deep learning solution using a CNN is both robust and accurate at detecting and discriminating between these three classes in a reliable and autonomous manner, whilst still fitting in with current clinical practices.
The proposed solution where the SST test is conducted using off-the-shelf writing equipment and paper eliminates the need for additional and expensive hardware such as digitized tablets or specialized instrumented writing instruments. This would allow the test to be performed easily in a clinical setting and thus, making it easier to carry out the test in busy clinical settings.
The results imply that the optimal configuration for discrimination both of PD vs Controls and between PD, ET and controls is a ResNet-32 CNN, with the pre-trained ImageNet weights being fine-tuned for the task. The mean 5-fold cross-validation accuracy for the PD vs Control discrimination was 98.2%, whilst the mean accuracy for PD, ET and Control discrimination was 92%.
In particular, the confusion matrix for the PD, ET and Control discrimination shows that that mis-classification is typically between ET and PD. This is understandable as both PD and ET are movement disorders that result in tremor, which would manifest in the spirals of the subjects. The strength of this result is that even when PD or ET is mis-classified it is still mis-classified as a movement disorder rather than as a control, which is beneficial in a triaging and referral scenario.
The result of the ablation study demonstrates the value of learning rate hyper-parameter optimization techniques. The cyclical learning rate and discriminative learning rate policies combined to increase discrimination accuracy by 4.33% with no change to the network architecture, thereby highlighting the value of correct hyper-parameter tuning in order to maximize performance of neural networks.
Finally, the value of transfer learning was demonstrated by the overall high discriminative accuracy of the networks. This highlights the value of using pre- trained networks even in biomedical applications and that the high level feature representations from pre-trained networks are useful even on significantly different tasks than the original ImageNet task. Moreover, the benefit is that fewer epochs are required to train the networks which reduces the computational requirements, as well as, training time.
The combined ease of use and machine learning discrimination proposed in this study has the potential to assist healthcare professionals in motor disorder evaluation using the SST, while easily fitting into the clinical environment by making use of the current pen and paper SST. The method could allow healthcare practitioners to easily and quantitatively diagnose motor disorder subjects using standard tools.
Future work could expand this study and aim at finer-grained classification, seeking to classify the severity, or stage of the motor diseases.
-  M Havaei, A Davy, D Warde-Farley, A Biard, A Courville, Y Bengio, C Pal, P.MJodoin, and H Larochelle. Brain tumor segmentation with deep neural networks.Medical Image Analysis, 35:18–31, 2017.
-  N Tomasev, X Glorot, J Rae, M Zielinski, H Askham, A Saraiva, A Mottram,C Meyer, S Ravuri, and I Protsyuk. A clinically applicable approach to continuous prediction of future acute kidney injury.Nature, 572(7767):116, 2019
-  E. R. Dorsey et al., Global, regional, and national burden of Parkinson’s disease, 1990-2016: a systematic analysis for the Global Burden of Disease Study 2016,” The Lancet Neurology, vol. 17, no. 11, pp. 939-953, Nov. 2018.
-  M. A. Thenganatt and E. D. Louis, Distinguishing essential tremor from Parkinson’s disease: bedside tests and laboratory evaluations.” Expert review of neurotherapeutics, vol. 12, no. 6, pp. 687-696, 2012.
-  R. Saunders-Pullman, C. Derby et al., Validity of spiral analysis in early Parkinson’s disease,” Movement Disorders, vol. 23, no. 4, pp. 531-537, Mar. 2008.
-  M. Algarni and A. Fasano, The overlap between Essential tremor and Parkinson disease,” Parkinsonism & Related Disorders, vol. 46, pp. 101-104, Jan. 2018.
-  M. Gil-Martín., J.M. Montero. and R. San-Segundo, 2019. Parkinson’s disease detection from drawing movements using convolutional neural networks. Electronics, 8(8), p.907.
-  C.R.Pereira, S.A.Weber, C.Hook, G.H.Rosa and J.P. Papa, 2016, October. Deep learning-aided Parkinson’s disease diagnosis from handwritten dynamics. In 2016 29th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI) (pp. 340-346). IEEE
-  P. Khatamino, I.Cantürk, and L.Özyılmaz, 2018, October. A Deep Learning-CNN Based System for Medical Diagnosis: An Application on Parkinson’s Disease Handwriting Drawings. In 2018 6th International Conference on Control Engineering & Information Technology (CEIT) (pp. 1-6). IEEE.
K.He, X.Zhang, S.Ren, and J.Sun, 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).
-  L.N.Smith, 2017, March. Cyclical learning rates for training neural networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 464-472). IEEE.
-  J.Howard and S.Ruder, 2018. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146.
-  B.Singh, S.De, Y.Zhang, T.Goldstein, and G.Taylor, 2015, December. Layer-specific adaptive learning rates for deep networks. In 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) (pp. 364-368). IEEE.