Erythemato-squamous disease (ESD) is a form of skin disease. It generally causes redness of the skin and also may cause loss of skin. ESDs are generally due to genetic or environmental factors (Elsayad et al., 2018). ESD comprises six classes of skin conditions namely, pityriasis rubra pilaris, lichen planus, chronic dermatitis, psoriasis, seboreic dermatitis and pityriasis rosea. However, the diagnosis of ESD is accepted as a difficult problem in Dermatology. The reason why ESD is difficult to diagnose is due to the fact that these diseases share many clinical and histopathological attributes with erythema and scaling. Another reason is that one disease may show the symptoms of another disease at the initial stages (Demiroz et al., 1998). Thus, a detailed observation skills and high experience are required from physicians to evaluate both clinical and histopathological features to correctly diagnose ESD (Menai and Altayash, 2014). So, the automated diagnosis of ESD can help doctors and dermatologists in reducing the efforts from their end and in taking faster decisions for treatment.
In the literature, there are a quite a few instances of works that proposed various machine learning methods such as decision trees, support vector machines, artificial neural networks and more for automated detection of the type of Erythemato-squamous disease(Menai and Altayash, 2014). We will discuss in detail about these works in Section 2
. In recent years, with the rise in computing power and availability of cheap memory devices along with cloud computing, deep learning has been very successful in many fields such as natural language processing(Deng and Liu, 2018), biomedicine (Mamoshina et al., 2016)2018) and more.
The main contribution of this paper is in the development of a novel hybrid deep learning approach i.e. Derm2Vec for the diagnosis of the Erythemato-Squamous disease (ESD) that hasn’t been reported in the literature earlier to the best of our knowledge. Derm2Vec is a hybrid deep learning approach that comprises both Autoencoders and Deep Neural Networks. Also, we find that there haven’t been many works reported in the literature regarding the applications of deep neural networks for the classification of ESD. Although the literature is replete with works that used conventional machine learning methods (such as Random forests, artificial neural netwroks, Extreme Gradient Boosting, K-nearest neighbors, decision trees, support vector machines and more) for the diagnosis of ESD. In this paper, we apply both Derm2Vec and DNN (after tuning the hyperparameters) along with other conventional machine learning methods on a real world dermatology dataset. The Derm2Vec method is found to be the best performer when taking the prediction accuracy into account.
The rest of this paper is organized as follows. In Section 2, we present a brief review of the literature. In Section 3, we describe the dataset used in this paper. This is followed by Sections 4 and 5 where we describe our proposed methodology and the experimental results respectively. Finally, Section 6 concludes the paper.
2 Related Work
There have been many works in the medical informatics literature on the applications of machine learning and expert systems and how it complements physicians/practitioners in decision making. For example, Cruz & Wishart Cruz and Wishart (2006) used machine learning for cancer prediction. Random forests was used for classification of the Alzheimer’s disease (Gray et al., 2013). Deep learning was used for the classification of self-care problems in children with physical disabilities Putatunda (2018). There have been many applications of machine learning for glaucoma screening (Cheng et al., 2013), retinal hemorrhage detection (Tang et al., 2013), lymphoma classification (Luo et al., 2014) and many more.
Automated classification of the type of Erythemato-squamous disease using machine learning and expert systems is reported in the literature. The first such work is that of Demiroz et al. Demiroz et al. (1998)
where the authors developed a new classifier called ”Voting feature intervals-5” for the differential diagnosis of ESD. Guvenir & EmeksizGuvenir and Emeksiz (2000) used three classification algorithms namely, Voting feature intervals-5, Naïve Bayes and nearest neighbor classification for diagnosis of the type of ESD. Ubeyli Ubeyli (2009)
used multi layer perceptron neural networks and Xie & WangXie and Wang (2011) used support vector machines for the classification of ESD. Even tree based methods such as CHAID decision trees (Elsayad et al., 2018) and ensemble of decision trees (Menai and Altayash, 2014) have been used for analysis and diagnosis of ESD. Nanni Nanni (2006) used an ensemble of support vector machines on random subspace and Menai Menai (2015) applied random forests for the diagnosis of ESD.
, k-means clustering(Ubeyli and Dogdu, 2010), boosting (Badrinath et al., 2013)2001). We find that none of the past works reported in the literature have used Deep learning to the best of our knowledge. In this paper, we use Deep neural networks for the differential diagnosis of ESD and propose a novel hybrid deep learning method i.e. Derm2Vec (see Section 4.3).
3 Dermatology Data
In this paper, we use the dermatology dataset that was first used by Ubeyli & Guler Ubeyli and Guler (2005) where the aim was to determine the type of ESD. This dataset is publicly available in the UCI machine learning repository (Dheeru and Karra Taniskidou, 2017). The dataset contains
attributes/predictor variables whereare for clinical attributes (namely, (a) erythema, (b) scaling, (c) definite borders, (d) itching, (e) koebner phenomenon, (f) polygonal papules, (g) follicular papules, (h) oral mucosal involvement, (i) knee and elbow involvement, (j) scalp involvement, (k) family history and (l) Age).
features are for the histopathological attributes (namely, (a) melanin incontinence, (b) eosinophils in the infiltrate, (c) PNL infiltrate, (d) fibrosis of the papillary dermis, (e) exocytosis, (f) acanthosis, (g) hyperkeratosis, (h) parakeratosis, (i) clubbing of the rete ridges, (j) elongation of the rete ridges, (k) thinning of the suprapapillary epidermis, (l) spongiform pustule, (m) munro microabcess, (n) focal hypergranulosis, (o) disappearance of the granular layer, (p) vacuolisation and damage of basal layer, (q) spongiosis, saw-tooth appearance of retes, (r) follicular horn plug, (s) perifollicular parakeratosis, (t) inflammatory monoluclear inflitrate and (u) band-like infiltrate). The total number of features after performing one-hot encoding for the categorical variables becomes.
The total number of observations in the dataset are . However, there are around missing values for the ”Age” variable in the dataset and we won’t be considering these observations in our analysis. So, finally the dataset has instances after removing missing values. The target variables had classes and the number of instances in each are- (a) psoriasis- , (b) seboreic dermatitis- , (c) lichen planus- , (d) pityriasis rosea- , (e) chronic dermatitis- and (f) pityriasis rubra pilaris- .
Sections 4.1 and 4.2 give a brief overview of artifical neural networks and deep learning respectively. In Section 4.3, we discuss our proposed method i.e. Derm2Vec and Section 4.4 metions the different machine learning methods that we will use for comparison of performance in this paper.
4.1 Artificial Neural Networks
To understand Deep learning (discussed in Section 4.2), first we need to understand Artifical Neural Networks (ANNs). The origin of ANNs can be traced to the study of information processing in life sciences (McCulloch and Pitts, 1943; Rosenblatt, 1962; Rumelhart et al., 1986). McCulloch & Pitts McCulloch and Pitts (1943) worked on developing nets of simple logical operators to model biological systems. Rosenblatt Rosenblatt (1958)
introduced the concept of ”perceptron” that is a biologically inspired learning algorithm. Neural networks are used for various statistical modeling and data analysis tasks and it is seen as an alternative to non-linear regression(Cheng and Titterington, 1994).
In this paper, we focus on the ”feed-forward neural networks”. BishopBishop (2006)
describes a two layered feed-forward neural network architecture consisting of an input layer that is followed by a hidden layer (that consists of hidden nodes) and finally, an output layer. The hidden nodes are like processing units that contains activation functions. Some of the commonly used activation functions are Sigmoid, ReLu, tanh and more(Putatunda, 2019). A feed forward neural networks performs layered computations where the hidden unit activations are computed using the input layer and then the output is calculated using the hidden unit activations (Larsen, 1999). Please refer to Bishop Bishop (2006) for more details on the modus-operandi of the feed-forward neural networks.
Nowadays, artificial neural networks are widely used in various applications such as weather forecasting (Kumar et al., 2012), clinical medicine (Baxt, 1995), Forex prediction (Eng et al., 2008), Location/Travel time prediction for GPS Taxis (Laha and Putatunda, 2018, 2017; Putatunda, 2017) and more. Please see Bishop Bishop (1995) for more details on artificial neural networks.
4.2 Deep learning: Deep Neural Networks and Autoencoders
Goodfellow et al. Goodfellow et al. (2016) describes deep learning as a subset of machine learning and as a form of ”representation learning”. Here the focus is on using the raw data to extract high level features. Chollet Chollet (2017) describes deep learning as learning from successive layers, each layer being some meaningful representation. In the recent years, deep learning has tasted success in various applications such as natural language processing (Deng and Liu, 2018), biomedicine (Mamoshina et al., 2016), computer vision (Voulodimos et al., 2018)2016) for more details on different deep learning techniques). However, in this paper we will focus on Deep Neural Networks (DNNs) and Autoencoders.
Deep neural networks (DNNs) originated from Artificial neural networks (see Section 4.1). ANNs with many hidden layers is known as DNNs (Mamoshina et al., 2016). These number of hidden layers determine the ”depth” of a DNN (Chollet, 2017)
. An Autoencoder is a type of deep learning method where the input and the output are same. It is classified as self-supervised learning method by CholletChollet (2017). An Autoencoder consists of two functions namely, (a) Encoder function- here the raw input data is converted into representations and (b) Decoder function- here the representations from the encoder layer are converted back to the input data. The goal of an autoencoder is to preserve as much information as possible and also add new representations on top of the raw input data (Goodfellow et al., 2016). Some of the great applications of Autoencoders include dimensionality reduction (Wang et al., 2012), cyber-emphatic design (Ghosh et al., 2018), molecular design (Blaschke et al., 2018) and many more.
4.3 Proposed Method: Derm2Vec
In this paper, first we apply a conventional DNN on the dataset for the prediction of the Erythemato-Squamous disease. The usage of DNN has’t been reported in the literature (see Section 2) for the diagnosis of ESD. Although ANNs have been used earlier as mentioned in Section 2.
In this paper, we propose a novel hybrid deep learning approach that is a two-step modeling approach comprising an autoencoders and a DNN for multi-class classification of type of ESD. This hasn’t been reported (to the best of our knowledge) in the dermatology informatics literature. We will refer to our proposed method as”Derm2Vec”. Figure 1 shows the modus-operandi of the Derm2Vec method.
We can see in Figure 1 that high dimensional the input data (i.e. features related to clinical and Histopathological attributes along with Age) is passed through an Autoencoder that comprises three encoder and three decoder layers containing , and nodes in each. The different encoding dimensions used by the encoder are , , , , , and more (see Section 5.2). The values from the innermost encoder layer is taken i.e. the Encoded output that represents a dense patient vector. We then apply a DNN (comprising a single hidden layer with nodes or two hidden layers with nodes in each) on the patient vector to get the predicted output. Since, the target variable contains classes i.e. this is a multi-class classification problem, so we use the ”Softmax” activation function in the output layer of the DNN.
4.4 Other Methods
In this paper, we will compare the performance of both our proposed method i.e. Derm2Vec and a conventional DNN method with other conventional machine learning techniques that have been used for the prediction of the Erythemato-Squamous disease in the literature as discussed in Section 2. Some of the techniques we will use in this paper for comparison are Decision trees (Breiman et al., 1984; Loh, 2014), Artificial Neural Networks (see Section 4.1), ensemble learning methods such as Extreme Gradient Boosting (Chen and Guestrin, 2016) and Random Forests (Breiman, 2001), K nearest neighbors (Cover and Hart, 1967), Support Vector Classification (Cortes and Vapnik, 1995) and finally, the Gaussian Naïve Bayes (Webb, 2010).
5 Experimental Results
Our goal is to predict the type of Erythemato-Squamous disease using our proposed method i.e. Derm2Vec along with a Deep Neural Network on the dermatology dataset described in Section 3. This is a multi-class classification problem as the target variable has classes. We will compare the performance of the above mentioned method with other conventional machine learning techniques that have been used in the literature for the prediction of ESD (see Section 2) and also mentioned in Section 4.4
such as Extreme Gradient Boosting (XGBoost), Artificial Neural Network (ANN), Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), K-nearest neighbors (KNN) and Support Vector Classification (SVC).
All the experiments are conducted in a system with mac OSX, GB RAM and Intel core i7 processor. The data analysis and model development were done in Python (Rossum, 1995). We use the scikit-learn (Pedregosa et al., 2011)2016)
and Keras(Chollet et al., 2015) libraries for implementing the various machine learning and deep learning techniques used in this paper.
5.1 Evaluation Metrics
We apply the above mentioned methods on the dermatology dataset and perform k-fold cross-validation (Refaeilzadeh et al., 2009). In this paper, we perform 10-fold cross-validation i.e. the dataset is partitioned into equal sets or folds and then subsequent iterations are performed where folds are used for model training and
fold is withheld for validation. We use the Mean cross-validation score i.e. the Mean CV score as a evaluation metric for our experiments in this paper (see Section5.2). The Mean CV score is the average of the accuracy scores obtained in each iterations while performing the 10 fold cross-validation.
A higher value of mean CV score indicates better performance of the method. It is reported in terms of percentage () in this paper.
5.2 Results: Classification of the Erythemato-Squamous Disease
Table 1 describes the mean CV scores when we apply Deep Neural Network (DNN) with different hyperparamters on the dataset and perform 10-fold cross validation. We run multiple iterations of DNN with different number of hidden layers such as , or containing different hidden nodes such as , and . Since this is in a multi-class classification setting, so the output layer of the DNN will have a Softmax activation function. We also use ”Dropouts” that ensure that the deep neural network model doesn’t overfits (Srivastava et al., 2014). Here, means that of the units are dropped randomly during training. We find that the highest mean CV score for DNN (that consists of hidden layer, hidden nodes and ) is .
|Sl no.||DNN hyperparameters||
|3||2||(100, 100)||Yes (0.5)||96.37|
|5||3||(100, 100, 100)||No||96.37|
|6||3||(100, 100, 100)||Yes (0.5)||96.08|
We now apply the Derm2Vec method on the dermatology dataset with different hyperparameters and run multiple iterations as described in Table 2. Here we tune the hyperparameters of the Autoencoder and the DNN. For the Autoencoder, the encoder and decoder comprises layers each with , and nodes as shown in Figure 1. The only hyperparameter we tune is the encoding dimensions i.e. we compress the high dimensional dataset (i.e. containing features) into a low dimensional space. We vary the encoding dimensions from , , , , , , …, upto as described in Table 2. We also tune the hyperparameters of the subsequent DNN of Derm2Vec i.e. Dropouts, the number of hidden layers and the number of hidden nodes. We find that the highest mean CV score is that is higher than what we got for DNN in Table 1. Thus, Derm2Vec performs better than DNN. In fact from Tables 1 and tab2 we can clearly see that the best performing configuration of Derm2Vec i.e. DNN with hidden layer, hidden nodes and when complemented with an Autoencoder with encoding dimension of perform better than the stand alone DNN with similar configuration (i.e. hidden layer, hidden nodes and ). This shows that the proposed hybrid deep neural network approach i.e. Derm2Vec is a better performer (when taking the prediction accuracy into consideration) than a conventional deep neural network.
In Table 3, we compare the performance of our proposed Derm2Vec method along with the DNN with some of the other conventional machine learning methods used in the literature for the diagnosis of ESD (see Sections 2 and 4.4
). We compare Derm2Vec and DNN with other methods such as Extreme Gradient Boosting (XGBoost), Random Forest (RF), Decision Tree (DT), Naïve Bayes (NB), Artificial Neural Network (ANN), K-nearest neighbors (KNN) and Support Vector Classification (SVC). For ANN, we use a simple architecture comprising one hidden layer with two hidden nodes. As far as the choice of kernel for the SVC method is concerned, we chose the ”Radial basis function (RBF)”(Cortes and Vapnik, 1995). For the KNN method, we use . The different hyperparamters that we chose for Random forests are and . Similarly, for XGBoost the hyperparameters selected were , and .
ANN has 1 hidden layer with 2 hidden nodes
SVC has RBF kernel
K=5 in KNN
The Derm2Vec method is found to be the best performer (when taking the prediction accuracy into account) followed by DNN and Extreme Gradient Boosting as described in Table 3. Both Derm2Vec and DNN perform better than XGBoost, RF, DT, ANN, SVC, NB and KNN. The mean CV score of Derm2Vec and DNN are and . However, the mean CV score of XGBoost, DT, ANN, SVC, RF, NB and KNN are , , , , , and respectively.
In this paper, we propose a novel hybrid deep learning approach i.e. Derm2Vec for the diagnosis of the Erythemato-Squamous disease (ESD) that to the best of our knowledge, hasn’t been reported in the literature. Also, we find that there haven’t been many works reported in the literature regarding the applications of deep neural networks for the classification of ESD. Although the literature is replete with works that used conventional machine learning methods (namely, Random forests, artificial neural networks, Extreme Gradient Boosting, K-nearest neighbors, decision trees, support vector machines and Naïve Bayes) for the diagnosis of ESD.
We apply both Derm2Vec and a Deep Neural Network (after tuning the hyperparameters) along with other conventional machine learning methods as mentioned above on a real world dermatology dataset. The Derm2Vec method is found to be the best performer when taking the prediction accuracy into account. Thus, we conclude that our proposed hybrid deep learning approach i.e. Derm2Vec is an effective method for the diagnosis of ESD. We feel that our proposed hybrid deep learning method Derm2Vec can be extended with some modifications in other areas of medicine such as diagnosis of liver disease, cancer prediction, prediction of diabetes and more. We plan to work in this direction in the future.
- Elsayad et al. (2018) A. M. Elsayad, M. Al-Dhaifallah, A. M. Nassef, Analysis and Diagnosis of Erythemato-Squamous Diseases Using CHAID Decision Trees, in: 15th International Multi-Conference on Systems, Signals and Devices (SSD), IEEE, 2018. doi:10.1109/SSD.2018.8570553.
- Demiroz et al. (1998) G. Demiroz, H. A. Govenir, N. Ilter, Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals, Aritificial Intelligence in Medicine 13 (1998) 147–165.
- Menai and Altayash (2014) M. E. B. Menai, N. Altayash, Differential Diagnosis of Erythemato-Squamous Diseases Using Ensemble of Decision Trees, in: Modern Advances in Applied Intelligence , 2014, pp. 369–377.
- Deng and Liu (2018) L. Deng, Y. Liu, Deep Learning in Natural Language Processing, 1 ed., Springer, Singapore, 2018. doi:10.1007/978-981-10-5209-5.
- Mamoshina et al. (2016) P. Mamoshina, A. Vieira, E. Putin, A. Zhavoronkov, Applications of deep learning in biomedicine, Mol. Pharmaceutics 13 (2016) 1445–1454.
- Voulodimos et al. (2018) A. Voulodimos, N. Doulamis, A. Doulamis, E. Protopapadakis, Deep learning for computer vision: A brief review, Computational Intelligence and Neuroscience (2018).
- Cruz and Wishart (2006) J. Cruz, D. Wishart, Applications of machine learning in cancer prediction and prognosis, Cancer Informat 2 (2006).
- Gray et al. (2013) K. R. Gray, P. Aljabar, R. A. Heckemann, A. Hammers, D. Rueckert, Random forest-based similarity measures for multi-modal classification of alzheimer’s disease, NeuroImage 65 (2013) 167–175.
- Putatunda (2018) S. Putatunda, Care2vec: A deep learning approach for the classification of self-care problems in physically disabled children, arXiv:1812.00715 [cs.LG], 2018.
- Cheng et al. (2013) J. Cheng, J. Liu, Y. Xu, F. Yin, D. W. K. Wong, N.-M. Tan, D. Tao, C.-Y. Cheng, T. Aung, T. Y. Wong, Superpixel classification based optic disc and optic cup segmentation for glaucoma screening, IEEE Transactions on Medical Imaging 32 (2013) 1019–1032.
- Tang et al. (2013) L. Tang, M. Niemeijer, J. M. Reinhardt, M. K. Garvin, M. D. Abramoff, Splat feature classification with application to retinal hemorrhage detection in fundus images, IEEE Transactions on Medical Imaging 32 (2013) 364–375.
- Luo et al. (2014) Y. Luo, A. R. Sohani, E. P. Hochberg, P. Szolovits, Automatic lymphoma classification with sentence subgraph mining from pathology reports, Journal of the American Medical Informatics Association 21 (2014) 824–832.
- Guvenir and Emeksiz (2000) H. Guvenir, N. Emeksiz, An expert system for the differential diagnosis of erythemato-squamous diseases, Expert Systems with Applications 18 (2000) 43–49.
- Ubeyli (2009) E. D. Ubeyli, Combined neural networks for diagnosis of erythemato-squamous diseases, Expert Systems with Applications 36 (2009) 5107–5112.
Xie and Wang (2011)
J. Xie, C. Wang,
Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases,Expert Systems with Applications 38 (2011) 5809–5815.
- Nanni (2006) L. Nanni, An ensemble of classifiers for the diagnosis of erythemato-squamous diseases, Neurocomputing 69 (2006) 842–8845.
- Menai (2015) M. E. B. Menai, Random forests for automatic differential diagnosis of erythemato–squamous diseases, International Journal of Medical Engineering and Informatics 7 (2015).
- Lekkas and Mikhailov (2010) S. Lekkas, L. Mikhailov, Evolving fuzzy medical diagnosis of pima indians diabetes and of dermatologica diseases, Artificial Intelligence in Medicine 50 (2010) 117–126.
- Ubeyli and Guler (2005) E. Ubeyli, I. Guler, Automatic detection of erythemato-squamous diseases using adaptive neuro-fuzzy inference systems, Comput. Biol. Med. 35 (2005) 421–433.
- Ubeyli and Dogdu (2010) E. D. Ubeyli, E. Dogdu, Automatic detection of erythemato-squamous diseases using k-means clustering, Journal of Medical Systems 34 (2010) 179–184.
- Badrinath et al. (2013) N. Badrinath, G. Gopinath, K. Ravichandran, Design of automatic detection of erythemato-squamous diseases through threshold-based abc-felm algorithm, Journal of Artificial Intelligence 6 (2013) 245–256.
- Bojarczuk et al. (2001) C. C. Bojarczuk, H. S. Lopes, A. A. Freitas, Data mining with constrained-syntax genetic programming: Applications in medical data set, in: Data Analysis in Medicine and Pharmacology (IDAMAP- 2001), London, UK, 2001.
- Dheeru and Karra Taniskidou (2017) D. Dheeru, E. Karra Taniskidou, UCI machine learning repository, Available: https://archive.ics.uci.edu/ml/datasets/SCADI, 2017. [Dataset].
- McCulloch and Pitts (1943) W. S. McCulloch, W. Pitts, A logical calculus of the ideas immanent in nervous activity, The bulletin of mathematical biophysics 5 (1943) 115–133.
- Rosenblatt (1962) F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington, 1962.
- Rumelhart et al. (1986) D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, in: D. E. Rumelhart, J. L. McClelland, C. PDP Research Group (Eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1, MIT Press, Cambridge, MA, USA, 1986, pp. 318–362.
- Rosenblatt (1958) F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review 65 (1958) 386–408.
- Cheng and Titterington (1994) B. Cheng, D. M. Titterington, Neural networks: A review from a statistical perspective, Statistical Science 9 (1994) 2–30.
C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer-Verlag, Berlin, Heidelberg, 2006.
- Putatunda (2019) S. Putatunda, Machine Learning: An Introduction, in: A. K. Laha (Ed.), Advances in Analytics and Applications, Springer Proceedings in Business and Economics, Springer Nature Singapore Pte Ltd., 2019, pp. 1–9. doi:https://doi.org/10.1007/978-981-13-1208-3_1.
- Larsen (1999) J. Larsen, Introduction to Artificial Neural Network, 1st ed., Technical University of Denmark, 1999.
- Kumar et al. (2012) A. Kumar, M. Singh, S. Ghosh, A. Anand, Weather forecasting model using artificial neural network, Procedia Technology 4 (2012) 311–318.
- Baxt (1995) W. G. Baxt, Application of neural networks to clinical medicine, Lancet 346 (1995) 1135–1138.
- Eng et al. (2008) M. H. Eng, Y. Li, Q.-G. Wang, T. H. Lee, Forecast forex with ann using fundamental data, in: 2008 International Conference on Information Management, Innovation Management and Industrial Engineering, IEEE, Taipei, Taiwan, 2008. doi:10.1109/ICIII.2008.302.
- Laha and Putatunda (2018) A. Laha, S. Putatunda, Real time location prediction with taxi-gps data streams, Transportation Research Part C: Emerging Technologies 92 (2018) 298–322.
- Laha and Putatunda (2017) A. K. Laha, S. Putatunda, Travel Time Prediction for GPS Taxi Data Streams, Indian Institute of Management Ahmedabad, Working Paper No. 2017-03-03 (2017).
- Putatunda (2017) S. Putatunda, Streaming Data: New Models and Methods with Applications in the Transportation Industry, Ph.D. thesis, Indian Institute of Management Ahmedabad, 2017.
- Bishop (1995) C. M. Bishop, Neural Networks for Pattern Recognition, 0198538642, Oxford University Press, Inc., New York, NY, USA, 1995.
- Goodfellow et al. (2016) I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.
- Chollet (2017) F. Chollet, Deep Learning with Python, 1st ed., Manning Publications Co., 2017.
- Wang et al. (2012) J. Wang, H. He, D. V. Prokhorov, A folded neural network autoencoder for dimensionality reduction, Procedia Computer Science 13 (2012) 120–127.
- Ghosh et al. (2018) D. Ghosh, A. Olewnik, K. Lewis, Application of autoencoders in cyber-empathic design, Design Science 4 (2018).
- Blaschke et al. (2018) T. Blaschke, M. Olivecrona, O. Engkvist, J. Bajorath, H. Chen, Application of generative autoencoder in de novo molecular design, Molecular Informatics 37 (2018).
- Breiman et al. (1984) L. Breiman, J. Friedman, C. J. Stone, R. Olshen, Classification and regression trees, Taylor & Francis, 1984.
- Loh (2014) W.-Y. Loh, Fifty years of classification and regression trees, International Statistical Review 82 (2014) 329–348.
- Chen and Guestrin (2016) T. Chen, C. Guestrin, XGBoost: A Scalable Tree Boosting System, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, ACM, New York, NY, USA, 2016, pp. 785–794.
- Breiman (2001) L. Breiman, Random forests, Machine Learning 45 (2001) 5–32. Kluwer Academic Publishers, Hingham, MA, USA.
- Cover and Hart (1967) T. Cover, P. Hart, Nearest neighbor pattern classification, IEEE Trans. Inf. Theor. 13 (1967) 21–27.
- Cortes and Vapnik (1995) C. Cortes, V. Vapnik, Support-Vector Networks, Mach. Learn. 20 (1995) 273–297.
- Webb (2010) G. I. Webb, Naïve Bayes, Springer, Boston, MA, 2010, pp. 713–714.
- Hastie et al. (2009) T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Springer series in statistics, 2nd ed., Springer, 2009. New York.
- James et al. (2013) G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, 2013.
- Rossum (1995) G. Rossum, Python Reference Manual, Technical Report, CWI (Centre for Mathematics and Computer Science), Amsterdam, The Netherlands, The Netherlands, 1995.
- Pedregosa et al. (2011) F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12 (2011) 2825–2830.
- Abadi et al. (2016) M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, Tensorflow: A system for large-scale machine learning, in: Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation, USENIX Association, Berkeley, CA, USA, 2016, pp. 265–283.
- Chollet et al. (2015) Chollet et al., Keras, https://keras.io, 2015.
- Refaeilzadeh et al. (2009) P. Refaeilzadeh, L. Tang, H. Liu, Cross-validation, in: L. Liu, M. Özsu (Eds.), Encyclopedia of Database Systems, Springer, Boston, MA, 2009.
- Srivastava et al. (2014) N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, Journal of Machine Learning Research 15 (2014) 1929–1958.