Individual Recognition in Schizophrenia using Deep Learning Methods with Random Forest and Voting Classifiers: Insights from Resting State EEG Streams

06/20/2017 ∙ by Lei Chu, et al. ∙ Tennessee Tech University Shanghai Jiao Tong University 0

Recently, there has been a growing interest in monitoring brain activity for individual recognition system. So far these works are mainly focussing on single channel data or fragment data collected by some advanced brain monitoring modalities. In this study we propose new individual recognition schemes based on spatio-temporal resting state Electroencephalography (EEG) data. Besides, instead of using features derived from artificially-designed procedures, modified deep learning architectures which aim to automatically extract an individual's unique features are developed to conduct classification. Our designed deep learning frameworks are proved of a small but consistent advantage of replacing the softmax layer with Random Forest. Additionally, a voting layer is added at the top of designed neural networks in order to tackle the classification problem arisen from EEG streams. Lastly, various experiments are implemented to evaluate the performance of the designed deep learning architectures; Results indicate that the proposed EEG-based individual recognition scheme yields a high degree of classification accuracy: 81.6% for characteristics in high risk (CHR) individuals, 96.7% for clinically stable first episode patients with schizophrenia (FES) and 99.2% for healthy controls (HC).

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Schizophrenia today is a chronic, frequently disabling mental disorder that affects about one per cent of the world s population [1]; And it is widely perceived as one of the most severest mental disorder compromising multi-aspect of everyday quality of life [2]. This predicament often continues in spite of pharmacological treatment of psychotic symptoms [3]. Accordingly, increasing attention is paid to the studies on individual recognition in schizophrenia (IRS) with the aim of surveillance, early detection or pre-diagnosis.

An typical IRS scheme consists of two phases: off-line training and online recognition [4, 5]. During the off-line training phase, knowledgeable and unique individual characteristics are measured and recorded by some advanced brain monitoring modalities which includes EEG, electrical impedance tomography (EIT), Magnetoencephalography (MEG), Quantitative susceptibility mapping (MAP), electroneurogram (ENG), etc. These modalities promise to piece together different factors of the brain and provide new insights to help detect and treat diseases; So they are particularly appropriate for a disease as schizophrenia which impacts many aspects of the brain [6]. In this context, however, we use EEG as brain monitoring modality for the following considerations:

  1. EEG provides a high spatio-temporal resolution data, a vivid reflection of dynamics of the brain [7].

  2. EEG is the most inexpensive method of neuroimaging which plays a fundamental role for implementing deep learning methods [8].

  3. The EEG of a normal and healthy brain will differ from a brain with disease or functioning abnormally or in different healthy condition [9].

  4. EEG shows small intra-personal differentiation and large inter-personal differentiation [10].

  5. A number of brain diseases are feasible to diagnose, study and analyze by the EEG; The diseases includes Headache, Parkinson’s disease, Schizophrenia, Attention Deficit Hyperactivity Disorder, etc. See monographs [8, 6] for the more detailed summary.

Note that preprocessing for these raw data is also accomplished in this phase. The ”brain fingerprinting” is then constructed for every kind of candidate. During the online recognition phase, various methods utilizing the recorded data and their extracted features can be applied to classify the candidates when the online individual characteristics were collected and refined.

I-a Related Works and Motivations

EEG-based biometrics offer an exciting new form of human computer interface where a device can be controlled and provide available data for individual recognition analysis. Related works on EEG-based individual recognition analysis are summarized briefly as follows.

So far, manually-designed experimental protocols and EEG features that have been commonly utilized for the devising of EEG biometric systems aimed to recognize characteristics of spatially limited sets of brain regions. Nakayama and Abe discuss the feasibility of using single-channel EEG waveforms for single-trial classification of viewed characters [11]. Berthomier and coauthors present an automatic analysis of single-channel frequency EEG measurements for validation in healthy individuals [12]

. The monograph gives a comprehensive study of classification of EEG signals using single channel independent component analysis, power spectrum, and linear discriminant analysis

[13]. To take advantage of spatial information provided by multi-channel EEG and obtain higher classification accuracy, Krajca achieve an automatic identification of significant graphoelements in multi-channel EEG recordings by adaptive segmentation and fuzzy clustering [14]

. Prasad and coauthors realize a single-trial eeg classification using logistic regression based on ensemble synchronization; These works could classify each single trial of EEG as belonging to a patient with schizophrenia or a healthy control subject with 73% accuracy

[15]

. However, these works are based on features extracted from single channel or multi-channel EEG fragment and fail to obtain the accurate and robust classification result, thus are unfeasible for practices. Fortuately, recently advanced big data analysis based on streaming data

[16, 17, 18] could provide new means and ideas for the planning and design of classification scheme.

In this paper we conduct the IRS task based on resting state EEG. There are two reasons for that. First, evidence suggests that electrical activities resting state organizes and coordinates neuronal functions

[19]. Second, certain tasks cannot be performed by certain group of people, e.g., schizophrenia, Attention Deficit Disorder, or hyperactivity disorder [20].

The difficulty encountered in resting state EEG based IRS scheme is that resting state EEG [21]

streams lack task-related feature, thus leading to a hard task to obtain the best and unique feature for an individual. Accordingly, there has been emerging a great need for the capability to extract features automatically. Kottaimalai and coauthors put forward EEG signal classification using Principal Component Analysis with Neural Network in Brain Computer Interface applications

[22]. Li and Fan suggest a classification method to separate Schizophrenia and depression by EEG with artificial neural networks (ANN) [23]

. Ruffini et al. present EEG-driven classification for Prognosis of Neurodegeneration in At-Risk Patients by recurrent neural networks (RNN)

[24]

. ANN-based and RNN-based neural network structures require the non-vectorial inputs such as matrices to be converted into vectors which has been proved of problematic

[25, 26]

. The vectorization of EEG streams would lose spatio-temporal information and give a very large solution space that demands very special treatments to the network parameters and high computational cost. As novel alternatives, convolution neural network (CNN) can help improve a learning system with three advantages sparse interactions, parameter sharing and equivariant representations

[27]. Recently, Ma et al. conduct resting state EEG-based biometrics for individual identification using convolutional neural networks [28]; And their results indicate that the CNN-based joint-optimized EEG-based biometric system yields a high degree of accuracy of identification (88%) which still can not reach the practical requirement. In summary, to obtain a higher classification accuracy, combining CNN-based network structure with spatiotemporal EEG analysis will be indispensable.

I-B Our Contributions

Based on considerations above, we propose a new IRS scheme using advanced deep learning methods, aiming at automatically extracting features and performing classification. Our main contributions are summarized as follows.

  1. Instead of utilizing short term EEG data which were found insufficient to provide required information for IRS analysis, we employ streaming EEG data collected by multi-channel scalp electrodes.

  2. Three kinds of advanced deep learning methods were developed for IRS analysis.

  3. The classifier which was widely used in classical deep learning methods is replaced by RF with aim of improving classification accuracy.

  4. To tackle the classification problem with EEG data streams, a voting layer is developed at the top of the employed neural networks.

  5. Various experiments are conducted to investigate the effectiveness and robustness of proposed IRS scheme.

The remainder of this paper is structured as follows. Section II firstly introduces the procedure of collection, preprocessing and mathematical representation for EED data streams; The proposed IRS scheme based on advanced deep learning methods is then developed in the second part of Section II. In Section III, numerical case studies are provided to evaluate the performance of the proposed IRS scheme. Conclusion and acknowledgement of this research is given in Section IV and Section V, respectively.

Ii Materials and Methods

Ii-a EEG Data Collection

The present work aimed to study IRS issue by assessing three types of subjects: characteristics in high risk (CHR) individuals, clinically stable first episode patients with schizophrenia (FES) and healthy controls (HC). 120 subjects (40 CHRs, 40 FESs and 40 HCs) were included; And all subjects to be investigated in this context were recruited from outpatients at Shanghai Mental Health Center. All subjects were free of mental retardation, neurological diseases, substance abuse or alcohol and any physical illness that may influence their cognitive function. The study protocol was approved by the Institutional Review Board of Shanghai Mental Health Center, and informed consents were obtained from all the subjects.

The experimental data were provided by Department of EEG Source Imaging in Shanghai Mental Health Center. So the data collection process is same with their previous work [29]. Participants were seated 1 m from the screen in a sound attenuated and electrically shielded chamber with dim illumination. EEGs were recorded from 64-channel scalp electrodes mounted in an elastic cap (BrainCap, Brain Products, Inc., Bavaria, Germany) including two pairs of vertical and horizontal electrooculography (EOG) electrodes. The electrode-scalp impedance was kept below 5 k for each electrode. Our analysis was performed on eye-open resting conditions, each single recording lasting over 300 seconds in time. Data recording was referenced to the tip of nose and sampled at Hz.

Ii-B EEG Data Preprocessing and Mathematical Representation

The brain vision analyzer (1.05, Brain Products, Inc.) was utilized for EEG preprocessing [30]. Artifacts caused by vertical and horizontal eyes movements and blinks were removed off-line by an ocular correction algorithm [31]. All the artifact-reduced EEG data were referenced using the common average reference, band-pass filtered into 0.01 C50 Hz using a zero phase-shift IIR filter (24 dB/Oct). See Fig. 1 and Fig. 2 for an illustration. After that, the broadband EEG signals containing artifacts were excluded using EEGLAB [31]. After the preprocessing, in addition to preserve the complete signals, the EEG signals were also band-pass filtered into four classic frequency bands, i.e., Hz, Hz, Hz, Hz and Hz bands, respectively, using least-squares FIR filters [32].

Fig. 1: Raw eeg data.
Fig. 2: Filtered eeg data.

In order to facilitate subsequent analysis, the mathematical representation of filtered EEG streams are described in the following. Let and denote the number of the available channel number of scalp electrodes and sampling time, respectively. To ensure the same length of collected EEG data in the following analysis, we have for all subjects. The total length of EEG data collected at every scalp electrode is . For th type subject, a sliding window based data allocation scheme for the large EEG data matrix is presented as follows. Let be the sliding window size and , then a sequence of matrix

(1)

is obtained to represent the collected EEG data streams. As shown in Fig. 3, these fragments are considered as raw Brain fingerprinting of all subjects.

(a) FES subject
(b) CHR subject
(c) HC subject
Fig. 3: Examples of raw brain fingerprinting of three kinds of subjects (fragment size are: and ).

Ii-C Proposed IRS Scheme based on Advanced Deep Learning Methods

Ii-C1 Classical Deep Learning Structures

As introduced in the Section I-A, we have introduced previous biometrics that embrace classical deep learning methods, such as ANN, RNN, CNN and their modified versions. Here, some technical details are discussed in order to get better understanding on how to apply them. Input, a kind of network and classifier contribute to the basic elements of a deep learning structure (See Fig. 4). The implementation of classical deep learning methods for IRS problem is discussed in the following.

Fig. 4: The basic deep learning structure.

As introduced in the Section II-B, the collected EEG streams recorded for off-line analysis are represented by data fragments , where , , . These fragments were utilized to train the neural networks (ANN, CNN and RNN) after local normalization scheme represented by

(2)

For IRS classification problems based on deep learning methods, it is standard to use classifier at the top. Let the subjects be a finite space with a finite observation space . Let

be the learned model of the conditional probability of seeing observation data

with people . Let and be the activation of the penultimate layer nodes and the weight connecting the penultimate layer to a classifier layer, respectively. The total input into the classifier layer, denoted by , is

(3)

Given the classifier, we have

(4)

The predicted class for the single fragment would be

(5)

Ii-C2 Deep Learning Methods using RF Classifier

Most deep learning methods for classification utilizing convolutional and full-connected layers have used classifier to learn the small size parameters. There are exceptions, significantly in works [33, 34], supervised embedding with nonlinear NCA [35], semi-supervised deep embedding [36]

and deep learning using linear support vector machines

[37]. In this paper, we replace the with RF for classification. RF has been studied extensively in the fields of nonparametric statistics [38] and continue to be very popular because of its simplicity and because it is very successful for many practical problems [39, 40]. Here we firstly summarize the basic principle of RF as follows.

Let and be the data features and the corresponding labels. RF is built from a training set that make predictions for new points by looking at the neighborhood of the point, formalized by a weight function in th tree:

(6)

Here, is the non-negative weight of the th training point relative to the new point and is the number of nodes in the penultimate layer of the neural network in this work. For any , the weights sum to one. Since a forest averages the predictions of a group of trees with individual weight functions , its predictions are

(7)

then the predicted class is

(8)

For more technical details about RF, interested readers are referred to the distinguished works by Breiman [41, 42].

Another advanced classifier, a linear multi-class support vector machine (mSVM) which has been proved of higher classification accuracy in [43, 44], is also adopted for the purposes of comparison. We verified the effectiveness of the proposed scheme on the well-known dataset: a nine-layer CNN achieved a test error with RF classifier, with mSVM classifier and with classifier.

Ii-C3 Streaming EEG Classification with a Voting Scheme

It is worth noting that the above modified deep learning methods are suitable to classify subjects with single fragment . Let . To handle the scenario that the subjects are with EEG streams , we develop a voting layer whose decision rule is denoted by

(9)

where .

Moreover, in this paper, we also adopt the some other recently developed techniques to improve the performance of deep learning methods employed in the IRS analysis. Specially, we use exponential linear unit (ELU) proposed in [45] to accelerate the learning speed in deep neural networks. The technique [46] is utilized to prevent substantial overfitting problem.

Lastly, for the readers’ convenience, we give a brief summary of the modified deep learning methods employed in this context in the following two tables (Tab. I and Tab. II).

CNN ANN RNN
1st Input Input Input
2nd 2(Conv. + ELU) (kernel:

, stride:2)

Vectorization Vectorization
3rd Max Pooling (kernel:); Dropout(Rate:0.25) Dense(512) recurrent layer (hidden units = 100)
4th 2(Conv. + ELU) (kernel:)

Activation(’relu’)

Dense(3)
5th Max Pooling (kernel:, stride:2); Dropout(Rate:0.25) Dropout(Rate:0.25) or mSVM or RF
6th 2(Conv. + ELU) (kernel:) Dense(512) Output(Predict)
7th Max Pooling (kernel:, stride:2); Dropout(Rate:0.25) Activation(’relu’) -
8th Dense(128) + ELU Dropout(Rate:0.25) -
9th Dropout(Rate:0.5) Dense(512) -
10th Dense(128) + ELU Activation(’relu’) -
11th Dropout(Rate:0.5) Dropout(Rate:0.25) -
12th Dense(3) + ELU Dense(3) -
13th Dropout(Rate:0.5) Dropout(Rate:0.5) -
14th or mSVM or RF or mSVM or RF -
15th Output(Predict) Output(Predict) -
TABLE I: THE STRUCTURES OF NEURAL NETWORKS.
Deep Learning Methods Explanation
ANNV Classical ANN with classifier and a voting layer
RNNV Classical RNN with classifier and a voting layer
CNNV Classical CNN with classifier and a voting layer
ANNV+mSVM Modified ANN using mSVM classifier and a voting layer
RNN+mSVM Modified ANN using mSVM classifier and a voting layer
CNN+mSVM Modified ANN using mSVM classifier and a voting layer
ANN+RF Modified ANN utilizing RF classifier and a voting layer
RNN+RF Modified RNN utilizing RF classifier and a voting layer
CNN+RF Modified CNN utilizing RF classifier and a voting layer
TABLE II: MODIFIED DEEP LEARNING METHODS

Iii Case Studies and Discussion

In this section, various experiments are developed to evaluate the performance of the proposed IRS schemes. We use the cross validation method to evaluate the performance of the proposed IRS scheme. Our results are the averages of 1000 independent run on GeForce GTX 750.

Iii-a The Accuracy of Time-domain and Frequency-domain EEG Data Streams

Time-domain (as introduced in Section. II-B

) and Frequency-domain (Amplitude of Fourier transform) EEG Data Streams are utilized firstly to perform and report accuracy assessments of proposed IRS schemes. The window size are set as

. The test data size are kept same with the training data for every subject. Tab. III shows that the proposed CNNV+RF has the best classification accuracy against other methods.

FES HC CHR FES HC CHR
ANNV 0.809 0.831 0.622 0.807 0.824 0.631
RNNV 0.742 0.803 0.594 0.731 0.792 0.588
CNNV 0.923 0.952 0.755 0.915 0.949 0.749
ANNV+mSVM 0.811 0.841 0.643 0.804 0.929 0.639
RNNV+mSVM 0.759 0.826 0.602 0.744 0.807 0.589
CNNV+mSVM 0.946 0.983 0.790 0.937 0.985 0.766
ANNV+RF 0.827 0.846 0.657 0.813 0.839 0.655
RNNV+RF 0.766 0.839 0.613 0.793 0.816 0.649
CNNV+RF 0.967 0.992 0.816 0.955 0.981 0.799
TABLE III: Classification Accuracy of Time-domain and Frequency-domain EEG Data Streams

Iv Conclusion

In conclusion, we have shown that CNNV-RF performs better than and CNNV-mSVM on a well-known dataset () and resting state EEG streams used in this paper. Switching from or mSVM to RF is incredibly simple and appears ro be helpful for classification problems. The experimental results show that the classification performance would be improved as the size of training and data database becomes larger. In the future, the proposed biometrics system should be tested on a larger group and more classes of subjects, providing further identification of accuracy, robustness and applicability of the system. The experiments also suggest that our results can be improved simply by waiting for faster GPUs.

V Acknowledgements

We are appreciated for department of EEG source imaging leaded by professor Jijun Wang and professor Chunbo Li from SHJC for data providing and discussion. We also grateful to Dr. Tianhong Zhang from SHJC for his expert collaboration on data analysis.

References

  • [1] T. R. Insel, “Rethinking schizophrenia.” Nature, vol. 468, no. 7321, pp. 187–193, 2010.
  • [2] S. Aswin, A. R. Bialas, D. R. Heather, D. Avery, T. R. Hammond, K. Nolan, T. Katherine, P. Jessy, B. Matthew, and V. D. Vanessa, “Schizophrenia risk from complex variation of complement component 4:,” Nature, vol. 530, no. 7589, p. 177, 2016.
  • [3] R. A. Gould, K. T. Mueser, E. Bolton, V. Mays, and D. Goff, “Cognitive therapy for psychosis in schizophrenia: an effect size analysis.” Schizophrenia Research, vol. 48, no. 2 C3, pp. 335–342, 2001.
  • [4] A. Ulas, U. Castellani, V. Murino, M. Bellani, M. Tansella, and P. Brambilla, Heat Diffusion Based Dissimilarity Analysis for Schizophrenia Classification.   Springer Berlin Heidelberg, 2011.
  • [5] H. S. Choi, B. Lee, and S. Yoon, “Biometric authentication using noisy electrocardiograms acquired by mobile sensors,” IEEE Access, vol. 4, pp. 1266–1273, 2016.
  • [6] A. E. Hassanien and A. T. Azar, Brain Computer Interfaces: Current Trends and Applications, 2014.
  • [7] C. Guger, A. Schlogl, C. Neuper, D. Walterspacher, T. Strein, and G. Pfurtscheller, “Rapid prototyping of an eeg-based bci,” IEEE Trans.on Neural Systems and Rehab.eng, 2001.
  • [8] C. P. Panayiotopoulos, EEG and brain imaging.   Springer London, 2010.
  • [9] M. G. Knyazeva and G. M. Innocenti, “Eeg coherence studies in the normal brain and after early-onset cortical pathologies,” Brain Research Reviews, vol. 36, no. 2 C3, pp. 119–128, 2001.
  • [10] M. A. Guevara, I. Lorenzo, C. Arce, J. Ramos, and M. Corsi-Cabrera, “Inter- and intrahemispheric eeg correlation during sleep and wakefulness.” Sleep, vol. 18, no. 4, p. 257, 1995.
  • [11] M. Nakayama and H. Abe, “Feasibility of using single-channel eeg waveforms for single-trial classification of viewed characters,” in International Conference on Information Society, 2010, pp. 218–223.
  • [12] C. Berthomier, X. Drouot, M. Herman-Stoica, P. Berthomier, J. Prado, D. Bokar-Thire, O. Benoit, J. Mattout, and M. P. D’Ortho, “Automatic analysis of single-channel sleep eeg: validation in healthy individuals.” Sleep, vol. 30, no. 11, pp. 1587–1595, 2007.
  • [13] H. Tjandrasa and S. Djanali, Classification of EEG Signals Using Single Channel Independent Component Analysis, Power Spectrum, and Linear Discriminant Analysis, 2016.
  • [14] V. Krajca, S. Petranek, I. Patakova, and A. Varri, “Automatic identification of significant graphoelements in multichannel eeg recordings by adaptive segmentation and fuzzy clustering.” International Journal of Bio-Medical Computing, vol. 28, no. 28, pp. 71–89, 1991.
  • [15] P. D. Prasad, H. N. Halahalli, J. P. John, and K. K. Majumdar, “Single-trial eeg classification using logistic regression based on ensemble synchronization.” IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 3, pp. 1074–1080, 2014.
  • [16] E. Landhuis, “Neuroscience: Big brain, big data,” Nature, 2017.
  • [17] L. Chu, R. Qiu, X. He, Z. Ling, and Y. Liu, “Massive streaming pmu data modeling and analytics in smart grid state evaluation based on multiple high-dimensional covariance tests,” 2017, accepted by IEEE TRANSACTIONS ON BIG DATA.
  • [18] R. C. Qiu and P. Antonik, Smart Grid and Big Data: Theory and Practice.   Hoboken, NJ, USA: John Wiley and Sons, 2015.
  • [19] S. Kuhn and J. Gallinat, “Resting-state brain activity in schizophrenia and major depression: a quantitative meta-analysis.” Schizophrenia Bulletin, vol. 39, no. 2, pp. 358–65, 2011.
  • [20] C. Karatekin and R. F. Asarnow, “Working memory in childhood-onset schizophrenia and attention-deficit/hyperactivity disorder.” Psychiatry Research, vol. 80, no. 2, pp. 165–176, 1998.
  • [21] S. R. Sponheim, B. A. Clementz, W. G. Iacono, and M. Beiser, “Resting eeg in first-episode and chronic schizophrenia.” Psychophysiology, vol. 31, no. 1, pp. 37–43, 1994.
  • [22] R. Kottaimalai, M. P. Rajasekaran, V. Selvam, and B. Kannapiran, EEG signal classification using Principal Component Analysis with Neural Network in Brain Computer Interface applications, 2013.
  • [23] Y. J. Li and F. Y. Fan, “Classification of schizophrenia and depression by eeg with anns,” in International Conference of the Engineering in Medicine and Biology Society, 2006, pp. 2679–2682.
  • [24] G. Ruffini, D. Ibanez, M. Castellano, S. Dunne, and A. Soria-Frisch, EEG-driven RNN Classification for Prognosis of Neurodegeneration in At-Risk Patients.   Springer International Publishing, 2016.
  • [25] J. Gao, Y. Guo, and Z. Wang, “Matrix neural networks,” CoRR, vol. abs/1601.03805, 2016. [Online]. Available: http://arxiv.org/abs/1601.03805
  • [26] J. Schmidhuber, “Deep learning in neural networks: An overview.” Neural Networks, vol. 61, pp. 85–117, 2015.
  • [27] Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
  • [28] L. Ma, J. W. Minett, T. Blu, and S. Y. Wang, “Resting state eeg-based biometrics for individual identification using convolutional neural networks,” in International Conference of the IEEE Engineering in Medicine and Biology Society, 2015, p. 2848.
  • [29] J. Sun, Y. Tang, K. O. Lim, J. Wang, S. Tong, H. Li, and B. He, “Abnormal dynamics of eeg oscillations in schizophrenia patients on multiple time scales.” IEEE transactions on biomedical engineering, vol. 61, no. 6, pp. 1756–1764, 2014.
  • [30] M. M. Lorist, E. Bezdan, M. T. Caat, M. M. Span, J. B. T. M. Roerdink, and N. M. Maurits, “The influence of mental fatigue and motivation on neural network dynamics; an eeg coherence study,” Brain Research, vol. 1270, no. 2, pp. 95–106, 2009.
  • [31] C. Brunner, A. Delorme, and S. Makeig, “Eeglab - an open source matlab toolbox for electrophysiological research.” Biomedizinische Technik and biomedical Engineering, vol. 58, no. 18, pp. 3234–3234, 2013.
  • [32] P. P. Vaidyanathan and T. Nguyen, “Nguyen, t.q.: Eigenfilters: A new approach to least squares fir filter design and applications including nyquist filters. ieee trans. circuits syst. 34, 11-23,” IEEE Transactions on Circuits and Systems, vol. 34, no. 1, pp. 11–23, 1987.
  • [33] J. Weston, F. Ratle, H. Mobahi, and R. Collobert, Deep Learning via Semi-supervised Embedding.   Springer Berlin Heidelberg, 2012.
  • [34] J. Weston, F. Ratle, and R. Collobert, “Lecture notes in computer science,” in International Conference, 2008, pp. 1168–1175.
  • [35] G. W. Taylor, R. Fergus, G. Williams, I. Spiro, and C. Bregler, “Pose-sensitive embedding by nonlinear nca regression.” in Advances in Neural Information Processing Systems 23: Conference on Neural Information Processing Systems 2010. Proceedings of A Meeting Held 6-9 December 2010, Vancouver, British Columbia, Canada, 2010, pp. 2280–2288.
  • [36] J. Weston, F. Ratle, and R. Collobert, “Deep learning via semi-supervised embedding,” in

    International Conference on Machine Learning

    , 2008, pp. 1168–1175.
  • [37] C. Gentile, “A new approximate maximal margin classification algorithm,” Journal of Machine Learning Research, vol. 2, no. 2, pp. 213–242, 2001.
  • [38]

    R. Genuer, “Variance reduction in purely random forests,”

    Journal of Nonparametric Statistics, vol. 24, no. 3, pp. 1–20, 2012.
  • [39] M. Pal, “Random forest classifier for remote sensing classification,” International Journal of Remote Sensing, vol. 26, no. 1, pp. 217–222, 2005.
  • [40] R. D azuriarte and d. A. S. Alvarez, “Gene selection and classification of microarray data using random forest.” Bmc Bioinformatics, vol. 7, no. 1, p. 3, 2006.
  • [41] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.
  • [42] ——, “Consistency for a simple model of random forests,” Applied Economics - APPL ECON, 2004.
  • [43] R. Arunkumar and P. Karthigaikumar, “Multi-retinal disease classification by reduced deep learning features,” Neural Computing and Applications, pp. 1–6, 2015.
  • [44] Y. Tang, “Deep learning using linear support vector machines,” Computer Science, 2015.
  • [45] A. Shah, E. Kadam, H. Shah, S. Shinde, and S. Shingade, “Deep residual networks with exponential linear unit,” in

    International Symposium on Computer Vision and the Internet

    , 2016, pp. 59–65.
  • [46]

    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in

    International Conference on Neural Information Processing Systems, 2012, pp. 1097–1105.