1 Introduction
A deoxyribonucleic acid (DNA) microarray, a collection of microscopic DNA spots attached to a solid surface, is an important biotechnology that allows scientists to measure the expression levels of a large number of genes simultaneously. One of the main purposes of conducting DNA microarray experiments is to classify biological samples and predict clinical or treatment strategies for certain disease, such as various cancers, using gene expression data. Although a classification problem is not new to statistical or data mining community, gene expression data from DNA microarray experiments exhibits some unique features. Two most important ones are high dimensionality and small sample size, which are due to the fact that the number of genes collected from microarray experiments is much bigger than than the number of available samples
[1, 2, 3, 4, 5, 6, 8, 7, 9]. Therefore, classifying DNA gene microarray data requires a new set of statistical or data mining methods that could reduce the dimension of the data or select variables of great significance while maintaining a high level of classification accuracy. From the biological perspective, this is equivalent to identifying informative genes associated with the occurrence of the disease under study. There are many results in the literature that address the problem of informative gene selection for DNA microarray classification. A large group of these results [10, 13, 12, 11, 14, 15, 16] are based on support vector machine (SVM) and its variants, a class of nonprobabilistic machine learning algorithms that seek a nonlinear decision boundary efficiently. For instance, fMRIbased hierarchical SVM was applied to the automatic classification and grading of liver fibrosis in [11]. A postprocessing SVM [12] was proposed to detect incorrect and correct measurements in realtime continuous glucose monitoring systems. In [13], Maulik et al. proposed a novel transductive SVM which achieved a better classification accuracy to select potential gene markers to predict cancer subtypes. Another group of researchers [17, 18, 19, 20, 21, 22, 23] address the problem of dimension deduction for microarray gene expression data by adding penalty terms related to the number of features to the cost functional the algorithm aims to minimize. This penalty strategy gives rise to algorithms including: LASSO with norm penalty [17] and its improved variants [18, 19, 20]; sparse logistic or multinomial logistic regression with Bayesian regularization
[22, 23].To improve the classification performance on the DNA microarray gene expression data, the idea of selecting selecting informative genes in groups is exploited due to the reason that complex biological processes, such as, tumor and cancer prediction and diagnosis, are not determined by a single gene but by the interactions of a few genes in groups. The group gene selection counterparts of some of the previously reviewed algorithms are developed. In [24], Yuan et al. presented the group Lasso (GL) algorithm, which is extended to logistic regression by Meier et al. in [25]. In order to obtain groupwise sparsity and within group sparsity simultaneously, Simon et al. [26] presented a sparse group lasso (SGL) algorithm, which is solved numerically by the method of accelerated generalized gradient descent. The multiclass classification variant of the multinomial sparse group lasso was then developed in [27].
Though the sparse group lasso [26] and its improved variants could obtain sparsity within a group by leading into the extra
norm penalty, they do not give the biological significance of the genes in the same selected group of genes. To solve the problem of estimating the gene coefficients that represent the importance of individual gene while at the same time performing gene selection, statistical methods such as adaptive LASSO
[20] and adaptive elastic net [21] that could adaptively select variables should be applied. The adaptive lasso [20] is a weighted penalization method with adaptive weights initially estimated from the data and it enjoys the oracle properties. Subsequently, Zou and Hastie proposed adaptive elastic net, an and mixed penalization method, that could be applied to highdimension data while maintaining the oracle properties in [21]. This improves the adaptive LASSO and the original elastic net by combining adaptively weighted penalization term with weights estimated from the original elastic net together with the penalization term from the elastic net. However, when applying the adaptive elastic net to high dimension gene expression data, owing to the requirements of the lower precision, some significant genes might be falsely assigned smaller weight values in the initial estimator. Therefore, these important genes would be incorrectly deleted from the model by shrinkage method, which leads to lower prediction accuracy in informative gene selection for microarray DNA data. In addition, adaptive elastic net might not perform well if the pairwise correlation between variables are not high.To address these issues, we propose a new method: adaptive elastic net with conditional mutual information (AENCMI) that weighs both the and penalization terms with weights estimated based on the conditional mutual information among different genes in the data set. How the weights are estimated is a major departure of our method from the well established method of adaptive elastic net. The idea of applying information theory on solving gene selection problems have been explored in the literature (see [28, 29]
for example) where various information theory based feature selection algorithms have been developed to search the subset of best genes. In particular, conditional mutual information is used to conjecture the gene regulatory networks of cancer gene expression datasets
[30]. By incorporating the conditional mutual information, the conditional dependency among genes, into the adaptive weight estimation, the aforementioned drawbacks of adaptive elastic net could be avoided to certain degree. In this article, we will present a full mathematical description of the new method of AENCMI and prove a theorem that explains why our method could encourage a grouping effect. The optimization problem will then be solved by a regularized solution path algorithm – the pathwise coordinate descent algorithm (PCD). We then evaluate the performance of our algorithm on colon cancer and leukemiaer gene expression datasets. The performance of the proposed algorithm will also be compared with other popular methods, including SVM, classic elastic net and adaptive elastic net. The experiment results show that our algorithm performs the best in the sense that it obtains the highest classification accuracy by using the smallest number of genes.The rest of the paper will be organized as the following: Section 2 briefly states the research problem and the preliminaries. The adaptive elastic net with conditional mutual information (AENCMI) is presented in Section 3. The regularized solution path algorithm, PCD, is developed in Section 4. Experimental results on the two cancer gene expression datasets are provided in Section 5, and Section 6 concludes the whole article.
2 Problem Statement and Preliminaries
Microarray classification in its essence is a binary classification problem, the abstract formulation of which we give below. For a training set , where is the input vector and denotes its class label, and then the classification problem is aim to learn a discrimination rule . Hence, we can allot a class label to any new sample. For microarray gene expression data, and respectively represent the number of tumor types and the number of genes. Let be the response vector and be the model matrix. Let represent the th predictor. We also assume that the response vector is centered and the columns of are standardized, i.e.,
(1) 
According to a general linear regression model, we can predict the response vector
by(2) 
where is the estimated coefficient vector. Note that the number of nonzero estimated coefficients in is equivalent to the final number of selected genes. Let indicate the classification function, where denotes the indicator function and is the prediction value for the given sample by the discrimination rule. Hence, the binary classification problem could be handled by the models of regression.
The aim of this paper is twofold: to predict the type of tumor for a new sample and to automatically select the relevant important genes to a biological process, in particular, the gene selection in microarray data classification. These two challenging problems will be solved using a novel algorithm: Adaptive Elastic Net with Conditional Mutual Information (AENCMI) that incorporates conditional mutual information into the variable selection process.
For the sake of completeness, we review some basic definitions from information theory including entropy, mutual information, conditional mutual information in the next subsection.
2.1 InformationTheoretic Measures
This section states the principles of information theory by focusing on entropy and mutual information in a concise form. Then we give some basic concepts about information theory [31].
Let, and
be three sets of discrete random variables. For simplicity,
is expressed as simply in this paper. The information entropy of variable can be defined as , wheredenotes the probability distribution of each
. The entropy of a random variable is an average measure of its uncertainty. The conditional information entropy of given is denoted as , wheredenotes the conditional probability distribution. The conditional entropy
is the entropy of a variable conditional on the given another variable .Mutual information (MI) measures the amount of information shared by and which are used to describe the degree of correlation between the two variables sets and its definition is as following
(3) 
where denotes the joint probability of and . By the definition of mutual information, the larger is, the more relevant variables and will be.
Conditional mutual information (CMI) measures conditional dependency between two variables given other variable. The CMI of variables and given is defined as
(4) 
where denotes the amount of information shared by variables and given variable . The CMI will become as a particularly important property in understanding the results of this paper.
3 Adaptive Elastic Net with Conditional Mutual Information
In this section, we propose a strategy of adaptive gene selection, which would be further developed into the AENCMI algorithm.
3.1 Strategy of adaptive gene selection
For cancer gene expression data, variables (genes) and are two vectors, where the elements denote their values of expression in different conditions or samples. Mutual information usually describes the degree of correlation between genes and . Conditional mutual information not only describes the correlation degree between pairwise genes given the class label but also surveys conditional dependency between two genes when the class label is given. In the following, we propose a mechanism to assess the importance of th gene by applying the conditional mutual information.
Define to be the individual significance of th gene, i.e.,
(5) 
where respectively denotes the th, th gene expression level among all the genes, where . is the classconditional correlation between the gene and all the other genes measures, thus it measures the average information shared by gene and remaining genes conditionally on . Here, includes the complementary information between and all other genes, which enables us to assess correlation between genes in groups and could help us make more accurate predictions. Moreover, can be used as a quantitative index to measure how significant a gene is, i.e., the higher the value of the is, the more significant the gene will be. As an extreme case, if the gene can not provide useful information for the class label.
Based on (5), we further construct the weight coefficient for the th gene
(6) 
where the controllable parameter is a given threshold. The th gene has distinct significance when . However, the th gene is not significant in predicting if . We denote the matrix of weights as
(7) 
where .
Remark 1
The computation of the weights and their meanings are not given in the multinomial sparse group lasso model [27]. The initial consistent estimator is used to construct the weights for the adaptive lasso [20] and the initial elastic net estimator is used to construct the weights for the adaptive elastic net [21]. Although the abovementioned two weights have clear statistical meanings and could be roundly applied to evaluate the gene importance, they can not indicate the obvious biological significance. The strategy of adaptive gene selection presented in this paper has biological significance.
3.2 Statistical learning model
Utilizing the weight matrix (7) that contains the conditional mutual information of individual genes, we propose the following penalty term for the adaptive elastic net:
(8) 
where .
The proposed AENCMI algorithm aims to seek the following:
(9) 
where , are the regularization parameters. Here, we use a squared error loss term.
The connection between the proposed AENCMI and some of the classic method is that: the AENCMI (9) could be transformed into adaptive lasso [20] if ; and into elastic net [18] if the weight matrix
is the identity matrix.
Remark 2
In comparison with the adaptive elastic net [21], the proposed model (9
) uses adaptive weights based on conditional mutual information in instead of ridge regression. Since the same weight is imposed on both 1norm penalized coefficient and 2norm penalized coefficient and conditional mutual information is robust to outliers in dataset, the shrinkage of the adaptive elastic net with conditional mutual information would produce better performance in the process of automatic gene selection and have clear biological significance.
Since the complex diseases are caused by disruption in gene pathways rather than individual genes, disease diagnosis using gene expression data should bring insights into the grouping information. The elastic net algorithms [21, 18] are widely known for encouraging grouping effect. In general, if the regression coefficients of the group with highly correlated variables incline to be equal, then the regression approach can detect the grouping effect. It should be noted that the important genes may be highly correlated with some inessential genes, the redundant noise variables could be included in these models. The following theorem shows that the AENCMI model can select the important genes within each group adaptively, which in turn encourages an adaptive grouping effect. The and in the following theorem are “significance of gene ranking”.
Theorem 1
Suppose that the predictors , are standardized, if holds, then we have
(10) 
where and .
Proof Let
(11) 
Note that Equ. (7) is an unconstrained convex optimization, the subgradient of Equ. (11) with respect to satisfies
(12) 
For , according to Equ. (12) we have
(13) 
It should be noted that . Hence, Equ. (13) can be represented as
(14) 
Similarly to the above, it can be easily obtained that
(15) 
Note that , i.e., We subtract Equ. (14) from Equ. (15)
(16) 
According to Equs. (7) and (11), we can obtain
Hence
From Equ. (10), it can be easily obtained
(17)  
Finally, we substitute Equ. (3.2) and Equ. (17) and Equ. (16) yields (10) which completes the proof.
Note that Theorem 1 still holds for the case and . The only difference of representation is substituting for . If the and , then the following Corollary can be easily obtained.
Corollary 1
Assume that the predictors are standardized. Let denote the optimal solution of Equ. (9). If , and , then
(18) 
Remark 3
It should be noted that the , the parameter that quantitatively describes the grouping effect, in Equ. (18) is far less than 1. Therefore, AENCMI has stronger grouping effect for the case , in comparison with Elastic Net. This implies that more genes are deleted together by norm shrinkage if they are less important to the classification. On the basis of Theorem 1, AENCMI can allot identical coefficients to the genes only if and . It is shown that given two gene groups with similar significance of the ranking (), the one with more genes should have a larger group size. This implies that AENCMI can adaptively control the size of the selected groups and thus adaptively select the important genes within each group by assessing the significance of the gene ranking.
4 Algorithm
In this section, we give the algorithm for solving the Adaptive Elastic Net with Conditional Mutual Information (AENCMI) on the augmented space, and hence some popular algorithms, such as LASSO, Forward Stagewise and LARS [16, 17], can be used to solve these models efficiently.
It should be noted that there are observations and predictors on the augmented space and is very large for cancer microarray data. Thus, solving the optimization problem proposed in AENCMI numerically is computational expensive. To this end, we select the method of pathwise coordinate descent algorithm (PCD) due to its fast computational speed for large , and its availability in the solving package “glmnet” of R.
This section gives the procedure of the an efficient solving Algorithm for AENCMI to select the optimal gene subset with informative genes, which is now detailed in Algorithm 1.
5 Experiment Results
5.1 Colon Cancer Dataset
To test the effectiveness of the adaptive elastic net with conditional mutual information (AENCMI), we conduct experiments on the colon cancer gene expression data. The aim of the colon data [2, 8] is to distinguish the cancerous tissues from the normal colon tissues. This data is available online: http://www.weizmann.ac.il/mcb/UriAlon/download/downloadabledata. The colon data is obtained from 22 normal and 40 colon cancer tissues. Gene expression information is extracted from DNA microarray data resulting, after preprocessing, in a matrix containing the expression of the 2,000 genes with highest minimal intensity across the 62 tissues. Since there is no defined training and test set, we split the data randomly into a training set of 31 samples and a testing set of the other 31 samples.
Then, we compute the adaptive weight matrix by equations (6) and (7). Finally, we solve AENCMI with the penalty factor and determine the relevant genes. The curve of misclassification errors for the initial elastic net with the penalty factor are displayed in Fig. 1. It is shown that the number of misclassified genes decreases as the number of the adaptive adjustment increases. However, it should be noted that the classification accuracy is not improved when the number of the adaptive adjustment becomes larger. In fact, using the third adaptive adjustment results in worse classification accuracy. It is also shown that the minimum of misclassification error (about 3) is obtained when .
Next, we focus on the solution paths of AENCMI. To this end, we randomly spilt the data set into two parts: twothirds for training and onethird for testing. The solution paths for the proposed algorithm AENCMI is illustrated in Fig. 2. The horizontal axis represents the natural logarithm of the parameter , the vertical axis represents the values of coefficient and each line corresponds to a coefficient path for a particular gene. Note that any line segment between two inflection points is linear. Hence, every coefficient path of AENCMI is piecewise linear with respect to .
We compare AENCMI with the SVM, Elastic Net, and Adaptive Elastic Net (AEN) based on two measurements: classification accuracy and gene selection performance. The entire process is repeated 10 times and the results are summarized in Table 1. As shown in the second column of the Table 1
, AENCMI achieves the best average classification accuracy: higher than AEN and much higher than the other two methods. The standard deviations for AENCMI is the least among all the methods. This implies that AENCMI is stabler than the other methods.
As shown in the third column of the Table 1, AENCMI and AEN select similar average number of genes which is much less than that for the other two methods. The adaptive strategy contributes to the improved properties of gene selection for the both methods. AENCMI achieves the least standard deviations for the number of selected genes among the four methods.
Method  Average classification accuracy  Average number of selected genes 

svm  0.7651 (0.049)  52.11 (4.73) 
Elastic Net  0.7803 (0.032)  67.54 (4.51) 
AEN  0.8432 (0.042)  25.21 (3.31) 
AENCMI  0.8512 (0.012)  24.43 (1.52) 
EST name  GenBank Acc No  Gene description  Selected frequency 

Hsa.8147  M63391  Human desmin gene, complete cds.  9/10 
Hsa.36689  Z50753  H.sapiens mRNA for GCAPII/uroguanylin precursor.  10/10 
Hsa.3152  D31885  Human mRNA (KIAA0069) for ORF (novel proetin), partial cds.  9/10 
Hsa.42186  AVPR1A  positive regulation of cellular pH reduction  7/10 
Hsa.2487  D14812  Human mRNA for ORF, complete cds.  10/10 
Hsa.3306  X12671  Human gene for heterogeneous nuclear ribonucleoprotein (hnRNP) core protein A1  8/10 
Hsa.1920  X06614  Human mRNA for receptor of retinoic acid.  10/10 
Hsa.692  M76378  Human cysteinerich protein (CRP) gene, exons 5 and 6.  10/10 
Hsa.447  U06698  Human neuronal kinesin heavy chain mRNA, complete cds. 
9/10 
Hsa.1588  U09587  Human glycyltRNA synthetase mRNA, complete cds.  10/10 
Hsa.2051  X01060  Human mRNA for transferrin receptor.  8/10 
Hsa.41260  L11706  Human hormonesensitive lipase (LIPE) gene, complete cds.  7/10 
Table 2 lists the top twelve genes and the selected frequency in experiments. Compared with AEN, AENCMI achieves almost the same test error, while it selects less genes and achieves least standard deviations. It should be also noted that AENCMI can achieve adaptive grouping effect in gene selection. For example, Hsa.3306 and Hsa.692 are selected in the same group. From the computational point of view, the regularization parameter and the kernel parameter of svm is optimized by searching a two dimensional grid of different values for both parameters, which can slow down the computation considerably. The regularization parameters can be optimized by the regularization solution path algorithm (PCD) in AENCMI.
5.2 Leukemia Cancer Dataset
To illustrate the effectiveness of AENCMI, we also conduct experiments on leukemia dataset [2] which includes the expression profiles of 7129 genes in 47 acute lymphoblastic leukemia (ALL) and 25 acute myeloid leukemia (AML). This data is available online: http://portals.broadinstitute.org/cgibin/cancer/publ ications/pub_paper.cgi?mode=view&paper_id=43. This data set is preprocessed as in [2]. After preprocessing, 3571 most significant genes are selected for module detection. We let the label of 47 ALL samples be 0 and 25 AML samples be 1. In the following, we split the data randomly into 43 training data and 29 test data for the two types of acute leukemia.
Method  Average classification accuracy  Average number of selected genes 

svm  0.8002 (0.065)  54.32 (5.33) 
Elastic Net  0.7983 (0.043)  46.54 (5.03) 
AEN  0.8211 (0.031)  28.00 (2.02) 
AENCMI  0.8398 (0.020)  23.43 (1.69) 
We also compare AENCMI with the SVM, Elastic Net, and Adaptive Elastic Net (AEN) on leukaemia cancer dataset based on classification accuracy and gene selection performance. The entire process is repeated 10 times and the results are summarized in Table 3. As shown in the Table 3, AENCMI achieves the best average classification accuracy with the least standard deviation and the least average number of genes selected again with the smallest standard deviation.
EST name  GenBank Acc No  Gene description  Selected frequency 

M27891 at  CST3  Cystatin C (amyloid angiopathy and cerebral hemorrhage)  10/10 
M84526 at  DF  D component of complement (adipsin)  10/10 
U18271 cds1 at  MPO  Myeloperoxidase  8/10 
D87024 at  GB  DEF = (lambda) DNA for immunoglobin light chain  9/10 
M31166 at  PTX3  Pentaxinrelated gene, rapidly induced by IL1 beta  10/10 
X57809 at  IGL  Immunoglobulin lambda light chain  9/10 
M13792 at  ADA  Adenosine deaminase 1  10/10 
M98399s at  CD36  CD36 antigen (collagen type I receptor, thrombospondin receptor)  9/10 
U77948 at  KAI1  Kangai 1 (suppression of tumorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen, antigen detected by monoclonal and antibody IA4))  8/10 
U05572 s at  MANB  Mannosidase alphaB (lysosomal)  9/10 
The summaries of the 10 topranked informative genes found by AENCMI for leukaemia cancer gene expression dataset are shown in Table 4, we give the comparisons for both group case and ungroup case. The important genes obtained by AENCMI are emphasized with bold. Some of these genes included in the frequently selected gene sets are biologically verified to be mostly and functionally related to carcinogenesis or tumor histogenesis: For example, in Table 3, the most frequently selected gene set of AENCMI, including cystatin C (CST3) and myeloperoxidase (MPO) genes are experimentally proved to be correlated to leukemia of ALL or AML. The cystatin C gene is located at the extracellular region of the cell and has role in invasiveness of human glioblastoma cells. Decrease of cystatin C in the CSF might contribute to the process of metastasis and spread of the cancer cells in the leptomeningeal tissues. Matsuo et al. [33] believed that the percentage of MPOpositive blast cells is the most simple and useful factor to predict a prognosis of AML patients in this category. Other examples are genes CST3, MPO and IGL highly correlated with the occurrence of leukaemia. They are selected by AENCMI in the same group owing to the adaptive grouping effect.
6 Conclusion
We propose a novel algorithm: adaptive elastic net with conditional mutual information in this paper. It is shown that the proposed learning algorithm encourages an adaptive grouping effect and reduces the influence of the wrong initial estimation to gene selection and microarray classification. It is also shown that the AENCMI encourages an adaptive grouping effect by evaluating the gene ranking significance. A fastsolving algorithm is also developed to implement the proposed AENCMI method numerically. Applications of AENCMI to two cancer microarray data sets show that the proposed AENCMI algorithm outperforms other algorithms, such as SVM, Elastic Net and classic Adaptive Elastic Net algorithms.
Acknowledgments.
Xinguang Yang and Yongjin Lu were supported by the Key Project of Science and Technology of Henan Province (Grant No. 182102410069).
References
 [1] T. Golub, D. Slonim, P. Tamayo, et al. “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol, 286, no, 5439, pp. 531536, 1999.
 [2] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machine,” Machine Learning, vol. 46, no. 1, pp. 389422, 2002.
 [3] T. Amaral, S. J. McKenna, K. Robertson, and A. Thompson, “Classification and immunohistochemical scoring of breast tissue microarray spots,” IEEE Transactions on Biomedical Engineering, vol, 60, no, 10, pp. 28062814, 2013.
 [4] F. Zhang, Y. Song, W. D. Cai, M. Z. Lee, Y. Zhou, et al, “Lung nodule classification with multilevel patchbased context analysis,” IEEE Transactions on Biomedical Engineering, vol, 61, no, 4, pp. 11551166, 2014.
 [5] Y. Wang, X. Li, and R. Ruiz, “Weighted general group lasso for gene selection in cancer classification” IEEE Transactions on Cybernetics, doi:10.1109/TCYB.2018.2829811, 2018.
 [6] Z. Sun, H. Wang, W. Lau, G. Seet, D. Wang, and K. Lam. “Microarray data classification using the spectralfeaturebased TLS ensemble algorithm,” IEEE Transactions on NanoBioscience, vol, 13, no, 3, pp. 289299, 2014.
 [7] J. X. Liu, Y. Xu, C. H. Zheng, H. Kong, and Z. H. Lai, “Rpcabased tumor classification using gene expression data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 11, 2014.
 [8] C. Zheng, Y. Chong, H. Wang, “Gene selection using independent variable group analysis for tumor classification”. Neural Computing and Applications, vol. 20, pp. 161 C170, 2011.
 [9] Z. Yu, L. Li, J. Liu, and G. Han, “Hybrid adaptive classifier ensemble,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 177190, 2015.
 [10] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machine,” Machine Learning, vol, 46, no, 13, pp. 389422, 2002.
 [11] Y. Sela, M. Freiman, E. Dery, et al, “fMRIbased hierarchical SVM model for the classification and grading of liver fibrosis,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 9, pp. 25742581, 2011.
 [12] Y. Leal, L. GonzalezAbril, C Lorencio, J. Bondia, and J. Vehi, “Detection of correct and incorrect measurements in realtime continuous glucose monitoring systems by applying a postprocessing support vector machine,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 7, pp. 18911899, 2013.
 [13] U. Maulik, A. Mukhopadhyay, and D. Chakraborty, “Geneexpressionbased cancer subtypes prediction through feature selection and transductive SVM,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 4, pp. 11111117, 2013.
 [14] I. Sen, M. Saraclar, Y. P. Kahya, “A comparison of SVM and GMMbased classifier configurations for diagnostic classification of pulmonary sounds,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 7, pp. 17681776, 2015.
 [15] Y. Tian, Z. Qi, X. Ju, Y. Shi, and X. Liu, “Nonparallel support vector machines for pattern classification,” IEEE Transactions on Cybernetics, vol. 44, no, 7, pp. 10671079, 2014.

[16]
Z. Qi, Y. Tian, and Y. Shi, “ Successive overrelaxation for
laplacian support vector machine,”
IEEE Transactions on Neural Networks and Learning Systems
, vol. 26, no, 4, pp. 674683, 2015.  [17] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 58, no. 1, pp. 267288, 1996.
 [18] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society B, vol. 67, no. 2, pp. 301320, 2005.
 [19] D. Angelosante, J. A. Bazerque, and G. B. Giannakis, “Online adaptive estimation of sparse signals: where rls meets the ‘1norm,” IEEE Transactions on Signal Processing,vol. 58, no. 7, pp. 34363447, 2010.
 [20] H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 14181429, 2006.
 [21] H. Zou and H. H. Zhang, “On the adaptive elastic net with a diverging number of parameters”. Annals of Statistics, vol. 37, no. 4, pp. 1933 C1751, 2009.
 [22] G. C. Cawley and N. L. C. Talbot, “Gene selection in cancer classification using sparse logistic regression with Bayesian regularisation,” Bioinformatics, vol. 22, no. 19, pp. 23482355, 2006.
 [23] B. Krishnapuram, L. Carin, M. A. Figueiredo, and A. J. Hartemink, “Sparse multinomial logistic regression: fast algorithms and generalization bounds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 957968, 2005.
 [24] M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables”. Journal of the Royal Statistical Society, Series B, vol. 68, no. 1, pp. 4967, 2006.
 [25] L. Meier, S. van de Geer, and P. Bhlmann, “The group lasso for logistic regression,” Journal of the Royal Statistical Society Series B, vol. 70, pp. 5371, 2008.
 [26] N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, “A sparsegroup lasso,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 231245, 2013.
 [27] M. Vincent, N. R. Hansen, “Sparse group lasso and high dimensional multinomial classification,” Computational Statistics and Data Analysis, vol. 71, pp. 771786, 2014.
 [28] S. Ghorai, A. Mukherjee, S. Sengupta, and P. K. Dutta,“Cancer classification from gene expression data by nppc ensemble,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 659671, 2011.
 [29] J. Meng, L. Yao, X. Sheng, D. Zhang, and X. Zhu,“Simultaneously optimizing spatial spectral features based on mutual information for eeg classification,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 1, pp. 227240, 2014.
 [30] X. J. Zhang, X. M. Zhao, K. He, L. Lu, Y. W. Cao, J. D. Liu, J. K. Hao, Z. P. Liu, and L. N. Chen, “Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information,” Bioinformatics, vol. 28, no. 1, pp. 98104, 2012.
 [31] T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.
 [32] J. Friedman, T. Hastie, H. Hofling, and R. Tibshirani, “Pathwise coordinate optimization”. Annals of Applied Statistics, no. 1, vol. 2, pp. 302332, 2007.
 [33] T. Matsuo, K. Kuriyama, Y. Miyazaki, S. Yoshida, M. Tomonaga, et al, The percentage of myeloperoxidasepositive blast cells is a strong independent prognostic factor in acute myeloid leukemia, even in the patients with normal karyotype . Leukemia, vol. 17, no. 8, pp. 15381543, 2003.
Comments
There are no comments yet.