A deoxyribonucleic acid (DNA) microarray, a collection of microscopic DNA spots attached to a solid surface, is an important biotechnology that allows scientists to measure the expression levels of a large number of genes simultaneously. One of the main purposes of conducting DNA microarray experiments is to classify biological samples and predict clinical or treatment strategies for certain disease, such as various cancers, using gene expression data. Although a classification problem is not new to statistical or data mining community, gene expression data from DNA microarray experiments exhibits some unique features. Two most important ones are high dimensionality and small sample size, which are due to the fact that the number of genes collected from microarray experiments is much bigger than than the number of available samples[1, 2, 3, 4, 5, 6, 8, 7, 9]. Therefore, classifying DNA gene microarray data requires a new set of statistical or data mining methods that could reduce the dimension of the data or select variables of great significance while maintaining a high level of classification accuracy. From the biological perspective, this is equivalent to identifying informative genes associated with the occurrence of the disease under study. There are many results in the literature that address the problem of informative gene selection for DNA microarray classification. A large group of these results [10, 13, 12, 11, 14, 15, 16] are based on support vector machine (SVM) and its variants, a class of non-probabilistic machine learning algorithms that seek a nonlinear decision boundary efficiently. For instance, fMRI-based hierarchical SVM was applied to the automatic classification and grading of liver fibrosis in . A postprocessing SVM  was proposed to detect incorrect and correct measurements in real-time continuous glucose monitoring systems. In , Maulik et al. proposed a novel transductive SVM which achieved a better classification accuracy to select potential gene markers to predict cancer subtypes. Another group of researchers [17, 18, 19, 20, 21, 22, 23] address the problem of dimension deduction for microarray gene expression data by adding penalty terms related to the number of features to the cost functional the algorithm aims to minimize. This penalty strategy gives rise to algorithms including: LASSO with -norm penalty  and its improved variants [18, 19, 20]
; sparse logistic or multinomial logistic regression with Bayesian regularization[22, 23].
To improve the classification performance on the DNA microarray gene expression data, the idea of selecting selecting informative genes in groups is exploited due to the reason that complex biological processes, such as, tumor and cancer prediction and diagnosis, are not determined by a single gene but by the interactions of a few genes in groups. The group gene selection counterparts of some of the previously reviewed algorithms are developed. In , Yuan et al. presented the group Lasso (GL) algorithm, which is extended to logistic regression by Meier et al. in . In order to obtain group-wise sparsity and within group sparsity simultaneously, Simon et al.  presented a sparse group lasso (SGL) algorithm, which is solved numerically by the method of accelerated generalized gradient descent. The multi-class classification variant of the multinomial sparse group lasso was then developed in .
Though the sparse group lasso  and its improved variants could obtain sparsity within a group by leading into the extra
-norm penalty, they do not give the biological significance of the genes in the same selected group of genes. To solve the problem of estimating the gene coefficients that represent the importance of individual gene while at the same time performing gene selection, statistical methods such as adaptive LASSO and adaptive elastic net  that could adaptively select variables should be applied. The adaptive lasso  is a weighted -penalization method with adaptive weights initially estimated from the data and it enjoys the oracle properties. Subsequently, Zou and Hastie proposed adaptive elastic net, an and mixed penalization method, that could be applied to high-dimension data while maintaining the oracle properties in . This improves the adaptive LASSO and the original elastic net by combining adaptively weighted penalization term with weights estimated from the original elastic net together with the penalization term from the elastic net. However, when applying the adaptive elastic net to high dimension gene expression data, owing to the requirements of the lower precision, some significant genes might be falsely assigned smaller weight values in the initial estimator. Therefore, these important genes would be incorrectly deleted from the model by shrinkage method, which leads to lower prediction accuracy in informative gene selection for microarray DNA data. In addition, adaptive elastic net might not perform well if the pairwise correlation between variables are not high.
To address these issues, we propose a new method: adaptive elastic net with conditional mutual information (AEN-CMI) that weighs both the and penalization terms with weights estimated based on the conditional mutual information among different genes in the data set. How the weights are estimated is a major departure of our method from the well established method of adaptive elastic net. The idea of applying information theory on solving gene selection problems have been explored in the literature (see [28, 29]
for example) where various information theory based feature selection algorithms have been developed to search the subset of best genes. In particular, conditional mutual information is used to conjecture the gene regulatory networks of cancer gene expression datasets. By incorporating the conditional mutual information, the conditional dependency among genes, into the adaptive weight estimation, the aforementioned drawbacks of adaptive elastic net could be avoided to certain degree. In this article, we will present a full mathematical description of the new method of AEN-CMI and prove a theorem that explains why our method could encourage a grouping effect. The optimization problem will then be solved by a regularized solution path algorithm – the pathwise coordinate descent algorithm (PCD). We then evaluate the performance of our algorithm on colon cancer and leukemiaer gene expression datasets. The performance of the proposed algorithm will also be compared with other popular methods, including SVM, classic elastic net and adaptive elastic net. The experiment results show that our algorithm performs the best in the sense that it obtains the highest classification accuracy by using the smallest number of genes.
The rest of the paper will be organized as the following: Section 2 briefly states the research problem and the preliminaries. The adaptive elastic net with conditional mutual information (AEN-CMI) is presented in Section 3. The regularized solution path algorithm, PCD, is developed in Section 4. Experimental results on the two cancer gene expression datasets are provided in Section 5, and Section 6 concludes the whole article.
2 Problem Statement and Preliminaries
Microarray classification in its essence is a binary classification problem, the abstract formulation of which we give below. For a training set , where is the input vector and denotes its class label, and then the classification problem is aim to learn a discrimination rule . Hence, we can allot a class label to any new sample. For microarray gene expression data, and respectively represent the number of tumor types and the number of genes. Let be the response vector and be the model matrix. Let represent the th predictor. We also assume that the response vector is centered and the columns of are standardized, i.e.,
According to a general linear regression model, we can predict the response vectorby
where is the estimated coefficient vector. Note that the number of non-zero estimated coefficients in is equivalent to the final number of selected genes. Let indicate the classification function, where denotes the indicator function and is the prediction value for the given sample by the discrimination rule. Hence, the binary classification problem could be handled by the models of regression.
The aim of this paper is two-fold: to predict the type of tumor for a new sample and to automatically select the relevant important genes to a biological process, in particular, the gene selection in microarray data classification. These two challenging problems will be solved using a novel algorithm: Adaptive Elastic Net with Conditional Mutual Information (AEN-CMI) that incorporates conditional mutual information into the variable selection process.
For the sake of completeness, we review some basic definitions from information theory including entropy, mutual information, conditional mutual information in the next subsection.
2.1 Information-Theoretic Measures
This section states the principles of information theory by focusing on entropy and mutual information in a concise form. Then we give some basic concepts about information theory .
be three sets of discrete random variables. For simplicity,is expressed as simply in this paper. The information entropy of variable can be defined as , where
denotes the probability distribution of each. The entropy of a random variable is an average measure of its uncertainty. The conditional information entropy of given is denoted as , where
denotes the conditional probability distribution. The conditional entropyis the entropy of a variable conditional on the given another variable .
Mutual information (MI) measures the amount of information shared by and which are used to describe the degree of correlation between the two variables sets and its definition is as following
where denotes the joint probability of and . By the definition of mutual information, the larger is, the more relevant variables and will be.
Conditional mutual information (CMI) measures conditional dependency between two variables given other variable. The CMI of variables and given is defined as
where denotes the amount of information shared by variables and given variable . The CMI will become as a particularly important property in understanding the results of this paper.
3 Adaptive Elastic Net with Conditional Mutual Information
In this section, we propose a strategy of adaptive gene selection, which would be further developed into the AEN-CMI algorithm.
3.1 Strategy of adaptive gene selection
For cancer gene expression data, variables (genes) and are two vectors, where the elements denote their values of expression in different conditions or samples. Mutual information usually describes the degree of correlation between genes and . Conditional mutual information not only describes the correlation degree between pairwise genes given the class label but also surveys conditional dependency between two genes when the class label is given. In the following, we propose a mechanism to assess the importance of th gene by applying the conditional mutual information.
Define to be the individual significance of th gene, i.e.,
where respectively denotes the th, th gene expression level among all the genes, where . is the class-conditional correlation between the gene and all the other genes measures, thus it measures the average information shared by gene and remaining genes conditionally on . Here, includes the complementary information between and all other genes, which enables us to assess correlation between genes in groups and could help us make more accurate predictions. Moreover, can be used as a quantitative index to measure how significant a gene is, i.e., the higher the value of the is, the more significant the gene will be. As an extreme case, if the gene can not provide useful information for the class label.
Based on (5), we further construct the weight coefficient for the th gene
where the controllable parameter is a given threshold. The th gene has distinct significance when . However, the th gene is not significant in predicting if . We denote the matrix of weights as
The computation of the weights and their meanings are not given in the multinomial sparse group lasso model . The initial consistent estimator is used to construct the weights for the adaptive lasso  and the initial elastic net estimator is used to construct the weights for the adaptive elastic net . Although the above-mentioned two weights have clear statistical meanings and could be roundly applied to evaluate the gene importance, they can not indicate the obvious biological significance. The strategy of adaptive gene selection presented in this paper has biological significance.
3.2 Statistical learning model
Utilizing the weight matrix (7) that contains the conditional mutual information of individual genes, we propose the following penalty term for the adaptive elastic net:
The proposed AEN-CMI algorithm aims to seek the following:
where , are the regularization parameters. Here, we use a squared error loss term.
is the identity matrix.
In comparison with the adaptive elastic net , the
proposed model (9 ) uses adaptive weights based on
conditional mutual information in instead of ridge regression. Since
the same weight is imposed on both 1-norm penalized coefficient and
2-norm penalized coefficient and conditional mutual information is robust to outliers in dataset, the shrinkage of the adaptive elastic
net with conditional mutual information would produce better
performance in the process of automatic gene selection and have
clear biological significance.
) uses adaptive weights based on conditional mutual information in instead of ridge regression. Since the same weight is imposed on both 1-norm penalized coefficient and 2-norm penalized coefficient and conditional mutual information is robust to outliers in dataset, the shrinkage of the adaptive elastic net with conditional mutual information would produce better performance in the process of automatic gene selection and have clear biological significance.
Since the complex diseases are caused by disruption in gene pathways rather than individual genes, disease diagnosis using gene expression data should bring insights into the grouping information. The elastic net algorithms [21, 18] are widely known for encouraging grouping effect. In general, if the regression coefficients of the group with highly correlated variables incline to be equal, then the regression approach can detect the grouping effect. It should be noted that the important genes may be highly correlated with some inessential genes, the redundant noise variables could be included in these models. The following theorem shows that the AEN-CMI model can select the important genes within each group adaptively, which in turn encourages an adaptive grouping effect. The and in the following theorem are “significance of gene ranking”.
Suppose that the predictors , are standardized, if holds, then we have
where and .
For , according to Equ. (12) we have
It should be noted that . Hence, Equ. (13) can be represented as
Similarly to the above, it can be easily obtained that
From Equ. (10), it can be easily obtained
Note that Theorem 1 still holds for the case and . The only difference of representation is substituting for . If the and , then the following Corollary can be easily obtained.
Assume that the predictors are standardized. Let denote the optimal solution of Equ. (9). If , and , then
It should be noted that the , the parameter that quantitatively describes the grouping effect, in Equ. (18) is far less than 1. Therefore, AEN-CMI has stronger grouping effect for the case , in comparison with Elastic Net. This implies that more genes are deleted together by -norm shrinkage if they are less important to the classification. On the basis of Theorem 1, AEN-CMI can allot identical coefficients to the genes only if and . It is shown that given two gene groups with similar significance of the ranking (), the one with more genes should have a larger group size. This implies that AEN-CMI can adaptively control the size of the selected groups and thus adaptively select the important genes within each group by assessing the significance of the gene ranking.
In this section, we give the algorithm for solving the Adaptive Elastic Net with Conditional Mutual Information (AEN-CMI) on the augmented space, and hence some popular algorithms, such as LASSO, Forward Stagewise and LARS [16, 17], can be used to solve these models efficiently.
It should be noted that there are observations and predictors on the augmented space and is very large for cancer microarray data. Thus, solving the optimization problem proposed in AEN-CMI numerically is computational expensive. To this end, we select the method of pathwise coordinate descent algorithm (PCD) due to its fast computational speed for large , and its availability in the solving package “glmnet” of R.
This section gives the procedure of the an efficient solving Algorithm for AEN-CMI to select the optimal gene subset with informative genes, which is now detailed in Algorithm 1.
5 Experiment Results
5.1 Colon Cancer Dataset
To test the effectiveness of the adaptive elastic net with conditional mutual information (AEN-CMI), we conduct experiments on the colon cancer gene expression data. The aim of the colon data [2, 8] is to distinguish the cancerous tissues from the normal colon tissues. This data is available online: http://www.weizmann.ac.il/mcb/UriAlon/download/downloadable-data. The colon data is obtained from 22 normal and 40 colon cancer tissues. Gene expression information is extracted from DNA microarray data resulting, after pre-processing, in a matrix containing the expression of the 2,000 genes with highest minimal intensity across the 62 tissues. Since there is no defined training and test set, we split the data randomly into a training set of 31 samples and a testing set of the other 31 samples.
Then, we compute the adaptive weight matrix by equations (6) and (7). Finally, we solve AEN-CMI with the penalty factor and determine the relevant genes. The curve of misclassification errors for the initial elastic net with the penalty factor are displayed in Fig. 1. It is shown that the number of misclassified genes decreases as the number of the adaptive adjustment increases. However, it should be noted that the classification accuracy is not improved when the number of the adaptive adjustment becomes larger. In fact, using the third adaptive adjustment results in worse classification accuracy. It is also shown that the minimum of misclassification error (about 3) is obtained when .
Next, we focus on the solution paths of AEN-CMI. To this end, we randomly spilt the data set into two parts: two-thirds for training and one-third for testing. The solution paths for the proposed algorithm AEN-CMI is illustrated in Fig. 2. The horizontal axis represents the natural logarithm of the parameter , the vertical axis represents the values of coefficient and each line corresponds to a coefficient path for a particular gene. Note that any line segment between two inflection points is linear. Hence, every coefficient path of AEN-CMI is piecewise linear with respect to .
We compare AEN-CMI with the -SVM, Elastic Net, and Adaptive Elastic Net (AEN) based on two measurements: classification accuracy and gene selection performance. The entire process is repeated 10 times and the results are summarized in Table 1. As shown in the second column of the Table 1
, AEN-CMI achieves the best average classification accuracy: higher than AEN and much higher than the other two methods. The standard deviations for AEN-CMI is the least among all the methods. This implies that AEN-CMI is stabler than the other methods.
As shown in the third column of the Table 1, AEN-CMI and AEN select similar average number of genes which is much less than that for the other two methods. The adaptive strategy contributes to the improved properties of gene selection for the both methods. AEN-CMI achieves the least standard deviations for the number of selected genes among the four methods.
|Method||Average classification accuracy||Average number of selected genes|
|-svm||0.7651 (0.049)||52.11 (4.73)|
|Elastic Net||0.7803 (0.032)||67.54 (4.51)|
|AEN||0.8432 (0.042)||25.21 (3.31)|
|AEN-CMI||0.8512 (0.012)||24.43 (1.52)|
|EST name||GenBank Acc No||Gene description||Selected frequency|
|Hsa.8147||M63391||Human desmin gene, complete cds.||9/10|
|Hsa.36689||Z50753||H.sapiens mRNA for GCAP-II/uroguanylin precursor.||10/10|
|Hsa.3152||D31885||Human mRNA (KIAA0069) for ORF (novel proetin), partial cds.||9/10|
|Hsa.42186||AVPR1A||positive regulation of cellular pH reduction||7/10|
|Hsa.2487||D14812||Human mRNA for ORF, complete cds.||10/10|
|Hsa.3306||X12671||Human gene for heterogeneous nuclear ribonucleoprotein (hnRNP) core protein A1||8/10|
|Hsa.1920||X06614||Human mRNA for receptor of retinoic acid.||10/10|
|Hsa.692||M76378||Human cysteine-rich protein (CRP) gene, exons 5 and 6.||10/10|
Human neuronal kinesin heavy chain mRNA, complete cds.
|Hsa.1588||U09587||Human glycyl-tRNA synthetase mRNA, complete cds.||10/10|
|Hsa.2051||X01060||Human mRNA for transferrin receptor.||8/10|
|Hsa.41260||L11706||Human hormone-sensitive lipase (LIPE) gene, complete cds.||7/10|
Table 2 lists the top twelve genes and the selected frequency in experiments. Compared with AEN, AEN-CMI achieves almost the same test error, while it selects less genes and achieves least standard deviations. It should be also noted that AEN-CMI can achieve adaptive grouping effect in gene selection. For example, Hsa.3306 and Hsa.692 are selected in the same group. From the computational point of view, the regularization parameter and the kernel parameter of -svm is optimized by searching a two dimensional grid of different values for both parameters, which can slow down the computation considerably. The regularization parameters can be optimized by the regularization solution path algorithm (PCD) in AEN-CMI.
5.2 Leukemia Cancer Dataset
To illustrate the effectiveness of AEN-CMI, we also conduct experiments on leukemia dataset  which includes the expression profiles of 7129 genes in 47 acute lymphoblastic leukemia (ALL) and 25 acute myeloid leukemia (AML). This data is available on-line: http://portals.broadinstitute.org/cgi-bin/cancer/publ ications/pub_paper.cgi?mode=view&paper_id=43. This data set is preprocessed as in . After preprocessing, 3571 most significant genes are selected for module detection. We let the label of 47 ALL samples be 0 and 25 AML samples be 1. In the following, we split the data randomly into 43 training data and 29 test data for the two types of acute leukemia.
|Method||Average classification accuracy||Average number of selected genes|
|-svm||0.8002 (0.065)||54.32 (5.33)|
|Elastic Net||0.7983 (0.043)||46.54 (5.03)|
|AEN||0.8211 (0.031)||28.00 (2.02)|
|AEN-CMI||0.8398 (0.020)||23.43 (1.69)|
We also compare AEN-CMI with the -SVM, Elastic Net, and Adaptive Elastic Net (AEN) on leukaemia cancer dataset based on classification accuracy and gene selection performance. The entire process is repeated 10 times and the results are summarized in Table 3. As shown in the Table 3, AEN-CMI achieves the best average classification accuracy with the least standard deviation and the least average number of genes selected again with the smallest standard deviation.
|EST name||GenBank Acc No||Gene description||Selected frequency|
|M27891 at||CST3||Cystatin C (amyloid angiopathy and cerebral hemorrhage)||10/10|
|M84526 at||DF||D component of complement (adipsin)||10/10|
|U18271 cds1 at||MPO||Myeloperoxidase||8/10|
|D87024 at||GB||DEF = (lambda) DNA for immunoglobin light chain||9/10|
|M31166 at||PTX3||Pentaxin-related gene, rapidly induced by IL-1 beta||10/10|
|X57809 at||IGL||Immunoglobulin lambda light chain||9/10|
|M13792 at||ADA||Adenosine deaminase 1||10/10|
|M98399s at||CD36||CD36 antigen (collagen type I receptor, thrombospondin receptor)||9/10|
|U77948 at||KAI1||Kangai 1 (suppression of tumorigenicity 6, prostate; CD82 antigen (R2 leukocyte antigen, antigen detected by monoclonal and antibody IA4))||8/10|
|U05572 s at||MANB||Mannosidase alpha-B (lysosomal)||9/10|
The summaries of the 10 top-ranked informative genes found by AEN-CMI for leukaemia cancer gene expression dataset are shown in Table 4, we give the comparisons for both group case and ungroup case. The important genes obtained by AEN-CMI are emphasized with bold. Some of these genes included in the frequently selected gene sets are biologically verified to be mostly and functionally related to carcinogenesis or tumor histogenesis: For example, in Table 3, the most frequently selected gene set of AEN-CMI, including cystatin C (CST3) and myeloperoxidase (MPO) genes are experimentally proved to be correlated to leukemia of ALL or AML. The cystatin C gene is located at the extracellular region of the cell and has role in invasiveness of human glioblastoma cells. Decrease of cystatin C in the CSF might contribute to the process of metastasis and spread of the cancer cells in the leptomeningeal tissues. Matsuo et al.  believed that the percentage of MPO-positive blast cells is the most simple and useful factor to predict a prognosis of AML patients in this category. Other examples are genes CST3, MPO and IGL highly correlated with the occurrence of leukaemia. They are selected by AEN-CMI in the same group owing to the adaptive grouping effect.
We propose a novel algorithm: adaptive elastic net with conditional mutual information in this paper. It is shown that the proposed learning algorithm encourages an adaptive grouping effect and reduces the influence of the wrong initial estimation to gene selection and microarray classification. It is also shown that the AEN-CMI encourages an adaptive grouping effect by evaluating the gene ranking significance. A fast-solving algorithm is also developed to implement the proposed AEN-CMI method numerically. Applications of AEN-CMI to two cancer microarray data sets show that the proposed AEN-CMI algorithm outperforms other algorithms, such as -SVM, Elastic Net and classic Adaptive Elastic Net algorithms.
Xinguang Yang and Yongjin Lu were supported by the Key Project of Science and Technology of Henan Province (Grant No. 182102410069).
-  T. Golub, D. Slonim, P. Tamayo, et al. “Molecular classification of cancer: class discovery and class prediction by gene expression monitoring,” Science, vol, 286, no, 5439, pp. 531-536, 1999.
-  I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machine,” Machine Learning, vol. 46, no. 1, pp. 389-422, 2002.
-  T. Amaral, S. J. McKenna, K. Robertson, and A. Thompson, “Classification and immunohistochemical scoring of breast tissue microarray spots,” IEEE Transactions on Biomedical Engineering, vol, 60, no, 10, pp. 2806-2814, 2013.
-  F. Zhang, Y. Song, W. D. Cai, M. Z. Lee, Y. Zhou, et al, “Lung nodule classification with multilevel patch-based context analysis,” IEEE Transactions on Biomedical Engineering, vol, 61, no, 4, pp. 1155-1166, 2014.
-  Y. Wang, X. Li, and R. Ruiz, “Weighted general group lasso for gene selection in cancer classification” IEEE Transactions on Cybernetics, doi:10.1109/TCYB.2018.2829811, 2018.
-  Z. Sun, H. Wang, W. Lau, G. Seet, D. Wang, and K. Lam. “Microarray data classification using the spectral-feature-based TLS ensemble algorithm,” IEEE Transactions on NanoBioscience, vol, 13, no, 3, pp. 289-299, 2014.
-  J. X. Liu, Y. Xu, C. H. Zheng, H. Kong, and Z. H. Lai, “Rpcabased tumor classification using gene expression data,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 12, no. 4, pp. 1-1, 2014.
-  C. Zheng, Y. Chong, H. Wang, “Gene selection using independent variable group analysis for tumor classification”. Neural Computing and Applications, vol. 20, pp. 161 C170, 2011.
-  Z. Yu, L. Li, J. Liu, and G. Han, “Hybrid adaptive classifier ensemble,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 177-190, 2015.
-  I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene selection for cancer classification using support vector machine,” Machine Learning, vol, 46, no, 1-3, pp. 389-422, 2002.
-  Y. Sela, M. Freiman, E. Dery, et al, “fMRI-based hierarchical SVM model for the classification and grading of liver fibrosis,” IEEE Transactions on Biomedical Engineering, vol. 58, no. 9, pp. 2574-2581, 2011.
-  Y. Leal, L. Gonzalez-Abril, C Lorencio, J. Bondia, and J. Vehi, “Detection of correct and incorrect measurements in real-time continuous glucose monitoring systems by applying a postprocessing support vector machine,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 7, pp. 1891-1899, 2013.
-  U. Maulik, A. Mukhopadhyay, and D. Chakraborty, “Gene-expression-based cancer subtypes prediction through feature selection and transductive SVM,” IEEE Transactions on Biomedical Engineering, vol. 60, no. 4, pp. 1111-1117, 2013.
-  I. Sen, M. Saraclar, Y. P. Kahya, “A comparison of SVM and GMM-based classifier configurations for diagnostic classification of pulmonary sounds,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 7, pp. 1768-1776, 2015.
-  Y. Tian, Z. Qi, X. Ju, Y. Shi, and X. Liu, “Nonparallel support vector machines for pattern classification,” IEEE Transactions on Cybernetics, vol. 44, no, 7, pp. 1067-1079, 2014.
Z. Qi, Y. Tian, and Y. Shi, “ Successive overrelaxation for
laplacian support vector machine,”
IEEE Transactions on Neural Networks and Learning Systems, vol. 26, no, 4, pp. 674-683, 2015.
-  R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society, Series B (Methodological), vol. 58, no. 1, pp. 267-288, 1996.
-  H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society B, vol. 67, no. 2, pp. 301-320, 2005.
-  D. Angelosante, J. A. Bazerque, and G. B. Giannakis, “Online adaptive estimation of sparse signals: where rls meets the ‘1-norm,” IEEE Transactions on Signal Processing,vol. 58, no. 7, pp. 3436-3447, 2010.
-  H. Zou, “The adaptive lasso and its oracle properties,” Journal of the American Statistical Association, vol. 101, no. 476, pp. 1418-1429, 2006.
-  H. Zou and H. H. Zhang, “On the adaptive elastic net with a diverging number of parameters”. Annals of Statistics, vol. 37, no. 4, pp. 1933 C1751, 2009.
-  G. C. Cawley and N. L. C. Talbot, “Gene selection in cancer classification using sparse logistic regression with Bayesian regularisation,” Bioinformatics, vol. 22, no. 19, pp. 2348-2355, 2006.
-  B. Krishnapuram, L. Carin, M. A. Figueiredo, and A. J. Hartemink, “Sparse multinomial logistic regression: fast algorithms and generalization bounds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 6, pp. 957-968, 2005.
-  M. Yuan and Y. Lin, “Model selection and estimation in regression with grouped variables”. Journal of the Royal Statistical Society, Series B, vol. 68, no. 1, pp. 49-67, 2006.
-  L. Meier, S. van de Geer, and P. Bhlmann, “The group lasso for logistic regression,” Journal of the Royal Statistical Society Series B, vol. 70, pp. 53-71, 2008.
-  N. Simon, J. Friedman, T. Hastie, and R. Tibshirani, “A sparse-group lasso,” Journal of Computational and Graphical Statistics, vol. 22, no. 2, pp. 231-245, 2013.
-  M. Vincent, N. R. Hansen, “Sparse group lasso and high dimensional multinomial classification,” Computational Statistics and Data Analysis, vol. 71, pp. 771-786, 2014.
-  S. Ghorai, A. Mukherjee, S. Sengupta, and P. K. Dutta,“Cancer classification from gene expression data by nppc ensemble,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 8, no. 3, pp. 659-671, 2011.
-  J. Meng, L. Yao, X. Sheng, D. Zhang, and X. Zhu,“Simultaneously optimizing spatial spectral features based on mutual information for eeg classification,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 1, pp. 227-240, 2014.
-  X. J. Zhang, X. M. Zhao, K. He, L. Lu, Y. W. Cao, J. D. Liu, J. K. Hao, Z. P. Liu, and L. N. Chen, “Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information,” Bioinformatics, vol. 28, no. 1, pp. 98-104, 2012.
-  T. M. Cover and J. A. Thomas, Elements of Information Theory. NewYork: Wiley, 1991.
-  J. Friedman, T. Hastie, H. Hofling, and R. Tibshirani, “Pathwise coordinate optimization”. Annals of Applied Statistics, no. 1, vol. 2, pp. 302-332, 2007.
-  T. Matsuo, K. Kuriyama, Y. Miyazaki, S. Yoshida, M. Tomonaga, et al, The percentage of myeloperoxidase-positive blast cells is a strong independent prognostic factor in acute myeloid leukemia, even in the patients with normal karyotype . Leukemia, vol. 17, no. 8, pp. 1538-1543, 2003.