Important Molecular Descriptors Selection Using Self Tuned Reweighted Sampling Method for Prediction of Antituberculosis Activity

02/21/2014
by   Doreswamy, et al.
0

In this paper, a new descriptor selection method for selecting an optimal combination of important descriptors of sulfonamide derivatives data, named self tuned reweighted sampling (STRS), is developed. descriptors are defined as the descriptors with large absolute coefficients in a multivariate linear regression model such as partial least squares(PLS). In this study, the absolute values of regression coefficients of PLS model are used as an index for evaluating the importance of each descriptor Then, based on the importance level of each descriptor, STRS sequentially selects N subsets of descriptors from N Monte Carlo (MC) sampling runs in an iterative and competitive manner. In each sampling run, a fixed ratio (e.g. 80 selected to establish a regresson model. Next, based on the regression coefficients, a two-step procedure including rapidly decreasing function (RDF) based enforced descriptor selection and self tuned sampling (STS) based competitive descriptor selection is adopted to select the important descriptorss. After running the loops, a number of subsets of descriptors are obtained and root mean squared error of cross validation (RMSECV) of PLS models established with subsets of descriptors is computed. The subset of descriptors with the lowest RMSECV is considered as the optimal descriptor subset. The performance of the proposed algorithm is evaluated by sulfanomide derivative dataset. The results reveal an good characteristic of STRS that it can usually locate an optimal combination of some important descriptors which are interpretable to the biologically of interest. Additionally, our study shows that better prediction is obtained by STRS when compared to full descriptor set PLS modeling, Monte Carlo uninformative variable elimination (MC-UVE).

READ FULL TEXT
research
04/23/2018

Descriptor Selection via Self-Paced Learning for Bioactivity of Molecular Structure in QSAR Classification

Quantitative structure-activity relationship (QSAR) modelling is effecti...
research
04/23/2018

QSAR Classification Modeling for Bioactivity of Molecular Structure via SPL-Logsum

Quantitative structure-activity relationship (QSAR) modelling is effecti...
research
10/17/2022

Asymptotic control of the mean-squared error for Monte Carlo sensitivity index estimators in stochastic models

In global sensitivity analysis for stochastic models, the Sobol' sensiti...
research
12/12/2017

A Mathematical Programming Approach for Integrated Multiple Linear Regression Subset Selection and Validation

Subset selection for multiple linear regression aims to construct a regr...
research
04/20/2021

Bayesian subset selection and variable importance for interpretable prediction and classification

Subset selection is a valuable tool for interpretable learning, scientif...
research
11/17/2017

Variable selection with genetic algorithms using repeated cross-validation of PLS regression models as fitness measure

Genetic algorithms are a widely used method in chemometrics for extracti...
research
04/12/2016

Thesis: Multiple Kernel Learning for Object Categorization

Object Categorization is a challenging problem, especially when the imag...

Please sign up or login with your details

Forgot password? Click here to reset