Pre-processing in AI based Prediction of QSARs

by   Om Prasad Patri, et al.

Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.



There are no comments yet.



Meta-QSAR: a large-scale application of meta-learning to drug design and discovery

We investigate the learning of quantitative structure activity relations...

Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology

Raman spectroscopy's capability to provide meaningful composition predic...

Chemistry42: An AI-based platform for de novo molecular design

Chemistry42 is a software platform for de novo small molecule design tha...

Convolutional neural networks for classification and regression analysis of one-dimensional spectral data

Convolutional neural networks (CNNs) are widely used for image recogniti...

HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

Indirect Immunofluorescence Imaging of Human Epithelial Type 2 (HEp-2) c...

Toxicity Detection in Drug Candidates using Simplified Molecular-Input Line-Entry System

The need for analysis of toxicity in new drug candidates and the require...

Applying Data Mining and Machine Learning Techniques to Submarine Intelligence Analysis

We describe how specialized database technology and data analysis method...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.