Pre-processing in AI based Prediction of QSARs

10/03/2009
by   Om Prasad Patri, et al.
0

Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.

READ FULL TEXT
research
09/12/2017

Meta-QSAR: a large-scale application of meta-learning to drug design and discovery

We investigate the learning of quantitative structure activity relations...
research
04/03/2019

Optimized Preprocessing and Machine Learning for Quantitative Raman Spectroscopy in Biology

Raman spectroscopy's capability to provide meaningful composition predic...
research
12/26/2022

Artificial Intelligence to Enhance Mission Science Output for In-situ Observations: Dealing with the Sparse Data Challenge

In the Earth's magnetosphere, there are fewer than a dozen dedicated pro...
research
01/22/2021

Chemistry42: An AI-based platform for de novo molecular design

Chemistry42 is a software platform for de novo small molecule design tha...
research
09/08/2015

HEp-2 Cell Classification: The Role of Gaussian Scale Space Theory as A Pre-processing Approach

Indirect Immunofluorescence Imaging of Human Epithelial Type 2 (HEp-2) c...
research
01/21/2021

Toxicity Detection in Drug Candidates using Simplified Molecular-Input Line-Entry System

The need for analysis of toxicity in new drug candidates and the require...
research
05/16/2003

Applying Data Mining and Machine Learning Techniques to Submarine Intelligence Analysis

We describe how specialized database technology and data analysis method...

Please sign up or login with your details

Forgot password? Click here to reset