Compound virtual screening by learning-to-rank with gradient boosting decision tree and enrichment-based cumulative gain

05/04/2022
by   Kairi Furui, et al.
0

Learning-to-rank, a machine learning technique widely used in information retrieval, has recently been applied to the problem of ligand-based virtual screening, to accelerate the early stages of new drug development. Ranking prediction models learn based on ordinal relationships, making them suitable for integrating assay data from various environments. Existing studies of rank prediction in compound screening have generally used a learning-to-rank method called RankSVM. However, they have not been compared with or validated against the gradient boosting decision tree (GBDT)-based learning-to-rank methods that have gained popularity recently. Furthermore, although the ranking metric called Normalized Discounted Cumulative Gain (NDCG) is widely used in information retrieval, it only determines whether the predictions are better than those of other models. In other words, NDCG is incapable of recognizing when a prediction model produces worse than random results. Nevertheless, NDCG is still used in the performance evaluation of compound screening using learning-to-rank. This study used the GBDT model with ranking loss functions, called lambdarank and lambdaloss, for ligand-based virtual screening; results were compared with existing RankSVM methods and GBDT models using regression. We also proposed a new ranking metric, Normalized Enrichment Discounted Cumulative Gain (NEDCG), which aims to properly evaluate the goodness of ranking predictions. Results showed that the GBDT model with learning-to-rank outperformed existing regression methods using GBDT and RankSVM on diverse datasets. Moreover, NEDCG showed that predictions by regression were comparable to random predictions in multi-assay, multi-family datasets, demonstrating its usefulness for a more direct assessment of compound screening performance.

READ FULL TEXT
research
06/01/2020

Regression Enrichment Surfaces: a Simple Analysis Technique for Virtual Drug Screening Models

We present a new method for understanding the performance of a model in ...
research
02/14/2020

Learning to rank for uplift modeling

Uplift modeling has effectively been used in fields such as marketing an...
research
12/17/2021

Rank4Class: A Ranking Formulation for Multiclass Classification

Multiclass classification (MCC) is a fundamental machine learning proble...
research
02/16/2021

Information Ranking Using Optimum-Path Forest

The task of learning to rank has been widely studied by the machine lear...
research
03/17/2020

A comprehensive study on the prediction reliability of graph neural networks for virtual screening

Prediction models based on deep neural networks are increasingly gaining...
research
10/16/2017

Calibrated Boosting-Forest

Excellent ranking power along with well calibrated probability estimates...
research
10/23/2016

Learning Deep Architectures for Interaction Prediction in Structure-based Virtual Screening

We introduce a deep learning architecture for structure-based virtual sc...

Please sign up or login with your details

Forgot password? Click here to reset