Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning

04/19/2022
by   Katsuhisa Morita, et al.
0

Adverse events are a serious issue in drug development and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach tends to be overoptimistic compared with the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not fully understood. To understand the differences, we compared the model performance between the time and random splits using eight types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events.

READ FULL TEXT

page 27

page 28

page 32

page 33

research
01/19/2020

Infrequent adverse event prediction in low carbon energy production using machine learning

Machine Learning is one of the fastest growing fields in academia. Many ...
research
06/20/2022

A Comparative Study on Application of Class-Imbalance Learning for Severity Prediction of Adverse Events Following Immunization

In collaboration with the Liaoning CDC, China, we propose a prediction s...
research
01/14/2022

Prediction of Drug-Induced TdP Risks Using Machine Learning and Rabbit Ventricular Wedge Assay

The evaluation of drug-induced Torsades de pointes (TdP) risks is crucia...
research
12/11/2020

Prediction of Hemolysis Tendency of Peptides using a Reliable Evaluation Method

There are numerous peptides discovered through past decades, which exhib...
research
12/08/2017

Detecting confounding due to subject identification in clinical machine learning diagnostic applications: a permutation test approach

Recently, Saeb et al (2017) showed that, in diagnostic machine learning ...
research
08/29/2022

Temporal Label Smoothing for Early Prediction of Adverse Events

Models that can predict adverse events ahead of time with low false-alar...

Please sign up or login with your details

Forgot password? Click here to reset