PIMKL: Pathway Induced Multiple Kernel Learning

03/29/2018
by   Matteo Manica, et al.
0

Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, therefore we have very little understanding about the mechanisms that lead to the prediction provided. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to classify samples reliably that can, at the same time, provide a pathway-based molecular fingerprint of the signature that underlies the classification. PIMKL exploits prior knowledge in the form of molecular interaction networks and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning algorithm (MKL), an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

READ FULL TEXT

page 7

page 9

page 20

research
01/14/2021

Feature reduction for machine learning on molecular features: The GeneScore

We present the GeneScore, a concept of feature reduction for Machine Lea...
research
05/28/2020

Machine learning and excited-state molecular dynamics

Machine learning is employed at an increasing rate in the research field...
research
06/12/2021

SIMPLE: Sparse Interaction Model over Peaks of moLEcules for fast, interpretable metabolite identification from tandem mass spectra

Motivation: Recent success in metabolite identification from tandem mass...
research
12/02/2022

COmic: Convolutional Kernel Networks for Interpretable End-to-End Learning on (Multi-)Omics Data

Motivation: The size of available omics datasets is steadily increasing ...
research
11/24/2020

Making Graph Neural Networks Worth It for Low-Data Molecular Machine Learning

Graph neural networks have become very popular for machine learning on m...
research
11/01/2022

Informed Priors for Knowledge Integration in Trajectory Prediction

Informed machine learning methods allow the integration of prior knowled...
research
02/15/2018

Simulation assisted machine learning

Predicting how a proposed cancer treatment will affect a given tumor can...

Please sign up or login with your details

Forgot password? Click here to reset