A Framework for Implementing Machine Learning on Omics Data

The potential benefits of applying machine learning methods to -omics data are becoming increasingly apparent, especially in clinical settings. However, the unique characteristics of these data are not always well suited to machine learning techniques. These data are often generated across different technologies in different labs, and frequently with high dimensionality. In this paper we present a framework for combining -omics data sets, and for handling high dimensional data, making -omics research more accessible to machine learning applications. We demonstrate the success of this framework through integration and analysis of multi-analyte data for a set of 3,533 breast cancers. We then use this data-set to predict breast cancer patient survival for individuals at risk of an impending event, with higher accuracy and lower variance than methods trained on individual data-sets. We hope that our pipelines for data-set generation and transformation will open up -omics data to machine learning researchers. We have made these freely available for noncommercial use at www.ccg.ai.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/24/2022

ACROBAT – a multi-stain breast cancer histological whole-slide-image data set from routine diagnostics for computational pathology

The analysis of FFPE tissue sections stained with haematoxylin and eosin...
research
11/28/2022

Graph Neural Networks for Breast Cancer Data Integration

International initiatives such as METABRIC (Molecular Taxonomy of Breast...
research
09/18/2023

Concurrent Haptic, Audio, and Visual Data Set During Bare Finger Interaction with Textured Surfaces

Perceptual processes are frequently multi-modal. This is the case of hap...
research
11/26/2018

Interlacing Personal and Reference Genomes for Machine Learning Disease-Variant Detection

DNA sequencing to identify genetic variants is becoming increasingly val...
research
08/10/2021

Meta-repository of screening mammography classifiers

Artificial intelligence (AI) is transforming medicine and showing promis...
research
02/01/2021

Computing the Hazard Ratios Associated with Explanatory Variables Using Machine Learning Models of Survival Data

Purpose: The application of Cox Proportional Hazards (CoxPH) models to s...
research
04/17/2008

Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

Flow cytometry is often used to characterize the malignant cells in leuk...

Please sign up or login with your details

Forgot password? Click here to reset