METCC: METric learning for Confounder Control Making distance matter in high dimensional biological analysis

12/07/2018
by   Kabir Manghnani, et al.
0

High-dimensional data acquired from biological experiments such as next generation sequencing are subject to a number of confounding effects. These effects include both technical effects, such as variation across batches from instrument noise or sample processing, or institution-specific differences in sample acquisition and physical handling, as well as biological effects arising from true but irrelevant differences in the biology of each sample, such as age biases in diseases. Prior work has used linear methods to adjust for such batch effects. Here, we apply contrastive metric learning by a non-linear triplet network to optimize the ability to distinguish biologically distinct sample classes in the presence of irrelevant technical and biological variation. Using whole-genome cell-free DNA data from 817 patients, we demonstrate that our approach, METric learning for Confounder Control (METCC), is able to match or exceed the classification performance achieved using a best-in-class linear method (HCP) or no normalization. Critically, results from METCC appear less confounded by irrelevant technical variables like institution and batch than those from other methods even without access to high quality metadata information required by many existing techniques; offering hope for improved generalization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/30/2009

Metric and Kernel Learning using a Linear Transformation

Metric and kernel learning are important in several machine learning app...
research
01/20/2019

Removal of Batch Effects using Generative Adversarial Networks

Many biological data analysis processes like Cytometry or Next Generatio...
research
01/17/2020

Coarsened mixtures of hierarchical skew normal kernels for flow cytometry analyses

Flow cytometry (FCM) is the standard multi-parameter assay used to measu...
research
08/17/2018

Estimating and accounting for unobserved covariates in high dimensional correlated data

Many high dimensional and high-throughput biological datasets have compl...
research
08/01/2023

Center Contrastive Loss for Metric Learning

Contrastive learning is a major studied topic in metric learning. Howeve...
research
01/13/2023

RxRx1: A Dataset for Evaluating Experimental Batch Correction Methods

High-throughput screening techniques are commonly used to obtain large q...

Please sign up or login with your details

Forgot password? Click here to reset