CLARITY – Comparing heterogeneous data using dissimiLARITY

05/29/2020
by   Daniel J. Lawson, et al.
16

Integrating datasets from different disciplines is hard because the data are often qualitatively different in meaning, scale, and reliability. When two datasets describe the same entities, many scientific questions can be phrased around whether the similarities between entities are conserved. Our method, CLARITY, quantifies consistency across datasets, identifies where inconsistencies arise, and aids in their interpretation. We explore three diverse comparisons: Gene Methylation vs Gene Expression, evolution of language sounds vs word use, and country-level economic metrics vs cultural beliefs. The non-parametric approach is robust to noise and differences in scaling, and makes only weak assumptions about how the data were generated. It operates by decomposing similarities into two components: the `structural' component analogous to a clustering, and an underlying `relationship' between those structures. This allows a `structural comparison' between two similarity matrices using their predictability from `structure'. The software, CLARITY, is available as an R package from https://github.com/danjlawson/CLARITY.

READ FULL TEXT

page 3

page 7

page 11

page 12

research
03/19/2019

Identify Statistical Similarities and Differences Between the Deadliest Cancer Types Through Gene Expression

Prognostic genes have been well studied within each type of cancer. Howe...
research
10/03/2022

A flexible model for correlated count data, with application to analysis of gene expression differences in multi-condition experiments

Detecting differences in gene expression is an important part of RNA seq...
research
05/19/2022

Spatial Transcriptomics Dimensionality Reduction using Wavelet Bases

Spatially resolved transcriptomics (ST) measures gene expression along w...
research
01/14/2021

Feature reduction for machine learning on molecular features: The GeneScore

We present the GeneScore, a concept of feature reduction for Machine Lea...
research
07/12/2019

Predicting phenotypes from microarrays using amplified, initially marginal, eigenvector regression

Motivation: The discovery of relationships between gene expression measu...
research
12/28/2020

Mechanism of Evolution Shared by Gene and Language

We propose a general mechanism for evolution to explain the diversity of...
research
11/05/2021

Compressed spectral screening for large-scale differential correlation analysis with application in selecting Glioblastoma gene modules

Differential co-expression analysis has been widely applied by scientist...

Please sign up or login with your details

Forgot password? Click here to reset