DeepAI AI Chat
Log In Sign Up

Invariant Risk Minimisation for Cross-Organism Inference: Substituting Mouse Data for Human Data in Human Risk Factor Discovery

11/14/2021
by   Odhran O'Donoghue, et al.
University of Oxford
Mayo Foundation for Medical Education and Research
King's College London
NASA
cornell university
Intel
University of St. Gallen
6

Human medical data can be challenging to obtain due to data privacy concerns, difficulties conducting certain types of experiments, or prohibitive associated costs. In many settings, data from animal models or in-vitro cell lines are available to help augment our understanding of human data. However, this data is known for having low etiological validity in comparison to human data. In this work, we augment small human medical datasets with in-vitro data and animal models. We use Invariant Risk Minimisation (IRM) to elucidate invariant features by considering cross-organism data as belonging to different data-generating environments. Our models identify genes of relevance to human cancer development. We observe a degree of consistency between varying the amounts of human and mouse data used, however, further work is required to obtain conclusive insights. As a secondary contribution, we enhance existing open source datasets and provide two uniformly processed, cross-organism, homologue gene-matched datasets to the community.

READ FULL TEXT

page 8

page 9

page 10

05/03/2023

Generating Synthetic Documents for Cross-Encoder Re-Rankers: A Comparative Study of ChatGPT and Human Experts

We investigate the usefulness of generative Large Language Models (LLMs)...
03/28/2023

Genetic Analysis of Prostate Cancer with Computer Science Methods

Metastatic prostate cancer is one of the most common cancers in men. In ...
12/17/2021

Balancing Fairness and Robustness via Partial Invariance

The Invariant Risk Minimization (IRM) framework aims to learn invariant ...
08/27/2021

PanelPRO: a general framework for multi-gene, multi-cancer Mendelian risk prediction models

Risk evaluation to identify individuals who are at greater risk of cance...
04/16/2020

Smaller p-values in genomics studies using distilled historical information

Medical research institutions have generated massive amounts of biologic...
03/09/2023

Contributing to Accessibility Datasets: Reflections on Sharing Study Data by Blind People

To ensure that AI-infused systems work for disabled people, we need to b...
06/09/2021

Learning Domain Invariant Representations by Joint Wasserstein Distance Minimization

Domain shifts in the training data are common in practical applications ...