Towards IID representation learning and its application on biomedical data

03/01/2022
by   Jiqing Wu, et al.
0

Due to the heterogeneity of real-world data, the widely accepted independent and identically distributed (IID) assumption has been criticized in recent studies on causality. In this paper, we argue that instead of being a questionable assumption, IID is a fundamental task-relevant property that needs to be learned. Consider k independent random vectors 𝖷^i = 1, …, k, we elaborate on how a variety of different causal questions can be reformulated to learning a task-relevant function ϕ that induces IID among 𝖹^i := ϕ∘𝖷^i, which we term IID representation learning. For proof of concept, we examine the IID representation learning on Out-of-Distribution (OOD) generalization tasks. Concretely, by utilizing the representation obtained via the learned function that induces IID, we conduct prediction of molecular characteristics (molecular prediction) on two biomedical datasets with real-world distribution shifts introduced by a) preanalytical variation and b) sampling protocol. To enable reproducibility and for comparison to the state-of-the-art (SOTA) methods, this is done by following the OOD benchmarking guidelines recommended from WILDS. Compared to the SOTA baselines supported in WILDS, the results confirm the superior performance of IID representation learning on OOD tasks. The code is publicly accessible via https://github.com/CTPLab/IID_representation_learning.

READ FULL TEXT

page 16

page 17

page 18

page 19

page 21

page 22

page 23

page 25

research
06/28/2021

LiteGEM: Lite Geometry Enhanced Molecular Representation Learning for Quantum Property Prediction

In this report, we (SuperHelix team) present our solution to KDD Cup 202...
research
06/01/2023

Task Relation-aware Continual User Representation Learning

User modeling, which learns to represent users into a low-dimensional re...
research
07/15/2021

Temporal-aware Language Representation Learning From Crowdsourced Labels

Learning effective language representations from crowdsourced labels is ...
research
07/08/2022

Graph-based Molecular Representation Learning

Molecular representation learning (MRL) is a key step to build the conne...
research
06/12/2023

Correlated Time Series Self-Supervised Representation Learning via Spatiotemporal Bootstrapping

Correlated time series analysis plays an important role in many real-wor...
research
07/18/2022

FunQG: Molecular Representation Learning Via Quotient Graphs

Learning expressive molecular representations is crucial to facilitate t...
research
11/27/2018

Flexible Attributed Network Embedding

Network embedding aims to find a way to encode network by learning an em...

Please sign up or login with your details

Forgot password? Click here to reset