SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

04/14/2022
by   Samuel Cahyawijaya, et al.
23

Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.

READ FULL TEXT

page 4

page 14

page 15

research
10/11/2021

Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types

In the genome biology research, regulatory genome modeling is an importa...
research
10/30/2022

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

Self-supervised pre-training has been successful in both text and speech...
research
10/29/2020

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Speech separation has been well-developed while there are still problems...
research
03/25/2021

Contrast to Divide: Self-Supervised Pre-Training for Learning with Noisy Labels

The success of learning with noisy labels (LNL) methods relies heavily o...
research
10/19/2022

Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering

Recent self-supervised pre-training methods on Heterogeneous Information...
research
09/30/2022

Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods

Self-supervised methods have achieved remarkable success in transfer lea...
research
03/02/2023

Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT

We aim to investigate whether UNMT approaches with self-supervised pre-t...

Please sign up or login with your details

Forgot password? Click here to reset