Gene-induced Multimodal Pre-training for Image-omic Classification

09/06/2023
by   Ting Jin, et al.
0

Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length contiguous subsequence of patch embeddings of a WSI. Finally, we combine the classification tokens of paired modalities and propose a triplet learning module to learn high-order relevance and discriminative patient-level information.After pre-training, a simple fine-tuning can be adopted to obtain the classification results. Experimental results on the TCGA dataset show the superiority of our network architectures and our pre-training framework, achieving 99.47 classification. The code is publicly available at https://github.com/huangwudiduan/GIMP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2023

Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training

The use of self-supervised pre-training has emerged as a promising appro...
research
07/27/2023

Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models

With the overwhelming trend of mask image modeling led by MAE, generativ...
research
05/23/2023

Training Transitive and Commutative Multimodal Transformers with LoReTTa

Collecting a multimodal dataset with two paired modalities A and B or B ...
research
04/13/2023

Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction

Integrating whole-slide images (WSIs) and bulk transcriptomics for predi...
research
11/03/2021

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

We present a unified Vision-Language pretrained Model (VLMo) that jointl...
research
09/11/2023

CNN or ViT? Revisiting Vision Transformers Through the Lens of Convolution

The success of Vision Transformer (ViT) has been widely reported on a wi...
research
11/18/2021

SimMIM: A Simple Framework for Masked Image Modeling

This paper presents SimMIM, a simple framework for masked image modeling...

Please sign up or login with your details

Forgot password? Click here to reset