A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models

by   Bilel Guetarni, et al.

Determining lymphoma subtypes is a crucial step for better patients treatment targeting to potentially increase their survival chances. In this context, the existing gold standard diagnosis method, which is based on gene expression technology, is highly expensive and time-consuming making difficult its accessibility. Although alternative diagnosis methods based on IHC (immunohistochemistry) technologies exist (recommended by the WHO), they still suffer from similar limitations and are less accurate. WSI (Whole Slide Image) analysis by deep learning models showed promising new directions for cancer diagnosis that would be cheaper and faster than existing alternative methods. In this work, we propose a vision transformer-based framework for distinguishing DLBCL (Diffuse Large B-Cell Lymphoma) cancer subtypes from high-resolution WSIs. To this end, we propose a multi-modal architecture to train a classifier model from various WSI modalities. We then exploit this model through a knowledge distillation mechanism for efficiently driving the learning of a mono-modal classifier. Our experimental study conducted on a dataset of 157 patients shows the promising performance of our mono-modal classification model, outperforming six recent methods from the state-of-the-art dedicated for cancer classification. Moreover, the power-law curve, estimated on our experimental data, shows that our classification model requires a reasonable number of additional patients for its training to potentially reach identical diagnosis accuracy as IHC technologies.


page 5

page 9


AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide image...

Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review

The rapid development of diagnostic technologies in healthcare is leadin...

Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection

In this paper, we for the first time explore helpful multi-modal context...

AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images

Processing giga-pixel whole slide histopathology images (WSI) is a compu...

Cascaded Multi-Modal Mixing Transformers for Alzheimer's Disease Classification with Incomplete Data

Accurate medical classification requires a large number of multi-modal d...

Multi-Modal Active Learning for Automatic Liver Fibrosis Diagnosis based on Ultrasound Shear Wave Elastography

With the development of radiomics, noninvasive diagnosis like ultrasound...

Deep Multi-Modal Classification of Intraductal Papillary Mucinous Neoplasms (IPMN) with Canonical Correlation Analysis

Pancreatic cancer has the poorest prognosis among all cancer types. Intr...

Please sign up or login with your details

Forgot password? Click here to reset