TFormer: A throughout fusion transformer for multi-modal skin lesion diagnosis

11/21/2022
by   Yilan Zhang, et al.
0

Multi-modal skin lesion diagnosis (MSLD) has achieved remarkable success by modern computer-aided diagnosis technology based on deep convolutions. However, the information aggregation across modalities in MSLD remains challenging due to severity unaligned spatial resolution (dermoscopic image and clinical image) and heterogeneous data (dermoscopic image and patients' meta-data). Limited by the intrinsic local attention, most recent MSLD pipelines using pure convolutions struggle to capture representative features in shallow layers, thus the fusion across different modalities is usually done at the end of the pipelines, even at the last layer, leading to an insufficient information aggregation. To tackle the issue, we introduce a pure transformer-based method, which we refer to as “Throughout Fusion Transformer (TFormer)", for sufficient information intergration in MSLD. Different from the existing approaches with convolutions, the proposed network leverages transformer as feature extraction backbone, bringing more representative shallow features. We then carefully design a stack of dual-branch hierarchical multi-modal transformer (HMT) blocks to fuse information across different image modalities in a stage-by-stage way. With the aggregated information of image modalities, a multi-modal transformer post-fusion (MTP) block is designed to integrate features across image and non-image data. Such a strategy that information of the image modalities is firstly fused then the heterogeneous ones enables us to better divide and conquer the two major challenges while ensuring inter-modality dynamics are effectively modeled. Experiments conducted on the public Derm7pt dataset validate the superiority of the proposed method. Our TFormer outperforms other state-of-the-art methods. Ablation experiments also suggest the effectiveness of our designs.

READ FULL TEXT

page 5

page 11

research
08/31/2022

NestedFormer: Nested Modality-Aware Transformer for Brain Tumor Segmentation

Multi-modal MR imaging is routinely used in clinical practice to diagnos...
research
08/28/2021

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide image...
research
06/02/2023

Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

In this paper, we introduce a novel Synchronized Class Token Fusion (SCT...
research
04/16/2023

TransFusionOdom: Interpretable Transformer-based LiDAR-Inertial Fusion Odometry Estimation

Multi-modal fusion of sensors is a commonly used approach to enhance the...
research
11/26/2022

CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion

Multi-modality (MM) image fusion aims to render fused images that mainta...
research
09/22/2022

UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

We propose the first unified framework UniColor to support colorization ...
research
04/20/2020

Transformer Reasoning Network for Image-Text Matching and Retrieval

Image-text matching is an interesting and fascinating task in modern AI ...

Please sign up or login with your details

Forgot password? Click here to reset