Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency

06/23/2022
by   Weijie Ma, et al.
16

The colorectal polyps classification is a critical clinical examination. To improve the classification accuracy, most computer-aided diagnosis algorithms recognize colorectal polyps by adopting Narrow-Band Imaging (NBI). However, the NBI usually suffers from missing utilization in real clinic scenarios since the acquisition of this specific image requires manual switching of the light mode when polyps have been detected by using White-Light (WL) images. To avoid the above situation, we propose a novel method to directly achieve accurate white-light colonoscopy image classification by conducting structured cross-modal representation consistency. In practice, a pair of multi-modal images, i.e. NBI and WL, are fed into a shared Transformer to extract hierarchical feature representations. Then a novel designed Spatial Attention Module (SAM) is adopted to calculate the similarities between the class token and patch tokens the class tokens and spatial attention maps of paired NBI and WL images at different levels, the Transformer achieves the ability to keep both global and local representation consistency for the above two modalities. Extensive experimental results illustrate the proposed method outperforms the recent studies with a margin, realizing multi-modal prediction with a single Transformer while greatly improving the classification accuracy when only with WL images.

READ FULL TEXT
research
06/21/2022

Toward Unpaired Multi-modal Medical Image Segmentation via Learning Structured Semantic Consistency

Integrating multi-modal data to improve medical image analysis has recei...
research
01/03/2023

Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection

In this paper, we propose a robust 3D detector, named Cross Modal Transf...
research
06/02/2023

Transformer-based Multi-Modal Learning for Multi Label Remote Sensing Image Classification

In this paper, we introduce a novel Synchronized Class Token Fusion (SCT...
research
12/12/2022

Cross-Modal Learning with 3D Deformable Attention for Action Recognition

An important challenge in vision-based action recognition is the embeddi...
research
10/12/2022

Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning

Learning medical visual representations directly from paired radiology r...
research
02/27/2022

DXM-TransFuse U-net: Dual Cross-Modal Transformer Fusion U-net for Automated Nerve Identification

Accurate nerve identification is critical during surgical procedures for...
research
05/19/2023

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Transformers achieve promising performance in document understanding bec...

Please sign up or login with your details

Forgot password? Click here to reset