A New Perspective to Boost Vision Transformer for Medical Image Classification

01/03/2023
by   Yuexiang Li, et al.
0

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

READ FULL TEXT
research
01/13/2021

Big Self-Supervised Models Advance Medical Image Classification

Self-supervised pretraining followed by supervised fine-tuning has seen ...
research
07/25/2022

Jigsaw-ViT: Learning Jigsaw Puzzles in Vision Transformer

The success of Vision Transformer (ViT) in various computer vision tasks...
research
12/17/2021

Unified 2D and 3D Pre-training for Medical Image classification and Segmentation

Self-supervised learning (SSL) opens up huge opportunities for better ut...
research
01/18/2023

ViT-AE++: Improving Vision Transformer Autoencoder for Self-supervised Medical Image Representations

Self-supervised learning has attracted increasing attention as it learns...
research
12/22/2021

Improved skin lesion recognition by a Self-Supervised Curricular Deep Learning approach

State-of-the-art deep learning approaches for skin lesion recognition of...
research
03/09/2022

Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification

A large-scale labeled dataset is a key factor for the success of supervi...
research
03/19/2023

DiffMIC: Dual-Guidance Diffusion Network for Medical Image Classification

Diffusion Probabilistic Models have recently shown remarkable performanc...

Please sign up or login with your details

Forgot password? Click here to reset