Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

07/14/2023
by   Xiaofei Chen, et al.
0

The foundation models based on pre-training technology have significantly advanced artificial intelligence from theoretical to practical applications. These models have facilitated the feasibility of computer-aided diagnosis for widespread use. Medical contrastive vision-language pre-training, which does not require human annotations, is an effective approach for guiding representation learning using description information in diagnostic reports. However, the effectiveness of pre-training is limited by the large-scale semantic overlap and shifting problems in medical field. To address these issues, we propose the Knowledge-Boosting Contrastive Vision-Language Pre-training framework (KoBo), which integrates clinical knowledge into the learning of vision-language semantic consistency. The framework uses an unbiased, open-set sample-wise knowledge representation to measure negative sample noise and supplement the correspondence between vision-language mutual information and clinical knowledge. Extensive experiments validate the effect of our framework on eight tasks including classification, segmentation, retrieval, and semantic relatedness, achieving comparable or better performance with the zero-shot or few-shot settings. Our code is open on https://github.com/ChenXiaoFei-CS/KoBo.

READ FULL TEXT
research
08/16/2023

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Contrastive Language-Image Pre-training (CLIP) has significantly boosted...
research
09/25/2022

Collaboration of Pre-trained Models Makes Better Few-shot Learner

Few-shot classification requires deep neural networks to learn generaliz...
research
03/03/2023

Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

Visual recognition in low-data regimes requires deep neural networks to ...
research
02/27/2023

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images

Despite of the success of multi-modal foundation models pre-trained on l...
research
09/15/2022

Exploring Visual Interpretability for Contrastive Language-Image Pre-training

Contrastive Language-Image pre-training (CLIP) learns rich representatio...
research
06/06/2023

Towards Label-free Scene Understanding by Vision Foundation Models

Vision foundation models such as Contrastive Vision-Language Pre-trainin...
research
04/12/2023

CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks

Contrastive Language-Image Pre-training (CLIP) is a powerful multimodal ...

Please sign up or login with your details

Forgot password? Click here to reset