MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training

01/05/2023
by   Chaoyi Wu, et al.
38

In this paper, we consider the problem of enhancing self-supervised visual-language pre-training (VLP) with medical-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. In particular, we make the following contributions: First, unlike existing works that directly process the raw reports, we adopt a novel report filter to extract the medical entities, avoiding unnecessary complexity from language grammar and enhancing the supervision signals; Second, we propose a novel entity embedding module by querying an external knowledge description base, to exploit the rich context of additional information that the medical domain affords, and implicitly build relationships between entities in the language embedding space; Third, we propose a novel Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level only with self-supervised learning, thus enabling the ability for spatial grounding; Fourth, we conduct thorough experiments to validate the effectiveness of our proposed architecture, and benchmark on numerous public benchmarks e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax, COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning settings, our model has demonstrated strong performance compared with the former methods on disease classification and grounding.

READ FULL TEXT
research
02/22/2023

K-Diag: Knowledge-enhanced Disease Diagnosis in Radiographic Imaging

In this paper, we consider the problem of disease diagnosis. Unlike the ...
research
03/22/2023

Frozen Language Model Helps ECG Zero-Shot Learning

The electrocardiogram (ECG) is one of the most commonly used non-invasiv...
research
08/29/2022

CounTR: Transformer-based Generalised Visual Counting

In this paper, we consider the problem of generalised visual object coun...
research
03/30/2021

Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays

Pre-trained models, e.g., from ImageNet, have proven to be effective in ...
research
11/04/2021

Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports

Pre-training lays the foundation for recent successes in radiograph anal...
research
03/30/2023

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

This paper explores training medical vision-language models (VLMs) – whe...
research
01/11/2023

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Self-supervised learning in vision-language processing exploits semantic...

Please sign up or login with your details

Forgot password? Click here to reset