MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

05/18/2023
by   Qiuhui Chen, et al.
0

Vision-language pre-training (VLP) models have been demonstrated to be effective in many computer vision applications. In this paper, we consider developing a VLP model in the medical domain for making computer-aided diagnoses (CAD) based on image scans and text descriptions in electronic health records, as done in practice. To achieve our goal, we present a lightweight CAD system MedBLIP, a new paradigm for bootstrapping VLP from off-the-shelf frozen pre-trained image encoders and frozen large language models. We design a MedQFormer module to bridge the gap between 3D medical images and 2D pre-trained image encoders and language models as well. To evaluate the effectiveness of our MedBLIP, we collect more than 30,000 image volumes from five public Alzheimer's disease (AD) datasets, i.e., ADNI, NACC, OASIS, AIBL, and MIRIAD. On this largest AD dataset we know, our model achieves the SOTA performance on the zero-shot classification of healthy, mild cognitive impairment (MCI), and AD subjects, and shows its capability of making medical visual question answering (VQA). The code and pre-trained models is available online: https://github.com/Qybc/MedBLIP.

READ FULL TEXT
research
01/30/2023

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

The cost of vision-and-language pre-training has become increasingly pro...
research
08/10/2021

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Vision-and-language(V L) models take image and text as input and learn...
research
05/02/2023

Huatuo-26M, a Large-scale Chinese Medical QA Dataset

In this paper, we release a largest ever medical Question Answering (QA)...
research
05/25/2023

ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs

The potential of integrating Computer-Assisted Diagnosis (CAD) with Larg...
research
09/27/2022

RepsNet: Combining Vision with Language for Automated Medical Reports

Writing reports by analyzing medical images is error-prone for inexperie...
research
07/27/2023

Med-Flamingo: a Multimodal Medical Few-shot Learner

Medicine, by its nature, is a multifaceted domain that requires the synt...
research
08/31/2023

Separate and Locate: Rethink the Text in Text-based Visual Question Answering

Text-based Visual Question Answering (TextVQA) aims at answering questio...

Please sign up or login with your details

Forgot password? Click here to reset