Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing

03/02/2023
by   Sheng Zhang, et al.
0

Contrastive pretraining on parallel image-text data has attained great success in vision-language processing (VLP), as exemplified by CLIP and related methods. However, prior explorations tend to focus on general domains in the web. Biomedical images and text are rather different, but publicly available datasets are small and skew toward chest X-ray, thus severely limiting progress. In this paper, we conducted by far the largest study on biomedical VLP, using 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. Our dataset (PMC-15M) is two orders of magnitude larger than existing biomedical image-text datasets such as MIMIC-CXR, and spans a diverse range of biomedical images. The standard CLIP method is suboptimal for the biomedical domain. We propose BiomedCLIP with domain-specific adaptations tailored to biomedical VLP. We conducted extensive experiments and ablation studies on standard biomedical imaging tasks from retrieval to classification to visual question-answering (VQA). BiomedCLIP established new state of the art in a wide range of standard datasets, substantially outperformed prior VLP approaches. Surprisingly, BiomedCLIP even outperformed radiology-specific state-of-the-art models such as BioViL on radiology-specific tasks such as RSNA pneumonia detection, thus highlighting the utility in large-scale pretraining across all biomedical image types. We will release our models at https://aka.ms/biomedclip to facilitate future research in biomedical VLP.

READ FULL TEXT

page 3

page 8

page 11

research
07/31/2020

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

Pretraining large neural language models, such as BERT, has led to impre...
research
04/21/2022

Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing

Multi-modal data abounds in biomedicine, such as radiology images and re...
research
06/25/2021

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Information overload is a prevalent challenge in many high-value domains...
research
06/01/2023

LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day

Conversational generative AI has demonstrated remarkable promise for emp...
research
08/30/2021

BioFors: A Large Biomedical Image Forensics Dataset

Research in media forensics has gained traction to combat the spread of ...
research
03/01/2023

RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training

Vision-and-language multi-modal pretraining and fine-tuning have shown g...
research
08/04/2021

Terabyte-scale supervised 3D training and benchmarking dataset of the mouse kidney

The performance of machine learning algorithms used for the segmentation...

Please sign up or login with your details

Forgot password? Click here to reset