Med-Flamingo: a Multimodal Medical Few-shot Learner

07/27/2023
by   Michael Moor, et al.
0

Medicine, by its nature, is a multifaceted domain that requires the synthesis of information across various modalities. Medical generative vision-language models (VLMs) make a first step in this direction and promise many exciting clinical applications. However, existing models typically have to be fine-tuned on sizeable down-stream datasets, which poses a significant limitation as in many medical applications data is scarce, necessitating models that are capable of learning from few examples in real-time. Here we propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. Med-Flamingo unlocks few-shot generative medical visual question answering (VQA) abilities, which we evaluate on several datasets including a novel challenging open-ended VQA dataset of visual USMLE-style problems. Furthermore, we conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app. Med-Flamingo improves performance in generative medical VQA by up to 20% in clinician's rating and firstly enables multimodal medical few-shot adaptations, such as rationale generation. We release our model, code, and evaluation app under https://github.com/snap-stanford/med-flamingo.

READ FULL TEXT

page 3

page 4

page 7

page 8

page 15

research
06/30/2023

Multimodal Prompt Retrieval for Generative Visual Question Answering

Recent years have witnessed impressive results of pre-trained vision-lan...
research
04/03/2021

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Images in the medical domain are fundamentally different from the genera...
research
02/25/2023

Medical visual question answering using joint self-supervised learning

Visual Question Answering (VQA) becomes one of the most active research ...
research
02/18/2021

SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering

Medical visual question answering (Med-VQA) has tremendous potential in ...
research
05/18/2023

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Vision-language pre-training (VLP) models have been demonstrated to be e...
research
05/24/2022

Rethinking Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

Vision-and-language (V L) models pretrained on large-scale multimodal ...
research
12/27/2021

Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?

Contrastive Language–Image Pre-training (CLIP) has shown remarkable succ...

Please sign up or login with your details

Forgot password? Click here to reset