Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Masked Contrastive Learning

09/12/2023
by   Weijian Huang, et al.
0

Recently, multi-modal vision-language foundation models have gained significant attention in the medical field. While these models offer great opportunities, they still face a number of challenges, such as the requirement for fine-grained knowledge understanding in computer-aided diagnosis and capability of utilizing very limited or no task-specific labeled data in real-world clinical applications. In this study, we present MaCo, a novel multi-modal medical foundation model that explores masked contrastive learning to achieve granular alignment and zero-shot learning for a variety of medical imaging tasks. MaCo incorporates a correlation weighting mechanism to adjust the correlation between masked image patches and their corresponding reports, thereby enhancing the representation learning capabilities. We evaluate MaCo on six well-known open-source X-ray datasets, and the experimental results show it outperforms seven state-of-the-art approaches for classification, segmentation, and zero-shot phase grounding, demonstrating its great potential to promote a wide range of medical image analysis tasks.

READ FULL TEXT

page 14

page 15

research
02/27/2023

Knowledge-enhanced Pre-training for Auto-diagnosis of Chest Radiology Images

Despite of the success of multi-modal foundation models pre-trained on l...
research
04/10/2023

SAM.MD: Zero-shot medical image segmentation capabilities of the Segment Anything Model

Foundation models have taken over natural language processing and image ...
research
09/04/2023

An Empirical Analysis for Zero-Shot Multi-Label Classification on COVID-19 CT Scans and Uncurated Reports

The pandemic resulted in vast repositories of unstructured data, includi...
research
04/25/2023

Generalist Vision Foundation Models for Medical Imaging: A Case Study of Segment Anything Model on Zero-Shot Medical Segmentation

We examine the recent Segment Anything Model (SAM) on medical images, an...
research
05/18/2023

OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

We introduce OpenShape, a method for learning multi-modal joint represen...
research
08/15/2023

A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision

Foundation vision-language models are currently transforming computer vi...
research
08/04/2023

Towards Generalist Foundation Model for Radiology

In this study, we aim to initiate the development of Radiology Foundatio...

Please sign up or login with your details

Forgot password? Click here to reset