ConTEXTual Net: A Multimodal Vision-Language Model for Segmentation of Pneumothorax

03/02/2023
by   Zachary Huemann, et al.
0

Clinical imaging databases contain not only medical images but also text reports generated by physicians. These narrative reports often describe the location, size, and shape of the disease, but using descriptive text to guide medical image analysis has been understudied. Vision-language models are increasingly used for multimodal tasks like image generation, image captioning, and visual question answering but have been scarcely used in medical imaging. In this work, we develop a vision-language model for the task of pneumothorax segmentation. Our model, ConTEXTual Net, detects and segments pneumothorax in chest radiographs guided by free-form radiology reports. ConTEXTual Net achieved a Dice score of 0.72 ± 0.02, which was similar to the level of agreement between the primary physician annotator and the other physician annotators (0.71 ± 0.04). ConTEXTual Net also outperformed a U-Net. We demonstrate that descriptive language can be incorporated into a segmentation model for improved performance. Through an ablative study, we show that it is the text information that is responsible for the performance gains. Additionally, we show that certain augmentation methods worsen ConTEXTual Net's segmentation performance by breaking the image-text concordance. We propose a set of augmentations that maintain this concordance and improve segmentation training.

READ FULL TEXT

page 6

page 7

research
09/03/2020

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Joint image-text embedding extracted from medical images and associated ...
research
06/06/2022

Implementation of a Modified U-Net for Medical Image Segmentation on Edge Devices

Deep learning techniques, particularly convolutional neural networks, ha...
research
02/11/2019

MultiResUNet : Rethinking the U-Net Architecture for Multimodal Biomedical Image Segmentation

In recent years Deep Learning has brought about a breakthrough in Medica...
research
03/30/2023

Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime

This paper explores training medical vision-language models (VLMs) – whe...
research
08/10/2017

TandemNet: Distilling Knowledge from Medical Images Using Diagnostic Reports as Optional Semantic References

In this paper, we introduce the semantic knowledge of medical images fro...
research
05/09/2023

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Webpages have been a rich resource for language and vision-language task...
research
05/20/2023

Bi-VLGM : Bi-Level Class-Severity-Aware Vision-Language Graph Matching for Text Guided Medical Image Segmentation

Medical reports with substantial information can be naturally complement...

Please sign up or login with your details

Forgot password? Click here to reset