RepsNet: Combining Vision with Language for Automated Medical Reports

09/27/2022
by   Ajay Kumar Tanwani, et al.
10

Writing reports by analyzing medical images is error-prone for inexperienced practitioners and time consuming for experienced ones. In this work, we present RepsNet that adapts pre-trained vision and language models to interpret medical images and generate automated reports in natural language. RepsNet consists of an encoder-decoder model: the encoder aligns the images with natural language descriptions via contrastive learning, while the decoder predicts answers by conditioning on encoded images and prior context of descriptions retrieved by nearest neighbor search. We formulate the problem in a visual question answering setting to handle both categorical and descriptive natural language answers. We perform experiments on two challenging tasks of medical visual question answering (VQA-Rad) and report generation (IU-Xray) on radiology image datasets. Results show that RepsNet outperforms state-of-the-art methods with 81.08 IU-Xray. Supplementary details are available at https://sites.google.com/view/repsnet

READ FULL TEXT
research
05/17/2023

PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

In this paper, we focus on the problem of Medical Visual Question Answer...
research
09/03/2020

A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports

Joint image-text embedding extracted from medical images and associated ...
research
07/11/2023

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting

Radiology reporting is a crucial part of the communication between radio...
research
05/18/2023

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Vision-language pre-training (VLP) models have been demonstrated to be e...
research
04/03/2021

MMBERT: Multimodal BERT Pretraining for Improved Medical VQA

Images in the medical domain are fundamentally different from the genera...
research
07/08/2017

MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis Network

The inability to interpret the model prediction in semantically and visu...
research
06/06/2020

Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation

Beyond the common difficulties faced in the natural image captioning, me...

Please sign up or login with your details

Forgot password? Click here to reset