Contextualized Keyword Representations for Multi-modal Retinal Image Captioning

04/26/2021
by   Jia-Hong Huang, et al.
0

Medical image captioning automatically generates a medical description to describe the content of a given medical image. A traditional medical image captioning model creates a medical description only based on a single medical image input. Hence, an abstract medical description or concept is hard to be generated based on the traditional approach. Such a method limits the effectiveness of medical image captioning. Multi-modal medical image captioning is one of the approaches utilized to address this problem. In multi-modal medical image captioning, textual input, e.g., expert-defined keywords, is considered as one of the main drivers of medical description generation. Thus, encoding the textual input and the medical image effectively are both important for the task of multi-modal medical image captioning. In this work, a new end-to-end deep multi-modal medical image captioning model is proposed. Contextualized keyword representations, textual feature reinforcement, and masked self-attention are used to develop the proposed approach. Based on the evaluation of the existing multi-modal medical image captioning dataset, experimental results show that the proposed model is effective with the increase of +53.2 state-of-the-art method.

READ FULL TEXT
research
05/30/2021

Longer Version for "Deep Context-Encoding Network for Retinal Image Captioning"

Automatically generating medical reports for retinal images is one of th...
research
03/19/2023

Multi-modal reward for visual relationships-based image captioning

Deep neural networks have achieved promising results in automatic image ...
research
09/28/2022

Medical Image Captioning via Generative Pretrained Transformers

The automatic clinical caption generation problem is referred to as prop...
research
03/23/2023

Increasing Textual Context Size Boosts Medical Image-Text Matching

This short technical report demonstrates a simple technique that yields ...
research
03/06/2019

A Synchronized Multi-Modal Attention-Caption Dataset and Analysis

In this work, we present a novel multi-modal dataset consisting of eye m...
research
05/09/2022

Beyond a Pre-Trained Object Detector: Cross-Modal Textual and Visual Context for Image Captioning

Significant progress has been made on visual captioning, largely relying...
research
12/29/2020

Detecting Hate Speech in Multi-modal Memes

In the past few years, there has been a surge of interest in multi-modal...

Please sign up or login with your details

Forgot password? Click here to reset