Cross-modal Prototype Driven Network for Radiology Report Generation

07/11/2022
by   Jun Wang, et al.
10

Radiology report generation (RRG) aims to describe automatically a radiology image with human-like language and could potentially support the work of radiologists, reducing the burden of manual reporting. Previous approaches often adopt an encoder-decoder architecture and focus on single-modal feature learning, while few studies explore cross-modal feature interaction. Here we propose a Cross-modal PROtotype driven NETwork (XPRONET) to promote cross-modal pattern learning and exploit it to improve the task of radiology report generation. This is achieved by three well-designed, fully differentiable and complementary modules: a shared cross-modal prototype matrix to record the cross-modal prototypes; a cross-modal prototype network to learn the cross-modal prototypes and embed the cross-modal information into the visual and textual features; and an improved multi-label contrastive loss to enable and enhance multi-label prototype learning. XPRONET obtains substantial improvements on the IU-Xray and MIMIC-CXR benchmarks, where its performance exceeds recent state-of-the-art approaches by a large margin on IU-Xray and comparable performance on MIMIC-CXR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/17/2017

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models

Textual-visual cross-modal retrieval has been a hot research topic in bo...
research
11/02/2022

CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

Radiology report generation (RRG) has gained increasing research attenti...
research
07/07/2019

A methodology for multisensory product experience design using cross-modal effect: A case of SLR camera

Throughout the course of product experience, a user employs multiple sen...
research
05/02/2018

Automatic Inference of Cross-modal Connection Topologies for X-CNNs

This paper introduces a way to learn cross-modal convolutional neural ne...
research
05/15/2021

Premise-based Multimodal Reasoning: A Human-like Cognitive Process

Reasoning is one of the major challenges of Human-like AI and has recent...
research
06/26/2023

TCEIP: Text Condition Embedded Regression Network for Dental Implant Position Prediction

When deep neural network has been proposed to assist the dentist in desi...
research
04/05/2023

Calibrating Cross-modal Feature for Text-Based Person Searching

We present a novel and effective method calibrating cross-modal features...

Please sign up or login with your details

Forgot password? Click here to reset