CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation

11/02/2022
by   Jun Wang, et al.
0

Radiology report generation (RRG) has gained increasing research attention because of its huge potential to mitigate medical resource shortages and aid the process of disease decision making by radiologists. Recent advancements in Radiology Report Generation (RRG) are largely driven by improving models' capabilities in encoding single-modal feature representations, while few studies explore explicitly the cross-modal alignment between image regions and words. Radiologists typically focus first on abnormal image regions before they compose the corresponding text descriptions, thus cross-modal alignment is of great importance to learn an abnormality-aware RRG model. Motivated by this, we propose a Class Activation Map guided Attention Network (CAMANet) which explicitly promotes cross-modal alignment by employing the aggregated class activation maps to supervise the cross-modal attention learning, and simultaneously enriches the discriminative information. Experimental results demonstrate that CAMANet outperforms previous SOTA methods on two commonly used RRG benchmarks.

READ FULL TEXT

page 1

page 3

page 11

page 13

page 14

research
07/11/2022

Cross-modal Prototype Driven Network for Radiology Report Generation

Radiology report generation (RRG) aims to describe automatically a radio...
research
06/25/2021

Graph Pattern Loss based Diversified Attention Network for Cross-Modal Retrieval

Cross-modal retrieval aims to enable flexible retrieval experience by co...
research
09/13/2019

Co-Attentive Cross-Modal Deep Learning for Medical Evidence Synthesis and Decision Making

Modern medicine requires generalised approaches to the synthesis and int...
research
09/02/2021

AnANet: Modeling Association and Alignment for Cross-modal Correlation Classification

The explosive increase of multimodal data makes a great demand in many c...
research
11/09/2020

Generating Image Descriptions via Sequential Cross-Modal Alignment Guided by Human Gaze

When speakers describe an image, they tend to look at objects before men...
research
02/25/2023

Cross-modal Contrastive Learning for Multimodal Fake News Detection

Automatic detection of multimodal fake news has gained a widespread atte...
research
05/23/2023

Text-guided 3D Human Generation from 2D Collections

3D human modeling has been widely used for engaging interaction in gamin...

Please sign up or login with your details

Forgot password? Click here to reset