A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering

10/01/2022
by   Xiaofei Huang, et al.
0

Research in medical visual question answering (MVQA) can contribute to the development of computeraided diagnosis. MVQA is a task that aims to predict accurate and convincing answers based on given medical images and associated natural language questions. This task requires extracting medical knowledge-rich feature content and making fine-grained understandings of them. Therefore, constructing an effective feature extraction and understanding scheme are keys to modeling. Existing MVQA question extraction schemes mainly focus on word information, ignoring medical information in the text. Meanwhile, some visual and textual feature understanding schemes cannot effectively capture the correlation between regions and keywords for reasonable visual reasoning. In this study, a dual-attention learning network with word and sentence embedding (WSDAN) is proposed. We design a module, transformer with sentence embedding (TSE), to extract a double embedding representation of questions containing keywords and medical information. A dualattention learning (DAL) module consisting of self-attention and guided attention is proposed to model intensive intramodal and intermodal interactions. With multiple DAL modules (DALs), learning visual and textual co-attention can increase the granularity of understanding and improve visual reasoning. Experimental results on the ImageCLEF 2019 VQA-MED (VQA-MED 2019) and VQA-RAD datasets demonstrate that our proposed method outperforms previous state-of-the-art methods. According to the ablation studies and Grad-CAM maps, WSDAN can extract rich textual information and has strong visual reasoning ability.

READ FULL TEXT

page 1

page 8

research
06/25/2019

Deep Modular Co-Attention Networks for Visual Question Answering

Visual Question Answering (VQA) requires a fine-grained and simultaneous...
research
07/07/2021

MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering

Medical Visual Question Answering (VQA) is a multi-modal challenging tas...
research
02/28/2023

VQA with Cascade of Self- and Co-Attention Blocks

The use of complex attention modules has improved the performance of the...
research
11/15/2018

Exploiting Sentence Embedding for Medical Question Answering

Despite the great success of word embedding, sentence embedding remains ...
research
07/11/2023

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

Medical students and junior surgeons often rely on senior surgeons and s...
research
09/27/2021

Recall and Learn: A Memory-augmented Solver for Math Word Problems

In this article, we tackle the math word problem, namely, automatically ...
research
11/23/2019

Unsupervised Keyword Extraction for Full-sentence VQA

In existing studies on Visual Question Answering (VQA), which aims to tr...

Please sign up or login with your details

Forgot password? Click here to reset