MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering

07/07/2021
by   Haiwei Pan, et al.
0

Medical Visual Question Answering (VQA) is a multi-modal challenging task widely considered by research communities of the computer vision and natural language processing. Since most current medical VQA models focus on visual content, ignoring the importance of text, this paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering which integrates the high-level semantics of medical images on the basis of text description. Firstly, different methods are utilized to extract the features of the image and the question for the two modalities of vision and text. Secondly, this paper proposes a multi-view attention mechanism that include Image-to-Question (I2Q) attention and Word-to-Text (W2T) attention. Multi-view attention can correlate the question with image and word in order to better analyze the question and get an accurate answer. Thirdly, a composite loss is presented to predict the answer accurately after multi-modal feature fusion and improve the similarity between visual and textual cross-modal features. It consists of classification loss and image-question complementary (IQC) loss. Finally, for data errors and missing labels in the VQA-RAD dataset, we collaborate with medical experts to correct and complete this dataset and then construct an enhanced dataset, VQA-RADPh. The experiments on these two datasets show that the effectiveness of MuVAM surpasses the state-of-the-art method.

READ FULL TEXT

page 1

page 3

page 8

research
10/17/2020

Answer-checking in Context: A Multi-modal FullyAttention Network for Visual Question Answering

Visual Question Answering (VQA) is challenging due to the complex cross-...
research
11/11/2022

MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering

There is a key problem in the medical visual question answering task tha...
research
04/27/2020

A Novel Attention-based Aggregation Function to Combine Vision and Language

The joint understanding of vision and language has been recently gaining...
research
10/01/2022

A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering

Research in medical visual question answering (MVQA) can contribute to t...
research
12/12/2021

Change Detection Meets Visual Question Answering

The Earth's surface is continually changing, and identifying changes pla...
research
02/01/2022

Research on Question Classification Methods in the Medical Field

Question classification is one of the important links in the research of...
research
08/10/2017

Beyond Bilinear: Generalized Multi-modal Factorized High-order Pooling for Visual Question Answering

Visual question answering (VQA) is challenging because it requires a sim...

Please sign up or login with your details

Forgot password? Click here to reset