Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder

04/04/2023
by   Yunyi Liu, et al.
0

Medical Visual Question Answering (VQA) systems play a supporting role to understand clinic-relevant information carried by medical images. The questions to a medical image include two categories: close-end (such as Yes/No question) and open-end. To obtain answers, the majority of the existing medical VQA methods relies on classification approaches, while a few works attempt to use generation approaches or a mixture of the two. The classification approaches are relatively simple but perform poorly on long open-end questions. To bridge this gap, in this paper, we propose a new Transformer based framework for medical VQA (named as Q2ATransformer), which integrates the advantages of both the classification and the generation approaches and provides a unified treatment for the close-end and open-end questions. Specifically, we introduce an additional Transformer decoder with a set of learnable candidate answer embeddings to query the existence of each answer class to a given image-question pair. Through the Transformer attention, the candidate answer embeddings interact with the fused features of the image-question pair to make the decision. In this way, despite being a classification-based approach, our method provides a mechanism to interact with the answer information for prediction like the generation-based approaches. On the other hand, by classification, we mitigate the task difficulty by reducing the search space of answers. Our method achieves new state-of-the-art performance on two medical VQA benchmarks. Especially, for the open-end questions, we achieve 79.19 VQA-RAD and 54.85 respectively.

READ FULL TEXT
research
02/17/2020

CQ-VQA: Visual Question Answering on Categorized Questions

This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end ...
research
04/02/2022

Co-VQA : Answering by Interactive Sub Question Sequence

Most existing approaches to Visual Question Answering (VQA) answer quest...
research
09/23/2020

Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering

Different approaches have been proposed to Visual Question Answering (VQ...
research
08/05/2019

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

Chart question answering (CQA) is a newly proposed visual question answe...
research
02/19/2023

Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning

Medical visual question answering (VQA) aims to answer clinically releva...
research
06/22/2022

Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

Visual question answering (VQA) in surgery is largely unexplored. Expert...
research
11/23/2019

Unsupervised Keyword Extraction for Full-sentence VQA

In existing studies on Visual Question Answering (VQA), which aims to tr...

Please sign up or login with your details

Forgot password? Click here to reset