MCQA: Multimodal Co-attention Based Network for Question Answering

04/25/2020
by   Abhishek Kumar, et al.
0

We present MCQA, a learning-based algorithm for multimodal question answering. MCQA explicitly fuses and aligns the multimodal input (i.e. text, audio, and video), which forms the context for the query (question and answer). Our approach fuses and aligns the question and the answer within this context. Moreover, we use the notion of co-attention to perform cross-modal alignment and multimodal context-query alignment. Our context-query alignment module matches the relevant parts of the multimodal context and the query with each other and aligns them to improve the overall performance. We evaluate the performance of MCQA on Social-IQ, a benchmark dataset for multimodal question answering. We compare the performance of our algorithm with prior methods and observe an accuracy improvement of 4-7

READ FULL TEXT
research
08/11/2021

Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering

Video question answering has recently received a lot of attention from m...
research
04/20/2021

Towards Solving Multimodal Comprehension

This paper targets the problem of procedural multimodal machine comprehe...
research
12/06/2021

MoCA: Incorporating Multi-stage Domain Pretraining and Cross-guided Multimodal Attention for Textbook Question Answering

Textbook Question Answering (TQA) is a complex multimodal task to infer ...
research
07/17/2023

PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese

We present in this paper a novel scheme for multimodal learning named th...
research
01/12/2021

Latent Alignment of Procedural Concepts in Multimodal Recipes

We propose a novel alignment mechanism to deal with procedural reasoning...
research
09/09/2023

MMHQA-ICL: Multimodal In-context Learning for Hybrid Question Answering over Text, Tables and Images

In the real world, knowledge often exists in a multimodal and heterogene...
research
10/26/2022

DyREx: Dynamic Query Representation for Extractive Question Answering

Extractive question answering (ExQA) is an essential task for Natural La...

Please sign up or login with your details

Forgot password? Click here to reset