LiVLR: A Lightweight Visual-Linguistic Reasoning Framework for Video Question Answering

11/29/2021
by   Jingjing Jiang, et al.
0

Video Question Answering (VideoQA), aiming to correctly answer the given question based on understanding multi-modal video content, is challenging due to the rich video content. From the perspective of video understanding, a good VideoQA framework needs to understand the video content at different semantic levels and flexibly integrate the diverse video content to distill question-related content. To this end, we propose a Lightweight Visual-Linguistic Reasoning framework named LiVLR. Specifically, LiVLR first utilizes the graph-based Visual and Linguistic Encoders to obtain multi-grained visual and linguistic representations. Subsequently, the obtained representations are integrated with the devised Diversity-aware Visual-Linguistic Reasoning module (DaVL). The DaVL considers the difference between the different types of representations and can flexibly adjust the importance of different types of representations when generating the question-related joint representation, which is an effective and general representation integration method. The proposed LiVLR is lightweight and shows its performance advantage on two VideoQA benchmarks, MRSVTT-QA and KnowIT VQA. Extensive ablation studies demonstrate the effectiveness of LiVLR key components.

READ FULL TEXT

page 1

page 2

page 9

research
07/10/2019

Learning to Reason with Relational Video Representation for Question Answering

How does machine learn to reason about the content of a video in answeri...
research
08/31/2020

Cross-modal Knowledge Reasoning for Knowledge-based Visual Question Answering

Knowledge-based Visual Question Answering (KVQA) requires external knowl...
research
03/06/2022

Dynamic Key-value Memory Enhanced Multi-step Graph Reasoning for Knowledge-based Visual Question Answering

Knowledge-based visual question answering (VQA) is a vision-language tas...
research
12/15/2021

3D Question Answering

Visual Question Answering (VQA) has witnessed tremendous progress in rec...
research
10/10/2017

iVQA: Inverse Visual Question Answering

In recent years, visual question answering (VQA) has become topical as a...
research
09/20/2023

StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding

Charts are common in literature across different scientific fields, conv...
research
05/28/2022

Visual Superordinate Abstraction for Robust Concept Learning

Concept learning constructs visual representations that are connected to...

Please sign up or login with your details

Forgot password? Click here to reset