Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering

05/14/2023
by   Chenyang Lyu, et al.
0

Event-Level Video Question Answering (EVQA) requires complex reasoning across video events to obtain the visual information needed to provide optimal answers. However, despite significant progress in model performance, few studies have focused on using the explicit semantic connections between the question and visual information especially at the event level. There is need for using such semantic connections to facilitate complex reasoning across video frames. Therefore, we propose a semantic-aware dynamic retrospective-prospective reasoning approach for video-based question answering. Specifically, we explicitly use the Semantic Role Labeling (SRL) structure of the question in the dynamic reasoning process where we decide to move to the next frame based on which part of the SRL structure (agent, verb, patient, etc.) of the question is being focused on. We conduct experiments on a benchmark EVQA dataset - TrafficQA. Results show that our proposed approach achieves superior performance compared to previous state-of-the-art models. Our code will be made publicly available for research use.

READ FULL TEXT
research
07/20/2017

Video Question Answering via Attribute-Augmented Attention Network Learning

Video Question Answering is a challenging problem in visual information ...
research
05/16/2023

Is a Video worth n× n Images? A Highly Efficient Approach to Transformer-based Video Question Answering

Conventional Transformer-based Video Question Answering (VideoQA) approa...
research
03/29/2021

TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic Events

Traffic event cognition and reasoning in videos is an important task tha...
research
03/06/2023

Confidence-based Event-centric Online Video Question Answering on a Newly Constructed ATBS Dataset

Deep neural networks facilitate video question answering (VideoQA), but ...
research
12/12/2021

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering

Video question answering requires the models to understand and reason ab...
research
11/23/2016

A dataset and exploration of models for understanding video data through fill-in-the-blank question-answering

While deep convolutional neural networks frequently approach or exceed h...
research
05/21/2022

NS3: Neuro-Symbolic Semantic Code Search

Semantic code search is the task of retrieving a code snippet given a te...

Please sign up or login with your details

Forgot password? Click here to reset