Video Question Answering: Datasets, Algorithms and Challenges

03/02/2022
by   Yaoyao Zhong, et al.
0

Video Question Answering (VideoQA) aims to answer natural language questions according to the given videos. It has earned increasing attention with recent research trends in joint vision and language understanding. Yet, compared with ImageQA, VideoQA is largely underexplored and progresses slowly. Although different algorithms have continually been proposed and shown success on different VideoQA datasets, we find that there lacks a meaningful survey to categorize them, which seriously impedes its advancements. This paper thus provides a clear taxonomy and comprehensive analyses to VideoQA, focusing on the datasets, algorithms, and unique challenges. We then point out the research trend of studying beyond factoid QA to inference QA towards the cognition of video contents, Finally, we conclude some promising directions for future exploration.

READ FULL TEXT
research
05/03/2017

The Forgettable-Watcher Model for Video Question Answering

A number of visual question answering approaches have been proposed rece...
research
05/19/2021

Geographic Question Answering: Challenges, Uniqueness, Classification, and Future Directions

As an important part of Artificial Intelligence (AI), Question Answering...
research
02/16/2023

Product Question Answering in E-Commerce: A Survey

Product question answering (PQA), aiming to automatically provide instan...
research
09/09/2018

Transforming Question Answering Datasets Into Natural Language Inference Datasets

Existing datasets for natural language inference (NLI) have propelled re...
research
04/08/2021

Video Question Answering with Phrases via Semantic Roles

Video Question Answering (VidQA) evaluation metrics have been limited to...
research
10/05/2022

Locate before Answering: Answer Guided Question Localization for Video Question Answering

Video question answering (VideoQA) is an essential task in vision-langua...
research
11/08/2022

Toward a Neural Semantic Parsing System for EHR Question Answering

Clinical semantic parsing (SP) is an important step toward identifying t...

Please sign up or login with your details

Forgot password? Click here to reset