Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering

08/11/2021
by   Donggeon Lee, et al.
0

Video question answering has recently received a lot of attention from multimodal video researchers. Most video question answering datasets are usually in the form of multiple-choice. But, the model for the multiple-choice task does not infer the answer. Rather it compares the answer candidates for picking the correct answer. Furthermore, it makes it difficult to extend to other tasks. In this paper, we challenge the existing multiple-choice video question answering by changing it to open-ended video question answering. To tackle open-ended question answering, we use the pretrained GPT2 model. The model is fine-tuned with video inputs and subtitles. An ablation study is performed by changing the existing DramaQA dataset to an open-ended question answering, and it shows that performance can be improved using video metadata.

READ FULL TEXT
research
07/20/2017

Video Question Answering via Attribute-Augmented Attention Network Learning

Video Question Answering is a challenging problem in visual information ...
research
05/24/2023

Extracting Psychological Indicators Using Question Answering

In this work, we propose a method for extracting text spans that may ind...
research
04/25/2020

MCQA: Multimodal Co-attention Based Network for Question Answering

We present MCQA, a learning-based algorithm for multimodal question answ...
research
09/27/2021

Context-guided Triple Matching for Multiple Choice Question Answering

The task of multiple choice question answering (MCQA) refers to identify...
research
10/08/2021

A Few More Examples May Be Worth Billions of Parameters

We investigate the dynamics of increasing the number of model parameters...
research
07/18/2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Circuit analysis is a promising technique for understanding the internal...
research
11/29/2022

PiggyBack: Pretrained Visual Question Answering Environment for Backing up Non-deep Learning Professionals

We propose a PiggyBack, a Visual Question Answering platform that allows...

Please sign up or login with your details

Forgot password? Click here to reset