VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks

04/16/2021
by   Hung Le, et al.
0

Neural module networks (NMN) have achieved success in image-grounded tasks such as Visual Question Answering (VQA) on synthetic images. However, very limited work on NMN has been studied in the video-grounded language tasks. These tasks extend the complexity of traditional visual tasks with the additional visual temporal variance. Motivated by recent NMN approaches on image-grounded tasks, we introduce Video-grounded Neural Module Network (VGNMN) to model the information retrieval process in video-grounded language tasks as a pipeline of neural modules. VGNMN first decomposes all language components to explicitly resolve any entity references and detect corresponding action-based inputs from the question. The detected entities and actions are used as parameters to instantiate neural module networks and extract visual cues from the video. Our experiments show that VGNMN can achieve promising performance on two video-grounded language tasks: video QA and video-grounded dialogues.

READ FULL TEXT

page 2

page 5

page 8

page 15

research
11/26/2018

Visual Entailment Task for Visually-Grounded Language Learning

We introduce a new inference task - Visual Entailment (VE) - which diffe...
research
06/12/2018

iParaphrasing: Extracting Visually Grounded Paraphrases via an Image

A paraphrase is a restatement of the meaning of a text in other words. P...
research
12/28/2019

All-in-One Image-Grounded Conversational Agents

As single-task accuracy on individual language and image tasks has impro...
research
10/07/2020

Towards Understanding Sample Variance in Visually Grounded Language Generation: Evaluations and Observations

A major challenge in visually grounded language generation is to build r...
research
09/04/2023

Can I Trust Your Answer? Visually Grounded Video Question Answering

We study visually grounded VideoQA in response to the emerging trends of...
research
10/26/2017

Understanding Grounded Language Learning Agents

Neural network-based systems can now learn to locate the referents of wo...
research
07/20/2021

Neural Variational Learning for Grounded Language Acquisition

We propose a learning system in which language is grounded in visual per...

Please sign up or login with your details

Forgot password? Click here to reset