Visual Discourse Parsing

03/06/2019
by   Arjun R Akula, et al.
0

Text-level discourse parsing aims to unmask how two segments (or sentences) in the text are related to each other. We propose the task of Visual Discourse Parsing, which requires understanding discourse relations among scenes in a video. Here we use the term scene to refer to a subset of video frames that can better summarize the video. In order to collect a dataset for learning discourse cues from videos, one needs to manually identify the scenes from a large pool of video frames and then annotate the discourse relations between them. This is clearly a time consuming, expensive and tedious task. In this work, we propose an approach to identify discourse cues from the videos without the need to explicitly identify and annotate the scenes. We also present a novel dataset containing 310 videos and the corresponding discourse cues to evaluate our approach. We believe that many of the multi-discipline Artificial Intelligence problems such as Visual Dialog and Visual Storytelling would greatly benefit from the use of visual discourse cues.

READ FULL TEXT
research
01/17/2022

Discourse Analysis for Evaluating Coherence in Video Paragraph Captions

Video paragraph captioning is the task of automatically generating a coh...
research
08/11/2017

Automatic Identification of AltLexes using Monolingual Parallel Corpora

The automatic identification of discourse relations is still a challengi...
research
09/08/2023

RST-style Discourse Parsing Guided by Document-level Content Structures

Rhetorical Structure Theory based Discourse Parsing (RST-DP) explores ho...
research
09/07/2018

Textual Analogy Parsing: What's Shared and What's Compared among Analogous Facts

To understand a sentence like "whereas only 10 below the poverty line, 2...
research
06/28/2015

Unsupervised Semantic Parsing of Video Collections

Human communication typically has an underlying structure. This is refle...
research
08/19/2017

The CLaC Discourse Parser at CoNLL-2016

This paper describes our submission "CLaC" to the CoNLL-2016 shared task...
research
10/16/2022

Motion-Based Weak Supervision for Video Parsing with Application to Colonoscopy

We propose a two-stage unsupervised approach for parsing videos into pha...

Please sign up or login with your details

Forgot password? Click here to reset