Weakly Supervised Dense Event Captioning in Videos

12/10/2018
by   Xuguang Duan, et al.
0

Dense event captioning aims to detect and describe all events of interest contained in a video. Despite the advanced development in this area, existing methods tackle this task by making use of dense temporal annotations, which is dramatically source-consuming. This paper formulates a new problem: weakly supervised dense event captioning, which does not require temporal segment annotations for model training. Our solution is based on the one-to-one correspondence assumption, each caption describes one temporal segment, and each temporal segment has one caption, which holds in current benchmark datasets and most real-world cases. We decompose the problem into a pair of dual problems: event captioning and sentence localization and present a cycle system to train our model. Extensive experimental results are provided to demonstrate the ability of our model on both dense event captioning and sentence localization in videos.

READ FULL TEXT
research
06/25/2018

Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos

This note describes the details of our solution to the dense-captioning ...
research
04/05/2017

Weakly Supervised Dense Video Captioning

This paper focuses on a novel and challenging vision task, dense video c...
research
06/21/2020

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

This technical report presents a brief description of our submission to ...
research
08/31/2019

WSLLN: Weakly Supervised Natural Language Localization Networks

We propose weakly supervised language localization networks (WSLLN) to d...
research
02/27/2023

Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

In this work, we introduce Vid2Seq, a multi-modal single-stage dense eve...
research
05/18/2021

Weakly Supervised Dense Video Captioning via Jointly Usage of Knowledge Distillation and Cross-modal Matching

This paper proposes an approach to Dense Video Captioning (DVC) without ...
research
07/28/2020

Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos

Automatically generating sentences to describe events and temporally loc...

Please sign up or login with your details

Forgot password? Click here to reset