DeepAI AI Chat
Log In Sign Up

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos

by   Shizhe Chen, et al.
Renmin University of China
Carnegie Mellon University

Contextual reasoning is essential to understand events in long untrimmed videos. In this work, we systematically explore different captioning models with various contexts for the dense-captioning events in video task, which aims to generate captions for different events in the untrimmed video. We propose five types of contexts as well as two categories of event captioning models, and evaluate their contributions for event captioning from both accuracy and diversity aspects. The proposed captioning models are plugged into our pipeline system for the dense video captioning challenge. The overall system achieves the state-of-the-art performance on the dense-captioning events in video task with 9.91 METEOR score on the challenge testing set.


page 1

page 2

page 3

page 4


RUC+CMU: System Report for Dense Captioning Events in Videos

This notebook paper presents our system in the ActivityNet Dense Caption...

Dense-Captioning Events in Videos

Most natural videos contain numerous events. For example, in a video of ...

Towards Diverse Paragraph Captioning for Untrimmed Videos

Video paragraph captioning aims to describe multiple events in untrimmed...

Semantic Metadata Extraction from Dense Video Captioning

Annotation of multimedia data by humans is time-consuming and costly, wh...

Multimodal Pretraining for Dense Video Captioning

Learning specific hands-on skills such as cooking, car maintenance, and ...

SAVCHOI: Detecting Suspicious Activities using Dense Video Captioning with Human Object Interactions

Detecting suspicious activities in surveillance videos has been a longst...