Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

06/21/2020
by   Teng Wang, et al.
0

This technical report presents a brief description of our submission to the dense video captioning task of ActivityNet Challenge 2020. Our approach follows a two-stage pipeline: first, we extract a set of temporal event proposals; then we propose a multi-event captioning model to capture the event-level temporal relationships and effectively fuse the multi-modal information. Our approach achieves a 9.28 METEOR score on the test set.

READ FULL TEXT
research
06/14/2020

Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

Detecting meaningful events in an untrimmed video is essential for dense...
research
04/13/2022

Semantic-Aware Pretraining for Dense Video Captioning

This report describes the details of our approach for the event dense-ca...
research
10/15/2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

This notebook paper presents our model in the VATEX video captioning cha...
research
10/13/2019

VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning

Multi-modal information is essential to describe what has happened in a ...
research
12/10/2018

Weakly Supervised Dense Event Captioning in Videos

Dense event captioning aims to detect and describe all events of interes...
research
06/25/2018

Best Vision Technologies Submission to ActivityNet Challenge 2018-Task: Dense-Captioning Events in Videos

This note describes the details of our solution to the dense-captioning ...
research
06/05/2020

Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

This report describes our model for VATEX Captioning Challenge 2020. Fir...

Please sign up or login with your details

Forgot password? Click here to reset