Semantic-Aware Pretraining for Dense Video Captioning

04/13/2022
by   Teng Wang, et al.
6

This report describes the details of our approach for the event dense-captioning task in ActivityNet Challenge 2021. We present a semantic-aware pretraining method for dense video captioning, which empowers the learned features to recognize high-level semantic concepts. Diverse video features of different modalities are fed into an event captioning module to generate accurate and meaningful sentences. Our final ensemble model achieves a 10.00 METEOR score on the test set.

READ FULL TEXT

page 1

page 2

page 3

research
06/21/2020

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

This technical report presents a brief description of our submission to ...
research
06/14/2020

Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning

Detecting meaningful events in an untrimmed video is essential for dense...
research
11/21/2016

Dense Captioning with Joint Inference and Visual Context

Dense captioning is a newly emerging computer vision topic for understan...
research
06/05/2020

Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

This report describes our model for VATEX Captioning Challenge 2020. Fir...
research
08/31/2019

A Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling

Given the features of a video, recurrent neural network can be used to a...
research
11/10/2020

Multimodal Pretraining for Dense Video Captioning

Learning specific hands-on skills such as cooking, car maintenance, and ...
research
10/17/2019

Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019

This document describes our solution for the VATEX Captioning Challenge ...

Please sign up or login with your details

Forgot password? Click here to reset