Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

06/05/2020
by   Ke Lin, et al.
0

This report describes our model for VATEX Captioning Challenge 2020. First, to gather information from multiple domains, we extract motion, appearance, semantic and audio features. Then we design a feature attention module to attend on different feature when decoding. We apply two types of decoders, top-down and X-LAN and ensemble these models to get the final result. The proposed method outperforms official baseline with a significant gap. We achieve 76.0 CIDEr and 50.0 CIDEr on English and Chinese private test set. We rank 2nd on both English and Chinese private test leaderboard.

READ FULL TEXT
research
10/13/2019

VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning

Multi-modal information is essential to describe what has happened in a ...
research
10/17/2019

Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019

This document describes our solution for the VATEX Captioning Challenge ...
research
04/13/2022

Semantic-Aware Pretraining for Dense Video Captioning

This report describes the details of our approach for the event dense-ca...
research
06/21/2020

Dense-Captioning Events in Videos: SYSU Submission to ActivityNet Challenge 2020

This technical report presents a brief description of our submission to ...
research
11/18/2022

AVATAR submission to the Ego4D AV Transcription Challenge

In this report, we describe our submission to the Ego4D AudioVisual (AV)...
research
05/12/2020

The IOA System for Deep Noise Suppression Challenge using a Framework Combining Dynamic Attention and Recursive Learning

This technical report describes our system that is submitted to the Deep...
research
01/23/2018

Comparison Training for Computer Chinese Chess

This paper describes the application of comparison training (CT) for aut...

Please sign up or login with your details

Forgot password? Click here to reset