Multi-View Features and Hybrid Reward Strategies for Vatex Video Captioning Challenge 2019

10/17/2019
by   Xinxin Zhu, et al.
0

This document describes our solution for the VATEX Captioning Challenge 2019, which requires generating descriptions for the videos in both English and Chinese languages. We identified three crucial factors that improve the performance, namely: multi-view features, hybrid reward, and diverse ensemble. Our method achieves the 2nd and the 3rd places on the Chinese and English video captioning tracks, respectively.

READ FULL TEXT
research
10/13/2019

VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning

Multi-modal information is essential to describe what has happened in a ...
research
06/05/2020

Multi-modal Feature Fusion with Feature Attention for VATEX Captioning Challenge 2020

This report describes our model for VATEX Captioning Challenge 2020. Fir...
research
04/06/2019

VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

We present a new large-scale multilingual video description dataset, VAT...
research
04/13/2022

Semantic-Aware Pretraining for Dense Video Captioning

This report describes the details of our approach for the event dense-ca...
research
05/11/2021

A Comparison of Multi-View Learning Strategies for Satellite Image-Based Real Estate Appraisal

In the house credit process, banks and lenders rely on a fast and accura...
research
10/15/2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019

This notebook paper presents our model in the VATEX video captioning cha...

Please sign up or login with your details

Forgot password? Click here to reset