Delving Deeper into the Decoder for Video Captioning

01/16/2020
by   Haoran Chen, et al.
7

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there still exist some non-negligible problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called professional learning is proposed which develops the strong points of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7 state-of-the-art models.

READ FULL TEXT

page 4

page 6

research
12/20/2020

Guidance Module Network for Video Captioning

Video captioning has been a challenging and significant task that descri...
research
03/30/2018

Reconstruction Network for Video Captioning

In this paper, the problem of describing visual contents of a video sequ...
research
02/21/2020

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance i...
research
11/21/2019

Empirical Autopsy of Deep Video Captioning Frameworks

Contemporary deep learning based video captioning follows encoder-decode...
research
06/03/2019

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

In this paper, the problem of describing visual contents of a video sequ...
research
08/08/2017

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

Video captioning in essential is a complex natural process, which is aff...
research
10/11/2021

CLIP4Caption ++: Multi-CLIP for Video Caption

This report describes our solution to the VALUE Challenge 2021 in the ca...

Please sign up or login with your details

Forgot password? Click here to reset