DeepAI
Log In Sign Up

Delving Deeper into the Decoder for Video Captioning

01/16/2020
by   Haoran Chen, et al.
7

Video captioning is an advanced multi-modal task which aims to describe a video clip using a natural language sentence. The encoder-decoder framework is the most popular paradigm for this task in recent years. However, there still exist some non-negligible problems in the decoder of a video captioning model. We make a thorough investigation into the decoder and adopt three techniques to improve the performance of the model. First of all, a combination of variational dropout and layer normalization is embedded into a recurrent unit to alleviate the problem of overfitting. Secondly, a new method is proposed to evaluate the performance of a model on a validation set so as to select the best checkpoint for testing. Finally, a new training strategy called professional learning is proposed which develops the strong points of a captioning model and bypasses its weaknesses. It is demonstrated in the experiments on Microsoft Research Video Description Corpus (MSVD) and MSR-Video to Text (MSR-VTT) datasets that our model has achieved the best results evaluated by BLEU, CIDEr, METEOR and ROUGE-L metrics with significant gains of up to 11.7 state-of-the-art models.

READ FULL TEXT

page 4

page 6

12/20/2020

Guidance Module Network for Video Captioning

Video captioning has been a challenging and significant task that descri...
03/30/2018

Reconstruction Network for Video Captioning

In this paper, the problem of describing visual contents of a video sequ...
02/21/2020

Image to Language Understanding: Captioning approach

Extracting context from visual representations is of utmost importance i...
11/21/2019

Empirical Autopsy of Deep Video Captioning Frameworks

Contemporary deep learning based video captioning follows encoder-decode...
10/11/2021

CLIP4Caption ++: Multi-CLIP for Video Caption

This report describes our solution to the VALUE Challenge 2021 in the ca...
04/12/2022

Video Captioning: a comparative review of where we are and which could be the route

Video captioning is the process of describing the content of a sequence ...
08/08/2017

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

Video captioning in essential is a complex natural process, which is aff...

Code Repositories

delving-deeper-into-the-decoder-for-video-captioning

Source code for Delving Deeper into the Decoder for Video Captioning


view repo