Reconstruction Network for Video Captioning

03/30/2018
by   Bairui Wang, et al.
0

In this paper, the problem of describing visual contents of a video sequence with natural language is addressed. Unlike previous video captioning work mainly exploiting the cues of video contents to make a language description, we propose a reconstruction network (RecNet) with a novel encoder-decoder-reconstructor architecture, which leverages both the forward (video to sentence) and backward (sentence to video) flows for video captioning. Specifically, the encoder-decoder makes use of the forward flow to produce the sentence description based on the encoded video semantic features. Two types of reconstructors are customized to employ the backward flow and reproduce the video features based on the hidden state sequence generated by the decoder. The generation loss yielded by the encoder-decoder and the reconstruction loss introduced by the reconstructor are jointly drawn into training the proposed RecNet in an end-to-end fashion. Experimental results on benchmark datasets demonstrate that the proposed reconstructor can boost the encoder-decoder models and leads to significant gains in video caption accuracy.

READ FULL TEXT
research
06/03/2019

Reconstruct and Represent Video Contents for Captioning via Reinforcement Learning

In this paper, the problem of describing visual contents of a video sequ...
research
12/20/2020

Guidance Module Network for Video Captioning

Video captioning has been a challenging and significant task that descri...
research
04/04/2019

An End-to-End Baseline for Video Captioning

Building correspondences across different modalities, such as video and ...
research
01/16/2020

Delving Deeper into the Decoder for Video Captioning

Video captioning is an advanced multi-modal task which aims to describe ...
research
11/21/2019

Empirical Autopsy of Deep Video Captioning Frameworks

Contemporary deep learning based video captioning follows encoder-decode...
research
10/11/2021

CLIP4Caption ++: Multi-CLIP for Video Caption

This report describes our solution to the VALUE Challenge 2021 in the ca...
research
12/02/2021

Controllable Video Captioning with an Exemplar Sentence

In this paper, we investigate a novel and challenging task, namely contr...

Please sign up or login with your details

Forgot password? Click here to reset