Video Captioning with Transferred Semantic Attributes

11/23/2016
by   Yingwei Pan, et al.
0

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNN) to encode video content and Recurrent Neural Networks (RNN) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred Semantic Attributes (LSTM-TSA)---a novel deep architecture that incorporates the transferred semantic attributes learnt from images and videos into the CNN plus RNN framework, by training them in an end-to-end manner. The design of LSTM-TSA is highly inspired by the facts that 1) semantic attributes play a significant contribution to captioning, and 2) images and videos carry complementary semantics and thus can reinforce each other for captioning. To boost video captioning, we propose a novel transfer unit to model the mutually correlated attributes learnt from images and videos. Extensive experiments are conducted on three public datasets, i.e., MSVD, M-VAD and MPII-MD. Our proposed LSTM-TSA achieves to-date the best published performance in sentence generation on MSVD: 52.8 and CIDEr-D. Superior results when compared to state-of-the-art methods are also reported on M-VAD and MPII-MD.

READ FULL TEXT
research
11/05/2016

Boosting Image Captioning with Attributes

Automatically describing an image with a natural language has been an em...
research
05/07/2015

Jointly Modeling Embedding and Translation to Bridge Video and Language

Automatically describing video content with natural language is a fundam...
research
08/17/2017

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Image captioning often requires a large set of training image-sentence p...
research
02/22/2022

Exploiting long-term temporal dynamics for video captioning

Automatically describing videos with natural language is a fundamental c...
research
05/23/2018

CNN+CNN: Convolutional Decoders for Image Captioning

Image captioning is a challenging task that combines the field of comput...
research
08/01/2019

Convolutional Auto-encoding of Sentence Topics for Image Paragraph Generation

Image paragraph generation is the task of producing a coherent story (us...
research
05/03/2019

Temporal Deformable Convolutional Encoder-Decoder Networks for Video Captioning

It is well believed that video captioning is a fundamental but challengi...

Please sign up or login with your details

Forgot password? Click here to reset