Controllable Video Captioning with an Exemplar Sentence

12/02/2021
by   Yitian Yuan, et al.
0

In this paper, we investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence. Formally, given a video and a syntactically valid exemplar sentence, the task aims to generate one caption which not only describes the semantic contents of the video, but also follows the syntactic form of the given exemplar sentence. In order to tackle such an exemplar-based video captioning task, we propose a novel Syntax Modulated Caption Generator (SMCG) incorporated in an encoder-decoder-reconstructor architecture. The proposed SMCG takes video semantic representation as an input, and conditionally modulates the gates and cells of long short-term memory network with respect to the encoded syntactic information of the given exemplar sentence. Therefore, SMCG is able to control the states for word prediction and achieve the syntax customized caption generation. We conduct experiments by collecting auxiliary exemplar sentences for two public video captioning datasets. Extensive experimental results demonstrate the effectiveness of our approach on generating syntax controllable and semantic preserved video captions. By providing different exemplar sentences, our approach is capable of producing different captions with various syntactic structures, thus indicating a promising way to strengthen the diversity of video captioning.

READ FULL TEXT

page 1

page 8

research
12/02/2021

Syntax Customized Video Captioning by Imitating Exemplar Sentences

Enhancing the diversity of sentences to describe video contents is an im...
research
08/27/2019

Controllable Video Captioning with POS Sequence Guidance Based on Gated Fusion Network

In this paper, we propose to guide the video caption generation with Par...
research
03/30/2018

Reconstruction Network for Video Captioning

In this paper, the problem of describing visual contents of a video sequ...
research
10/16/2021

Self-Annotated Training for Controllable Image Captioning

The Controllable Image Captioning (CIC) task aims to generate captions c...
research
10/10/2019

Controllable Sentence Simplification: Employing Syntactic and Lexical Constraints

Sentence simplification aims to make sentences easier to read and unders...
research
07/27/2020

Decomposed Generation Networks with Structure Prediction for Recipe Generation from Food Images

Recipe generation from food images and ingredients is a challenging task...
research
06/07/2020

NITS-VC System for VATEX Video Captioning Challenge 2020

Video captioning is process of summarising the content, event and action...

Please sign up or login with your details

Forgot password? Click here to reset