Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

08/17/2016
by   Rakshith Shetty, et al.
0

We present our submission to the Microsoft Video to Language Challenge of generating short captions describing videos in the challenge dataset. Our model is based on the encoder--decoder pipeline, popular in image and video captioning systems. We propose to utilize two different kinds of video features, one to capture the video content in terms of objects and attributes, and the other to capture the motion and action information. Using these diverse features we train models specializing in two separate input sub-domains. We then train an evaluator model which is used to pick the best caption from the pool of candidates generated by these domain expert models. We argue that this approach is better suited for the current video captioning task, compared to using a single model, due to the diversity in the dataset. Efficacy of our method is proven by the fact that it was rated best in MSR Video to Language Challenge, as per human evaluation. Additionally, we were ranked second in the automatic evaluation metrics based table.

READ FULL TEXT
research
11/22/2022

Aligning Source Visual and Target Language Domains for Unpaired Video Captioning

Training supervised video captioning model requires coupled video-captio...
research
12/09/2015

Video captioning with recurrent networks based on frame- and video-level features and visual content classification

In this paper, we describe the system for generating textual description...
research
03/05/2018

Less Is More: Picking Informative Frames for Video Captioning

In video captioning task, the best practice has been achieved by attenti...
research
10/13/2021

CLIP4Caption: CLIP for Video Caption

Video captioning is a challenging task since it requires generating sent...
research
01/04/2022

Variational Stacked Local Attention Networks for Diverse Video Captioning

While describing Spatio-temporal events in natural language, video capti...
research
04/12/2022

Video Captioning: a comparative review of where we are and which could be the route

Video captioning is the process of describing the content of a sequence ...

Please sign up or login with your details

Forgot password? Click here to reset