Characterizing the impact of using features extracted from pre-trained models on the quality of video captioning sequence-to-sequence models

11/22/2019
by   Menatallh Hammad, et al.
0

The task of video captioning, that is, the automatic generation of sentences describing a sequence of actions in a video, has attracted an increasing attention recently. The complex and high-dimensional representation of video data makes it difficult for a typical encoder-decoder architectures to recognize relevant features and encode them in a proper format. Video data contains different modalities that can be recognized using a mix image, scene, action and audio features. In this paper, we characterize the different features affecting video descriptions and explore the interactions among these features and how they affect the final quality of a video representation. Building on existing encoder-decoder models that utilize limited range of video information, our comparisons show how the inclusion of multi-modal video features can make a significant effect on improving the quality of generated statements. The work is of special interest to scientists and practitioners who are using sequence-to-sequence models to generate video captions.

READ FULL TEXT
research
04/18/2022

Automated Audio Captioning using Audio Event Clues

Audio captioning is an important research area that aims to generate mea...
research
10/16/2019

Imperial College London Submission to VATEX Video Captioning Task

This paper describes the Imperial College London team's submission to th...
research
05/30/2023

Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning

Automated audio captioning (AAC) which generates textual descriptions of...
research
10/10/2022

Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Automated audio captioning (AAC) aims to describe the content of an audi...
research
08/17/2016

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

We present our submission to the Microsoft Video to Language Challenge o...
research
09/28/2022

Thinking Hallucination for Video Captioning

With the advent of rich visual representations and pre-trained language ...
research
08/08/2017

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning

Video captioning in essential is a complex natural process, which is aff...

Please sign up or login with your details

Forgot password? Click here to reset