Imperial College London Submission to VATEX Video Captioning Task

10/16/2019
by   Ozan Caglayan, et al.
0

This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features. We then investigate the effect of dropping the encoder and the attention mechanism and instead conditioning the GRU decoder over two different vectorial representations: (i) a max-pooled action feature vector and (ii) the output of a multi-label classifier trained to predict visual entities from the action features. Our baselines achieved scores comparable to the official baseline. Conditioning over entity predictions performed substantially better than conditioning on the max-pooled feature vector, and only marginally worse than the GRU-based sequence-to-sequence baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/07/2021

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Attention-based Transformer models have been increasingly employed for a...
research
11/21/2019

Improving Conditioning in Context-Aware Sequence to Sequence Models

Neural sequence to sequence models are well established for applications...
research
09/27/2018

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularit...
research
10/07/2019

Human Action Sequence Classification

This paper classifies human action sequences from videos using a machine...
research
10/28/2018

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are chal...
research
04/24/2017

Multi-Task Video Captioning with Video and Entailment Generation

Video captioning, the task of describing the content of a video, has see...

Please sign up or login with your details

Forgot password? Click here to reset