DeepAI AI Chat
Log In Sign Up

Imperial College London Submission to VATEX Video Captioning Task

by   Ozan Caglayan, et al.

This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features. We then investigate the effect of dropping the encoder and the attention mechanism and instead conditioning the GRU decoder over two different vectorial representations: (i) a max-pooled action feature vector and (ii) the output of a multi-label classifier trained to predict visual entities from the action features. Our baselines achieved scores comparable to the official baseline. Conditioning over entity predictions performed substantially better than conditioning on the max-pooled feature vector, and only marginally worse than the GRU-based sequence-to-sequence baseline.


page 1

page 2

page 3

page 4


Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Attention-based Transformer models have been increasingly employed for a...

Improving Conditioning in Context-Aware Sequence to Sequence Models

Neural sequence to sequence models are well established for applications...

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularit...

Thinking Hallucination for Video Captioning

With the advent of rich visual representations and pre-trained language ...

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are chal...

TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

In this report, the technical details of our submission to the EPIC-Kitc...