DeepAI AI Chat
Log In Sign Up

Imperial College London Submission to VATEX Video Captioning Task

10/16/2019
by   Ozan Caglayan, et al.
0

This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features. We then investigate the effect of dropping the encoder and the attention mechanism and instead conditioning the GRU decoder over two different vectorial representations: (i) a max-pooled action feature vector and (ii) the output of a multi-label classifier trained to predict visual entities from the action features. Our baselines achieved scores comparable to the official baseline. Conditioning over entity predictions performed substantially better than conditioning on the max-pooled feature vector, and only marginally worse than the GRU-based sequence-to-sequence baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

11/07/2021

Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Attention-based Transformer models have been increasingly employed for a...
11/21/2019

Improving Conditioning in Context-Aware Sequence to Sequence Models

Neural sequence to sequence models are well established for applications...
09/27/2018

Vector Learning for Cross Domain Representations

Recently, generative adversarial networks have gained a lot of popularit...
09/28/2022

Thinking Hallucination for Video Captioning

With the advent of rich visual representations and pre-trained language ...
10/28/2018

Middle-Out Decoding

Despite being virtually ubiquitous, sequence-to-sequence models are chal...
07/28/2021

TransAction: ICL-SJTU Submission to EPIC-Kitchens Action Anticipation Challenge 2021

In this report, the technical details of our submission to the EPIC-Kitc...