
-
Seeing past words: Testing the cross-modal capabilities of pretrained V L models
We investigate the ability of general-purpose pretrained vision and lang...
read it
-
A Study on the Autoregressive and non-Autoregressive Multi-label Learning
Extreme classification tasks are multi-label tasks with an extremely lar...
read it
-
Are scene graphs good enough to improve Image Captioning?
Many top-performing image captioning models rely solely on object featur...
read it
-
ImagiFilter: A resource to enable the semi-automatic mining of images at scale
Datasets (semi-)automatically collected from the web can easily scale to...
read it
-
VisualSem: a high-quality knowledge graph for vision and language
We argue that the next frontier in natural language understanding (NLU) ...
read it
-
English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too
Intermediate-task training has been shown to substantially improve pretr...
read it
-
Latent Visual Cues for Neural Machine Translation
In this work, we propose to model the interaction between visual and tex...
read it
-
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation
We introduce a Multi-modal Neural Machine Translation model in which a d...
read it
-
Multilingual Multi-modal Embeddings for Natural Language Processing
We propose a novel discriminative model that learns embeddings from mult...
read it
-
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
We introduce multi-modal, attention-based neural machine translation (NM...
read it