
-
The Role of Syntactic Planning in Compositional Image Captioning
Image captioning has focused on generalizing to images drawn from the sa...
read it
-
Multimodal Pretraining Unmasked: Unifying the Vision and Language BERTs
Large-scale pretraining and task-specific fine-tuning is now the standar...
read it
-
Multimodal Speech Recognition with Unstructured Audio Masking
Visual context has been shown to be useful for automatic speech recognit...
read it
-
Textual Supervision for Visually Grounded Spoken Language Understanding
Visually-grounded models of spoken language understanding extract semant...
read it
-
Fine-Grained Grounding for Multimodal Speech Recognition
Multimodal automatic speech recognition systems integrate information fr...
read it
-
CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning
Approaches to Grounded Language Learning typically focus on a single tas...
read it
-
The Sensitivity of Language Models and Humans to Winograd Schema Perturbations
Large-scale pretrained language models are the major driving force behin...
read it
-
Multimodal Machine Translation through Visuals and Speech
Multimodal machine translation involves drawing information from more th...
read it
-
Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning
Recent work has highlighted the advantage of jointly learning grounded s...
read it
-
Compositional Generalization in Image Captioning
Image captioning models are usually evaluated on their ability to descri...
read it
-
Cross-lingual Visual Verb Sense Disambiguation
Recent work has shown that visual context improves cross-lingual sense d...
read it
-
How2: A Large-scale Dataset for Multimodal Language Understanding
In this paper, we introduce How2, a multimodal collection of instruction...
read it
-
Lessons learned in multilingual grounded language learning
Recent work has shown how to learn better visual-semantic embeddings by ...
read it
-
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description
We present the results from the second shared task on multimodal machine...
read it
-
Cross-linguistic differences and similarities in image descriptions
Automatic image description systems are commonly trained and evaluated o...
read it
-
Imagination improves Multimodal Translation
We decompose multimodal translation into two sub-tasks: learning to tran...
read it
-
Room for improvement in automatic image description: an error analysis
In recent years we have seen rapid and significant progress in automatic...
read it
-
Pragmatic factors in image description: the case of negations
We provide a qualitative analysis of the descriptions containing negatio...
read it
-
Multi30K: Multilingual English-German Image Descriptions
We introduce the Multi30K dataset to stimulate multilingual multimodal r...
read it
-
Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures
Automatic description generation from natural images is a challenging pr...
read it
-
Multilingual Image Description with Neural Sequence Models
In this paper we present an approach to multi-language image description...
read it