CaMEL: Mean Teacher Learning for Image Captioning

02/21/2022
by   Manuele Barraco, et al.
9

Describing images in natural language is a fundamental step towards the automatic modeling of connections between the visual and textual modalities. In this paper we present CaMEL, a novel Transformer-based architecture for image captioning. Our proposed approach leverages the interaction of two interconnected language models that learn from each other during the training phase. The interplay between the two language models follows a mean teacher learning paradigm with knowledge distillation. Experimentally, we assess the effectiveness of the proposed solution on the COCO dataset and in conjunction with different visual feature extractors. When comparing with existing proposals, we demonstrate that our model provides state-of-the-art caption quality with a significantly reduced number of parameters. According to the CIDEr metric, we obtain a new state of the art on COCO when training without using external data. The source code and trained models are publicly available at: https://github.com/aimagelab/camel.

READ FULL TEXT
research
02/11/2022

ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

Recent research that applies Transformer-based architectures to image ca...
research
12/17/2019

M^2: Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequen...
research
07/22/2022

Efficient Modeling of Future Context for Image Captioning

Existing approaches to image captioning usually generate the sentence wo...
research
07/26/2022

Retrieval-Augmented Transformer for Image Captioning

Image captioning models aim at connecting Vision and Language by providi...
research
08/23/2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Image captioning, like many tasks involving vision and language, current...
research
03/27/2018

Neural Baby Talk

We introduce a novel framework for image captioning that can produce nat...
research
11/17/2022

Progressive Tree-Structured Prototype Network for End-to-End Image Captioning

Studies of image captioning are shifting towards a trend of a fully end-...

Please sign up or login with your details

Forgot password? Click here to reset