ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning

02/11/2022
by   Jia Huei Tan, et al.
12

Recent research that applies Transformer-based architectures to image captioning has resulted in state-of-the-art image captioning performance, capitalising on the success of Transformers on natural language tasks. Unfortunately, though these models work well, one major flaw is their large model sizes. To this end, we present three parameter reduction methods for image captioning Transformers: Radix Encoding, cross-layer parameter sharing, and attention parameter sharing. By combining these methods, our proposed ACORT models have 3.7x to 21.6x fewer parameters than the baseline model without compromising test performance. Results on the MS-COCO dataset demonstrate that our ACORT models are competitive against baselines and SOTA approaches, with CIDEr score >=126. Finally, we present qualitative results and ablation studies to demonstrate the efficacy of the proposed changes further. Code and pre-trained models are publicly available at https://github.com/jiahuei/sparse-image-captioning.

READ FULL TEXT
research
10/07/2021

End-to-End Supermask Pruning: Learning to Prune Image Captioning Models

With the advancement of deep models, research work on image captioning h...
research
12/17/2019

M^2: Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequen...
research
02/21/2022

CaMEL: Mean Teacher Learning for Image Captioning

Describing images in natural language is a fundamental step towards the ...
research
08/23/2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Image captioning, like many tasks involving vision and language, current...
research
09/28/2022

Medical Image Captioning via Generative Pretrained Transformers

The automatic clinical caption generation problem is referred to as prop...
research
05/05/2022

Understanding Transfer Learning for Chest Radiograph Clinical Report Generation with Modified Transformer Architectures

The image captioning task is increasingly prevalent in artificial intell...

Please sign up or login with your details

Forgot password? Click here to reset