ExpansionNet: exploring the sequence length bottleneck in the Transformer for Image Captioning

07/07/2022
by   Jia Cheng Hu, et al.
4

Most recent state of art architectures rely on combinations and variations of three approaches: convolutional, recurrent and self-attentive methods. Our work attempts in laying the basis for a new research direction for sequence modeling based upon the idea of modifying the sequence length. In order to do that, we propose a new method called “Expansion Mechanism” which transforms either dynamically or statically the input sequence into a new one featuring a different sequence length. Furthermore, we introduce a novel architecture that exploits such method and achieves competitive performances on the MS-COCO 2014 data set, yielding 134.6 and 131.4 CIDEr-D on the Karpathy test split in the ensemble and single model configuration respectively and 130 CIDEr-D in the official online testing server, despite being neither recurrent nor fully attentive. At the same time we address the efficiency aspect in our design and introduce a convenient training strategy suitable for most computational resources in contrast to the standard one. Source code is available at https://github.com/jchenghu/ExpansionNet

READ FULL TEXT

page 6

page 7

page 9

page 12

page 13

research
08/13/2022

ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

Expansion methods explore the possibility of performance bottlenecks in ...
research
12/17/2019

M^2: Meshed-Memory Transformer for Image Captioning

Transformer-based architectures represent the state of the art in sequen...
research
01/16/2021

Dual-Level Collaborative Transformer for Image Captioning

Descriptive region features extracted by object detection networks have ...
research
08/05/2021

Dual Graph Convolutional Networks with Transformer and Curriculum Learning for Image Captioning

Existing image captioning methods just focus on understanding the relati...
research
11/21/2022

Exploring Discrete Diffusion Models for Image Captioning

The image captioning task is typically realized by an auto-regressive me...
research
05/20/2023

A request for clarity over the End of Sequence token in the Self-Critical Sequence Training

The Image Captioning research field is currently compromised by the lack...
research
09/04/2023

Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Trendy suggestions for learning-based elastic warps enable the deep imag...

Please sign up or login with your details

Forgot password? Click here to reset