ExpansionNet v2: Block Static Expansion in fast end to end training for Image Captioning

08/13/2022
by   Jia Cheng Hu, et al.
7

Expansion methods explore the possibility of performance bottlenecks in the input length in Deep Learning methods. In this work, we introduce the Block Static Expansion which distributes and processes the input over a heterogeneous and arbitrarily big collection of sequences characterized by a different length compared to the input one. Adopting this method we introduce a model called ExpansionNet v2, which is trained using our novel training strategy, designed to be not only effective but also 6 times faster compared to the standard approach of recent works in Image Captioning. The model achieves the state of art performance over the MS-COCO 2014 captioning challenge with a score of 143.7 CIDEr-D in the offline test split, 140.8 CIDEr-D in the online evaluation server and 72.9 All-CIDEr on the nocaps validation set. Source code available at: https://github.com/jchenghu/ExpansionNet_v2

READ FULL TEXT

page 3

page 5

page 7

page 10

page 11

research
07/07/2022

ExpansionNet: exploring the sequence length bottleneck in the Transformer for Image Captioning

Most recent state of art architectures rely on combinations and variatio...
research
08/19/2019

Attention on Attention for Image Captioning

Attention mechanisms are widely used in current encoder/decoder framewor...
research
08/16/2018

Context-Aware Visual Policy Network for Sequence-Level Image Captioning

Many vision-language tasks can be reduced to the problem of sequence pre...
research
05/20/2023

A request for clarity over the End of Sequence token in the Self-Critical Sequence Training

The Image Captioning research field is currently compromised by the lack...
research
03/09/2020

Deconfounded Image Captioning: A Causal Retrospect

The dataset bias in vision-language tasks is becoming one of the main pr...
research
11/21/2022

Exploring Discrete Diffusion Models for Image Captioning

The image captioning task is typically realized by an auto-regressive me...
research
01/16/2021

Dual-Level Collaborative Transformer for Image Captioning

Descriptive region features extracted by object detection networks have ...

Please sign up or login with your details

Forgot password? Click here to reset