ZoDIAC: Zoneout Dropout Injection Attention Calculation

06/28/2022
by   Zanyar Zohourianshahzadi, et al.
0

Recently the use of self-attention has yielded to state-of-the-art results in vision-language tasks such as image captioning as well as natural language understanding and generation (NLU and NLG) tasks and computer vision tasks such as image classification. This is since self-attention maps the internal interactions among the elements of input source and target sequences. Although self-attention successfully calculates the attention values and maps the relationships among the elements of input source and target sequence, yet there is no mechanism to control the intensity of attention. In real world, when communicating with each other face to face or vocally, we tend to express different visual and linguistic context with various amounts of intensity. Some words might carry (be spoken with) more stress and weight indicating the importance of that word in the context of the whole sentence. Based on this intuition, we propose Zoneout Dropout Injection Attention Calculation (ZoDIAC) in which the intensities of attention values in the elements of the input sequence are calculated with respect to the context of the elements of input sequence. The results of our experiments reveal that employing ZoDIAC leads to better performance in comparison with the self-attention module in the Transformer model. The ultimate goal is to find out if we could modify self-attention module in the Transformer model with a method that is potentially extensible to other models that leverage on self-attention at their core. Our findings suggest that this particular goal deserves further attention and investigation by the research community. The code for ZoDIAC is available on www.github.com/zanyarz/zodiac .

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2020

A Survey on Visual Transformer

Transformer is a type of deep neural network mainly based on self-attent...
research
04/23/2022

Grad-SAM: Explaining Transformers via Gradient Self-Attention Maps

Transformer-based language models significantly advanced the state-of-th...
research
01/05/2022

Synthesizing Tensor Transformations for Visual Self-attention

Self-attention shows outstanding competence in capturing long-range rela...
research
01/05/2023

Adaptively Clustering Neighbor Elements for Image Captioning

We design a novel global-local Transformer named Ada-ClustFormer (ACF) t...
research
08/04/2022

DropKey

In this paper, we focus on analyzing and improving the dropout technique...
research
08/09/2019

VisualBERT: A Simple and Performant Baseline for Vision and Language

We propose VisualBERT, a simple and flexible framework for modeling a br...
research
08/24/2021

Auto-Parsing Network for Image Captioning and Visual Question Answering

We propose an Auto-Parsing Network (APN) to discover and exploit the inp...

Please sign up or login with your details

Forgot password? Click here to reset