Auto-Parsing Network for Image Captioning and Visual Question Answering

08/24/2021
by   Xu Yang, et al.
0

We propose an Auto-Parsing Network (APN) to discover and exploit the input data's hidden tree structures for improving the effectiveness of the Transformer-based vision-language systems. Specifically, we impose a Probabilistic Graphical Model (PGM) parameterized by the attention operations on each self-attention layer to incorporate sparse assumption. We use this PGM to softly segment an input sequence into a few clusters where each cluster can be treated as the parent of the inside entities. By stacking these PGM constrained self-attention layers, the clusters in a lower layer compose into a new sequence, and the PGM in a higher layer will further segment this sequence. Iteratively, a sparse tree can be implicitly parsed, and this tree's hierarchical knowledge is incorporated into the transformed embeddings, which can be used for solving the target vision-language tasks. Specifically, we showcase that our APN can strengthen Transformer based networks in two major vision-language tasks: Captioning and Visual Question Answering. Also, a PGM probability-based parsing algorithm is developed by which we can discover what the hidden structure of input is during the inference.

READ FULL TEXT
research
03/19/2020

Normalized and Geometry-Aware Self-Attention Network for Image Captioning

Self-attention (SA) network has shown profound value in image captioning...
research
09/24/2019

Unified Vision-Language Pre-Training for Image Captioning and VQA

This paper presents a unified Vision-Language Pre-training (VLP) model. ...
research
01/05/2023

Adaptively Clustering Neighbor Elements for Image Captioning

We design a novel global-local Transformer named Ada-ClustFormer (ACF) t...
research
12/16/2021

Block-Skim: Efficient Question Answering for Transformer

Transformer models have achieved promising results on natural language p...
research
06/28/2022

ZoDIAC: Zoneout Dropout Injection Attention Calculation

Recently the use of self-attention has yielded to state-of-the-art resul...
research
08/20/2023

Generic Attention-model Explainability by Weighted Relevance Accumulation

Attention-based transformer models have achieved remarkable progress in ...
research
02/08/2022

Modeling Structure with Undirected Neural Networks

Neural networks are powerful function estimators, leading to their statu...

Please sign up or login with your details

Forgot password? Click here to reset