Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision

03/06/2021
by   Andrew Shin, et al.
17

Transformer architectures have brought about fundamental changes to computational linguistic field, which had been dominated by recurrent neural networks for many years. Its success also implies drastic changes in cross-modal tasks with language and vision, and many researchers have already tackled the issue. In this paper, we review some of the most critical milestones in the field, as well as overall trends on how transformer architecture has been incorporated into visuolinguistic cross-modal tasks. Furthermore, we discuss its current limitations and speculate upon some of the prospects that we find imminent.

READ FULL TEXT

page 2

page 6

research
12/04/2021

LAVT: Language-Aware Vision Transformer for Referring Image Segmentation

Referring image segmentation is a fundamental vision-language task that ...
research
05/10/2022

Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training

In this paper, we present a cross-modal recipe retrieval framework, Tran...
research
08/28/2023

Cross-Modal Retrieval: A Systematic Review of Methods and Future Directions

With the exponential surge in diverse multi-modal data, traditional uni-...
research
03/10/2022

Cross-modal Map Learning for Vision and Language Navigation

We consider the problem of Vision-and-Language Navigation (VLN). The maj...
research
02/08/2023

SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images

Radiotherapy (RT) combined with cetuximab is the standard treatment for ...
research
07/27/2023

Cascaded Cross-Modal Transformer for Request and Complaint Detection

We propose a novel cascaded cross-modal transformer (CCMT) that combines...
research
05/27/2023

CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

Vision-language models have achieved tremendous progress far beyond what...

Please sign up or login with your details

Forgot password? Click here to reset