Grounded Situation Recognition with Transformers

11/19/2021
by   Junhyeong Cho, et al.
9

Grounded Situation Recognition (GSR) is the task that not only classifies a salient action (verb), but also predicts entities (nouns) associated with semantic roles and their locations in the given image. Inspired by the remarkable success of Transformers in vision tasks, we propose a GSR model based on a Transformer encoder-decoder architecture. The attention mechanism of our model enables accurate verb classification by capturing high-level semantic feature of an image effectively, and allows the model to flexibly deal with the complicated and image-dependent relations between entities for improved noun classification and localization. Our model is the first Transformer architecture for GSR, and achieves the state of the art in every evaluation metric on the SWiG benchmark. Our code is available at https://github.com/jhcho99/gsrtr .

READ FULL TEXT

page 9

page 10

page 11

page 19

page 20

page 21

page 22

page 23

research
03/30/2022

Collaborative Transformers for Grounded Situation Recognition

Grounded situation recognition is the task of predicting the main activi...
research
12/10/2021

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Grounded Situation Recognition (GSR), i.e., recognizing the salient acti...
research
07/15/2023

Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

Grounded Situation Recognition (GSR) is capable of recognizing and inter...
research
02/23/2022

ProFormer: Learning Data-efficient Representations of Body Movement with Prototype-based Feature Augmentation and Visual Transformers

Automatically understanding human behaviour allows household robots to i...
research
01/29/2023

Graph Mixer Networks

In recent years, the attention mechanism has demonstrated superior perfo...
research
08/03/2022

DALLE-URBAN: Capturing the urban design expertise of large text to image transformers

Automatically converting text descriptions into images using transformer...
research
07/21/2022

Focused Decoding Enables 3D Anatomical Detection by Transformers

Detection Transformers represent end-to-end object detection approaches ...

Please sign up or login with your details

Forgot password? Click here to reset