Grounded Situation Recognition

03/26/2020
by   Sarah Pratt, et al.
25

We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities, overcoming semantic sparsity, and disambiguating roles. Moreover, unlike in captioning, GSR is straightforward to evaluate. To study this new task we create the Situations With Groundings (SWiG) dataset which adds 278,336 bounding-box groundings to the 11,538 entity classes in the imsitu dataset. We propose a Joint Situation Localizer and find that jointly predicting situations and groundings with end-to-end training handily outperforms independent training on the entire grounding metric suite with relative gains between 8 exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval. Code and data available at https://prior.allenai.org/projects/gsr.

READ FULL TEXT

page 1

page 5

page 12

page 13

page 14

page 19

page 25

page 26

research
03/30/2022

Collaborative Transformers for Grounded Situation Recognition

Grounded situation recognition is the task of predicting the main activi...
research
07/02/2023

ClipSitu: Effectively Leveraging CLIP for Conditional Predictions in Situation Recognition

Situation Recognition is the task of generating a structured summary of ...
research
12/17/2018

Grounded Video Description

Video description is one of the most challenging problems in vision and ...
research
12/03/2016

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Semantic sparsity is a common challenge in structured visual classificat...
research
10/31/2017

Semantic Image Retrieval via Active Grounding of Visual Situations

We describe a novel architecture for semantic image retrieval---in parti...
research
03/18/2017

Recurrent Models for Situation Recognition

This work proposes Recurrent Neural Network (RNN) models to predict stru...
research
07/15/2023

Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

Grounded Situation Recognition (GSR) is capable of recognizing and inter...

Please sign up or login with your details

Forgot password? Click here to reset