DeepAI AI Chat
Log In Sign Up

Grounded Situation Recognition

03/26/2020
by   Sarah Pratt, et al.
Allen Institute for Artificial Intelligence
25

We introduce Grounded Situation Recognition (GSR), a task that requires producing structured semantic summaries of images describing: the primary activity, entities engaged in the activity with their roles (e.g. agent, tool), and bounding-box groundings of entities. GSR presents important technical challenges: identifying semantic saliency, categorizing and localizing a large and diverse set of entities, overcoming semantic sparsity, and disambiguating roles. Moreover, unlike in captioning, GSR is straightforward to evaluate. To study this new task we create the Situations With Groundings (SWiG) dataset which adds 278,336 bounding-box groundings to the 11,538 entity classes in the imsitu dataset. We propose a Joint Situation Localizer and find that jointly predicting situations and groundings with end-to-end training handily outperforms independent training on the entire grounding metric suite with relative gains between 8 exciting future directions enabled by our models: conditional querying, visual chaining, and grounded semantic aware image retrieval. Code and data available at https://prior.allenai.org/projects/gsr.

READ FULL TEXT

page 1

page 5

page 12

page 13

page 14

page 19

page 25

page 26

03/30/2022

Collaborative Transformers for Grounded Situation Recognition

Grounded situation recognition is the task of predicting the main activi...
12/03/2016

Commonly Uncommon: Semantic Sparsity in Situation Recognition

Semantic sparsity is a common challenge in structured visual classificat...
12/17/2018

Grounded Video Description

Video description is one of the most challenging problems in vision and ...
10/31/2017

Semantic Image Retrieval via Active Grounding of Visual Situations

We describe a novel architecture for semantic image retrieval---in parti...
11/19/2021

Grounded Situation Recognition with Transformers

Grounded Situation Recognition (GSR) is the task that not only classifie...
03/18/2017

Recurrent Models for Situation Recognition

This work proposes Recurrent Neural Network (RNN) models to predict stru...
10/19/2022

Grounded Video Situation Recognition

Dense video understanding requires answering several questions such as w...

Code Repositories

swig

Situation With Groundings (SWiG) dataset and Joint Situation Localizer (JSL)


view repo