Open Scene Understanding: Grounded Situation Recognition Meets Segment Anything for Helping People with Visual Impairments

07/15/2023
by   Ruiping Liu, et al.
0

Grounded Situation Recognition (GSR) is capable of recognizing and interpreting visual scenes in a contextually intuitive way, yielding salient activities (verbs) and the involved entities (roles) depicted in images. In this work, we focus on the application of GSR in assisting people with visual impairments (PVI). However, precise localization information of detected objects is often required to navigate their surroundings confidently and make informed decisions. For the first time, we propose an Open Scene Understanding (OpenSU) system that aims to generate pixel-wise dense segmentation masks of involved entities instead of bounding boxes. Specifically, we build our OpenSU system on top of GSR by additionally adopting an efficient Segment Anything Model (SAM). Furthermore, to enhance the feature extraction and interaction between the encoder-decoder structure, we construct our OpenSU system using a solid pure transformer backbone to improve the performance of GSR. In order to accelerate the convergence, we replace all the activation functions within the GSR decoders with GELU, thereby reducing the training duration. In quantitative analysis, our model achieves state-of-the-art performance on the SWiG dataset. Moreover, through field testing on dedicated assistive technology datasets and application demonstrations, the proposed OpenSU system can be used to enhance scene understanding and facilitate the independent mobility of people with visual impairments. Our code will be available at https://github.com/RuipingL/OpenSU.

READ FULL TEXT

page 2

page 4

page 7

page 8

research
11/19/2021

Grounded Situation Recognition with Transformers

Grounded Situation Recognition (GSR) is the task that not only classifie...
research
03/30/2022

Collaborative Transformers for Grounded Situation Recognition

Grounded situation recognition is the task of predicting the main activi...
research
12/10/2021

Rethinking the Two-Stage Framework for Grounded Situation Recognition

Grounded Situation Recognition (GSR), i.e., recognizing the salient acti...
research
12/01/2022

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

We tackle open-world semantic segmentation, which aims at learning to se...
research
03/26/2020

Grounded Situation Recognition

We introduce Grounded Situation Recognition (GSR), a task that requires ...
research
06/21/2023

ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining

Scene text removal (STR) aims at replacing text strokes in natural scene...

Please sign up or login with your details

Forgot password? Click here to reset