1st Place Solution for PSG competition with ECCV'22 SenseHuman Workshop

02/06/2023
by   Qixun Wang, et al.
0

Panoptic Scene Graph (PSG) generation aims to generate scene graph representations based on panoptic segmentation instead of rigid bounding boxes. Existing PSG methods utilize one-stage paradigm which simultaneously generates scene graphs and predicts semantic segmentation masks or two-stage paradigm that first adopt an off-the-shelf panoptic segmentor, then pairwise relationship prediction between these predicted objects. One-stage approach despite having a simplified training paradigm, its segmentation results are usually under-satisfactory, while two-stage approach lacks global context and leads to low performance on relation prediction. To bridge this gap, in this paper, we propose GRNet, a Global Relation Network in two-stage paradigm, where the pre-extracted local object features and their corresponding masks are fed into a transformer with class embeddings. To handle relation ambiguity and predicate classification bias caused by long-tailed distribution, we formulate relation prediction in the second stage as a multi-class classification task with soft label. We conduct comprehensive experiments on OpenPSG dataset and achieve the state-of-art performance on the leadboard. We also show the effectiveness of our soft label strategy for long-tailed classes in ablation studies. Our code has been released in https://github.com/wangqixun/mfpsg.

READ FULL TEXT

page 1

page 3

research
08/24/2023

Boosting Semantic Segmentation from the Perspective of Explicit Class Embeddings

Semantic segmentation is a computer vision task that associates a label ...
research
07/22/2022

Panoptic Scene Graph Generation

Existing research addresses scene graph generation (SGG) – a critical te...
research
09/09/2021

Self Supervision to Distillation for Long-Tailed Visual Recognition

Deep learning has achieved remarkable progress for visual recognition on...
research
03/25/2023

VL-SAT: Visual-Linguistic Semantics Assisted Training for 3D Semantic Scene Graph Prediction in Point Cloud

The task of 3D semantic scene graph (3DSSG) prediction in the point clou...
research
03/19/2022

Relationformer: A Unified Framework for Image-to-Graph Generation

A comprehensive representation of an image requires understanding object...
research
09/09/2018

Visual Relationship Prediction via Label Clustering and Incorporation of Depth Information

In this paper, we investigate the use of an unsupervised label clusterin...
research
03/13/2017

A Localisation-Segmentation Approach for Multi-label Annotation of Lumbar Vertebrae using Deep Nets

Multi-class segmentation of vertebrae is a non-trivial task mainly due t...

Please sign up or login with your details

Forgot password? Click here to reset