Human-centric Relation Segmentation: Dataset and Solution

05/24/2021
by   Si Liu, et al.
3

Vision and language understanding techniques have achieved remarkable progress, but currently it is still difficult to well handle problems involving very fine-grained details. For example, when the robot is told to "bring me the book in the girl's left hand", most existing methods would fail if the girl holds one book respectively in her left and right hand. In this work, we introduce a new task named human-centric relation segmentation (HRS), as a fine-grained case of HOI-det. HRS aims to predict the relations between the human and surrounding entities and identify the relation-correlated human parts, which are represented as pixel-level masks. For the above exemplar case, our HRS task produces results in the form of relation triplets <girl [left hand], hold, book> and exacts segmentation masks of the book, with which the robot can easily accomplish the grabbing task. Correspondingly, we collect a new Person In Context (PIC) dataset for this new task, which contains 17,122 high-resolution images and densely annotated entity segmentation and relations, including 141 object categories, 23 relation categories and 25 semantic human parts. We also propose a Simultaneous Matching and Segmentation (SMS) framework as a solution to the HRS task. I Outputs of the three branches are fused to produce the final HRS results. Extensive experiments on PIC and V-COCO datasets show that the proposed SMS method outperforms baselines with the 36 FPS inference speed.

READ FULL TEXT

page 2

page 4

page 5

page 6

page 7

page 12

research
03/09/2020

Cascaded Human-Object Interaction Recognition

Rapid progress has been witnessed for human-object interaction (HOI) rec...
research
03/19/2021

ClawCraneNet: Leveraging Object-level Relation for Text-based Video Segmentation

Text-based video segmentation is a challenging task that segments out th...
research
07/12/2020

SkyScapes – Fine-Grained Semantic Understanding of Aerial Scenes

Understanding the complex urban infrastructure with centimeter-level acc...
research
09/18/2019

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

Reasoning human object interactions is a core problem in human-centric s...
research
08/07/2022

Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications

Egocentric videos offer fine-grained information for high-fidelity model...
research
10/16/2019

RGB-D Individual Segmentation

Fine-grained recognition task deals with sub-category classification pro...

Please sign up or login with your details

Forgot password? Click here to reset