Occluded Video Instance Segmentation

02/02/2021
by   Jiyang Qi, et al.
12

Can our video understanding systems perceive objects when a heavy occlusion exists in a scene? To answer this question, we collect a large scale dataset called OVIS for occluded video instance segmentation, that is, to simultaneously detect, segment, and track instances in occluded scenes. OVIS consists of 296k high-quality instance masks from 25 semantic categories, where object occlusions usually occur. While our human vision systems can understand those occluded instances by contextual reasoning and association, our experiments suggest that current video understanding systems are not satisfying. On the OVIS dataset, the highest AP achieved by state-of-the-art algorithms is only 14.4, which reveals that we are still at a nascent stage for understanding objects, instances, and videos in a real-world scenario. Moreover, to complement missing object cues caused by occlusion, we propose a plug-and-play module called temporal feature calibration. Built upon MaskTrack R-CNN and SipMask, we report an AP of 15.2 and 15.0 respectively. The OVIS dataset is released at http://songbai.site/ovis , and the project code will be available soon.

READ FULL TEXT

page 1

page 2

page 4

page 5

page 7

page 8

research
11/15/2021

Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge

Although deep learning methods have achieved advanced video object recog...
research
09/23/2021

Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling

Instance-aware segmentation of unseen objects is essential for a robotic...
research
03/16/2020

PS-RCNN: Detecting Secondary Human Instances in a Crowd via Primary Object Suppression

Detecting human bodies in highly crowded scenes is a challenging problem...
research
07/29/2020

MessyTable: Instance Association in Multiple Camera Views

We present an interesting and challenging dataset that features a large ...
research
08/16/2021

Real-time Human-Centric Segmentation for Complex Video Scenes

Most existing video tasks related to "human" focus on the segmentation o...
research
11/17/2020

SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation

Amodal recognition is the ability of the system to detect occluded objec...
research
05/29/2022

Perceiving the Invisible: Proposal-Free Amodal Panoptic Segmentation

Amodal panoptic segmentation aims to connect the perception of the world...

Please sign up or login with your details

Forgot password? Click here to reset