PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation

by   Xiangtai Li, et al.

Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2 improvements on the Cityscapes PPS dataset and 5 PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70 on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.


page 1

page 2

page 6

page 8

page 13


Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation

Panoptic Part Segmentation (PPS) aims to unify panoptic segmentation and...

Learning to Fuse Things and Stuff

We propose an end-to-end learning approach for panoptic segmentation, a ...

Fashionformer: A simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition

Human fashion understanding is one important computer vision task since ...

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Few Shot Instance Segmentation (FSIS) requires models to detect and segm...

Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation

This paper presents Video K-Net, a simple, strong, and unified framework...

Attention-guided Unified Network for Panoptic Segmentation

This paper studies panoptic segmentation, a recently proposed task which...

Dynamic Feature Integration for Simultaneous Detection of Salient Object, Edge and Skeleton

In this paper, we solve three low-level pixel-wise vision problems, incl...

Please sign up or login with your details

Forgot password? Click here to reset