DeepAI AI Chat
Log In Sign Up

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

by   Inkyu Shin, et al.

Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy, in this work, we propose a unified approach for online and near-online VPS. The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips). We propose clip-kMaX (clip k-means mask transformer) and HiLA-MB (Hierarchical Location-Aware Memory Buffer) to instantiate the segmenter and associater, respectively. Our general formulation includes the online scenario as a special case by adopting clip length of one. Without bells and whistles, Video-kMaX sets a new state-of-the-art on KITTI-STEP and VIPSeg for video panoptic segmentation, and VSPW for video semantic segmentation. Code will be made publicly available.


page 3

page 5

page 10

page 11


TubeFormer-DeepLab: Video Mask Transformer

We present TubeFormer-DeepLab, the first attempt to tackle multiple core...

Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation

The goal of video segmentation is to accurately segment and track every ...

NOVIS: A Case for End-to-End Near-Online Video Instance Segmentation

Until recently, the Video Instance Segmentation (VIS) community operated...

Attention-guided Unified Network for Panoptic Segmentation

This paper studies panoptic segmentation, a recently proposed task which...

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation

We tackle the task of semi-supervised video object segmentation, i.e. se...

STEP: Segmenting and Tracking Every Pixel

In this paper, we tackle video panoptic segmentation, a task that requir...

Probabilistic Attention for Interactive Segmentation

We provide a probabilistic interpretation of attention and show that the...