Temporal-aware Hierarchical Mask Classification for Video Semantic Segmentation

09/14/2023
by   Zhaochong An, et al.
0

Modern approaches have proved the huge potential of addressing semantic segmentation as a mask classification task which is widely used in instance-level segmentation. This paradigm trains models by assigning part of object queries to ground truths via conventional one-to-one matching. However, we observe that the popular video semantic segmentation (VSS) dataset has limited categories per video, meaning less than 10 to receive meaningful gradient updates during VSS training. This inefficiency limits the full expressive potential of all queries.Thus, we present a novel solution THE-Mask for VSS, which introduces temporal-aware hierarchical object queries for the first time. Specifically, we propose to use a simple two-round matching mechanism to involve more queries matched with minimal cost during training while without any extra cost during inference. To support our more-to-one assignment, in terms of the matching results, we further design a hierarchical loss to train queries with their corresponding hierarchy of primary or secondary. Moreover, to effectively capture temporal information across frames, we propose a temporal aggregation decoder that fits seamlessly into the mask-classification paradigm for VSS. Utilizing temporal-sensitive multi-level queries, our method achieves state-of-the-art performance on the latest challenging VSS benchmark VSPW without bells and whistles.

READ FULL TEXT

page 1

page 4

page 6

research
07/13/2021

Per-Pixel Classification is Not All You Need for Semantic Segmentation

Modern approaches typically formulate semantic segmentation as a per-pix...
research
06/11/2023

3rd Place Solution for PVUW Challenge 2023: Video Panoptic Segmentation

In order to deal with the task of video panoptic segmentation in the wil...
research
01/11/2022

Pyramid Fusion Transformer for Semantic Segmentation

The recently proposed MaskFormer <cit.> gives a refreshed perspective on...
research
07/07/2021

WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations

Compared with tedious per-pixel mask annotating, it is much easier to an...
research
07/20/2018

Future Semantic Segmentation with Convolutional LSTM

We consider the problem of predicting semantic segmentation of future fr...
research
11/25/2022

Pixels Together Strong: Segmenting Unknown Regions Rejected by All

Semantic segmentation methods typically perform per-pixel classification...
research
03/29/2018

MaskRNN: Instance Level Video Object Segmentation

Instance level video object segmentation is an important technique for v...

Please sign up or login with your details

Forgot password? Click here to reset