Semantics Meets Temporal Correspondence: Self-supervised Object-centric Learning in Videos

08/19/2023
by   Rui Qian, et al.
0

Self-supervised methods have shown remarkable progress in learning high-level semantics and low-level temporal correspondence. Building on these results, we take one step further and explore the possibility of integrating these two features to enhance object-centric representations. Our preliminary experiments indicate that query slot attention can extract different semantic components from the RGB feature map, while random sampling based slot attention can exploit temporal correspondence cues between frames to assist instance identification. Motivated by this, we propose a novel semantic-aware masked slot attention on top of the fused semantic features and correspondence maps. It comprises two slot attention stages with a set of shared learnable Gaussian distributions. In the first stage, we use the mean vectors as slot initialization to decompose potential semantics and generate semantic segmentation masks through iterative attention. In the second stage, for each semantics, we randomly sample slots from the corresponding Gaussian distribution and perform masked feature aggregation within the semantic area to exploit temporal correspondence patterns for instance identification. We adopt semantic- and instance-level temporal consistency as self-supervision to encourage temporally coherent object-centric representations. Our model effectively identifies multiple object instances with semantic structure, reaching promising results on unsupervised video object discovery. Furthermore, we achieve state-of-the-art performance on dense label propagation tasks, demonstrating the potential for object-centric analysis. The code is released at https://github.com/shvdiwnkozbw/SMTC.

READ FULL TEXT

page 1

page 3

page 9

research
07/18/2023

Unsupervised Conditional Slot Attention for Object Centric Learning

Extracting object-level representations for downstream reasoning tasks i...
research
10/17/2022

Unsupervised Object-Centric Learning with Bi-Level Optimized Query Slot Attention

The ability to decompose complex natural scenes into meaningful object-c...
research
08/25/2022

Refine and Represent: Region-to-Object Representation Learning

Recent works in self-supervised learning have demonstrated strong perfor...
research
06/07/2023

Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities

Unsupervised video-based object-centric learning is a promising avenue t...
research
02/16/2023

Object-centric Learning with Cyclic Walks between Parts and Whole

Learning object-centric representations from complex natural environment...
research
12/16/2021

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation

Video Panoptic Segmentation (VPS) aims at assigning a class label to eac...
research
05/15/2021

STAGE: Tool for Automated Extraction of Semantic Time Cues to Enrich Neural Temporal Ordering Models

Despite achieving state-of-the-art accuracy on temporal ordering of even...

Please sign up or login with your details

Forgot password? Click here to reset