Unsupervised Video Segmentation via Spatio-Temporally Nonlocal Appearance Learning

by   Kaihua Zhang, et al.

Video object segmentation is challenging due to the factors like rapidly fast motion, cluttered backgrounds, arbitrary object appearance variation and shape deformation. Most existing methods only explore appearance information between two consecutive frames, which do not make full use of the usefully long-term nonlocal information that is helpful to make the learned appearance stable, and hence they tend to fail when the targets suffer from large viewpoint changes and significant non-rigid deformations. In this paper, we propose a simple yet effective approach to mine the long-term sptatio-temporally nonlocal appearance information for unsupervised video segmentation. The motivation of our algorithm comes from the spatio-temporal nonlocality of the region appearance reoccurrence in a video. Specifically, we first generate a set of superpixels to represent the foreground and background, and then update the appearance of each superpixel with its long-term sptatio-temporally nonlocal counterparts generated by the approximate nearest neighbor search method with the efficient KD-tree algorithm. Then, with the updated appearances, we formulate a spatio-temporal graphical model comprised of the superpixel label consistency potentials. Finally, we generate the segmentation by optimizing the graphical model via iteratively updating the appearance model and estimating the labels. Extensive evaluations on the SegTrack and Youtube-Objects datasets demonstrate the effectiveness of the proposed method, which performs favorably against some state-of-art methods.


page 17

page 18


Temporally Object-based Video Co-Segmentation

In this paper, we propose an unsupervised video object co-segmentation f...

Fast video object segmentation with Spatio-Temporal GANs

Learning descriptive spatio-temporal object models from data is paramoun...

Implicit Motion-Compensated Network for Unsupervised Video Object Segmentation

Unsupervised video object segmentation (UVOS) aims at automatically sepa...

Domain Alignment and Temporal Aggregation for Unsupervised Video Object Segmentation

Unsupervised video object segmentation aims at detecting and segmenting ...

STD-Trees: Spatio-temporal Deformable Trees for Multirotors Kinodynamic Planning

In constrained solution spaces with a huge number of homotopy classes, s...

Distributed Bayesian inference for consistent labeling of tracked objects in non-overlapping camera networks

One of the fundamental requirements for visual surveillance using non-ov...

Unsupervised Video Decomposition using Spatio-temporal Iterative Inference

Unsupervised multi-object scene decomposition is a fast-emerging problem...

Please sign up or login with your details

Forgot password? Click here to reset