Motion Segmentation using Frequency Domain Transformer Networks

04/18/2020
by   Hafez Farazi, et al.
8

Self-supervised prediction is a powerful mechanism to learn representations that capture the underlying structure of the data. Despite recent progress, the self-supervised video prediction task is still challenging. One of the critical factors that make the task hard is motion segmentation, which is segmenting individual objects and the background and estimating their motion separately. In video prediction, the shape, appearance, and transformation of each object should be understood only by predicting the next frame in pixel space. To address this task, we propose a novel end-to-end learnable architecture that predicts the next frame by modeling foreground and background separately while simultaneously estimating and predicting the foreground motion using Frequency Domain Transformer Networks. Experimental evaluations show that this yields interpretable representations and that our approach can outperform some widely used video prediction methods like Video Ladder Network and Predictive Gated Pyramids on synthetic data.

READ FULL TEXT
research
03/01/2019

Frequency Domain Transformer Networks for Video Prediction

The task of video prediction is forecasting the next frames given some p...
research
04/15/2021

Self-supervised Video Object Segmentation by Motion Grouping

Animals have evolved highly functional visual systems to understand moti...
research
06/27/2018

CeMNet: Self-supervised learning for accurate continuous ego-motion estimation

In this paper, we propose a novel self-supervised learning model for est...
research
04/25/2019

On guiding video object segmentation

This paper presents a novel approach for segmenting moving objects in un...
research
09/21/2023

MoDA: Leveraging Motion Priors from Videos for Advancing Unsupervised Domain Adaptation in Semantic Segmentation

Unsupervised domain adaptation (UDA) is an effective approach to handle ...
research
08/30/2022

Stabilize, Decompose, and Denoise: Self-Supervised Fluoroscopy Denoising

Fluoroscopy is an imaging technique that uses X-ray to obtain a real-tim...
research
09/12/2020

Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning

Self-supervised learning has shown great potentials in improving the vid...

Please sign up or login with your details

Forgot password? Click here to reset