Adaptive Human Matting for Dynamic Videos

04/12/2023
by   Chung-Ching Lin, et al.
0

The most recent efforts in video matting have focused on eliminating trimap dependency since trimap annotations are expensive and trimap-based methods are less adaptable for real-time applications. Despite the latest tripmap-free methods showing promising results, their performance often degrades when dealing with highly diverse and unstructured videos. We address this limitation by introducing Adaptive Matting for Dynamic Videos, termed AdaM, which is a framework designed for simultaneously differentiating foregrounds from backgrounds and capturing alpha matte details of human subjects in the foreground. Two interconnected network designs are employed to achieve this goal: (1) an encoder-decoder network that produces alpha mattes and intermediate masks which are used to guide the transformer in adaptively decoding foregrounds and backgrounds, and (2) a transformer network in which long- and short-term attention combine to retain spatial and temporal contexts, facilitating the decoding of foreground details. We benchmark and study our methods on recently introduced datasets, showing that our model notably improves matting realism and temporal coherence in complex real-world videos and achieves new best-in-class generalizability. Further details and examples are available at https://github.com/microsoft/AdaM.

READ FULL TEXT

page 1

page 2

page 3

page 6

page 7

page 8

page 14

page 15

research
07/22/2021

EAN: Event Adaptive Network for Enhanced Action Recognition

Efficiently modeling spatial-temporal information in videos is crucial f...
research
08/22/2022

InstanceFormer: An Online Video Instance Segmentation Framework

Recent transformer-based offline video instance segmentation (VIS) appro...
research
04/17/2022

VDTR: Video Deblurring with Transformer

Video deblurring is still an unsolved problem due to the challenging spa...
research
04/21/2021

Temporal Modulation Network for Controllable Space-Time Video Super-Resolution

Space-time video super-resolution (STVSR) aims to increase the spatial a...
research
10/12/2021

Rescoring Sequence-to-Sequence Models for Text Line Recognition with CTC-Prefixes

In contrast to Connectionist Temporal Classification (CTC) approaches, S...
research
08/04/2023

Painterly Image Harmonization using Diffusion Model

Painterly image harmonization aims to insert photographic objects into p...
research
07/24/2022

Object State Change Classification in Egocentric Videos using the Divided Space-Time Attention Mechanism

This report describes our submission called "TarHeels" for the Ego4D: Ob...

Please sign up or login with your details

Forgot password? Click here to reset