Intrinsic Temporal Regularization for High-resolution Human Video Synthesis

by   Lingbo Yang, et al.

Temporal consistency is crucial for extending image processing pipelines to the video domain, which is often enforced with flow-based warping error over adjacent frames. Yet for human video synthesis, such scheme is less reliable due to the misalignment between source and target video as well as the difficulty in accurate flow estimation. In this paper, we propose an effective intrinsic temporal regularization scheme to mitigate these issues, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation via temporal loss modulation. This creates a shortcut for back-propagating temporal loss gradients directly to the front-end motion estimator, thus improving training stability and temporal coherence in output videos. We apply our intrinsic temporal regulation to single-image generator, leading to a powerful "INTERnet" capable of generating 512×512 resolution human action videos with temporal-coherent, realistic visual details. Extensive experiments demonstrate the superiority of proposed INTERnet over several competitive baselines.


page 1

page 2

page 3

page 5

page 6

page 7

page 8


Domain Adaptive Video Segmentation via Temporal Consistency Regularization

Video semantic segmentation is an essential task for the analysis and un...

Video-to-Video Synthesis

We study the problem of video-to-video synthesis, whose goal is to learn...

FISR: Deep Joint Frame Interpolation and Super-Resolution with A Multi-scale Temporal Loss

Super-resolution (SR) has been widely used to convert low-resolution leg...

Learning Blind Video Temporal Consistency

Applying image processing algorithms independently to each frame of a vi...

C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer

Human video motion transfer (HVMT) aims to synthesize videos that one pe...

Component-Based Distributed Framework for Coherent and Real-Time Video Dehazing

Traditional dehazing techniques, as a well studied topic in image proces...

Decomposition, Compression, and Synthesis (DCS)-based Video Coding: A Neural Exploration via Resolution-Adaptive Learning

Inspired by the facts that retinal cells actually segregate the visual s...