Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

03/27/2023
by   Siqi Chen, et al.
0

Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability in modeling long-term dependency both spatially and temporally, leading to color fading, color discontinuity or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double head non-local operation are designed. The proposed CNN-Transformer block can better incorporate long-term spatial dependency with local texture and structural features, and the double head non-local operation further leverages the performance of augmented feature. While for long-term temporal dependency enhancement, we further introduce the novel linkage subnet. The linkage subnet propagate motion information across adjacent frame blocks and help to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.

READ FULL TEXT

page 2

page 3

page 4

page 7

page 8

page 10

research
08/22/2023

How Much Temporal Long-Term Context is Needed for Action Segmentation?

Modeling long-term context in videos is crucial for many fine-grained ta...
research
07/07/2023

Predicting Outcomes in Long COVID Patients with Spatiotemporal Attention

Long COVID is a general term of post-acute sequelae of COVID-19. Patient...
research
03/26/2023

Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation

Video-based 3D human pose and shape estimations are evaluated by intra-f...
research
08/07/2023

DSformer: A Double Sampling Transformer for Multivariate Time Series Long-term Prediction

Multivariate time series long-term prediction, which aims to predict the...
research
03/16/2023

TemporalMaxer: Maximize Temporal Context with only Max Pooling for Temporal Action Localization

Temporal Action Localization (TAL) is a challenging task in video unders...
research
06/30/2022

Masked Part-Of-Speech Model: Does Modeling Long Context Help Unsupervised POS-tagging?

Previous Part-Of-Speech (POS) induction models usually assume certain in...
research
12/02/2021

Self-supervised Video Transformer

In this paper, we propose self-supervised training for video transformer...

Please sign up or login with your details

Forgot password? Click here to reset