MetaDance: Few-shot Dancing Video Retargeting via Temporal-aware Meta-learning
Dancing video retargeting aims to synthesize a video that transfers the dance movements from a source video to a target person. Previous work need collect a several-minute-long video of a target person with thousands of frames to train a personalized model. However, the trained model can only generate videos of the same person. To address the limitations, recent work tackled few-shot dancing video retargeting, which learns to synthesize videos of unseen persons by leveraging a few frames of them. In practice, given a few frames of a person, these work simply regarded them as a batch of individual images without temporal correlations, thus generating temporally incoherent dancing videos of low visual quality. In this work, we model a few frames of a person as a series of dancing moves, where each move contains two consecutive frames, to extract the appearance patterns and the temporal dynamics of this person. We propose MetaDance, which utilizes temporal-aware meta-learning to optimize the initialization of a model through the synthesis of dancing moves, such that the meta-trained model can be efficiently tuned towards enhanced visual quality and strengthened temporal stability for unseen persons with a few frames. Extensive evaluations show large superiority of our method.
READ FULL TEXT