Visual Rhythm Prediction with Feature-Aligning Network

01/29/2019
by   Yutong Xie, et al.
0

In this paper, we propose a data-driven visual rhythm prediction method, which overcomes the previous works' deficiency that predictions are made primarily by human-crafted hard rules. In our approach, we first extract features including original frames and their residuals, optical flow, scene change, and body pose. These visual features will be next taken into an end-to-end neural network as inputs. Here we observe that there are some slight misaligning between features over the timeline and assume that this is due to the distinctions between how different features are computed. To solve this problem, the extracted features are aligned by an elaborately designed layer, which can also be applied to other models suffering from mismatched features, and boost performance. Then these aligned features are fed into sequence labeling layers implemented with BiLSTM and CRF to predict the onsets. Due to the lack of existing public training and evaluation set, we experiment on a dataset constructed by ourselves based on professionally edited Music Videos (MVs), and the F1 score of our approach reaches 79.6.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/04/2019

Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for Lipreading

We focus on the word-level visual lipreading, which requires recognizing...
research
06/09/2015

Flowing ConvNets for Human Pose Estimation in Videos

The objective of this work is human pose estimation in videos, where mul...
research
05/02/2015

Dense Optical Flow Prediction from a Static Image

Given a scene, what is going to move, and in what direction will it move...
research
06/10/2017

Exploring Convolutional Networks for End-to-End Visual Servoing

Present image based visual servoing approaches rely on extracting hand c...
research
04/02/2018

End-to-End Learning of Motion Representation for Video Understanding

Despite the recent success of end-to-end learned representations, hand-c...
research
09/17/2020

DanceIt: Music-inspired Dancing Video Synthesis

Close your eyes and listen to music, one can easily imagine an actor dan...
research
04/19/2021

Comparing Correspondences: Video Prediction with Correspondence-wise Losses

Today's image prediction methods struggle to change the locations of obj...

Please sign up or login with your details

Forgot password? Click here to reset