AIMusicGuru: Music Assisted Human Pose Correction

03/24/2022
by   Snehesh Shrestha, et al.
31

Pose Estimation techniques rely on visual cues available through observations represented in the form of pixels. But the performance is bounded by the frame rate of the video and struggles from motion blur, occlusions, and temporal coherence. This issue is magnified when people are interacting with objects and instruments, for example playing the violin. Standard approaches for postprocessing use interpolation and smoothing functions to filter noise and fill gaps, but they cannot model highly non-linear motion. We present a method that leverages our understanding of the high degree of a causal relationship between the sound produced and the motion that produces them. We use the audio signature to refine and predict accurate human body pose motion models. We propose MAPnet (Music Assisted Pose network) for generating a fine grain motion model from sparse input pose sequences but continuous audio. To accelerate further research in this domain, we also open-source MAPdat, a new multi-modal dataset of 3D violin playing motion with music. We perform a comparison of different standard machine learning models and perform analysis on input modalities, sampling techniques, and audio and motion features. Experiments on MAPdat suggest multi-modal approaches like ours as a promising direction for tasks previously approached with visual methods only. Our results show both qualitatively and quantitatively how audio can be combined with visual observation to help improve any pose estimation methods.

READ FULL TEXT

page 4

page 6

research
03/12/2021

Deep Dual Consecutive Network for Human Pose Estimation

Multi-frame human pose estimation in complicated situations is challengi...
research
10/26/2021

TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation

The recent success of transformer models in language, such as BERT, has ...
research
07/04/2019

Sim2real transfer learning for 3D pose estimation: motion to the rescue

Simulation is an anonymous, low-bias source of data where annotation can...
research
10/13/2020

A review of 3D human pose estimation algorithms for markerless motion capture

Human pose estimation (HPE) in 3D is an active research field that have ...
research
10/31/2020

Temporal Smoothing for 3D Human Pose Estimation and Localization for Occluded People

In multi-person pose estimation actors can be heavily occluded, even bec...
research
07/31/2020

Looking At The Body: Automatic Analysis of Body Gestures and Self-Adaptors in Psychological Distress

Psychological distress is a significant and growing issue in society. Au...
research
12/19/2017

Audio to Body Dynamics

We present a method that gets as input an audio of violin or piano playi...

Please sign up or login with your details

Forgot password? Click here to reset